Docker Failed to Start After a Trivy Scan

Root Cause Analysis, Deep Troubleshooting, and Final Fix
Today I ran into a Docker issue that initially looked like a simple service failure, but it turned into a valuable real-world DevOps learning experience.
What made this issue tricky was not the error itself, but the sequence of events that led to it:
Docker was already installed on my system (I forgot about it)
I installed Trivy for vulnerability scanning
I interrupted a Trivy image scan midway
Docker stopped starting
Reinstalling Docker did not fix the problem
In this post, I’ll explain what happened, why it happened, how I troubleshot it step by step, and how I finally solved it.
Background
Before the issue occurred:
Docker had been installed earlier on my system
I installed Trivy to scan container images
I ran:
trivy image nginx:latestI cancelled the scan midway using
Ctrl + CLater, when I tried to use Docker again, it completely failed to start
At first, I did not connect the Trivy scan to the Docker failure, which made the issue harder to reason about.
What Happened (The Problem)
When I checked the Docker service status:
sudo systemctl status docker
I saw:
docker.service - Docker Application Container Engine
Active: failed (Result: exit-code)
Start request repeated too quickly
This indicates that:
Docker daemon (
dockerd) tried to startIt crashed immediately
systemd retried multiple times
systemd eventually stopped retrying to prevent a restart loop
Checking Docker Logs
Next, I checked the Docker logs:
sudo journalctl -u docker.service -xe
Surprisingly, there was:
No meaningful error message
No stack trace
Only generic service failure logs
This is an important signal.
When Docker fails without useful logs, it often means the daemon is crashing very early during startup.
Verifying containerd
Since Docker depends on containerd, the next step was to verify its status:
sudo systemctl status containerd
The service was active and running.
This immediately told me:
The OS was healthy
The kernel was not the issue
The container runtime was functioning correctly
The failure was isolated to Docker itself
Understanding dockerd vs containerd (Bonus Insight)
At this stage, understanding the distinction between dockerd and containerd became critical.
dockerd is Docker’s control plane. It manages images, containers, networks, volumes, and Docker’s internal state.
containerd is the low-level container runtime that actually executes containers and manages their lifecycle.
In simple terms:
dockerd = orchestration and state management
containerd = container execution
Why This Matters in Troubleshooting
Because containerd was healthy, I could confidently rule out:
OS-level issues
Runtime-level failures
Kernel or dependency problems
That narrowed the issue to Docker’s internal state, most notably:
/var/lib/docker
If Docker’s metadata is corrupted, dockerd can crash before meaningful logs are generated — which matched the symptoms perfectly.
What Trivy Actually Did
When running:
trivy image nginx:latest
Trivy:
Uses the Docker daemon
Triggers Docker to pull the image
Scans the image layers stored by Docker
During this process:
Docker was actively writing image layers and metadata
I interrupted the operation mid-way
Why Interrupting the Scan Broke Docker
Interrupting an image pull can leave Docker in an inconsistent state:
Partially downloaded layers
Broken metadata references
Invalid storage driver state (e.g., overlay2)
Docker is particularly sensitive during image operations.
As a result:
➡️ Docker metadata under /var/lib/docker became corrupted
➡️ dockerd crashed immediately on startup
Why Reinstalling Docker Did Not Fix the Issue
Before reinstalling, I removed Docker packages using apt:
sudo apt remove docker docker.io containerd runc
This removed:
Docker binaries
CLI tools
Service files
However, APT does not remove runtime data.
These directories remained:
/var/lib/docker/etc/docker
So when Docker was reinstalled, it reused the same corrupted state and failed again.
Root Cause (Clear Statement)
Docker failed to start because its internal metadata under
/var/lib/dockerwas corrupted by an interrupted image operation triggered by a Trivy scan while Docker was already installed.
Step-by-Step Troubleshooting Process
Confirmed Docker service failure
Checked Docker logs (no useful errors)
Verified containerd was running
Isolated the issue to Docker’s internal state
The Fix (Final Solution)
Stop Docker
sudo systemctl stop docker
Remove corrupted Docker state
sudo rm -rf /var/lib/docker
⚠️ This removes images and containers — safe for development environments.
Reset systemd state and restart Docker
sudo systemctl daemon-reexec # refreshes systemd (like restarting the service manager)
sudo systemctl reset-failed docker # clears Docker’s failed status so it can start again
sudo systemctl start docker # starts Docker normal
Quick explanation for beginners 👇
• systemctl daemon-reexec → refreshes systemd (like restarting the service manager)
• systemctl reset-failed docker → clears Docker’s failed status so it can start again
• systemctl start docker → starts Docker normally
Final Result
sudo systemctl status docker
Active: active (running)
Verification:
docker run hello-world
Docker started successfully.
Key Learnings
Docker failures are often state-related, not installation-related
Removing packages does not remove corrupted Docker metadata
Interrupting image pulls can break Docker
Checking containerd early helps isolate issues quickly
/var/lib/dockerplays a critical role in Docker startupStructured troubleshooting beats blind reinstalling
🧰 Debugging Checklist (Save This)
When Docker fails to start:
☑ Check Docker status
systemctl status docker
☑ Inspect Docker logs
journalctl -u docker.service -xe
☑ Verify containerd
systemctl status containerd
☑ If containerd is healthy and Docker crashes early:
Inspect
/var/lib/dockerSuspect corrupted metadata
☑ Reset Docker state (dev environments):
rm -rf /var/lib/docker
☑ Restart Docker cleanly
Conclusion
This issue reinforced an important DevOps principle:
Understanding system internals matters more than memorizing commands.
By reasoning through dependencies and state, I was able to identify the real root cause and fix the issue cleanly — without guesswork.




