Fix Docker Start Issue Post Trivy Scan

Root Cause Analysis, Deep Troubleshooting, and Final Fix

Today I ran into a Docker issue that initially looked like a simple service failure, but it turned into a valuable real-world DevOps learning experience.

What made this issue tricky was not the error itself, but the sequence of events that led to it:

Docker was already installed on my system (I forgot about it)
I installed Trivy for vulnerability scanning
I interrupted a Trivy image scan midway
Docker stopped starting
Reinstalling Docker did not fix the problem

In this post, I’ll explain what happened, why it happened, how I troubleshot it step by step, and how I finally solved it.

Background

Before the issue occurred:

Docker had been installed earlier on my system
I installed Trivy to scan container images
I ran:
```
  trivy image nginx:latest
```
I cancelled the scan midway using Ctrl + C
Later, when I tried to use Docker again, it completely failed to start

At first, I did not connect the Trivy scan to the Docker failure, which made the issue harder to reason about.

What Happened (The Problem)

When I checked the Docker service status:

sudo systemctl status docker

I saw:

docker.service - Docker Application Container Engine
Active: failed (Result: exit-code)
Start request repeated too quickly

This indicates that:

Docker daemon (dockerd) tried to start
It crashed immediately
systemd retried multiple times
systemd eventually stopped retrying to prevent a restart loop

Checking Docker Logs

Next, I checked the Docker logs:

sudo journalctl -u docker.service -xe

Surprisingly, there was:

No meaningful error message
No stack trace
Only generic service failure logs

This is an important signal.

When Docker fails without useful logs, it often means the daemon is crashing very early during startup.

Verifying containerd

Since Docker depends on containerd, the next step was to verify its status:

sudo systemctl status containerd

The service was active and running.

This immediately told me:

The OS was healthy
The kernel was not the issue
The container runtime was functioning correctly
The failure was isolated to Docker itself

Understanding dockerd vs containerd (Bonus Insight)

At this stage, understanding the distinction between dockerd and containerd became critical.

dockerd is Docker’s control plane. It manages images, containers, networks, volumes, and Docker’s internal state.
containerd is the low-level container runtime that actually executes containers and manages their lifecycle.

In simple terms:

dockerd = orchestration and state management
containerd = container execution

Why This Matters in Troubleshooting

Because containerd was healthy, I could confidently rule out:

OS-level issues
Runtime-level failures
Kernel or dependency problems

That narrowed the issue to Docker’s internal state, most notably:

/var/lib/docker

If Docker’s metadata is corrupted, dockerd can crash before meaningful logs are generated — which matched the symptoms perfectly.

What Trivy Actually Did

When running:

trivy image nginx:latest

Trivy:

Uses the Docker daemon
Triggers Docker to pull the image
Scans the image layers stored by Docker

During this process:

Docker was actively writing image layers and metadata
I interrupted the operation mid-way

Why Interrupting the Scan Broke Docker

Interrupting an image pull can leave Docker in an inconsistent state:

Partially downloaded layers
Broken metadata references
Invalid storage driver state (e.g., overlay2)

Docker is particularly sensitive during image operations.

As a result:
➡️ Docker metadata under /var/lib/docker became corrupted
➡️ dockerd crashed immediately on startup

Why Reinstalling Docker Did Not Fix the Issue

Before reinstalling, I removed Docker packages using apt:

sudo apt remove docker docker.io containerd runc

This removed:

Docker binaries
CLI tools
Service files

However, APT does not remove runtime data.

These directories remained:

/var/lib/docker
/etc/docker

So when Docker was reinstalled, it reused the same corrupted state and failed again.

Root Cause (Clear Statement)

Docker failed to start because its internal metadata under /var/lib/docker was corrupted by an interrupted image operation triggered by a Trivy scan while Docker was already installed.

Step-by-Step Troubleshooting Process

Confirmed Docker service failure
Checked Docker logs (no useful errors)
Verified containerd was running
Isolated the issue to Docker’s internal state

The Fix (Final Solution)

Stop Docker

sudo systemctl stop docker

Remove corrupted Docker state

sudo rm -rf /var/lib/docker

⚠️ This removes images and containers — safe for development environments.

Reset systemd state and restart Docker

sudo systemctl daemon-reexec # refreshes systemd (like restarting the service manager)
sudo systemctl reset-failed docker # clears Docker’s failed status so it can start again
sudo systemctl start docker # starts Docker normal

Quick explanation for beginners 👇

• systemctl daemon-reexec → refreshes systemd (like restarting the service manager)
• systemctl reset-failed docker → clears Docker’s failed status so it can start again
• systemctl start docker → starts Docker normally

Final Result

sudo systemctl status docker

Active: active (running)

Verification:

docker run hello-world

Docker started successfully.

Key Learnings

Docker failures are often state-related, not installation-related
Removing packages does not remove corrupted Docker metadata
Interrupting image pulls can break Docker
Checking containerd early helps isolate issues quickly
/var/lib/docker plays a critical role in Docker startup
Structured troubleshooting beats blind reinstalling

🧰 Debugging Checklist (Save This)

When Docker fails to start:

☑ Check Docker status

systemctl status docker

☑ Inspect Docker logs

journalctl -u docker.service -xe

☑ Verify containerd

systemctl status containerd

☑ If containerd is healthy and Docker crashes early:

Inspect /var/lib/docker
Suspect corrupted metadata

☑ Reset Docker state (dev environments):

rm -rf /var/lib/docker

☑ Restart Docker cleanly

Conclusion

This issue reinforced an important DevOps principle:

Understanding system internals matters more than memorizing commands.

By reasoning through dependencies and state, I was able to identify the real root cause and fix the issue cleanly — without guesswork.

Docker Failed to Start After a Trivy Scan

Root Cause Analysis, Deep Troubleshooting, and Final Fix

Background

What Happened (The Problem)

Checking Docker Logs

Verifying containerd

Understanding dockerd vs containerd (Bonus Insight)

Why This Matters in Troubleshooting

What Trivy Actually Did

Why Interrupting the Scan Broke Docker

Why Reinstalling Docker Did Not Fix the Issue

Root Cause (Clear Statement)

Step-by-Step Troubleshooting Process

The Fix (Final Solution)

Stop Docker

Remove corrupted Docker state

Reset systemd state and restart Docker

Final Result

Key Learnings

🧰 Debugging Checklist (Save This)

Conclusion

Comments

More from this blog

Day 16: AWS IAM User Management with Terraform – CSV-Driven Onboarding, RBAC, Password Security, PGP Encryption, and SSO Best Practices

How I Reduced a Next.js Docker Image from 3.39 GB to 619 MB (82% Reduction)

Day-15 Building a Full-Mesh Multi-Region VPC Peering Architecture Using Terraform on AWS

Day 14 — Static Website Hosting on AWS with S3 + CloudFront using Terraform

How I Debugged an "Undeletable" AWS Elastic IP and Traced It Back to Redshift Serverless

Command Palette

Root Cause Analysis, Deep Troubleshooting, and Final Fix

Background

What Happened (The Problem)

Checking Docker Logs

Verifying containerd

Understanding dockerd vs containerd (Bonus Insight)

Why This Matters in Troubleshooting

What Trivy Actually Did

Why Interrupting the Scan Broke Docker

Why Reinstalling Docker Did Not Fix the Issue

Root Cause (Clear Statement)

Step-by-Step Troubleshooting Process

The Fix (Final Solution)

Stop Docker

Remove corrupted Docker state

Reset systemd state and restart Docker

Final Result

Key Learnings

🧰 Debugging Checklist (Save This)

Conclusion

Comments

More from this blog