Skip to main content

Command Palette

Search for a command to run...

Docker Failed to Start After a Trivy Scan

Updated
5 min read
Docker Failed to Start After a Trivy Scan
P
Welcome! I’m Prajwal P. I stand at the intersection of technology and efficiency, exploring the dynamic world of DevOps ⚙️. From mastering Cloud infrastructure to orchestrating containers, I am passionate about automating the complex to create the simple. Join me as I document my learning curve, share technical insights, and navigate the ever-evolving landscape of software deployment.

Root Cause Analysis, Deep Troubleshooting, and Final Fix

Today I ran into a Docker issue that initially looked like a simple service failure, but it turned into a valuable real-world DevOps learning experience.

What made this issue tricky was not the error itself, but the sequence of events that led to it:

  • Docker was already installed on my system (I forgot about it)

  • I installed Trivy for vulnerability scanning

  • I interrupted a Trivy image scan midway

  • Docker stopped starting

  • Reinstalling Docker did not fix the problem

In this post, I’ll explain what happened, why it happened, how I troubleshot it step by step, and how I finally solved it.


Background

Before the issue occurred:

  • Docker had been installed earlier on my system

  • I installed Trivy to scan container images

  • I ran:

      trivy image nginx:latest
    
  • I cancelled the scan midway using Ctrl + C

  • Later, when I tried to use Docker again, it completely failed to start

At first, I did not connect the Trivy scan to the Docker failure, which made the issue harder to reason about.


What Happened (The Problem)

When I checked the Docker service status:

sudo systemctl status docker

I saw:

docker.service - Docker Application Container Engine
Active: failed (Result: exit-code)
Start request repeated too quickly

This indicates that:

  • Docker daemon (dockerd) tried to start

  • It crashed immediately

  • systemd retried multiple times

  • systemd eventually stopped retrying to prevent a restart loop


Checking Docker Logs

Next, I checked the Docker logs:

sudo journalctl -u docker.service -xe

Surprisingly, there was:

  • No meaningful error message

  • No stack trace

  • Only generic service failure logs

This is an important signal.

When Docker fails without useful logs, it often means the daemon is crashing very early during startup.


Verifying containerd

Since Docker depends on containerd, the next step was to verify its status:

sudo systemctl status containerd

The service was active and running.

This immediately told me:

  • The OS was healthy

  • The kernel was not the issue

  • The container runtime was functioning correctly

  • The failure was isolated to Docker itself


Understanding dockerd vs containerd (Bonus Insight)

At this stage, understanding the distinction between dockerd and containerd became critical.

  • dockerd is Docker’s control plane. It manages images, containers, networks, volumes, and Docker’s internal state.

  • containerd is the low-level container runtime that actually executes containers and manages their lifecycle.

In simple terms:

  • dockerd = orchestration and state management

  • containerd = container execution


Why This Matters in Troubleshooting

Because containerd was healthy, I could confidently rule out:

  • OS-level issues

  • Runtime-level failures

  • Kernel or dependency problems

That narrowed the issue to Docker’s internal state, most notably:

/var/lib/docker

If Docker’s metadata is corrupted, dockerd can crash before meaningful logs are generated — which matched the symptoms perfectly.


What Trivy Actually Did

When running:

trivy image nginx:latest

Trivy:

  • Uses the Docker daemon

  • Triggers Docker to pull the image

  • Scans the image layers stored by Docker

During this process:

  • Docker was actively writing image layers and metadata

  • I interrupted the operation mid-way


Why Interrupting the Scan Broke Docker

Interrupting an image pull can leave Docker in an inconsistent state:

  • Partially downloaded layers

  • Broken metadata references

  • Invalid storage driver state (e.g., overlay2)

Docker is particularly sensitive during image operations.

As a result:
➡️ Docker metadata under /var/lib/docker became corrupted
➡️ dockerd crashed immediately on startup


Why Reinstalling Docker Did Not Fix the Issue

Before reinstalling, I removed Docker packages using apt:

sudo apt remove docker docker.io containerd runc

This removed:

  • Docker binaries

  • CLI tools

  • Service files

However, APT does not remove runtime data.

These directories remained:

  • /var/lib/docker

  • /etc/docker

So when Docker was reinstalled, it reused the same corrupted state and failed again.


Root Cause (Clear Statement)

Docker failed to start because its internal metadata under /var/lib/docker was corrupted by an interrupted image operation triggered by a Trivy scan while Docker was already installed.


Step-by-Step Troubleshooting Process

  1. Confirmed Docker service failure

  2. Checked Docker logs (no useful errors)

  3. Verified containerd was running

  4. Isolated the issue to Docker’s internal state


The Fix (Final Solution)

Stop Docker

sudo systemctl stop docker

Remove corrupted Docker state

sudo rm -rf /var/lib/docker

⚠️ This removes images and containers — safe for development environments.

Reset systemd state and restart Docker

sudo systemctl daemon-reexec # refreshes systemd (like restarting the service manager)
sudo systemctl reset-failed docker # clears Docker’s failed status so it can start again
sudo systemctl start docker # starts Docker normal

Quick explanation for beginners 👇

systemctl daemon-reexec → refreshes systemd (like restarting the service manager)
systemctl reset-failed docker → clears Docker’s failed status so it can start again
systemctl start docker → starts Docker normally


Final Result

sudo systemctl status docker
Active: active (running)

Verification:

docker run hello-world

Docker started successfully.


Key Learnings

  • Docker failures are often state-related, not installation-related

  • Removing packages does not remove corrupted Docker metadata

  • Interrupting image pulls can break Docker

  • Checking containerd early helps isolate issues quickly

  • /var/lib/docker plays a critical role in Docker startup

  • Structured troubleshooting beats blind reinstalling


🧰 Debugging Checklist (Save This)

When Docker fails to start:

☑ Check Docker status

systemctl status docker

☑ Inspect Docker logs

journalctl -u docker.service -xe

☑ Verify containerd

systemctl status containerd

☑ If containerd is healthy and Docker crashes early:

  • Inspect /var/lib/docker

  • Suspect corrupted metadata

☑ Reset Docker state (dev environments):

rm -rf /var/lib/docker

☑ Restart Docker cleanly


Conclusion

This issue reinforced an important DevOps principle:

Understanding system internals matters more than memorizing commands.

By reasoning through dependencies and state, I was able to identify the real root cause and fix the issue cleanly — without guesswork.


More from this blog

Terraform on AWS

29 posts

Stop clicking in the AWS console. Start coding your infrastructure.