An alert fires. You open logs. They look fine. Production is still on fire.
This isn't bad observability. This is misplaced trust.
Kubernetes logs are great at telling you what your app thinks happened. They are terrible at telling you why your system is behaving the way it is. During incidents, that gap is where most time is wasted.
Here's what logs consistently fail to tell you — and what you should look at instead.
1. Logs Don't Show Resource Starvation (Until It's Too Late)
Your logs say:
"Request processed successfully"
Your users say:
"The app is slow as hell."
What logs won't tell you:
- Your pods are CPU-throttled
- Your containers are technically "running"
- Your requests are queued, not failing
CPU throttling doesn't crash pods. It stretches latency silently. Kubernetes will happily keep your container alive while it gets 20ms of CPU every 100ms.
By the time logs show timeouts, the damage is already done.
What actually helps
- Container CPU throttling metrics
- Node-level CPU saturation
- Request latency percentiles (not averages)
Hard-earned rule: If latency is high and logs are clean, suspect CPU first.
2. Logs Don't Explain Why Pods Are Restarting
You see:
"Container terminated with exit code 137"
Cool. That tells you nothing useful.
Logs won't tell you:
- Whether the pod was OOM-killed due to node pressure
- Whether kubelet evicted it proactively
- Whether another workload starved it
The container log ends abruptly — because the container never got a chance to log its own death.
What actually helps
- Pod
Last State(OOMKilled vs Evicted) - Node memory pressure events
- Which other pods spiked memory at the same time
Senior mistake I made early: Chasing app bugs when the real issue was node-level memory contention.
3. Logs Don't Show Scheduler Decisions
Your deployment is "stuck".
Logs show:
"Scaled to 10 replicas"
But only 6 pods are running.
Logs won't tell you:
- Why the scheduler can't place the remaining pods
- Which constraints are blocking scheduling
- Whether bin-packing failed silently
The scheduler doesn't log why it rejected nodes in a way that's visible in app logs.
What actually helps
- Pod scheduling events
- Node allocatable vs requested resources
- Affinity and taint conflicts
Hard truth: Most "Kubernetes bugs" during incidents are scheduler math problems.
4. Logs Don't Capture Network Degradation
Logs say:
"Request sent"
They don't say:
- DNS resolution took 800ms
- Packet loss spiked between nodes
- kube-proxy rules exploded
From the app's perspective, nothing failed. From the user's perspective, everything is slow.
Network issues degrade before they break.
What actually helps
- DNS latency metrics
- Node-to-node packet drops
- Connection retry rates
Senior hack: If everything is slow across multiple services, stop reading logs and start looking at DNS and networking.
5. Logs Lie During Rolling Updates
During rollouts, logs are actively misleading.
You'll see:
"Pod started successfully"
What logs won't tell you:
- Readiness probes passed too early
- Traffic hit a pod before caches warmed
- Old pods drained too slowly (or not at all)
The app thinks it's ready. The system isn't.
What actually helps
- Real traffic success rates during rollout
- Readiness delay vs actual readiness
- Load balancer connection draining behavior
Lesson learned the hard way: A green rollout is not the same as a safe rollout.
6. Logs Don't Show Control Plane Pain
Your app logs are quiet. Your cluster is slow.
Logs won't tell you:
- The API server is throttling requests
- Controllers are backlogged
- Watch events are delayed
From the workload's point of view, Kubernetes is the problem — but Kubernetes doesn't log that to your app.
What actually helps
- API server latency
- Request throttling metrics
- Controller reconciliation lag
If kubectl feels slow during an incident, that's a signal, not an annoyance.
7. Logs Don't Tell You What Didn't Happen
This is the most dangerous one.
Logs show what happened. They don't show:
- Requests that never reached the pod
- Pods that never received traffic
- Jobs that never started
Absence of logs is rarely treated as data — but during incidents, it's often the most important clue.
What actually helps
- Traffic metrics vs expected volume
- Request drop rates upstream
- Control plane event gaps
Senior instinct upgrade: When logs are empty, ask what should have logged but didn't.
The Real Lesson: Logs Are a Lagging Signal
Logs are symptoms, not causes.
By the time logs scream:
- Latency is already bad
- Users already noticed
- The incident clock is already running
Senior engineers don't stop using logs — they stop trusting them alone.
What I Check Before Logs Now
Hard rule list I follow in every incident:
- Node CPU & memory pressure
- Pod scheduling events
- Throttling and eviction signals
- DNS and network latency
- API server health
- Traffic vs capacity mismatch
Only then do I read logs — to confirm, not to discover.
Final Take
If your incident response starts and ends with logs, you're debugging too late in the chain.
Kubernetes incidents live in the space between components — scheduler, nodes, network, control plane — and logs were never designed to tell that story.
Logs don't lie. They just don't tell the whole truth.
And in production, half the truth is how outages last longer than they should.