- Published on
Kubernetes Pod Keeps Restarting for no Apparent Reason
- Authors
- Name
- Yair Mark
- @yairmark
Today I was busy deploying changes to an environment and I noticed that a pod had been restarted many times, was in a CrashLoopBackOff state and was currently not ready 0/1
. I looked at the previous logs of the pod to try debug this further:
kubectl logs your-app-873240921-b87cg -p | less
I could not see any errors in the logs but I did see that the app logged out that it was shutting down. All other pods were fine so this did not seem to be a cluster wide problem. To dig further I described the pod:
kubectl describe pod your-app-873240921-b87cg
This output something similar to the below (most of the output has been left out for brevity, the important bits are at the bottom):
Name: your-app-873240921-b87cg
Namespace: your-namespace
...
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned your-app-873240921-b87cg to your-company2
Normal SuccessfulMountVolume 11m kubelet, your-company2 MountVolume.SetUp succeeded for volume "default-token-nkpqp"
Normal Pulling 11m kubelet, your-company2 pulling image "your.company/your.company/your-app:b512288"
Normal Pulled 10m kubelet, your-company2 Successfully pulled image "your.company/your.company/your-app:b512288"
Normal Created 8m (x2 over 10m) kubelet, your-company2 Created container
Normal Started 8m (x2 over 10m) kubelet, your-company2 Started container
Warning Unhealthy 7m (x5 over 9m) kubelet, your-company2 Liveness probe failed: Get http://1.2.3.4:8000/management/health: dial tcp 1.2.3.4:8000: getsockopt: connection refused
Normal Killing 7m (x2 over 8m) kubelet, your-company2 Killing container with id docker://your-app-app:Container failed liveness probe.. Container will be killed and recreated.
Normal Pulled 7m (x2 over 8m) kubelet, your-company2 Container image "your.company/your.company/your-app:b512288" already present on machine
Warning Unhealthy 5m (x14 over 9m) kubelet, your-company2 Readiness probe failed: Get http://1.2.3.4:8000/management/health: dial tcp 1.2.3.4:8000: getsockopt: connection refused
Warning FailedSync 1m (x7 over 2m) kubelet, your-company2 Error syncing pod
Looking at the events section we can see clearly that the Liveness probe failed which resulted in this app being killed as Kubernetes tried to recover it:
Liveness probe failed: Get http://1.2.3.4:8000/management/health: dial tcp 1.2.3.4:8000: getsockopt: connection refused
There is also clearly some other issue with the app as the readiness probe also fails which results in the app continuously being restarted:
Readiness probe failed: Get http://1.2.3.4:8000/management/health: dial tcp 1.2.3.4:8000: getsockopt: connection refused
This lead us in the right direction and resulted in us looking at what was causing the app's probes to fail. In our case we increased the pod's hardware and upped the Liveness probes time a bit which did the trick. The app appeared to be taking longer to startup as it did not have enough CPU.