Improving Kubernetes reliability: quicker detection of a Node down | Fatal failure - 0 views
-
when a Node gets down, the pods of the broken node are still running for some time and they still get requests, and those requests, will fail.
-
1- The Kubelet posts its status to the masters using –node-status-update-frequency=10s 2- A node dies 3- The kube controller manager is the one monitoring the nodes, using –-node-monitor-period=5s it checks, in the masters, the node status reported by the Kubelet. 4- Kube controller manager will see the node is unresponsive, and has this grace period –node-monitor-grace-period=40s until it considers the node unhealthy.
-
node-status-update-frequency x (N-1) != node-monitor-grace-period
- ...2 more annotations...