kubernetes/node-problem-detector
A Kubernetes daemon that detects and reports various node problems to the apiserver, making node health visible for improved cluster management.
Core Features
Detailed Introduction
Kubernetes nodes can suffer from various underlying issues such as hardware failures, kernel deadlocks, or unresponsive container runtimes, which often remain invisible to the cluster's scheduling layers. Node-problem-detector addresses this by running as a daemon on each node, actively monitoring for these problems. It then reports critical, persistent issues as `NodeCondition` and temporary, informative ones as `Event` directly to the Kubernetes apiserver. This visibility empowers upstream Kubernetes components to make informed decisions, preventing pods from being scheduled onto unhealthy nodes and enabling more robust cluster operations.