Count of pods per cluster and namespace. Having a list of how many pods your namespaces have in your cluster can be useful for detecting an unusually high or low number of pods on your namespaces.
sum by (namespace) (kube_pod_info)Code language: PHP (php)Number of containers by cluster and namespace without CPU limits. Setting the right limits and requests in your cluster is essential in optimizing application and cluster performance. This query detects containers with no CPU limits.
count by (namespace)(sum by (namespace,pod,container)(kube_pod_container_info{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"}))Code language: JavaScript (javascript)Pod restarts by namespace. With this query, you’ll get all the pods that have been restarting. This is really important since a high pod restart rate usually means CrashLoopBackOff.
sum by (namespace)(changes(kube_pod_status_ready{condition="true"}[5m]))Code language: JavaScript (javascript)Pods not ready. This query lists all of the Pods with any kind of issue. This could be the first step for troubleshooting a situation.
sum by (namespace) (kube_pod_status_ready{condition="false"})
Code language: JavaScript (javascript)CPU overcommit. CPU limits over the capacity of the cluster is a scenario you need to avoid. Otherwise, you’ll end up with CPU throttling issues. You can detect CPU overcommit with the following query.
sum(kube_pod_container_resource_limits{resource="cpu"}) - sum(kube_node_status_capacity_cpu_cores)Code language: JavaScript (javascript)Memory overcommit. Memory limits over the capacity of the cluster could end up in PodEviction if a node is running out of memory. Be aware of this situation with this PromQL query.
sum(kube_pod_container_resource_limits{resource="memory"}) - sum (kube_node_status_capacity_memory_bytes)
Code language: JavaScript (javascript)Number of ready nodes per cluster. List the number of nodes available in each cluster.
sum(kube_node_status_condition{condition="Ready", status="true"}==1)Code language: JavaScript (javascript)Nodes readiness flapping. Identify nodes flapping between the ready and not ready state.
sum(changes(kube_node_status_condition{status="true",condition="Ready"}[15m])) by (node) > 2Code language: JavaScript (javascript)CPU idle by cluster. Computing capacity is one of the most delicate things to configure, and it’s one of the fundamental steps when performing Kubernetes capacity planning. With this query, you can detect how many CPU cores are underutilized.
sum((rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[30m]) - on (namespace,pod,container) group_left avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="cpu"})) * -1 >0)Code language: JavaScript (javascript)Memory idle by cluster. Save money detecting how much requested memory is underutilized in your cluster by using this query.
sum((container_memory_usage_bytes{container!="POD",container!=""} - on (namespace,pod,container) avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="memory"})) * -1 >0 ) / (1024*1024*1024)Code language: JavaScript (javascript)I’m a DevOps/SRE/DevSecOps/Cloud Expert passionate about sharing knowledge and experiences. I have worked at Cotocus. I share tech blog at DevOps School, travel stories at Holiday Landmark, stock market tips at Stocks Mantra, health and fitness guidance at My Medic Plus, product reviews at TrueReviewNow , and SEO strategies at Wizbrand.
Do you want to learn Quantum Computing?
Please find my social handles as below;
Rajesh Kumar Personal Website
Rajesh Kumar at YOUTUBE
Rajesh Kumar at INSTAGRAM
Rajesh Kumar at X
Rajesh Kumar at FACEBOOK
Rajesh Kumar at LINKEDIN
Rajesh Kumar at WIZBRAND
 
