{"id":23938,"date":"2021-09-28T03:31:09","date_gmt":"2021-09-28T03:31:09","guid":{"rendered":"https:\/\/www.devopsschool.com\/blog\/?p=23938"},"modified":"2025-01-23T12:45:26","modified_gmt":"2025-01-23T12:45:26","slug":"prometheus-promql-example-query-monitoring-kubernetes","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/blog\/prometheus-promql-example-query-monitoring-kubernetes\/","title":{"rendered":"Prometheus PromQL Example Query: Monitoring Kubernetes"},"content":{"rendered":"\n<p>Count of pods per cluster and namespace. Having a list of how many pods your namespaces have in your cluster can be useful for detecting an unusually high or low number of pods on your namespaces.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\">sum by (<span class=\"hljs-keyword\">namespace<\/span>) (<span class=\"hljs-title\">kube_pod_info<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>Number of containers by cluster and namespace without CPU limits. Setting the right limits and requests in your cluster is essential in optimizing application and cluster performance. This query detects containers with no CPU limits.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">count by (namespace)(sum by (namespace,pod,container)(kube_pod_container_info{container!=<span class=\"hljs-string\">\"\"<\/span>}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource=<span class=\"hljs-string\">\"cpu\"<\/span>}))<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>Pod restarts by namespace. With this query, you\u2019ll get all the pods that have been restarting. This is really important since a high pod restart rate usually means CrashLoopBackOff.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">sum by (namespace)(changes(kube_pod_status_ready{condition=<span class=\"hljs-string\">\"true\"<\/span>}&#91;<span class=\"hljs-number\">5<\/span>m]))<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>Pods not ready. This query lists all of the Pods with any kind of issue. This could be the first step for troubleshooting a situation.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">sum by (namespace) (kube_pod_status_ready{condition=<span class=\"hljs-string\">\"false\"<\/span>})\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>CPU overcommit. CPU limits over the capacity of the cluster is a scenario you need to avoid. Otherwise, you\u2019ll end up with CPU throttling issues. You can detect CPU overcommit with the following query.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">sum(kube_pod_container_resource_limits{resource=<span class=\"hljs-string\">\"cpu\"<\/span>}) - sum(kube_node_status_capacity_cpu_cores)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>Memory overcommit. Memory limits over the capacity of the cluster could end up in PodEviction if a node is running out of memory. Be aware of this situation with this PromQL query.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">sum(kube_pod_container_resource_limits{resource=<span class=\"hljs-string\">\"memory\"<\/span>}) - sum (kube_node_status_capacity_memory_bytes)\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>Number of ready nodes per cluster. List the number of nodes available in each cluster.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">sum(kube_node_status_condition{condition=<span class=\"hljs-string\">\"Ready\"<\/span>, status=<span class=\"hljs-string\">\"true\"<\/span>}==<span class=\"hljs-number\">1<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>Nodes readiness flapping. Identify nodes flapping between the ready and not ready state.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-8\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">sum(changes(kube_node_status_condition{status=<span class=\"hljs-string\">\"true\"<\/span>,condition=<span class=\"hljs-string\">\"Ready\"<\/span>}&#91;<span class=\"hljs-number\">15<\/span>m])) by (node) &gt; <span class=\"hljs-number\">2<\/span><\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-8\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>CPU idle by cluster. Computing capacity is one of the most delicate things to configure, and it\u2019s one of the fundamental steps when performing Kubernetes capacity planning. With this query, you can detect how many CPU cores are underutilized.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-9\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">sum((rate(container_cpu_usage_seconds_total{container!=<span class=\"hljs-string\">\"POD\"<\/span>,container!=<span class=\"hljs-string\">\"\"<\/span>}&#91;<span class=\"hljs-number\">30<\/span>m]) - on (namespace,pod,container) group_left avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource=<span class=\"hljs-string\">\"cpu\"<\/span>})) * <span class=\"hljs-number\">-1<\/span> &gt;<span class=\"hljs-number\">0<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-9\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>Memory idle by cluster. Save money detecting how much requested memory is underutilized in your cluster by using this query.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-10\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">sum((container_memory_usage_bytes{container!=<span class=\"hljs-string\">\"POD\"<\/span>,container!=<span class=\"hljs-string\">\"\"<\/span>} - on (namespace,pod,container) avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource=<span class=\"hljs-string\">\"memory\"<\/span>})) * <span class=\"hljs-number\">-1<\/span> &gt;<span class=\"hljs-number\">0<\/span> ) \/ (<span class=\"hljs-number\">1024<\/span>*<span class=\"hljs-number\">1024<\/span>*<span class=\"hljs-number\">1024<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-10\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<hr class=\"wp-block-separator\"\/>\n","protected":false},"excerpt":{"rendered":"<p>Count of pods per cluster and namespace. Having a list of how many pods your namespaces have in your cluster can be useful for detecting an unusually high or low&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_joinchat":[],"footnotes":""},"categories":[4859],"tags":[],"class_list":["post-23938","post","type-post","status-publish","format-standard","hentry","category-kubernetes"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/23938","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=23938"}],"version-history":[{"count":2,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/23938\/revisions"}],"predecessor-version":[{"id":23941,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/23938\/revisions\/23941"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=23938"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=23938"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=23938"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}