{"id":76216,"date":"2026-05-25T10:39:52","date_gmt":"2026-05-25T10:39:52","guid":{"rendered":"https:\/\/www.devopsschool.com\/blog\/?p=76216"},"modified":"2026-05-25T10:39:54","modified_gmt":"2026-05-25T10:39:54","slug":"why-every-devops-team-needs-an-ai-red-teaming-strategy","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/blog\/why-every-devops-team-needs-an-ai-red-teaming-strategy\/","title":{"rendered":"Why Every DevOps Team Needs an AI Red Teaming Strategy"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2026\/05\/image-10-1024x683.jpeg\" alt=\"\" class=\"wp-image-76217\" srcset=\"https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2026\/05\/image-10-1024x683.jpeg 1024w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2026\/05\/image-10-300x200.jpeg 300w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2026\/05\/image-10-768x512.jpeg 768w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2026\/05\/image-10-1536x1024.jpeg 1536w, https:\/\/www.devopsschool.com\/blog\/wp-content\/uploads\/2026\/05\/image-10.jpeg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Source: DepositPhotos<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">AI agents are already being connected to internal APIs, ticketing systems, cloud infrastructure, and deployment workflows. In many environments, they also interact with customer data, internal documentation, and operational tooling, often with relatively broad permissions, because overly restrictive access can slow adoption and create friction for engineering teams.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Most DevSecOps pipelines were designed around predictable application behavior. Teams scan dependencies, validate infrastructure-as-code templates, harden containers, review IAM permissions, and block known vulnerabilities before deployment. Those workflows still matter, but AI systems behave differently once they begin interacting with live environments, external context, and user-generated prompts.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A model may pass every CI\/CD validation step and still behave unsafely later because of prompt manipulation, chained instructions, retrieval context, or unexpected interactions with connected tools. As a result, more engineering teams are spending time testing runtime behavior instead of relying entirely on pre-deployment validation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Traditional Security Testing Does Not Fully Cover AI Behavior&nbsp;<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Most application security tooling focuses on code, infrastructure, and known vulnerability patterns. That works well for conventional software because execution paths are usually deterministic and easier to validate before release.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">AI systems exhibit much less predictable behavior because their responses depend heavily on prompts, memory, external data sources, and access to tools. An internal AI assistant connected to Slack, Jira, or cloud environments may technically operate within approved permissions while still exposing sensitive information or performing actions developers never intended during implementation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is one reason more engineering teams are evaluating <a href=\"https:\/\/www.checkpoint.com\/ai-security\/ai-red-teaming\/\">AI red teaming solutions<\/a> before deploying AI systems into production. The focus is increasingly shifting toward understanding how the model behaves under adversarial or unexpected conditions rather than only validating the surrounding infrastructure.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">AI Red Teaming Focuses on Runtime Decisions<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Traditional penetration testing usually targets exposed infrastructure, authentication weaknesses, privilege escalation paths, or vulnerable services. AI red teaming focuses much more heavily on how models and agents behave when their normal assumptions break down.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Teams intentionally test scenarios involving prompt injection, unsafe instruction chaining, data leakage, tool misuse, and attempts to bypass restrictions built into the orchestration layer. The idea is to observe how the system reacts to inputs or contextual signals that developers did not anticipate during normal testing.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This becomes much more important with agentic systems that can automatically interact with APIs, infrastructure, deployment tooling, or internal operational systems. Many unsafe actions still appear technically legitimate from an infrastructure perspective because authentication succeeds, permissions are validated correctly, and API requests look normal. In those cases, the problem is usually the model\u2019s reasoning path and contextual interpretation rather than the infrastructure itself.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">NIST recently organized a <a href=\"https:\/\/www.nist.gov\/blogs\/caisi-research-blog\/insights-ai-agent-security-large-scale-red-teaming-competition\">large-scale public competition focused on red teaming AI agents<\/a> to evaluate how modern AI agents behave under adversarial conditions. One recurring pattern involved agents failing due to contextual manipulation and chained actions rather than obvious infrastructure vulnerabilities, which closely aligns with what many DevOps teams are already seeing internally.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Runtime Validation Is Becoming Part of AI Operations<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Static validation catches infrastructure and dependency issues fairly well, but AI systems often behave differently once they start interacting with real users, production data, and external tools. Teams that only test models before deployment usually quickly discover that runtime behavior varies with prompts, retrieval pipelines, orchestration logic, and connected services.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Because of that, more organizations are combining adversarial testing with runtime telemetry, behavioral monitoring, and policy enforcement around what agents can access and execute. Some teams now apply infrastructure-level restrictions around agent permissions regardless of what the model attempts to do, while others monitor for abnormal patterns such as unexpected API usage or unusual sequences of actions.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This operational model starts to look much closer to runtime governance and observability than to traditional application security scanning. Instead of treating AI validation as a one-time release checkpoint, teams increasingly handle it as a continuous operational process tied directly to production behavior.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">AI Red Teaming Fits Naturally Into Existing DevSecOps Workflows<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Most mature DevOps teams already understand the operational workflow behind this type of testing. Teams test the system, identify unsafe behavior, reproduce the issue, patch it, retest, and continue monitoring over time.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The main difference is that the testing target now includes model behavior, not just infrastructure posture or application code. Teams already trying to <a href=\"https:\/\/www.devopsschool.com\/blog\/devsecops-in-action-strengthening-security-without-slowing-down-development\/\">embed security testing throughout the development lifecycle<\/a> usually adapt fairly quickly because the underlying engineering process itself remains familiar.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The larger adjustment is understanding that deployment is no longer the final security checkpoint. With AI systems, some of the most important validation happens after the model begins interacting with live environments, real users, and connected operational systems.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DevOps Teams Are Becoming Responsible for AI Runtime Safety<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">One noticeable shift over the last year is that DevOps teams increasingly own the operational behavior of AI systems running in production environments. Infrastructure reliability alone is no longer enough because teams also need visibility into how models behave when interacting with users, APIs, internal data sources, and automated workflows.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Traditional monitoring can confirm that services remain available and infrastructure stays healthy, but it does not necessarily explain whether an autonomous agent is operating safely under real-world conditions. As more organizations deploy AI agents deeper into operational workflows, runtime testing and behavioral validation are gradually becoming part of standard engineering and security practices rather than isolated research exercises.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Source: DepositPhotos AI agents are already being connected to internal APIs, ticketing systems, cloud infrastructure, and deployment workflows. In many environments, they also interact with customer data,&#8230; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_joinchat":[],"footnotes":""},"categories":[11138],"tags":[],"class_list":["post-76216","post","type-post","status-publish","format-standard","hentry","category-best-tools"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/76216","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=76216"}],"version-history":[{"count":1,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/76216\/revisions"}],"predecessor-version":[{"id":76218,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/76216\/revisions\/76218"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=76216"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=76216"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=76216"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}