{"id":373,"date":"2026-04-13T20:19:41","date_gmt":"2026-04-13T20:19:41","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/azure-observability-in-foundry-control-plane-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-ai-machine-learning\/"},"modified":"2026-04-13T20:19:41","modified_gmt":"2026-04-13T20:19:41","slug":"azure-observability-in-foundry-control-plane-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-ai-machine-learning","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/azure-observability-in-foundry-control-plane-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-ai-machine-learning\/","title":{"rendered":"Azure Observability in Foundry Control Plane Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for AI + Machine Learning"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p>AI + Machine Learning<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<p>Observability in <strong>Foundry Control Plane<\/strong> in <strong>Azure<\/strong> is about gaining reliable visibility into what the platform is doing when you build, configure, secure, and operate AI systems\u2014especially the <em>management-plane<\/em> actions that create, update, deploy, and govern AI resources.<\/p>\n\n\n\n<p>In simple terms: <strong>it helps you answer \u201cwhat changed, who changed it, when, and what happened next?\u201d<\/strong> for your AI platform setup. This includes tracking administrative operations, policy outcomes, service health signals, and operational logs that explain why an AI environment is healthy, degraded, or failing.<\/p>\n\n\n\n<p>Technically, \u201cObservability in Foundry Control Plane\u201d is not typically a single standalone Azure resource with its own billing meter. Instead, it is best understood as the <strong>set of observability signals and integrations<\/strong> that cover Foundry-related control-plane operations\u2014implemented through standard Azure observability building blocks such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Azure Monitor<\/strong> (metrics, alerts, workbooks)<\/li>\n<li><strong>Azure Monitor Logs \/ Log Analytics<\/strong> (central log store + KQL queries)<\/li>\n<li><strong>Azure Activity Log<\/strong> (subscription-level control-plane events)<\/li>\n<li><strong>Diagnostic settings<\/strong> (routing platform logs to Log Analytics \/ Storage \/ Event Hubs)<\/li>\n<li>Optional integrations like <strong>Microsoft Sentinel<\/strong> (SIEM) and <strong>ITSM<\/strong> connectors<\/li>\n<\/ul>\n\n\n\n<p>What problem does it solve? It reduces the risk and toil caused by \u201cinvisible\u201d platform changes and failures\u2014like unexpected access changes, deployments that don\u2019t take effect, policy blocks, quota issues, or regional incidents\u2014by giving you <strong>auditability, troubleshooting data, and actionable alerts<\/strong> for the AI platform control plane.<\/p>\n\n\n\n<blockquote>\n<p>Naming note (verify in official docs): Microsoft\u2019s AI platform branding has evolved (for example, Azure AI Studio and Azure AI Foundry naming). This tutorial treats <strong>\u201cObservability in Foundry Control Plane\u201d<\/strong> as the <strong>observability scope for Foundry\u2019s management plane<\/strong> and shows how to implement it using current Azure Monitor capabilities. If your tenant uses different portal names, follow the equivalent resources and blades.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is Observability in Foundry Control Plane?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Official purpose (practical definition aligned with Azure)<\/h3>\n\n\n\n<p>Observability in Foundry Control Plane is the practice and implementation of <strong>collecting, centralizing, analyzing, and alerting on control-plane signals<\/strong> related to Foundry-based AI platform resources in Azure.<\/p>\n\n\n\n<p>Because control-plane operations in Azure are fundamentally governed by Azure Resource Manager (ARM), most \u201ccontrol plane observability\u201d relies on:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Azure Activity Log<\/strong> for subscription-level events (create\/update\/delete, RBAC changes, policy actions)<\/li>\n<li><strong>Resource logs<\/strong> (when supported by specific resource types) routed using <strong>Diagnostic settings<\/strong><\/li>\n<li><strong>Service Health \/ Resource Health<\/strong> for platform and regional incidents<\/li>\n<li><strong>Azure Monitor<\/strong> alerts and dashboards to detect, notify, and triage issues<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Core capabilities<\/h3>\n\n\n\n<p>In a Foundry control-plane context, observability typically includes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Audit trail of administrative actions<\/strong><\/li>\n<li>Who created\/updated\/deleted AI resources and configurations<\/li>\n<li>Who changed access, keys, networking, or identity settings<\/li>\n<li><strong>Policy and governance visibility<\/strong><\/li>\n<li>Policy compliance results and \u201cdeny\u201d outcomes<\/li>\n<li>Drift detection for \u201capproved\u201d configurations<\/li>\n<li><strong>Operational troubleshooting<\/strong><\/li>\n<li>Correlating a deployment\/configuration change to an outage<\/li>\n<li>Explaining authorization failures (RBAC), networking blocks, quota failures, or region issues<\/li>\n<li><strong>Alerting and reporting<\/strong><\/li>\n<li>Alerts for suspicious or risky control-plane actions (deletions, public network enablement, key rotations)<\/li>\n<li>Periodic reporting and dashboards for platform operations<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Major components (Azure building blocks)<\/h3>\n\n\n\n<p>The most common components used to implement Observability in Foundry Control Plane are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Azure Activity Log<\/strong> (subscription scope)<\/li>\n<li><strong>Log Analytics workspace<\/strong> (central log store)<\/li>\n<li><strong>Diagnostic settings<\/strong> (routing control-plane logs to sinks)<\/li>\n<li><strong>Azure Monitor Alerts<\/strong> (metric alerts, log alerts)<\/li>\n<li><strong>Azure Monitor Workbooks<\/strong> (dashboards)<\/li>\n<li>Optional:<\/li>\n<li><strong>Microsoft Sentinel<\/strong> (security analytics, incident management)<\/li>\n<li><strong>Event Hubs<\/strong> (stream logs to external platforms)<\/li>\n<li><strong>Storage accounts<\/strong> (long retention\/archival)<\/li>\n<li><strong>Azure Managed Grafana<\/strong> (visualization, when appropriate)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Service type<\/h3>\n\n\n\n<p>Observability in Foundry Control Plane is best viewed as a <strong>solution pattern<\/strong> implemented using Azure\u2019s native observability services. It is not usually purchased as a single SKU.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Scope: regional\/global\/subscription<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Azure Activity Log<\/strong> is <strong>subscription-scoped<\/strong> and not tied to a single region.<\/li>\n<li><strong>Log Analytics workspaces<\/strong> are <strong>regional resources<\/strong> (you choose a region).<\/li>\n<li><strong>Service Health<\/strong> is <strong>global<\/strong> and <strong>tenant\/subscription contextual<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How it fits into the Azure ecosystem (AI + Machine Learning)<\/h3>\n\n\n\n<p>Foundry-based AI systems frequently rely on a mix of services (for example: model endpoints, orchestration, data stores, networking, identity). Foundry control-plane observability connects those operations back to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Identity (Microsoft Entra ID)<\/strong> and <strong>RBAC<\/strong> decisions<\/li>\n<li><strong>ARM deployments<\/strong> (Bicep\/Terraform\/Portal changes)<\/li>\n<li><strong>Policy<\/strong> enforcement (Azure Policy)<\/li>\n<li><strong>Operational governance<\/strong> (tagging, naming, budget alerts, resource locks)<\/li>\n<\/ul>\n\n\n\n<p>This is especially important in <strong>AI + Machine Learning<\/strong>, where misconfiguration can create:\n&#8211; data exposure risks,\n&#8211; runaway costs,\n&#8211; model deployment failures,\n&#8211; compliance gaps.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use Observability in Foundry Control Plane?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Reduce downtime and incident duration<\/strong>: Faster root cause analysis when you can correlate outages with recent control-plane changes.<\/li>\n<li><strong>Lower operational risk<\/strong>: Catch risky actions early (e.g., public network enabled, diagnostic logs disabled, key vault access changed).<\/li>\n<li><strong>Improve audit readiness<\/strong>: Maintain traceability of changes for regulated workloads.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Single source of truth for change events<\/strong>: Centralize control-plane events and resource logs into Log Analytics.<\/li>\n<li><strong>Correlation across services<\/strong>: Track changes across AI resources, networking, identity, and data services in one timeline.<\/li>\n<li><strong>Evidence-based troubleshooting<\/strong>: Replace guesswork with logs and structured events.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons (SRE\/Platform\/DevOps)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Actionable alerting<\/strong>: Notify the right teams on destructive operations, policy denies, or repeated failures.<\/li>\n<li><strong>Operational dashboards<\/strong>: Workbooks for recurring operational questions (who changed what, what failed, what\u2019s trending).<\/li>\n<li><strong>Change management integration<\/strong>: Stream audit logs to SIEM\/ITSM tools.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Detect unauthorized or unexpected changes<\/strong>: RBAC, identity, and network posture changes are common sources of security incidents.<\/li>\n<li><strong>Support least privilege<\/strong>: Use logs to validate that roles and permissions are used as intended.<\/li>\n<li><strong>Retention controls<\/strong>: Store logs to meet regulatory retention requirements (often via Storage or Sentinel).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<p>Control-plane observability helps scaling indirectly:\n&#8211; When you scale AI systems, you create more resources, deployments, and changes. Observability prevents that scale from turning into chaos.\n&#8211; Alerting on throttling\/quota and policy issues helps prevent repeated failed rollouts.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose it<\/h3>\n\n\n\n<p>Choose Observability in Foundry Control Plane when:\n&#8211; You operate AI environments in shared subscriptions or landing zones.\n&#8211; You need audit trails and governance evidence.\n&#8211; Multiple teams deploy models and services frequently.\n&#8211; You must respond to incidents quickly and consistently.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should not choose it<\/h3>\n\n\n\n<p>You may not need a full control-plane observability implementation if:\n&#8211; You are running a short-lived prototype in a sandbox with no compliance requirements.\n&#8211; You have a single developer and minimal change frequency.\n&#8211; You do not retain resources beyond a few days.<\/p>\n\n\n\n<p>Even then, enabling basic Activity Log routing is usually low-effort and pays off quickly.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is Observability in Foundry Control Plane used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Finance and insurance (auditability, change control)<\/li>\n<li>Healthcare and life sciences (compliance, access tracking)<\/li>\n<li>Retail and e-commerce (availability + rapid releases)<\/li>\n<li>Manufacturing (operational reliability, OT\/IT boundaries)<\/li>\n<li>Public sector (policy enforcement, retention requirements)<\/li>\n<li>SaaS\/ISVs building AI features (multi-tenant governance)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Platform engineering teams operating Azure landing zones<\/li>\n<li>SRE\/Operations teams managing incident response<\/li>\n<li>Security engineering and SOC teams<\/li>\n<li>AI\/ML engineering teams deploying models at scale<\/li>\n<li>DevOps teams managing CI\/CD and infrastructure as code<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads and architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Hub-and-spoke networks with private endpoints<\/li>\n<li>Multi-subscription environments with centralized logging<\/li>\n<li>Production AI platforms with strict role separation<\/li>\n<li>Regulated environments using Azure Policy and Sentinel<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Production vs dev\/test<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Dev\/test<\/strong>: Focus on rapid debugging, basic change tracking, cost guardrails.<\/li>\n<li><strong>Production<\/strong>: Add retention, SIEM integration, strict alerting, and governance reporting.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p>Below are realistic scenarios where Observability in Foundry Control Plane is directly useful.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Audit \u201cwho changed the model deployment configuration\u201d<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A model endpoint starts returning errors after a configuration change.<\/li>\n<li><strong>Why this fits<\/strong>: Control-plane logs reveal the change operation, identity, time, and target resource.<\/li>\n<li><strong>Example<\/strong>: An engineer updates a deployment SKU or networking setting; Activity Log shows the update event and the caller.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Alert on destructive actions (delete, purge, disable logging)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Critical AI resources are deleted or logging is turned off.<\/li>\n<li><strong>Why this fits<\/strong>: Log alerts can detect delete operations or diagnostic settings changes.<\/li>\n<li><strong>Example<\/strong>: Alert when a resource delete occurs under AI resource groups, page the on-call team.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Detect RBAC drift and privilege escalation<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Unexpected access grants appear on AI resources or resource groups.<\/li>\n<li><strong>Why this fits<\/strong>: Activity Log captures role assignment changes.<\/li>\n<li><strong>Example<\/strong>: Notify security when \u201cOwner\u201d is assigned to a non-approved group.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Troubleshoot policy denies that block deployments<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A pipeline fails with a vague \u201cforbidden\u201d error.<\/li>\n<li><strong>Why this fits<\/strong>: Policy events and Activity Log entries help identify the policy assignment causing the deny.<\/li>\n<li><strong>Example<\/strong>: A policy requiring private endpoints blocks a deployment; logs show the policy name and assignment scope.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Quota and capacity incident correlation<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Deployments fail intermittently due to quota\/capacity constraints.<\/li>\n<li><strong>Why this fits<\/strong>: Control-plane failure events plus service health context can guide remediation.<\/li>\n<li><strong>Example<\/strong>: Activity Log shows repeated \u201cfailed\u201d create operations; correlate with region service health advisory.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Build an operational \u201cAI platform change timeline\u201d<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Incident reviews require a consistent timeline of changes across resources.<\/li>\n<li><strong>Why this fits<\/strong>: Centralized logs let you query by time range and resource group tags.<\/li>\n<li><strong>Example<\/strong>: A workbook shows all create\/update\/delete operations in the last 24 hours for the AI platform.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Multi-team governance reporting (chargeback\/showback support)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Leadership asks which teams are creating AI resources and whether they follow standards.<\/li>\n<li><strong>Why this fits<\/strong>: Control-plane logs + tags provide evidence for reporting.<\/li>\n<li><strong>Example<\/strong>: Report top resource creators per subscription and whether tagging policies were satisfied.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Incident response automation with Sentinel<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: SOC needs to detect suspicious admin activity and open incidents.<\/li>\n<li><strong>Why this fits<\/strong>: Stream logs to Microsoft Sentinel for correlation and automated response.<\/li>\n<li><strong>Example<\/strong>: Sentinel rule triggers when multiple role changes happen outside business hours.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Validate infrastructure-as-code deployments<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: You want proof that a CI\/CD pipeline applied the intended changes.<\/li>\n<li><strong>Why this fits<\/strong>: Activity Log shows deployment operations and outcomes.<\/li>\n<li><strong>Example<\/strong>: Confirm that a Bicep deployment updated diagnostic settings and network rules.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Prove compliance for regulated AI environments<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Auditors require evidence of access control, retention, and change tracking.<\/li>\n<li><strong>Why this fits<\/strong>: Centralized logs + retention policies + audit trails support compliance.<\/li>\n<li><strong>Example<\/strong>: Provide evidence of RBAC changes, key rotations, and policy compliance over time.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<p>Because Observability in Foundry Control Plane is typically implemented using Azure Monitor primitives, the \u201cfeatures\u201d are best described as the capabilities you enable.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 1: Subscription-level control-plane event capture (Azure Activity Log)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Captures administrative events such as create\/update\/delete operations, RBAC changes, policy actions, and service health notifications at subscription scope.<\/li>\n<li><strong>Why it matters<\/strong>: Most critical AI platform incidents involve \u201cwhat changed\u201d in the control plane.<\/li>\n<li><strong>Practical benefit<\/strong>: A single timeline for changes across Foundry-related resources.<\/li>\n<li><strong>Caveat<\/strong>: Activity Log retention in the portal is limited; for longer retention you must export via diagnostic settings.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 2: Diagnostic settings routing to Log Analytics \/ Storage \/ Event Hubs<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Exports logs to one or more sinks for retention, analysis, or streaming.<\/li>\n<li><strong>Why it matters<\/strong>: Centralization is required for cross-resource correlation and alerting.<\/li>\n<li><strong>Practical benefit<\/strong>: Query across subscriptions\/workloads; store long term; feed SIEM.<\/li>\n<li><strong>Caveat<\/strong>: Not every resource type exposes the same resource logs\/metrics categories. Verify per resource in Azure portal.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 3: Centralized log search and analytics (Log Analytics + KQL)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Stores logs and allows querying with Kusto Query Language (KQL).<\/li>\n<li><strong>Why it matters<\/strong>: Control-plane troubleshooting often needs filtering by caller, operation, resource, status, correlation ID.<\/li>\n<li><strong>Practical benefit<\/strong>: Fast investigations and reusable queries for SRE runbooks.<\/li>\n<li><strong>Caveat<\/strong>: Costs depend on ingestion and retention; implement filters\/retention tiers carefully.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 4: Alerting on risky or anomalous control-plane events (Azure Monitor Alerts)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Generates notifications\/incidents when queries match conditions (log alerts) or metrics cross thresholds.<\/li>\n<li><strong>Why it matters<\/strong>: You shouldn\u2019t learn about deletions, access changes, or policy denies from users.<\/li>\n<li><strong>Practical benefit<\/strong>: Proactive operations and security response.<\/li>\n<li><strong>Caveat<\/strong>: Poorly tuned alerts create noise. Start with a small set of high-signal detections.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 5: Dashboards and reporting (Azure Monitor Workbooks)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Visualizes queries and metrics with parameterized dashboards.<\/li>\n<li><strong>Why it matters<\/strong>: Platform operations need repeatable \u201cdaily view\u201d dashboards.<\/li>\n<li><strong>Practical benefit<\/strong>: Self-service visibility for engineers and stakeholders.<\/li>\n<li><strong>Caveat<\/strong>: Workbooks are only as good as the underlying log hygiene (tags, consistent scopes, routed logs).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 6: Service Health \/ Resource Health integration<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Provides Azure platform incident notifications and per-resource health signals.<\/li>\n<li><strong>Why it matters<\/strong>: Separates \u201cour change broke it\u201d from \u201cAzure incident is impacting it.\u201d<\/li>\n<li><strong>Practical benefit<\/strong>: Faster triage and clearer comms during outages.<\/li>\n<li><strong>Caveat<\/strong>: Health signals are not a substitute for your app\/data-plane monitoring\u2014use both.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 7: Governance visibility (Policy + Activity Log + optional compliance reporting)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Shows what policies were evaluated, denied, or remediated.<\/li>\n<li><strong>Why it matters<\/strong>: Foundry control plane often must enforce private networking, encryption, tagging, and restricted SKUs.<\/li>\n<li><strong>Practical benefit<\/strong>: Clear evidence of enforcement and drift.<\/li>\n<li><strong>Caveat<\/strong>: Policy event coverage and details vary by resource provider and policy effect. Validate policy logging behavior.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 8: Security analytics via SIEM (optional Microsoft Sentinel)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Correlates events, applies detections, and manages incidents.<\/li>\n<li><strong>Why it matters<\/strong>: AI platforms are high-value targets; admin actions are high-signal events.<\/li>\n<li><strong>Practical benefit<\/strong>: SOC-ready detections and incident workflows.<\/li>\n<li><strong>Caveat<\/strong>: Additional cost and operational ownership required; don\u2019t forward everything without a plan.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level architecture<\/h3>\n\n\n\n<p>Observability in Foundry Control Plane follows a straightforward pattern:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Control-plane events occur<\/strong> whenever someone or something (portal, CLI, IaC pipeline) performs ARM operations on Foundry-related resources.<\/li>\n<li>Azure emits:\n   &#8211; <strong>Activity Log events<\/strong> at subscription scope\n   &#8211; Optional <strong>resource logs\/metrics<\/strong> for specific resources (where supported)<\/li>\n<li><strong>Diagnostic settings<\/strong> export these signals to:\n   &#8211; <strong>Log Analytics<\/strong> for query\/alert\/dashboard\n   &#8211; <strong>Storage<\/strong> for archival\/retention\n   &#8211; <strong>Event Hubs<\/strong> for streaming to third-party tools<\/li>\n<li><strong>Azure Monitor<\/strong> evaluates alert rules and triggers notifications\/actions.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Request\/data\/control flow (what flows where)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Control flow<\/strong>: User\/CI \u2192 ARM \u2192 Resource Provider (AI\/ML services)<\/li>\n<li><strong>Telemetry flow<\/strong>:<\/li>\n<li>ARM writes <strong>Activity Log events<\/strong><\/li>\n<li>Resource Provider may emit <strong>resource logs\/metrics<\/strong><\/li>\n<li>Diagnostic settings route telemetry to Log Analytics \/ Storage \/ Event Hubs<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Integrations with related services<\/h3>\n\n\n\n<p>Common integrations in Azure AI + Machine Learning environments include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Microsoft Entra ID<\/strong>: identity and authentication to Azure<\/li>\n<li><strong>Azure RBAC<\/strong>: authorization decisions<\/li>\n<li><strong>Azure Policy<\/strong>: governance controls; policy deny events affect deployments<\/li>\n<li><strong>Private Link \/ Private Endpoints<\/strong>: networking posture; changes are critical to observe<\/li>\n<li><strong>Key Vault<\/strong>: secrets and keys; access changes should be monitored<\/li>\n<li><strong>Azure DevOps \/ GitHub Actions<\/strong>: IaC pipelines generating control-plane events<\/li>\n<li><strong>Microsoft Sentinel<\/strong>: SIEM for high-value control-plane events<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services<\/h3>\n\n\n\n<p>To implement this pattern you typically need:\n&#8211; Log Analytics workspace\n&#8211; Azure Monitor alert rules\n&#8211; Diagnostic settings at subscription\/resource scope<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Authentication uses <strong>Microsoft Entra ID<\/strong><\/li>\n<li>Authorization uses <strong>Azure RBAC<\/strong><\/li>\n<li>Access to logs is governed by:<\/li>\n<li>Log Analytics workspace RBAC (Log Analytics Reader\/Contributor)<\/li>\n<li>Azure Monitor roles (Monitoring Reader\/Contributor)<\/li>\n<li>Azure subscription\/Resource Group RBAC<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Activity Log and Log Analytics are Azure services accessed over Azure\u2019s public endpoints by default.<\/li>\n<li>You can harden access with:<\/li>\n<li>Private Link options (availability varies by service\u2014verify in official docs)<\/li>\n<li>Network restrictions and firewall rules where supported<\/li>\n<li>Restricting who can read logs via RBAC, rather than relying only on network controls<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring\/logging\/governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Decide <strong>which subscriptions<\/strong> and <strong>which resource groups<\/strong> represent Foundry platform boundaries.<\/li>\n<li>Standardize:<\/li>\n<li>naming conventions (to filter queries)<\/li>\n<li>tagging (owner, env, cost center)<\/li>\n<li>retention strategy (hot vs archive)<\/li>\n<li>Treat \u201cdisable diagnostic settings\u201d as a high-severity event\u2014alert on it.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  U[Engineer \/ CI Pipeline] --&gt; ARM[Azure Resource Manager]\n  ARM --&gt; RP[Foundry-related Resource Providers]\n  ARM --&gt; AL[Azure Activity Log]\n  RP --&gt; RL[Resource Logs \/ Metrics&lt;br\/&gt;(when supported)]\n\n  AL --&gt; DS[Diagnostic Settings]\n  RL --&gt; DS\n\n  DS --&gt; LAW[Log Analytics Workspace]\n  LAW --&gt; AM[Azure Monitor Alerts]\n  LAW --&gt; WB[Workbooks \/ Dashboards]\n  AM --&gt; N[Notifications \/ ITSM \/ Webhook]\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph Management[\"Management &amp; Governance\"]\n    AAD[Microsoft Entra ID]\n    RBAC[Azure RBAC]\n    POL[Azure Policy]\n    SH[Service Health \/ Resource Health]\n  end\n\n  subgraph Platform[\"Foundry Platform Subscriptions\"]\n    CI[GitHub Actions \/ Azure DevOps]\n    ARM[Azure Resource Manager]\n    AI[AI + ML Resources&lt;br\/&gt;(Foundry-related)]\n    KV[Key Vault]\n    NET[Networking&lt;br\/&gt;(VNet\/Private Endpoints)]\n  end\n\n  subgraph Observability[\"Central Observability Subscription\"]\n    DS[Diagnostic Settings&lt;br\/&gt;(Subscription + Resource)]\n    LAW[Log Analytics Workspace]\n    STO[Storage Account (Archive)]\n    EH[Event Hubs (Streaming)]\n    WB[Azure Monitor Workbooks]\n    ALRT[Azure Monitor Alerts]\n    SENT[Microsoft Sentinel (Optional)]\n  end\n\n  CI --&gt; ARM\n  ARM --&gt; AI\n  ARM --&gt; KV\n  ARM --&gt; NET\n\n  AAD --&gt; ARM\n  RBAC --&gt; ARM\n  POL --&gt; ARM\n\n  ARM --&gt;|Control-plane events| DS\n  AI --&gt;|Resource logs\/metrics| DS\n  SH --&gt; DS\n\n  DS --&gt; LAW\n  DS --&gt; STO\n  DS --&gt; EH\n\n  LAW --&gt; WB\n  LAW --&gt; ALRT\n  LAW --&gt; SENT\n  EH --&gt; SENT\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Account\/subscription requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An <strong>Azure subscription<\/strong> where you can:<\/li>\n<li>Configure diagnostic settings at the subscription level, and\/or<\/li>\n<li>Configure diagnostic settings on Foundry-related resources<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions (IAM roles)<\/h3>\n\n\n\n<p>Typical minimum roles (scope varies by where you configure things):\n&#8211; To create a Log Analytics workspace: <strong>Contributor<\/strong> on a resource group (or higher)\n&#8211; To configure diagnostic settings:\n  &#8211; <strong>Owner<\/strong> or <strong>Contributor<\/strong> at the subscription\/resource scope is commonly required\n  &#8211; Some environments use a dedicated role with monitoring permissions; verify your org\u2019s RBAC model\n&#8211; To query logs: <strong>Log Analytics Reader<\/strong>\n&#8211; To create alerts: <strong>Monitoring Contributor<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Billing requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A payment method enabled for Azure Monitor Logs ingestion\/retention and alerting.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Tools<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Azure Portal<\/li>\n<li>Azure CLI (<code>az<\/code>)<br\/>\n  Install: https:\/\/learn.microsoft.com\/cli\/azure\/install-azure-cli<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Log Analytics workspace is regional; choose a region consistent with your data residency requirements.<\/li>\n<li>Activity Log is subscription-level and not tied to one region.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas\/limits (verify in official docs)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Log Analytics ingestion\/retention constraints<\/li>\n<li>Alert rules per subscription\/workspace limits<\/li>\n<li>Diagnostic settings per resource limits<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Azure Monitor<\/li>\n<li>Log Analytics workspace (recommended for the lab)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<p>Observability in Foundry Control Plane is priced through the Azure services you use to store, query, and act on telemetry\u2014not usually as a standalone \u201cFoundry observability\u201d SKU.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Primary pricing dimensions (what you pay for)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Azure Monitor Logs (Log Analytics)<\/strong>\n   &#8211; Data ingestion (GB\/day)\n   &#8211; Retention (days stored in the workspace)\n   &#8211; Optional archive and restore costs (where used)<\/li>\n<li><strong>Alerting<\/strong>\n   &#8211; Log alerts may have charges depending on alert type and evaluation frequency (verify current Azure Monitor pricing details).<\/li>\n<li><strong>Data export \/ streaming<\/strong>\n   &#8211; Event Hubs throughput units and retention (if streaming)\n   &#8211; Storage costs for archived logs (capacity + transactions)<\/li>\n<li><strong>SIEM (optional)<\/strong>\n   &#8211; Microsoft Sentinel charges (typically based on data ingestion\/retention)<\/li>\n<\/ol>\n\n\n\n<p>Official pricing:\n&#8211; Azure Monitor pricing: https:\/\/azure.microsoft.com\/pricing\/details\/monitor\/\n&#8211; Azure Pricing Calculator: https:\/\/azure.microsoft.com\/pricing\/calculator\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Free tier (what may be free)<\/h3>\n\n\n\n<p>Azure pricing changes over time. Some aspects that are commonly \u201cincluded\u201d or low-cost:\n&#8211; Viewing recent Activity Log entries in the portal (limited retention)\n&#8211; Some basic platform logs may not incur additional charges until exported\/ingested<\/p>\n\n\n\n<p><strong>Verify in official docs<\/strong> for the current free allowances for Log Analytics ingestion and retention in your region.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Cost drivers (most important)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>High-volume Activity Log export<\/strong> across many subscriptions<\/li>\n<li><strong>Verbose resource logs<\/strong> exported at high frequency<\/li>\n<li><strong>Long retention periods<\/strong> kept in hot storage<\/li>\n<li><strong>Unfiltered logs<\/strong> streamed to multiple sinks (Log Analytics + Event Hubs + Storage)<\/li>\n<li><strong>Noisy alerts<\/strong> evaluated too frequently<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Hidden or indirect costs<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cross-team access<\/strong>: More users querying logs may increase operational load (not a direct cost, but real toil).<\/li>\n<li><strong>Data egress<\/strong>: Streaming to third-party tools may incur network charges depending on architecture.<\/li>\n<li><strong>Retention compliance<\/strong>: Long-term retention in hot tier can be expensive; storage archive patterns may be cheaper.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network\/data transfer implications<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Exporting logs to Event Hubs and then to non-Azure tools can introduce egress charges.<\/li>\n<li>Centralized logging across regions may create additional complexity. Prefer regionally aligned workspaces where required by policy.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost (without losing auditability)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Start with <strong>Activity Log export<\/strong> only; add resource logs selectively.<\/li>\n<li>Use <strong>short hot retention<\/strong> in Log Analytics + <strong>archive to Storage<\/strong> for long retention (verify the recommended approach in current Azure docs).<\/li>\n<li>Reduce alert frequency; use high-signal conditions.<\/li>\n<li>Use KQL to focus on:<\/li>\n<li>specific resource groups<\/li>\n<li>specific operation names (delete, write, role assignments)<\/li>\n<li>failures only (when appropriate)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (model, not numbers)<\/h3>\n\n\n\n<p>A minimal setup for a small team:\n&#8211; 1 Log Analytics workspace\n&#8211; Subscription Activity Log routed to workspace\n&#8211; 2\u20135 log alerts (delete operations, RBAC changes)\nMain cost components:\n&#8211; Workspace ingestion from Activity Log volume\n&#8211; Retention days chosen\n&#8211; Alerts evaluation frequency<\/p>\n\n\n\n<p>Because the exact price per GB and alert charges vary by region and plan, <strong>use the Azure Pricing Calculator<\/strong> to estimate with your expected GB\/day.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p>In production, cost planning should include:\n&#8211; Central workspace per region or per landing zone\n&#8211; Storage archive for multi-year retention\n&#8211; Sentinel (if SOC required)\n&#8211; Event Hubs streaming to enterprise SIEM\n&#8211; Multiple workbooks and alerts\n&#8211; Budget alerts and cost anomaly detection (FinOps)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<p>Implement a practical baseline for <strong>Observability in Foundry Control Plane<\/strong> by:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Creating a <strong>Log Analytics workspace<\/strong><\/li>\n<li>Exporting <strong>Azure Activity Log<\/strong> to that workspace (subscription-level control-plane visibility)<\/li>\n<li>Running <strong>KQL queries<\/strong> to inspect Foundry-related control-plane events (by filtering to AI\/ML resource providers)<\/li>\n<li>Creating a basic <strong>alert<\/strong> for a high-risk control-plane action<\/li>\n<li>Cleaning up to avoid ongoing cost<\/li>\n<\/ol>\n\n\n\n<p>This lab is designed to be safe and low-cost. You will generate only a small number of control-plane events.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p>You will:\n&#8211; Create a resource group and Log Analytics workspace.\n&#8211; Configure a <strong>subscription diagnostic setting<\/strong> to send Activity Logs to Log Analytics.\n&#8211; Generate a control-plane event by creating and deleting a small Azure resource (you can use a minimal AI\/ML-related resource <em>if available in your subscription<\/em>; otherwise any resource will still validate the pipeline).\n&#8211; Query Activity Log data in Log Analytics.\n&#8211; Create an alert for delete operations.<\/p>\n\n\n\n<blockquote>\n<p>Note: Foundry-specific resource types vary by tenant and by how your organization provisions AI services. The Activity Log approach still applies because it captures ARM operations across resource providers.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Create a resource group<\/h3>\n\n\n\n<p><strong>Action (Azure CLI):<\/strong><\/p>\n\n\n\n<pre><code class=\"language-bash\">az account show\naz group create \\\n  --name rg-foundry-observability-lab \\\n  --location eastus\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong>\n&#8211; A resource group named <code>rg-foundry-observability-lab<\/code> exists.<\/p>\n\n\n\n<p><strong>Verification:<\/strong><\/p>\n\n\n\n<pre><code class=\"language-bash\">az group show --name rg-foundry-observability-lab --query \"{name:name, location:location}\" -o table\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create a Log Analytics workspace<\/h3>\n\n\n\n<p><strong>Action (Azure CLI):<\/strong><\/p>\n\n\n\n<pre><code class=\"language-bash\">az monitor log-analytics workspace create \\\n  --resource-group rg-foundry-observability-lab \\\n  --workspace-name law-foundry-obsv-lab \\\n  --location eastus\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong>\n&#8211; A Log Analytics workspace is created.<\/p>\n\n\n\n<p><strong>Verification:<\/strong><\/p>\n\n\n\n<pre><code class=\"language-bash\">az monitor log-analytics workspace show \\\n  --resource-group rg-foundry-observability-lab \\\n  --workspace-name law-foundry-obsv-lab \\\n  --query \"{name:name, customerId:customerId, location:location}\" -o table\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Export Azure Activity Log to Log Analytics (subscription diagnostic setting)<\/h3>\n\n\n\n<p>This is the key step for <strong>control-plane observability<\/strong>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Option A (recommended): Azure Portal<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to <strong>Monitor<\/strong> in the Azure portal.<\/li>\n<li>Navigate to <strong>Activity log<\/strong>.<\/li>\n<li>Select <strong>Export Activity Logs<\/strong> (or <strong>Diagnostic settings<\/strong> depending on portal layout).<\/li>\n<li>Create a diagnostic setting:\n   &#8211; Destination: <strong>Send to Log Analytics workspace<\/strong>\n   &#8211; Select your workspace: <code>law-foundry-obsv-lab<\/code>\n   &#8211; Categories to include (typical baseline):<ul>\n<li>Administrative<\/li>\n<li>Policy<\/li>\n<li>Security<\/li>\n<li>ServiceHealth<\/li>\n<li>ResourceHealth<\/li>\n<li>Alert (if available)<\/li>\n<li>Recommendation (if available)<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong>\n&#8211; A diagnostic setting exists for the subscription that exports Activity Logs to Log Analytics.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Option B: Azure CLI (if available in your environment)<\/h4>\n\n\n\n<p>Azure CLI support for subscription diagnostic settings can vary by CLI version\/extension. If the following commands fail, use the portal.<\/p>\n\n\n\n<p>1) Get your subscription ID:<\/p>\n\n\n\n<pre><code class=\"language-bash\">SUB_ID=$(az account show --query id -o tsv)\necho $SUB_ID\n<\/code><\/pre>\n\n\n\n<p>2) Create the subscription diagnostic setting (command group may vary; verify in official docs if it differs):<\/p>\n\n\n\n<pre><code class=\"language-bash\">az monitor diagnostic-settings subscription create \\\n  --name ds-activitylog-to-law \\\n  --subscription $SUB_ID \\\n  --workspace law-foundry-obsv-lab \\\n  --resource-group rg-foundry-observability-lab \\\n  --logs '[\n    {\"category\":\"Administrative\",\"enabled\":true},\n    {\"category\":\"Policy\",\"enabled\":true},\n    {\"category\":\"Security\",\"enabled\":true},\n    {\"category\":\"ServiceHealth\",\"enabled\":true},\n    {\"category\":\"ResourceHealth\",\"enabled\":true}\n  ]'\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong>\n&#8211; Activity Log events begin flowing to Log Analytics (may take a few minutes).<\/p>\n\n\n\n<p><strong>Verification (Portal):<\/strong>\n&#8211; Go to the workspace \u2192 <strong>Logs<\/strong> \u2192 run a query (next step).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Generate a control-plane event<\/h3>\n\n\n\n<p>To validate end-to-end, create a small resource. If your subscription allows provisioning an AI\/ML resource you normally use with Foundry, prefer that (because it will generate provider-specific events). If not, any Azure resource will still prove the control-plane logging pipeline.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Example (safe and generally available): Create and delete a Storage account<\/h4>\n\n\n\n<p>Storage is not \u201cAI\u201d, but the Activity Log pipeline is identical and confirms your setup.<\/p>\n\n\n\n<pre><code class=\"language-bash\">az storage account create \\\n  --name stfoundryobsv$RANDOM \\\n  --resource-group rg-foundry-observability-lab \\\n  --location eastus \\\n  --sku Standard_LRS\n<\/code><\/pre>\n\n\n\n<p>Wait ~1\u20133 minutes, then delete it:<\/p>\n\n\n\n<pre><code class=\"language-bash\">ST_NAME=$(az storage account list -g rg-foundry-observability-lab --query \"[0].name\" -o tsv)\naz storage account delete --name $ST_NAME --resource-group rg-foundry-observability-lab --yes\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong>\n&#8211; You generated at least two Activity Log events: a create and a delete.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Query Activity Log data in Log Analytics (KQL)<\/h3>\n\n\n\n<p>In Azure portal:\n1. Open your Log Analytics workspace: <code>law-foundry-obsv-lab<\/code>\n2. Select <strong>Logs<\/strong>\n3. Run this query:<\/p>\n\n\n\n<pre><code class=\"language-kusto\">AzureActivity\n| where TimeGenerated &gt; ago(1h)\n| project TimeGenerated, OperationNameValue, ActivityStatusValue, Caller, ResourceGroup, ResourceProviderValue, ResourceId\n| order by TimeGenerated desc\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong>\n&#8211; You see recent control-plane operations, including your storage create\/delete (or other resource operations).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Filter to AI\/ML-related providers (examples)<\/h4>\n\n\n\n<p>Depending on what you use with Foundry, you might filter to providers like these. Use what matches your environment.<\/p>\n\n\n\n<pre><code class=\"language-kusto\">AzureActivity\n| where TimeGenerated &gt; ago(24h)\n| where ResourceProviderValue has_any (\"Microsoft.MachineLearningServices\", \"Microsoft.CognitiveServices\")\n| project TimeGenerated, OperationNameValue, ActivityStatusValue, Caller, ResourceGroup, ResourceId\n| order by TimeGenerated desc\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong>\n&#8211; If you have AI\/ML resources and recent operations, you\u2019ll see them here.\n&#8211; If not, you\u2019ll get zero results\u2014meaning you need to generate an AI\/ML operation in your subscription to validate provider-specific coverage.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Create a high-signal alert for delete operations<\/h3>\n\n\n\n<p>A practical baseline alert is: <strong>any delete operation in your Foundry platform resource group(s)<\/strong>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Create a log alert (Portal method)<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to <strong>Monitor<\/strong> \u2192 <strong>Alerts<\/strong> \u2192 <strong>Create<\/strong> \u2192 <strong>Alert rule<\/strong><\/li>\n<li>Scope: select your <strong>Log Analytics workspace<\/strong><\/li>\n<li>Condition: <strong>Custom log search<\/strong><\/li>\n<li>Use this query:<\/li>\n<\/ol>\n\n\n\n<pre><code class=\"language-kusto\">AzureActivity\n| where TimeGenerated &gt; ago(10m)\n| where OperationNameValue endswith \"\/delete\"\n| where ResourceGroup == \"rg-foundry-observability-lab\"\n<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\" start=\"5\">\n<li>Set:\n   &#8211; Evaluation frequency: e.g., 5 minutes\n   &#8211; Lookback period: e.g., 10 minutes\n   &#8211; Threshold: greater than 0<\/li>\n<li>Action group: email yourself (and\/or webhook\/ITSM connector)<\/li>\n<li>Name: <code>alert-delete-ops-rg-foundry-observability-lab<\/code><\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong>\n&#8211; If a delete happens in the resource group, the alert fires.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p>Use this checklist:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Activity Log export enabled<\/strong>\n   &#8211; Portal: Monitor \u2192 Activity log \u2192 Export\/Diagnostic settings shows your Log Analytics destination.<\/li>\n<li><strong>Data arriving in Log Analytics<\/strong>\n   &#8211; <code>AzureActivity | where TimeGenerated &gt; ago(1h)<\/code> returns rows.<\/li>\n<li><strong>Alert rule created and enabled<\/strong>\n   &#8211; Monitor \u2192 Alerts shows the rule as enabled.<\/li>\n<li><strong>Test alert<\/strong>\n   &#8211; Delete a small resource in the lab resource group and confirm the alert triggers.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<p><strong>Issue: <code>AzureActivity<\/code> table has no data<\/strong>\n&#8211; Wait 5\u201315 minutes after enabling export.\n&#8211; Confirm diagnostic setting is configured at the <strong>subscription<\/strong> level, not only on a resource.\n&#8211; Ensure you selected relevant categories (Administrative is essential).<\/p>\n\n\n\n<p><strong>Issue: Permission denied creating diagnostic settings<\/strong>\n&#8211; You likely need <strong>Owner<\/strong> or <strong>Contributor<\/strong> at subscription scope (or a role that includes <code>Microsoft.Insights\/diagnosticSettings\/*<\/code>).\n&#8211; In locked-down environments, request help from the platform team.<\/p>\n\n\n\n<p><strong>Issue: Alert never fires<\/strong>\n&#8211; Confirm your query is correct:\n  &#8211; Use a larger time window temporarily (e.g., <code>ago(1h)<\/code>).\n  &#8211; Remove the ResourceGroup filter to confirm delete operations appear.\n&#8211; Confirm the alert evaluation period\/frequency matches your query window.<\/p>\n\n\n\n<p><strong>Issue: Too many alerts (noise)<\/strong>\n&#8211; Narrow by:\n  &#8211; Resource group(s) for Foundry platform\n  &#8211; Specific operation names (role assignments, delete, write)\n  &#8211; Only failed operations (<code>ActivityStatusValue != \"Succeeded\"<\/code>)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p>To avoid ongoing charges, remove what you created.<\/p>\n\n\n\n<p>1) Delete the lab resource group (deletes workspace and any remaining lab resources):<\/p>\n\n\n\n<pre><code class=\"language-bash\">az group delete --name rg-foundry-observability-lab --yes --no-wait\n<\/code><\/pre>\n\n\n\n<p>2) Remove the subscription diagnostic setting (if you created one)\n&#8211; Portal: Monitor \u2192 Activity Log \u2192 Export\/Diagnostic settings \u2192 delete the setting<br\/>\n  or use CLI if available (command patterns vary\u2014verify in official docs).<\/p>\n\n\n\n<p>3) Remove alert rules created for the lab:\n&#8211; Portal: Monitor \u2192 Alerts \u2192 Alert rules \u2192 delete the lab rule<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Centralize logs<\/strong> by landing zone or platform subscription to enable cross-resource correlation.<\/li>\n<li>Use a <strong>tiered retention strategy<\/strong>:<\/li>\n<li>Hot retention in Log Analytics for active investigations<\/li>\n<li>Archive in Storage for long-term compliance (verify best practice in current Azure docs)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Restrict who can:<\/li>\n<li>change diagnostic settings,<\/li>\n<li>delete workspaces,<\/li>\n<li>disable alerts.<\/li>\n<li>Use <strong>separation of duties<\/strong>:<\/li>\n<li>Platform team owns export pipelines and workspaces<\/li>\n<li>App\/ML teams have reader access and create team-level workbooks (where appropriate)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Export what you need:<\/li>\n<li>Start with Activity Log categories: Administrative, Policy, Security<\/li>\n<li>Add resource logs selectively<\/li>\n<li>Tune alerts for <strong>signal<\/strong>, not completeness.<\/li>\n<li>Use budgets and cost alerts for observability resources too (workspaces can grow unexpectedly).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer focused queries (time-bounded, filtered by resource group\/provider).<\/li>\n<li>Build \u201cinvestigation queries\u201d as saved queries or workbook components.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Alert if diagnostic settings are removed or modified (control-plane observability must be protected).<\/li>\n<li>Use resource locks on critical logging resources (carefully\u2014locks can block legitimate changes).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Maintain an on-call runbook:<\/li>\n<li>where to look first (Activity Log timeline),<\/li>\n<li>key KQL queries,<\/li>\n<li>escalation paths (platform vs Azure incident).<\/li>\n<li>Create a standard workbook for:<\/li>\n<li>recent changes,<\/li>\n<li>failed operations,<\/li>\n<li>RBAC changes,<\/li>\n<li>policy denies.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Standardize tags like:<\/li>\n<li><code>env<\/code> (dev\/test\/prod)<\/li>\n<li><code>owner<\/code><\/li>\n<li><code>costCenter<\/code><\/li>\n<li><code>dataClassification<\/code><\/li>\n<li>Use consistent resource group naming for Foundry platform boundaries (makes queries and alerts precise).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Control-plane actions authenticate via <strong>Microsoft Entra ID<\/strong>.<\/li>\n<li>Authorization is enforced by <strong>Azure RBAC<\/strong> (and sometimes resource-specific roles).<\/li>\n<li>Log access is also RBAC-controlled:<\/li>\n<li>Use least privilege for Log Analytics readers.<\/li>\n<li>Restrict write permissions to avoid tampering.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Azure services encrypt data at rest by default (verify specifics for Log Analytics and Storage in current docs).<\/li>\n<li>For archives in Storage, consider:<\/li>\n<li>encryption keys (Microsoft-managed vs customer-managed keys), if required by policy.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Treat observability endpoints as sensitive:<\/li>\n<li>They can reveal resource names, IDs, and operational details.<\/li>\n<li>Prefer RBAC restrictions as the primary control.<\/li>\n<li>Where available\/required, evaluate private connectivity options (verify support per service).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid embedding secrets in alert webhooks or automation scripts.<\/li>\n<li>Store secrets in <strong>Azure Key Vault<\/strong> and use managed identities for automation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Your observability pipeline itself must be observable:<\/li>\n<li>Alert when diagnostic settings are changed.<\/li>\n<li>Alert when the workspace is deleted (activity log events).<\/li>\n<li>Consider streaming to Sentinel for tamper-resistant security operations (with proper governance).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Define retention requirements (e.g., 90 days hot, 1\u20137 years archive) based on your regulatory obligations.<\/li>\n<li>Ensure logs do not violate data residency rules\u2014choose workspace region accordingly.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Allowing too many users to modify diagnostic settings<\/li>\n<li>Storing logs only in short-retention default views<\/li>\n<li>Not alerting on role assignment changes<\/li>\n<li>Not separating production and non-production logging workspaces<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use IaC (Bicep\/Terraform) to define:<\/li>\n<li>diagnostic settings<\/li>\n<li>workspaces<\/li>\n<li>alerts<\/li>\n<li>action groups<\/li>\n<li>Apply policy to require diagnostic settings on critical resource types (verify feasibility per resource provider).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Not all resources emit the same logs<\/strong>: Some Foundry-related services may have limited resource logs. Always check the resource\u2019s <strong>Diagnostic settings<\/strong> categories.<\/li>\n<li><strong>Activity Log is necessary but not sufficient<\/strong>: It shows control-plane operations, not application\/data-plane telemetry (e.g., model inference latency inside your app).<\/li>\n<li><strong>Retention defaults can be short<\/strong>: Relying only on portal views risks losing critical evidence.<\/li>\n<li><strong>Alert noise is easy to create<\/strong>: Without filters, you\u2019ll overwhelm responders with low-signal events.<\/li>\n<li><strong>CLI\/portal differences<\/strong>: Some diagnostic setting operations are easier in the portal; CLI support can vary by version. Use the portal if commands don\u2019t match your environment.<\/li>\n<li><strong>Costs can grow quietly<\/strong>: Log ingestion increases with organizational scale and change frequency. Implement budgets and periodic reviews.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<p>Observability in Foundry Control Plane is a control-plane-focused approach. Here\u2019s how it compares with nearby options.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Observability in Foundry Control Plane (Azure Monitor + Activity Log + Log Analytics)<\/strong><\/td>\n<td>Auditing and operating Foundry management plane<\/td>\n<td>Strong change tracking, native Azure integration, flexible KQL<\/td>\n<td>Doesn\u2019t automatically cover app\/data-plane telemetry<\/td>\n<td>When you need governance, audit trails, and control-plane alerting<\/td>\n<\/tr>\n<tr>\n<td><strong>Azure Monitor (general)<\/strong><\/td>\n<td>Broad monitoring across Azure<\/td>\n<td>Standard platform for metrics\/logs\/alerts<\/td>\n<td>Requires design to cover Foundry boundaries<\/td>\n<td>When you want a unified monitoring strategy<\/td>\n<\/tr>\n<tr>\n<td><strong>Application Insights (app telemetry)<\/strong><\/td>\n<td>Application performance monitoring<\/td>\n<td>Traces, dependencies, distributed tracing for apps<\/td>\n<td>Not a control-plane audit trail<\/td>\n<td>When you need app-level observability for AI apps (APIs, RAG services)<\/td>\n<\/tr>\n<tr>\n<td><strong>Microsoft Sentinel<\/strong><\/td>\n<td>Security operations and incident response<\/td>\n<td>SIEM\/SOAR, correlation, detections<\/td>\n<td>Additional cost\/ops overhead<\/td>\n<td>When SOC needs detections for admin actions and suspicious changes<\/td>\n<\/tr>\n<tr>\n<td><strong>AWS CloudTrail + CloudWatch<\/strong><\/td>\n<td>AWS control-plane observability<\/td>\n<td>Mature change\/audit tracking<\/td>\n<td>Different cloud; not Azure-native<\/td>\n<td>If your AI platform runs on AWS<\/td>\n<\/tr>\n<tr>\n<td><strong>GCP Cloud Audit Logs + Cloud Monitoring<\/strong><\/td>\n<td>GCP control-plane observability<\/td>\n<td>Strong audit logs<\/td>\n<td>Different cloud; not Azure-native<\/td>\n<td>If your AI platform runs on GCP<\/td>\n<\/tr>\n<tr>\n<td><strong>Datadog \/ Splunk (self-managed or SaaS)<\/strong><\/td>\n<td>Cross-cloud enterprise observability<\/td>\n<td>Powerful search\/correlation<\/td>\n<td>Cost, integration complexity, data residency concerns<\/td>\n<td>When you need a unified multi-cloud observability layer<\/td>\n<\/tr>\n<tr>\n<td><strong>Prometheus\/Grafana (self-managed)<\/strong><\/td>\n<td>Metrics-focused observability<\/td>\n<td>Open ecosystem<\/td>\n<td>Control-plane audit coverage is not the focus<\/td>\n<td>When you primarily need metrics and have platform maturity<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example (regulated)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A financial services company runs AI workloads with strict governance. Auditors require proof of change control for AI platform resources, and incidents must be triaged quickly.<\/li>\n<li><strong>Proposed architecture<\/strong>:<\/li>\n<li>Subscription Activity Log exported to a central Log Analytics workspace<\/li>\n<li>Resource logs enabled on critical AI, networking, and Key Vault resources (where supported)<\/li>\n<li>Workbooks for \u201cchange timeline\u201d, \u201cRBAC changes\u201d, \u201cpolicy denies\u201d<\/li>\n<li>Alerts on delete operations, role assignment changes, and diagnostic setting changes<\/li>\n<li>Optional Microsoft Sentinel for SOC detections and incident workflows<\/li>\n<li><strong>Why this service was chosen<\/strong>: Observability in Foundry Control Plane aligns with Azure-native governance and audit requirements and integrates with existing Azure Monitor and security tooling.<\/li>\n<li><strong>Expected outcomes<\/strong>:<\/li>\n<li>Faster audits (repeatable evidence)<\/li>\n<li>Lower MTTR through change correlation<\/li>\n<li>Reduced risk of unauthorized configuration drift<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A small SaaS team ships AI features weekly. A few outages were caused by accidental config changes and lack of visibility.<\/li>\n<li><strong>Proposed architecture<\/strong>:<\/li>\n<li>Single Log Analytics workspace<\/li>\n<li>Activity Log export enabled<\/li>\n<li>3 log alerts: deletes, role assignment changes, repeated failed writes<\/li>\n<li>One workbook showing last 7 days of changes<\/li>\n<li><strong>Why this service was chosen<\/strong>: Minimal setup effort, low operational overhead, and immediate value from change visibility.<\/li>\n<li><strong>Expected outcomes<\/strong>:<\/li>\n<li>Quick identification of \u201cwhat changed\u201d<\/li>\n<li>Better on-call experience with fewer blind spots<\/li>\n<li>Cost-controlled logging with short retention<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1) What does \u201ccontrol plane\u201d mean in Foundry Control Plane observability?<\/h3>\n\n\n\n<p>Control plane refers to <strong>management operations<\/strong> (create\/update\/delete\/configure) executed through Azure Resource Manager. It is different from data plane traffic such as application requests to your AI endpoint.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2) Is Observability in Foundry Control Plane a standalone Azure product?<\/h3>\n\n\n\n<p>Usually no. It is commonly implemented using <strong>Azure Monitor, Activity Log, Log Analytics, diagnostic settings, and alerts<\/strong>. Verify your organization\u2019s Foundry documentation for any Foundry-specific dashboards or integrations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3) What is the first thing to enable?<\/h3>\n\n\n\n<p>Enable <strong>subscription Activity Log export<\/strong> to a Log Analytics workspace. It provides immediate, broad control-plane visibility.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4) Does this replace application monitoring for AI apps?<\/h3>\n\n\n\n<p>No. Control-plane observability explains platform changes. You still need application\/data-plane monitoring (often with Application Insights and distributed tracing).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5) How long does Activity Log data take to appear in Log Analytics?<\/h3>\n\n\n\n<p>Typically minutes, but delays can occur. If you see no data after 15 minutes, re-check diagnostic settings and permissions.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">6) Which events are most important to alert on?<\/h3>\n\n\n\n<p>Start with high-signal events:\n&#8211; delete operations\n&#8211; role assignment changes\n&#8211; policy denies affecting deployments\n&#8211; diagnostic setting modifications<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">7) Can I route logs to both Log Analytics and Storage?<\/h3>\n\n\n\n<p>Yes, diagnostic settings often support multiple sinks. This is common for \u201chot search\u201d in Log Analytics plus long-term archive in Storage.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">8) What\u2019s the difference between Activity Log and resource logs?<\/h3>\n\n\n\n<p>Activity Log is <strong>subscription-level control-plane events<\/strong>. Resource logs are <strong>resource-specific telemetry<\/strong> exposed via diagnostic settings (varies by resource type).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">9) How do I prove \u201cwho changed what\u201d during an incident?<\/h3>\n\n\n\n<p>Use Activity Log records in Log Analytics, filtering by time range, resource group, and operation. The <code>Caller<\/code> field is commonly used to identify the actor.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">10) How do I protect observability from tampering?<\/h3>\n\n\n\n<p>Use RBAC to restrict modification of diagnostic settings and workspaces. Consider alerts when diagnostic settings change and use resource locks where appropriate.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">11) Do I need Microsoft Sentinel?<\/h3>\n\n\n\n<p>Not always. Sentinel is beneficial when you need SOC workflows, correlation, and incident management. Many teams start with Azure Monitor and add Sentinel later.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">12) Will this increase my Azure bill significantly?<\/h3>\n\n\n\n<p>It can, depending on log volume and retention. The main cost drivers are Log Analytics ingestion and retention. Start small, measure GB\/day, and optimize.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">13) Can I use this across multiple subscriptions?<\/h3>\n\n\n\n<p>Yes. Many organizations export logs from multiple subscriptions into central workspaces (or per-region workspaces) to support a platform view.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">14) What KQL table should I query for control-plane events?<\/h3>\n\n\n\n<p>If you export Activity Log to Log Analytics, you\u2019ll typically query the <code>AzureActivity<\/code> table.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">15) What if Foundry resource logs aren\u2019t available?<\/h3>\n\n\n\n<p>Rely on Activity Log (control plane) plus health signals and policy logs. For deeper telemetry, implement data-plane observability in your applications and AI services.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn Observability in Foundry Control Plane<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official documentation<\/td>\n<td>Azure Monitor overview: https:\/\/learn.microsoft.com\/azure\/azure-monitor\/overview<\/td>\n<td>Foundation for metrics, logs, alerts, and visualization in Azure<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>Azure Activity log: https:\/\/learn.microsoft.com\/azure\/azure-monitor\/essentials\/activity-log<\/td>\n<td>Core control-plane event source for subscriptions<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>Diagnostic settings: https:\/\/learn.microsoft.com\/azure\/azure-monitor\/essentials\/diagnostic-settings<\/td>\n<td>How to route Activity Log and resource logs to Log Analytics\/Storage\/Event Hubs<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>Log Analytics workspace overview: https:\/\/learn.microsoft.com\/azure\/azure-monitor\/logs\/log-analytics-workspace-overview<\/td>\n<td>How to design and operate a workspace<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>KQL query overview: https:\/\/learn.microsoft.com\/azure\/azure-monitor\/logs\/log-query-overview<\/td>\n<td>How to query control-plane logs effectively<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>Azure Monitor alerts: https:\/\/learn.microsoft.com\/azure\/azure-monitor\/alerts\/alerts-overview<\/td>\n<td>How to create actionable alerts from logs\/metrics<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>Azure Monitor workbooks: https:\/\/learn.microsoft.com\/azure\/azure-monitor\/visualize\/workbooks-overview<\/td>\n<td>How to build dashboards for operations<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>Azure Service Health: https:\/\/learn.microsoft.com\/azure\/service-health\/overview<\/td>\n<td>Platform incident visibility for triage<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>Microsoft Sentinel overview: https:\/\/learn.microsoft.com\/azure\/sentinel\/overview<\/td>\n<td>SIEM\/SOAR option for security-driven observability<\/td>\n<\/tr>\n<tr>\n<td>Official pricing<\/td>\n<td>Azure Monitor pricing: https:\/\/azure.microsoft.com\/pricing\/details\/monitor\/<\/td>\n<td>Understand ingestion\/retention\/alerting cost model<\/td>\n<\/tr>\n<tr>\n<td>Official tool<\/td>\n<td>Azure Pricing Calculator: https:\/\/azure.microsoft.com\/pricing\/calculator\/<\/td>\n<td>Build region-specific cost estimates<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<blockquote>\n<p>Foundry-specific documentation links can change with product naming. If you cannot find \u201cFoundry Control Plane\u201d by that name, search Microsoft Learn for \u201cAzure AI Foundry\u201d + \u201cmonitoring\u201d + \u201cdiagnostic settings\u201d and use the relevant resource provider documentation.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps engineers, SREs, platform teams<\/td>\n<td>Azure operations, monitoring, DevOps practices, CI\/CD integration<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Beginners to intermediate engineers<\/td>\n<td>DevOps fundamentals, tooling, process, and governance<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud operations teams<\/td>\n<td>Cloud operations patterns, monitoring, reliability<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs, reliability engineers<\/td>\n<td>SRE principles, alerting strategy, incident response<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops + AI teams<\/td>\n<td>AIOps concepts, monitoring automation, operational analytics<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>DevOps\/Cloud training content (verify scope)<\/td>\n<td>Beginners to intermediate<\/td>\n<td>https:\/\/www.rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps training and coaching (verify offerings)<\/td>\n<td>DevOps engineers, SREs<\/td>\n<td>https:\/\/www.devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps help\/training (verify offerings)<\/td>\n<td>Teams needing short-term support<\/td>\n<td>https:\/\/www.devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support and guidance (verify offerings)<\/td>\n<td>Ops teams and engineers<\/td>\n<td>https:\/\/www.devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps consulting (verify exact services)<\/td>\n<td>Platform engineering, operational readiness, monitoring foundations<\/td>\n<td>Central logging design, alerting standards, IaC observability rollout<\/td>\n<td>https:\/\/www.cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps consulting and training<\/td>\n<td>DevOps\/SRE enablement, monitoring practices<\/td>\n<td>Implement Azure Monitor baselines, dashboards, CI\/CD guardrails<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting (verify exact services)<\/td>\n<td>DevOps processes, automation, operations<\/td>\n<td>Activity log export rollout, RBAC governance, incident response runbooks<\/td>\n<td>https:\/\/www.devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before this service<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Azure fundamentals: subscriptions, resource groups, regions<\/li>\n<li>Microsoft Entra ID basics and Azure RBAC<\/li>\n<li>Azure Resource Manager concepts (deployments, resource providers)<\/li>\n<li>Azure Monitor basics (metrics vs logs, diagnostic settings)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after this service<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Advanced KQL (joins, parsing, workbook parameters)<\/li>\n<li>Microsoft Sentinel detections and incident workflows<\/li>\n<li>IaC-based monitoring (Bicep\/Terraform modules for diagnostic settings and alerts)<\/li>\n<li>Data-plane observability for AI apps:<\/li>\n<li>Application Insights<\/li>\n<li>OpenTelemetry tracing patterns<\/li>\n<li>SLOs\/SLIs and error budgets<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use it<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud engineer \/ platform engineer<\/li>\n<li>DevOps engineer<\/li>\n<li>SRE<\/li>\n<li>Security engineer \/ SOC analyst<\/li>\n<li>Solutions architect (AI platform governance)<\/li>\n<li>FinOps practitioner (logging cost governance)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (Azure)<\/h3>\n\n\n\n<p>There is no single \u201cFoundry control plane observability\u201d certification. Helpful Microsoft certifications (verify current names\/availability):\n&#8211; Azure Administrator (operations and monitoring foundations)\n&#8211; Azure Security Engineer (security monitoring and governance)\n&#8211; Azure Solutions Architect (architecture and platform design)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Build a \u201cFoundry platform change timeline\u201d workbook (last 7\/30\/90 days)<\/li>\n<li>Create an alert pack:<\/li>\n<li>role assignment changes<\/li>\n<li>delete operations<\/li>\n<li>diagnostic settings changes<\/li>\n<li>repeated failed write operations<\/li>\n<li>Implement a multi-subscription export pattern with standardized retention and tags<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Observability<\/strong>: The ability to understand a system\u2019s internal state using outputs like logs, metrics, and traces.<\/li>\n<li><strong>Control plane<\/strong>: Management operations (create\/update\/delete\/configure) performed via Azure Resource Manager.<\/li>\n<li><strong>Data plane<\/strong>: Runtime operations (e.g., application requests, model inference calls).<\/li>\n<li><strong>Azure Activity Log<\/strong>: Subscription-level log of control-plane events.<\/li>\n<li><strong>Diagnostic settings<\/strong>: Azure mechanism to route logs\/metrics to Log Analytics, Storage, or Event Hubs.<\/li>\n<li><strong>Log Analytics workspace<\/strong>: Azure Monitor Logs store for querying and retention.<\/li>\n<li><strong>KQL (Kusto Query Language)<\/strong>: Query language used for Azure Monitor Logs.<\/li>\n<li><strong>Azure Monitor<\/strong>: Azure\u2019s platform for metrics, logs, alerts, and dashboards.<\/li>\n<li><strong>Workbook<\/strong>: Azure Monitor visualization artifact built from queries and parameters.<\/li>\n<li><strong>Alert rule<\/strong>: Condition that triggers notifications\/actions based on logs or metrics.<\/li>\n<li><strong>Action group<\/strong>: Notification and automation targets for alert rules (email, webhook, ITSM, etc.).<\/li>\n<li><strong>RBAC<\/strong>: Role-Based Access Control for authorization in Azure.<\/li>\n<li><strong>Azure Policy<\/strong>: Governance service for enforcing rules and compliance.<\/li>\n<li><strong>Service Health<\/strong>: Azure service providing incident and maintenance notifications.<\/li>\n<li><strong>Resource Health<\/strong>: Health status for a specific Azure resource.<\/li>\n<li><strong>SIEM<\/strong>: Security Information and Event Management system (e.g., Microsoft Sentinel).<\/li>\n<li><strong>Retention<\/strong>: How long logs are stored and searchable.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p>Observability in <strong>Foundry Control Plane<\/strong> (Azure) is the discipline of capturing and operationalizing <strong>control-plane telemetry<\/strong>\u2014especially Activity Logs, diagnostic exports, queries, dashboards, and alerts\u2014so you can reliably answer what changed, who changed it, and how it impacted your AI platform.<\/p>\n\n\n\n<p>It matters because AI + Machine Learning platforms are configuration-heavy and security-sensitive; control-plane visibility reduces outages, improves audit readiness, and strengthens governance. Cost is primarily driven by <strong>Log Analytics ingestion and retention<\/strong>, plus optional SIEM and streaming. Security hinges on <strong>RBAC<\/strong>, protecting diagnostic settings, and alerting on risky admin actions.<\/p>\n\n\n\n<p>Use it when you operate Foundry-based AI environments beyond basic prototypes\u2014especially in shared, regulated, or fast-changing production platforms. Next step: expand from baseline Activity Log export to <strong>targeted resource logs<\/strong>, <strong>workbooks<\/strong>, and <strong>high-signal alert packs<\/strong>, then integrate with <strong>Microsoft Sentinel<\/strong> if SOC workflows are required.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI + Machine Learning<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,40],"tags":[],"class_list":["post-373","post","type-post","status-publish","format-standard","hentry","category-ai-machine-learning","category-azure"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/373","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=373"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/373\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=373"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=373"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=373"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}