{"id":358,"date":"2026-04-13T19:08:29","date_gmt":"2026-04-13T19:08:29","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/azure-openai-in-foundry-models-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-ai-machine-learning\/"},"modified":"2026-04-13T19:08:29","modified_gmt":"2026-04-13T19:08:29","slug":"azure-openai-in-foundry-models-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-ai-machine-learning","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/azure-openai-in-foundry-models-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-ai-machine-learning\/","title":{"rendered":"Azure OpenAI in Foundry Models Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for AI + Machine Learning"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">AI + Machine Learning<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What this service is<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Azure OpenAI in Foundry Models<\/strong> is the experience of using <strong>Azure OpenAI models<\/strong> (such as chat, reasoning, and embedding models) through the <strong>Azure AI Foundry<\/strong> \u201cModels\u201d workflow\u2014where you discover models, deploy them, test them in a playground, and integrate them into applications with Azure-native security, governance, and operations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">One-paragraph simple explanation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">If you want to add high-quality generative AI (chatbots, summarization, extraction, embeddings for search, etc.) to an application using Azure, <strong>Azure OpenAI in Foundry Models<\/strong> is the practical path: pick a model from the Foundry model catalog, deploy it to your Azure OpenAI resource, test prompts, and then call the deployment from your code using a secure endpoint.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">One-paragraph technical explanation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Technically, your application calls an <strong>Azure OpenAI deployment endpoint<\/strong> using HTTPS. <strong>Azure AI Foundry (Foundry Models)<\/strong> provides the model discovery and deployment experience, while the underlying inference endpoint is served by <strong>Azure OpenAI<\/strong> (an Azure AI service). Authentication is typically via API key or Microsoft Entra ID (Azure AD) depending on your setup and supported auth mode. You can integrate network controls (Private Link, disabling public access), diagnostics to Log Analytics, and governance (Azure Policy, RBAC, tags) to operate safely at scale.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What problem it solves<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Teams need generative AI that is:\n&#8211; <strong>Production-ready<\/strong> (SLA\/quotas\/monitoring\/governance)\n&#8211; <strong>Secure by design<\/strong> (Azure identity, private networking, logging)\n&#8211; <strong>Operationally manageable<\/strong> (cost controls, rate limits, retries, deployments)\n&#8211; <strong>Easy to adopt<\/strong> (model catalog + playground + code samples)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Azure OpenAI in Foundry Models solves the gap between \u201ca model you can demo\u201d and \u201ca model you can run reliably in an enterprise Azure environment.\u201d<\/p>\n\n\n\n<blockquote>\n<p>Naming note (verify in official docs): Microsoft introduced <strong>Azure AI Foundry<\/strong> as the evolution of <strong>Azure AI Studio<\/strong>. The \u201cFoundry Models \/ model catalog\u201d experience is part of Azure AI Foundry, while <strong>Azure OpenAI Service<\/strong> remains the underlying service that hosts model deployments. If your tenant UI still shows \u201cAzure AI Studio,\u201d the steps are similar but labels may differ.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is Azure OpenAI in Foundry Models?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Official purpose<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The purpose of <strong>Azure OpenAI in Foundry Models<\/strong> is to enable customers to <strong>select, deploy, evaluate, and operationalize OpenAI-family models on Azure<\/strong> using the Azure AI Foundry model experience, with Azure-grade identity, security, compliance options, monitoring, and integration patterns.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Core capabilities<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Common core capabilities include (availability varies by region\/model\/tenant\u2014verify in official docs):\n&#8211; <strong>Model catalog discovery<\/strong> in Azure AI Foundry (filter by provider\/type\/capabilities)\n&#8211; <strong>Deployments<\/strong> of Azure OpenAI models (chat, embeddings, etc.) to an Azure OpenAI resource\n&#8211; <strong>Playground testing<\/strong> for prompts and responses\n&#8211; <strong>SDK\/REST integration<\/strong> using the deployment\u2019s endpoint\n&#8211; <strong>Governance and operations<\/strong> via Azure (RBAC, tags, policy, diagnostics)\n&#8211; <strong>Safety controls<\/strong> through Azure OpenAI content filtering and\/or integration with Azure AI Content Safety (exact workflow depends on your configuration)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Major components<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In practical deployments, you\u2019ll see these components:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Component<\/th>\n<th>What it is<\/th>\n<th>Why it matters<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Azure AI Foundry (portal experience)<\/strong><\/td>\n<td>Web experience for projects, model catalog, evaluation, and app building<\/td>\n<td>Central place to manage AI work<\/td>\n<\/tr>\n<tr>\n<td><strong>Foundry Models \/ Model catalog<\/strong><\/td>\n<td>Curated catalog of models available to deploy\/use<\/td>\n<td>Helps choose the right model and workflow<\/td>\n<\/tr>\n<tr>\n<td><strong>Azure OpenAI resource<\/strong><\/td>\n<td>The Azure resource that hosts your model deployments<\/td>\n<td>Where inference happens and where quotas apply<\/td>\n<\/tr>\n<tr>\n<td><strong>Model deployment<\/strong><\/td>\n<td>A named deployment of a specific model\/version\/capacity<\/td>\n<td>Your application calls the deployment name<\/td>\n<\/tr>\n<tr>\n<td><strong>Endpoint + Auth<\/strong><\/td>\n<td>Endpoint URL and API key and\/or Entra ID auth<\/td>\n<td>Secure access for apps and devs<\/td>\n<\/tr>\n<tr>\n<td><strong>Diagnostics<\/strong><\/td>\n<td>Azure Monitor logs\/metrics via Diagnostic settings<\/td>\n<td>Troubleshooting, audit, and cost control<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Service type<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">This is a <strong>managed AI inference service experience<\/strong>:\n&#8211; Foundry Models provides the <strong>model selection\/deployment workflow<\/strong>\n&#8211; Azure OpenAI provides the <strong>managed inference API<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Scope (regional\/global, subscription, etc.)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Azure OpenAI resources are regional<\/strong>: you create the resource in an Azure region and deploy models supported in that region. Model availability varies by region and may require access approval\u2014<strong>verify in official docs<\/strong>.<\/li>\n<li><strong>Azure AI Foundry projects\/hubs are Azure resources<\/strong> tied to your tenant\/subscription and typically associated with a region and resource group (exact resource topology and naming can evolve\u2014verify in official docs).<\/li>\n<li>Access and management are controlled via <strong>Azure RBAC<\/strong> and (optionally) <strong>private networking<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How it fits into the Azure ecosystem<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Azure OpenAI in Foundry Models commonly integrates with:\n&#8211; <strong>Microsoft Entra ID (Azure AD)<\/strong> for identity governance\n&#8211; <strong>Azure Key Vault<\/strong> for secrets (if you use API keys)\n&#8211; <strong>Azure Monitor \/ Log Analytics<\/strong> for logs and metrics\n&#8211; <strong>Azure Private Link<\/strong> for private endpoints\n&#8211; <strong>Azure App Service \/ Azure Functions \/ AKS<\/strong> for hosting AI-powered apps\n&#8211; <strong>Azure AI Search<\/strong> for Retrieval-Augmented Generation (RAG) patterns (optional)\n&#8211; <strong>Storage accounts<\/strong> for documents\/data used in downstream workflows (optional)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use Azure OpenAI in Foundry Models?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Faster time-to-value<\/strong>: model catalog + deployment workflow reduces \u201cintegration friction.\u201d<\/li>\n<li><strong>Risk management<\/strong>: enterprise controls (identity, logs, network) reduce security\/compliance risk.<\/li>\n<li><strong>Reuse and standardization<\/strong>: shared patterns across teams (deployments, monitoring, naming conventions).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed inference endpoints<\/strong>: no GPU cluster management for common LLM use.<\/li>\n<li><strong>Model choice within Azure<\/strong>: select models that fit latency, cost, and quality requirements.<\/li>\n<li><strong>First-class Azure integrations<\/strong>: monitoring, private networking, policy, and DevOps automation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Diagnostics and auditing<\/strong>: send logs\/metrics to central workspaces.<\/li>\n<li><strong>Quota and capacity awareness<\/strong>: avoid accidental overload with rate limits and scaling planning.<\/li>\n<li><strong>Repeatable deployments<\/strong>: consistent model deployment naming and environments.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Tenant-controlled access<\/strong> through RBAC and (where supported) Entra ID auth.<\/li>\n<li><strong>Network isolation<\/strong> using Private Link and disabling public access (where supported).<\/li>\n<li><strong>Centralized governance<\/strong>: tags, policies, resource locks, and standard Azure controls.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Azure OpenAI is designed for <strong>high-throughput<\/strong> inference, but practical scalability depends on:<\/li>\n<li>Model type, token volumes, regional availability<\/li>\n<li>Quotas and rate limits for your subscription\/resource<\/li>\n<li>Your app\u2019s retry\/caching\/backpressure design<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose it<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Choose Azure OpenAI in Foundry Models when you need:\n&#8211; A secure, Azure-governed path to production LLM deployments\n&#8211; Centralized model discovery + repeatable deployments\n&#8211; Clear operational tooling (monitoring, logs, RBAC, private networking)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When they should not choose it<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Consider alternatives when:\n&#8211; You require <strong>full model weight control<\/strong> or custom low-level inference tuning (self-hosting may fit better).\n&#8211; You need models not available in your Azure region or under your tenant\u2019s eligibility.\n&#8211; Your workload is extremely latency-sensitive and must run on-prem\/edge with no cloud dependency.\n&#8211; You want a provider-agnostic platform with minimal cloud coupling (though you can still design abstractions).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is Azure OpenAI in Foundry Models used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Common adoption patterns include:\n&#8211; <strong>Customer service<\/strong> (contact centers, ticket triage)\n&#8211; <strong>Healthcare and life sciences<\/strong> (clinical documentation support\u2014ensure compliance)\n&#8211; <strong>Financial services<\/strong> (document intelligence, policy Q&amp;A, risk summaries)\n&#8211; <strong>Retail\/e-commerce<\/strong> (product Q&amp;A, search, personalization)\n&#8211; <strong>Manufacturing<\/strong> (maintenance logs, SOP assistants)\n&#8211; <strong>Software\/SaaS<\/strong> (in-product copilots and help experiences)\n&#8211; <strong>Public sector<\/strong> (knowledge assistants with strict governance)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Application developers integrating AI features<\/li>\n<li>Platform teams building shared AI foundations (guardrails, logging, cost controls)<\/li>\n<li>Security teams validating identity\/network\/logging posture<\/li>\n<li>Data\/ML teams evaluating models and prompt strategies<\/li>\n<li>DevOps\/SRE teams operating production endpoints<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Chat assistants (internal\/external)<\/li>\n<li>Document summarization and classification<\/li>\n<li>Code assistance (where policy allows)<\/li>\n<li>Embeddings for semantic search and RAG<\/li>\n<li>Workflow automation and agent-like orchestration (ensure strict tool permissions)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Web\/API apps calling Azure OpenAI deployments<\/li>\n<li>Event-driven processing (Functions) for batch summarization\/extraction<\/li>\n<li>RAG (Azure AI Search + embeddings + chat model)<\/li>\n<li>Multi-tenant SaaS with per-tenant governance controls<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Real-world deployment contexts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Dev\/test<\/strong>: prompt experiments, evaluation harnesses, limited quotas<\/li>\n<li><strong>Production<\/strong>: private networking, diagnostics, alerting, cost governance, CI\/CD for config, standard prompt\/versioning practices<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Below are realistic scenarios where <strong>Azure OpenAI in Foundry Models<\/strong> is a good fit.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Internal knowledge base assistant (RAG-ready)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Employees can\u2019t find policy\/process info quickly.<\/li>\n<li><strong>Why it fits:<\/strong> Deploy chat + embeddings; integrate with Azure AI Search later.<\/li>\n<li><strong>Example:<\/strong> HR assistant answers \u201cHow do I file expenses?\u201d with citations from internal docs (after you implement retrieval).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Customer support ticket triage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Tickets come in unstructured; routing is slow.<\/li>\n<li><strong>Why it fits:<\/strong> Use a chat model for classification and summarization; integrate with CRM.<\/li>\n<li><strong>Example:<\/strong> Incoming emails summarized and labeled (\u201cbilling\u201d, \u201cbug\u201d, \u201cpriority\u201d).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Meeting and call summarization<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Meeting notes are inconsistent and time-consuming.<\/li>\n<li><strong>Why it fits:<\/strong> Text summarization with structured output.<\/li>\n<li><strong>Example:<\/strong> Teams transcript summarized into action items and decisions.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Contract clause extraction<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Legal ops needs key fields from contracts.<\/li>\n<li><strong>Why it fits:<\/strong> Strong at extraction into JSON (with schema constraints in your app).<\/li>\n<li><strong>Example:<\/strong> Extract renewal date, termination clause, and governing law.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) PII detection assistance (with human review)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Sensitive data appears in logs\/documents.<\/li>\n<li><strong>Why it fits:<\/strong> Use AI to flag likely PII; combine with Azure Purview or DLP workflows.<\/li>\n<li><strong>Example:<\/strong> Flag content likely containing SSNs; route to review queue.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Developer documentation assistant<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Engineering teams struggle to navigate internal docs.<\/li>\n<li><strong>Why it fits:<\/strong> Chat Q&amp;A over internal docs with governance.<\/li>\n<li><strong>Example:<\/strong> \u201cHow do I rotate secrets in service X?\u201d answered with links and steps.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Product catalog enrichment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Product descriptions and attributes are incomplete.<\/li>\n<li><strong>Why it fits:<\/strong> Generate descriptions and extract attributes at scale.<\/li>\n<li><strong>Example:<\/strong> Generate SEO-safe descriptions and extract material\/color\/size fields.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Incident postmortem draft generation<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Postmortems are delayed and inconsistent.<\/li>\n<li><strong>Why it fits:<\/strong> Summarize incident timeline and contributing factors from notes.<\/li>\n<li><strong>Example:<\/strong> Generate a structured template from incident channel transcripts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Semantic search with embeddings<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Keyword search misses meaning\/synonyms.<\/li>\n<li><strong>Why it fits:<\/strong> Embedding models power semantic similarity search.<\/li>\n<li><strong>Example:<\/strong> \u201cVPN not working\u201d returns \u201cremote access configuration\u201d solutions.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Compliance policy Q&amp;A (guardrailed)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Staff need quick compliance answers without misstatements.<\/li>\n<li><strong>Why it fits:<\/strong> Azure governance + logging + controlled prompts; ensure disclaimers.<\/li>\n<li><strong>Example:<\/strong> Provide references to policy text and require human approval for decisions.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) Multilingual support responses<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Support in multiple languages is inconsistent.<\/li>\n<li><strong>Why it fits:<\/strong> High-quality translation and response drafting.<\/li>\n<li><strong>Example:<\/strong> Draft Spanish replies from English ticket context.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) Data-to-text executive reporting<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Stakeholders want narrative summaries from metrics.<\/li>\n<li><strong>Why it fits:<\/strong> Convert structured KPIs to executive-ready language.<\/li>\n<li><strong>Example:<\/strong> Weekly business summary from a dashboard export.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<blockquote>\n<p>Feature availability varies by region, model, and tenant eligibility. Always confirm in official docs and in your Azure AI Foundry tenant UI.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 1: Model discovery via Foundry Models catalog<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Lets you browse\/search models and view descriptions and usage patterns.<\/li>\n<li><strong>Why it matters:<\/strong> Reduces guesswork and speeds up model selection.<\/li>\n<li><strong>Practical benefit:<\/strong> Faster prototyping and fewer wrong model choices.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Catalog contents differ by region\/permissions; some models require approval.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 2: Model deployments (named endpoints)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Creates a <strong>deployment name<\/strong> mapped to a specific model\/version\/capacity in your Azure OpenAI resource.<\/li>\n<li><strong>Why it matters:<\/strong> Your app targets the deployment name, enabling controlled upgrades\/rollbacks.<\/li>\n<li><strong>Practical benefit:<\/strong> Stable integration contract for applications.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Quotas and rate limits apply; model availability varies by region.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 3: Playground testing<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Interactive testing of prompts and parameters before coding.<\/li>\n<li><strong>Why it matters:<\/strong> Most failures are prompt\/format issues; playground shortens iteration cycles.<\/li>\n<li><strong>Practical benefit:<\/strong> Validate prompt style, safety behavior, and output format quickly.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Playground results may differ from production if your app adds retrieval\/tools\/system prompts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 4: Multiple model types (chat + embeddings)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Supports common LLM patterns: conversational generation and vector embeddings for search\/RAG.<\/li>\n<li><strong>Why it matters:<\/strong> Most production assistants require both generation and retrieval.<\/li>\n<li><strong>Practical benefit:<\/strong> One Azure-governed ecosystem for both steps.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Embedding dimensionality and token limits vary by model.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 5: Authentication options (keys and\/or Entra ID)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Supports secure access via API keys; some configurations support Microsoft Entra ID-based auth.<\/li>\n<li><strong>Why it matters:<\/strong> Keys are simple; Entra ID improves governance and reduces secret sprawl.<\/li>\n<li><strong>Practical benefit:<\/strong> Aligns with enterprise identity and least-privilege.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Entra ID support and recommended approach can vary\u2014verify current docs for your API version and SDK.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 6: Networking controls (Private Link, public access control)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Enables private endpoints and restricting public network access (where supported).<\/li>\n<li><strong>Why it matters:<\/strong> Reduces data exfiltration risk and meets internal network policy requirements.<\/li>\n<li><strong>Practical benefit:<\/strong> Keep traffic on private IPs within your Azure virtual network.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Requires DNS planning and private connectivity from your app environment.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 7: Content filtering \/ safety features<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Applies safety filters to prompts and completions; may integrate with additional safety services.<\/li>\n<li><strong>Why it matters:<\/strong> Reduces risk of harmful output and policy violations.<\/li>\n<li><strong>Practical benefit:<\/strong> Baseline guardrails without custom moderation pipelines.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Not a complete safety solution; you still need app-level checks, user policies, and human review workflows for high-risk domains.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 8: Monitoring and diagnostics (Azure Monitor)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Exposes logs\/metrics via Azure Monitor (via Diagnostic settings).<\/li>\n<li><strong>Why it matters:<\/strong> Production systems need observability for incidents and cost anomalies.<\/li>\n<li><strong>Practical benefit:<\/strong> Centralized troubleshooting and audit.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Logging may include sensitive prompts depending on configuration\u2014review governance and data handling carefully.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 9: Quotas and rate limits management (service-side)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Enforces per-resource\/subscription limits for throughput.<\/li>\n<li><strong>Why it matters:<\/strong> Protects the service and forces capacity planning.<\/li>\n<li><strong>Practical benefit:<\/strong> Predictable operation when combined with app-side backpressure and retries.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Hitting 429s is common without load planning.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 10: Enterprise governance alignment (RBAC, policy, tags)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Uses Azure\u2019s standard governance toolchain.<\/li>\n<li><strong>Why it matters:<\/strong> AI services must follow the same controls as the rest of your platform.<\/li>\n<li><strong>Practical benefit:<\/strong> Standardization for audit, cost allocation, and environment separation.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Governance needs deliberate design; defaults are rarely enough.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level service architecture<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">At a high level:\n1. A developer uses <strong>Azure AI Foundry<\/strong> to select an <strong>Azure OpenAI<\/strong> model from Foundry Models and creates a <strong>deployment<\/strong>.\n2. The application sends HTTPS requests to the <strong>Azure OpenAI endpoint<\/strong>, specifying the <strong>deployment name<\/strong>.\n3. Azure OpenAI performs inference, applies applicable <strong>content filters<\/strong>, and returns the response.\n4. Logs and metrics flow into <strong>Azure Monitor<\/strong> (if configured).\n5. Secrets and identity are managed through <strong>Key Vault<\/strong> and\/or <strong>Microsoft Entra ID<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Request\/data\/control flow<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Control plane<\/strong>: creating resources, deployments, configuring diagnostics, networking, RBAC.<\/li>\n<li><strong>Data plane<\/strong>: inference calls with user prompts and system instructions; returns generated text and usage metadata (tokens).<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">A common flow:\n&#8211; User \u2192 App UI \u2192 App backend \u2192 Azure OpenAI deployment \u2192 App backend \u2192 User<br\/>\nOptionally:\n&#8211; App backend \u2192 embedding model \u2192 vector store\/search \u2192 retrieved context \u2192 chat model<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Integrations with related services<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Typical integrations include:\n&#8211; <strong>Azure Key Vault<\/strong>: store API keys, rotate secrets.\n&#8211; <strong>Azure App Service \/ Azure Functions \/ AKS<\/strong>: host AI-enabled services.\n&#8211; <strong>Azure AI Search<\/strong>: retrieval layer for RAG.\n&#8211; <strong>Azure Monitor \/ Log Analytics<\/strong>: logs, metrics, alerting.\n&#8211; <strong>Private Link<\/strong>: private endpoints for the Azure OpenAI resource.\n&#8211; <strong>API Management<\/strong>: wrap and secure the inference endpoint; enforce quotas per client.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Azure subscription, resource group<\/li>\n<li>Azure OpenAI resource and deployment<\/li>\n<li>Optional: VNet, Private DNS zones, Log Analytics workspace, Key Vault<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Common patterns:\n&#8211; <strong>API key<\/strong>: simplest; store key in Key Vault; never embed in client apps.\n&#8211; <strong>Entra ID (where supported)<\/strong>: use managed identity from your compute (Functions\/App Service\/AKS) to access the service without static secrets. <strong>Verify exact support and setup steps in official docs<\/strong> for your SDK and API version.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Public endpoint (default)<\/strong>: simplest for dev\/test; control via firewalls and key management.<\/li>\n<li><strong>Private endpoint (recommended for production)<\/strong>: Azure OpenAI is reachable privately from your VNet; disable public network access where feasible.<\/li>\n<li>Plan DNS: private endpoint deployments require correct private DNS zone linkage.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring\/logging\/governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enable <strong>Diagnostic settings<\/strong> to Log Analytics for centralized visibility.<\/li>\n<li>Establish <strong>tagging<\/strong>: environment, cost center, owner, data classification.<\/li>\n<li>Implement <strong>budget alerts<\/strong> and cost anomaly monitoring.<\/li>\n<li>Adopt <strong>deployment naming conventions<\/strong> so you can trace which app uses which model.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  U[User] --&gt; A[App Backend]\n  A --&gt;|HTTPS: deployment call| OAI[Azure OpenAI Deployment&lt;br\/&gt;(via Foundry Models)]\n  OAI --&gt; A\n  A --&gt; U\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph Client\n    U[Users]\n  end\n\n  subgraph Azure[\"Azure Subscription\"]\n    subgraph Net[\"VNet (optional but recommended)\"]\n      APIM[API Management (optional)]\n      APP[App Service \/ AKS \/ Functions]\n      PE[Private Endpoint to Azure OpenAI]\n      DNS[Private DNS Zone]\n    end\n\n    KV[Azure Key Vault]\n    MON[Azure Monitor + Log Analytics]\n    OAI[Azure OpenAI Resource&lt;br\/&gt;Model Deployments]\n    AIS[Azure AI Search (optional for RAG)]\n    STO[Storage Account (optional)]\n  end\n\n  U --&gt; APIM --&gt; APP\n  APP --&gt;|Managed Identity or Key| KV\n  APP --&gt;|Embeddings + Retrieval (optional)| AIS\n  APP --&gt;|Private traffic| PE --&gt; OAI\n  DNS --- PE\n  OAI --&gt; MON\n  APP --&gt; MON\n  AIS --&gt; MON\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Account\/subscription\/tenant requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An active <strong>Azure subscription<\/strong><\/li>\n<li>Access to <strong>Azure AI Foundry<\/strong> in your tenant (portal experience at https:\/\/ai.azure.com\/ is commonly used\u2014verify current entry point in your tenant)<\/li>\n<li>Eligibility for <strong>Azure OpenAI Service<\/strong> in your tenant (Azure OpenAI often requires an application\/approval process\u2014verify current requirements in official docs)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions \/ IAM roles<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">At minimum you typically need:\n&#8211; Permission to create resources in a resource group: <strong>Contributor<\/strong> (or a custom role)\n&#8211; For Azure OpenAI management: roles that allow creating and managing the Azure OpenAI resource and deployments (exact roles can differ; verify official docs)\n&#8211; For using the endpoint with Entra ID: appropriate data-plane permissions (verify official docs)\n&#8211; For diagnostics: permission to configure Diagnostic settings and write to Log Analytics workspace<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Billing requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A subscription with a valid billing method<\/li>\n<li>Cost controls: budgets\/alerts recommended before production testing<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">CLI\/SDK\/tools needed<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Azure CLI<\/strong> (optional but useful): https:\/\/learn.microsoft.com\/cli\/azure\/install-azure-cli<\/li>\n<li><strong>Python 3.10+<\/strong> (for the lab code example)<\/li>\n<li><code>pip<\/code> to install dependencies<\/li>\n<li>Optional: <code>curl<\/code> for quick REST validation<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Azure OpenAI is <strong>region-dependent<\/strong><\/li>\n<li>Model availability is <strong>region-dependent<\/strong><\/li>\n<li>Foundry Models catalog visibility can be <strong>tenant\/region-dependent<\/strong><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Always confirm supported regions\/models:\n&#8211; Azure OpenAI documentation: https:\/\/learn.microsoft.com\/azure\/ai-services\/openai\/\n&#8211; Azure AI Foundry documentation: https:\/\/learn.microsoft.com\/azure\/ai-foundry\/ (verify this is the current doc path)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas\/limits<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Common constraints (verify exact values in your environment):\n&#8211; Tokens per minute \/ requests per minute (rate limits)\n&#8211; Maximum input\/output tokens per request (model-specific)\n&#8211; Concurrent requests guidance\n&#8211; Per-region capacity<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For the core lab:\n&#8211; Resource group\n&#8211; Azure OpenAI resource\n&#8211; Azure AI Foundry project (or equivalent construct in your tenant UI)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Optional for production patterns:\n&#8211; Log Analytics workspace\n&#8211; Key Vault\n&#8211; VNet + Private Endpoint + Private DNS\n&#8211; API Management<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Current pricing model (high level)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Azure OpenAI pricing is typically <strong>usage-based<\/strong> and depends on:\n&#8211; <strong>Model family and model variant<\/strong>\n&#8211; <strong>Tokens processed<\/strong> (input + output tokens) for text\/chat\n&#8211; Additional features (if enabled) such as certain hosted tools or specialized model operations (verify in official pricing)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Because pricing is <strong>region- and model-dependent<\/strong> and changes over time, do not hardcode unit rates in internal docs. Use official sources.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Official pricing page (Azure OpenAI Service):\n&#8211; https:\/\/azure.microsoft.com\/pricing\/details\/cognitive-services\/openai-service\/<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Azure Pricing Calculator:\n&#8211; https:\/\/azure.microsoft.com\/pricing\/calculator\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing dimensions to understand<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Dimension<\/th>\n<th>What it means<\/th>\n<th>Cost impact<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Input tokens<\/strong><\/td>\n<td>Tokens you send (system + user + retrieved context)<\/td>\n<td>Often a major cost driver in RAG (context can get large)<\/td>\n<\/tr>\n<tr>\n<td><strong>Output tokens<\/strong><\/td>\n<td>Tokens the model generates<\/td>\n<td>Controls response length and cost<\/td>\n<\/tr>\n<tr>\n<td><strong>Model choice<\/strong><\/td>\n<td>Larger models generally cost more<\/td>\n<td>Biggest lever for cost\/latency tradeoffs<\/td>\n<\/tr>\n<tr>\n<td><strong>Throughput\/quotas<\/strong><\/td>\n<td>Higher quotas enable more traffic<\/td>\n<td>May require request to increase limits<\/td>\n<\/tr>\n<tr>\n<td><strong>Networking<\/strong><\/td>\n<td>Private endpoints, egress, etc.<\/td>\n<td>Usually smaller than token costs but can matter at scale<\/td>\n<\/tr>\n<tr>\n<td><strong>Observability<\/strong><\/td>\n<td>Log ingestion into Log Analytics<\/td>\n<td>Can become meaningful at high volume<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Free tier (if applicable)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Azure OpenAI typically does <strong>not<\/strong> provide a general always-free tier like some developer services. Trials\/credits depend on your subscription offers. <strong>Verify in official pricing and your subscription benefits<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Primary cost drivers<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prompt size (system prompt + user prompt + conversation history)<\/li>\n<li>RAG context size (retrieved passages can multiply tokens)<\/li>\n<li>Response length (max tokens)<\/li>\n<li>Model selection (quality vs. cost)<\/li>\n<li>Traffic patterns (bursty traffic may increase retries\/timeouts if quotas are tight)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Hidden or indirect costs<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Log Analytics ingestion<\/strong> if you send detailed logs<\/li>\n<li><strong>Key Vault operations<\/strong> (small but measurable at scale)<\/li>\n<li><strong>API Management<\/strong> costs if you front the endpoint<\/li>\n<li><strong>Search\/vector store<\/strong> costs if you implement RAG (Azure AI Search, Storage)<\/li>\n<li><strong>Data egress<\/strong> if your app is outside Azure or cross-region<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network\/data transfer implications<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Same-region deployments reduce latency and egress.<\/li>\n<li>Private Link can simplify compliance but adds networking components (Private DNS, endpoint management).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Practical techniques:\n&#8211; <strong>Choose the smallest model<\/strong> that meets quality requirements.\n&#8211; <strong>Minimize tokens<\/strong>: summarize chat history; trim retrieved context; avoid verbose system prompts.\n&#8211; <strong>Set strict output limits<\/strong> (<code>max_tokens<\/code>) and stop sequences where appropriate.\n&#8211; <strong>Cache<\/strong> frequent answers (app-level caching) when allowed by policy.\n&#8211; <strong>Batch<\/strong> non-interactive workloads (e.g., nightly summarization) and implement backoff to avoid 429 retry storms.\n&#8211; <strong>Use embeddings efficiently<\/strong>: chunk documents carefully; avoid recomputing embeddings unnecessarily.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (no fabricated numbers)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">A realistic \u201cstarter\u201d approach:\n&#8211; Deploy one chat model and test a few dozen short prompts in the playground and via a small script.\n&#8211; Keep input prompts short (&lt; 1\u20132 KB text) and limit outputs.\n&#8211; Expect costs to be dominated by token usage; you can estimate by:\n  1) Measuring average input\/output tokens per call\n  2) Multiplying by expected daily calls\n  3) Applying the model\u2019s per-token price from the official pricing page for your region\/model<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In production, costs scale with:\n&#8211; Active users \u00d7 prompts per user \u00d7 average tokens per prompt\n&#8211; RAG expansions (retrieval adds context tokens)\n&#8211; Long-running conversations (history grows)\n&#8211; Multiple environments (dev\/stage\/prod)\n&#8211; Monitoring\/logging retention policies<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A strong practice is to build a <strong>token budget<\/strong> per feature (e.g., \u201cSupport chat answer must stay under X input tokens and Y output tokens on average\u201d) and treat it like performance budgets.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This lab deploys an Azure OpenAI model using the Foundry Models experience and calls it from Python. It is designed to be low-risk and relatively low-cost (actual cost depends on model choice and token usage).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Create an <strong>Azure OpenAI resource<\/strong><\/li>\n<li>Use <strong>Azure AI Foundry (Foundry Models)<\/strong> to <strong>deploy a chat model<\/strong><\/li>\n<li>Test the deployment in the playground<\/li>\n<li>Call the model from <strong>Python<\/strong> using the deployment endpoint<\/li>\n<li>Configure basic diagnostics (optional but recommended)<\/li>\n<li>Clean up resources to stop charges<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You will create these resources:\n&#8211; Resource Group\n&#8211; Azure OpenAI resource (regional)\n&#8211; Azure AI Foundry project (or equivalent)\n&#8211; A model deployment (chat model) inside Azure OpenAI\n&#8211; (Optional) Log Analytics workspace + diagnostic settings<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You will produce:\n&#8211; A working response from the model in the Foundry playground\n&#8211; A working response from a Python script\n&#8211; A clear cleanup path<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Create a resource group<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Goal:<\/strong> Have a dedicated container for easy cleanup.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Option A: Azure Portal<\/strong>\n1. Go to https:\/\/portal.azure.com\/\n2. Search <strong>Resource groups<\/strong>\n3. Select <strong>Create<\/strong>\n4. Fill:\n   &#8211; Subscription: your subscription\n   &#8211; Resource group: <code>rg-oai-foundry-lab<\/code>\n   &#8211; Region: choose a region that supports Azure OpenAI for your tenant (verify)\n5. Select <strong>Review + create<\/strong> \u2192 <strong>Create<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Option B: Azure CLI<\/strong><\/p>\n\n\n\n<pre><code class=\"language-bash\">az login\naz account set --subscription \"&lt;SUBSCRIPTION_ID&gt;\"\n\naz group create \\\n  --name \"rg-oai-foundry-lab\" \\\n  --location \"&lt;AZURE_REGION&gt;\"\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> Resource group <code>rg-oai-foundry-lab<\/code> exists.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create an Azure OpenAI resource<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Goal:<\/strong> Create the resource that will host your model deployments.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In Azure Portal, select <strong>Create a resource<\/strong><\/li>\n<li>Search for <strong>Azure OpenAI<\/strong> (service name may appear under Azure AI services)<\/li>\n<li>Select <strong>Create<\/strong><\/li>\n<li>Configure:\n   &#8211; Subscription: your subscription\n   &#8211; Resource group: <code>rg-oai-foundry-lab<\/code>\n   &#8211; Region: choose a supported region\n   &#8211; Name: <code>oai-foundry-lab-&lt;unique&gt;<\/code>\n   &#8211; Pricing tier: as available in your region (verify)<\/li>\n<li>Select <strong>Review + create<\/strong> \u2192 <strong>Create<\/strong><\/li>\n<li>Wait for deployment to complete.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> Azure OpenAI resource is deployed.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; Open the resource and confirm it exists in the correct region.\n&#8211; Locate <strong>Keys and Endpoint<\/strong> (names may vary). Do not copy into documents; store securely.<\/p>\n\n\n\n<blockquote>\n<p>If you cannot create the resource due to access policy, you likely need Azure OpenAI eligibility\/approval. See Troubleshooting.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Create or open an Azure AI Foundry project<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Goal:<\/strong> Use Foundry Models to manage deployment through the Foundry experience.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to Azure AI Foundry: https:\/\/ai.azure.com\/<\/li>\n<li>Sign in with your Azure account.<\/li>\n<li>Create a <strong>Hub<\/strong> and <strong>Project<\/strong> (exact UI names can vary\u2014verify in your tenant):\n   &#8211; Hub: <code>hub-oai-foundry-lab<\/code>\n   &#8211; Project: <code>proj-oai-foundry-lab<\/code>\n   &#8211; Region: prefer the same region as your Azure OpenAI resource (reduces latency and complexity)<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> You have an AI Foundry project where you can browse models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; You can open the project and see options like Models\/Playground\/Deployments (exact navigation may differ).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Deploy an Azure OpenAI chat model from Foundry Models<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Goal:<\/strong> Create a named deployment you can call from code.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In your Foundry project, go to <strong>Models<\/strong> (or <strong>Model catalog \/ Foundry Models<\/strong>).<\/li>\n<li>Filter to <strong>Azure OpenAI<\/strong> models (wording may vary).<\/li>\n<li>Choose a <strong>chat<\/strong> model that is available for your region and subscription.\n   &#8211; Use any available chat model in the catalog.\n   &#8211; If you\u2019re unsure, pick the model recommended for general chat in your tenant UI.<\/li>\n<li>Select <strong>Deploy<\/strong>.<\/li>\n<li>When prompted:\n   &#8211; Choose your existing <strong>Azure OpenAI resource<\/strong> <code>oai-foundry-lab-&lt;unique&gt;<\/code>\n   &#8211; Set a <strong>deployment name<\/strong> (important): <code>chat-lab<\/code>\n   &#8211; Keep default settings unless you have quota\/capacity requirements<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Wait for deployment completion.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> A deployment named <code>chat-lab<\/code> is available.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; In Foundry and\/or Azure OpenAI resource, confirm the deployment appears.\n&#8211; Open the deployment details and find:\n  &#8211; Endpoint\n  &#8211; Authentication method (keys and\/or Entra ID)\n  &#8211; Sample code (often includes the correct <code>api-version<\/code>)<\/p>\n\n\n\n<blockquote>\n<p>Tip: Copy the <strong>sample request<\/strong> shown by the portal for your deployment. That sample is the most reliable source for endpoint format and <code>api-version<\/code> for your environment.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Test the deployment in the playground<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Goal:<\/strong> Confirm the deployment works before coding.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In Foundry, open the <strong>Chat playground<\/strong> (or equivalent).<\/li>\n<li>Select the deployment <code>chat-lab<\/code>.<\/li>\n<li>Enter a test prompt:\n   &#8211; \u201cWrite a 5-bullet checklist for securely storing API keys in Azure.\u201d<\/li>\n<li>Run.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> You receive a coherent response.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; Confirm the response arrives without errors.\n&#8211; If you see content filtering warnings, try a benign prompt and verify your policy configuration.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Call the model using REST (curl)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Goal:<\/strong> Validate data-plane access outside the portal.<\/p>\n\n\n\n<blockquote>\n<p>Use the <strong>exact endpoint format<\/strong> and <strong>api-version<\/strong> shown in your deployment\u2019s sample code in the Azure portal\/Foundry UI. The REST shape can differ by API version and model capability.<\/p>\n<\/blockquote>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Export environment variables (bash\/zsh):<\/li>\n<\/ol>\n\n\n\n<pre><code class=\"language-bash\">export AZURE_OPENAI_ENDPOINT=\"https:\/\/&lt;your-resource-name&gt;.openai.azure.com\"\nexport AZURE_OPENAI_API_KEY=\"&lt;your-api-key&gt;\"\nexport AZURE_OPENAI_DEPLOYMENT=\"chat-lab\"\nexport AZURE_OPENAI_API_VERSION=\"&lt;copy-from-portal-sample&gt;\"\n<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\" start=\"2\">\n<li>Send a request:<\/li>\n<\/ol>\n\n\n\n<pre><code class=\"language-bash\">curl -sS \"$AZURE_OPENAI_ENDPOINT\/openai\/deployments\/$AZURE_OPENAI_DEPLOYMENT\/chat\/completions?api-version=$AZURE_OPENAI_API_VERSION\" \\\n  -H \"Content-Type: application\/json\" \\\n  -H \"api-key: $AZURE_OPENAI_API_KEY\" \\\n  -d '{\n    \"messages\": [\n      {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n      {\"role\": \"user\", \"content\": \"Give me three naming conventions for Azure OpenAI deployments.\"}\n    ],\n    \"temperature\": 0.2\n  }' | python -m json.tool\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> JSON response with a message containing the answer.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; Confirm HTTP 200.\n&#8211; Confirm <code>choices[0].message.content<\/code> exists.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 7: Call the model using Python (recommended for app integration)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Goal:<\/strong> Use a supported SDK approach.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">7.1 Create a virtual environment and install dependencies<\/h4>\n\n\n\n<pre><code class=\"language-bash\">python -m venv .venv\nsource .venv\/bin\/activate  # Linux\/macOS\n# .venv\\Scripts\\activate   # Windows PowerShell\n\npip install --upgrade pip\npip install openai\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">7.2 Create <code>chat_lab.py<\/code><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Use the Azure OpenAI pattern supported by the OpenAI Python library (verify the latest recommended SDK approach in official docs for Azure OpenAI; SDKs evolve).<\/p>\n\n\n\n<pre><code class=\"language-python\">import os\nfrom openai import AzureOpenAI\n\nendpoint = os.environ[\"AZURE_OPENAI_ENDPOINT\"]\napi_key = os.environ[\"AZURE_OPENAI_API_KEY\"]\ndeployment = os.environ[\"AZURE_OPENAI_DEPLOYMENT\"]\napi_version = os.environ[\"AZURE_OPENAI_API_VERSION\"]\n\nclient = AzureOpenAI(\n    azure_endpoint=endpoint,\n    api_key=api_key,\n    api_version=api_version,\n)\n\nresp = client.chat.completions.create(\n    model=deployment,  # In Azure OpenAI, 'model' is typically the deployment name\n    messages=[\n        {\"role\": \"system\", \"content\": \"You are an Azure cloud assistant.\"},\n        {\"role\": \"user\", \"content\": \"Explain Private Link for Azure OpenAI in 4 bullet points.\"},\n    ],\n    temperature=0.2,\n)\n\nprint(resp.choices[0].message.content)\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">7.3 Run it<\/h4>\n\n\n\n<pre><code class=\"language-bash\">python chat_lab.py\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> Four bullet points printed to your terminal.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; If it prints a coherent answer, your deployment and auth are correct.\n&#8211; If you get an auth error, confirm endpoint\/key and whether your resource allows key auth.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 8 (Optional but recommended): Enable diagnostics to Log Analytics<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Goal:<\/strong> Improve observability for troubleshooting and governance.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Create a Log Analytics workspace (Portal \u2192 <strong>Log Analytics workspaces<\/strong> \u2192 Create).<\/li>\n<li>Go to your <strong>Azure OpenAI resource<\/strong> \u2192 <strong>Diagnostic settings<\/strong>.<\/li>\n<li>Add a diagnostic setting:\n   &#8211; Send logs to your workspace\n   &#8211; Select available log categories and metrics (names vary)<\/li>\n<li>Save.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> Logs\/metrics start flowing to Log Analytics.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; In Log Analytics, run queries for Azure resource logs (exact table names vary by configuration\u2014verify in your workspace).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Use this checklist:\n&#8211; [ ] Azure OpenAI resource exists in the intended region\n&#8211; [ ] Deployment <code>chat-lab<\/code> is created and \u201cSucceeded\u201d\n&#8211; [ ] Playground returns responses\n&#8211; [ ] <code>curl<\/code> call returns HTTP 200\n&#8211; [ ] Python script prints the model response\n&#8211; [ ] (Optional) Diagnostic settings configured and logs\/metrics visible<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Problem: \u201cYou do not have access to Azure OpenAI\u201d<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause:<\/strong> Azure OpenAI may require eligibility approval for your tenant\/subscription.<\/li>\n<li><strong>Fix:<\/strong> Follow the official Azure OpenAI access\/eligibility process:<\/li>\n<li>https:\/\/learn.microsoft.com\/azure\/ai-services\/openai\/ (see access requirements)<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Problem: 404 \u201cDeployment not found\u201d<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause:<\/strong> Wrong deployment name or wrong endpoint\/resource.<\/li>\n<li><strong>Fix:<\/strong> Confirm:<\/li>\n<li><code>AZURE_OPENAI_ENDPOINT<\/code> matches the resource hosting the deployment<\/li>\n<li><code>AZURE_OPENAI_DEPLOYMENT<\/code> exactly matches the deployment name (case-sensitive)<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Problem: 401\/403 unauthorized<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause:<\/strong> Invalid key, wrong header, or RBAC\/Entra ID misconfiguration.<\/li>\n<li><strong>Fix:<\/strong><\/li>\n<li>Ensure <code>api-key<\/code> header is used for key auth<\/li>\n<li>Regenerate key if needed and update Key Vault\/app settings<\/li>\n<li>If using Entra ID, verify the supported auth steps for your SDK\/API version<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Problem: 429 Too Many Requests<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause:<\/strong> Rate limits\/quota exceeded.<\/li>\n<li><strong>Fix:<\/strong><\/li>\n<li>Implement exponential backoff with jitter in code<\/li>\n<li>Reduce concurrency, shorten prompts, limit output tokens<\/li>\n<li>Request quota increase (if eligible)<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Problem: Model not available in region<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause:<\/strong> Region\/model support mismatch.<\/li>\n<li><strong>Fix:<\/strong> Choose a supported model for your region or deploy in a supported region (subject to policy).<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Problem: Private endpoint enabled but app can\u2019t connect<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause:<\/strong> DNS and routing are not configured for Private Link.<\/li>\n<li><strong>Fix:<\/strong> Verify private DNS zone linkage and that the app runs inside the VNet or has connectivity (VPN\/ExpressRoute).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To stop charges, delete resources you created.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Option A: Delete the resource group (recommended for labs)<\/strong><\/p>\n\n\n\n<pre><code class=\"language-bash\">az group delete --name \"rg-oai-foundry-lab\" --yes --no-wait\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Option B: Delete individual resources<\/strong>\n&#8211; Delete the Azure OpenAI resource\n&#8211; Delete Log Analytics workspace (if created)\n&#8211; Delete Foundry hub\/project resources (if they created billable artifacts)\n&#8211; Remove diagnostic settings<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome:<\/strong> No remaining billable resources related to the lab.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Separate environments<\/strong>: use separate resource groups\/subscriptions for dev\/test\/prod.<\/li>\n<li><strong>Abstract model calls<\/strong> behind your own service layer so you can swap deployments\/models safely.<\/li>\n<li><strong>Prefer retrieval over long prompts<\/strong>: for enterprise knowledge, use RAG patterns rather than stuffing large context into every prompt.<\/li>\n<li><strong>Design for idempotency<\/strong>: retries must not duplicate side effects (especially with tool-calling\/agents).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer <strong>managed identity + Entra ID<\/strong> where supported for data-plane access; otherwise:<\/li>\n<li>Store API keys in <strong>Key Vault<\/strong><\/li>\n<li>Rotate keys regularly and automate rotation<\/li>\n<li>Apply <strong>least privilege<\/strong> with Azure RBAC and scoped roles.<\/li>\n<li>Do not expose Azure OpenAI keys in front-end apps or mobile clients.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Put <strong>token budgets<\/strong> into feature requirements.<\/li>\n<li>Use <strong>smaller\/cheaper models<\/strong> for classification and extraction.<\/li>\n<li>Reduce tokens:<\/li>\n<li>Truncate conversation history<\/li>\n<li>Summarize history periodically<\/li>\n<li>Limit retrieved passages for RAG<\/li>\n<li>Set conservative output limits<\/li>\n<li>Add <strong>budgets and alerts<\/strong> at subscription and resource group levels.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>streaming<\/strong> responses when supported for better UX (verify SDK support).<\/li>\n<li>Implement <strong>client-side timeouts<\/strong> and <strong>circuit breakers<\/strong>.<\/li>\n<li>Use <strong>concurrency controls<\/strong> and queues for burst smoothing.<\/li>\n<li>Cache stable outputs where policy allows.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Plan for <strong>rate limiting<\/strong>:<\/li>\n<li>exponential backoff + jitter<\/li>\n<li>fallbacks (smaller model, reduced context, \u201ctry again later\u201d UX)<\/li>\n<li>Keep deployments stable:<\/li>\n<li>version your prompts<\/li>\n<li>controlled rollout when changing models or parameters<\/li>\n<li>Use <strong>multi-region<\/strong> only if your compliance and architecture require it; cross-region increases complexity.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enable <strong>diagnostic settings<\/strong> early and decide retention policies.<\/li>\n<li>Create <strong>dashboards<\/strong> for:<\/li>\n<li>request volume<\/li>\n<li>error rates (401\/403\/429\/5xx)<\/li>\n<li>latency<\/li>\n<li>token usage trends (where visible)<\/li>\n<li>Maintain a runbook for common failures (quota, auth, network, DNS).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Tags: <code>env<\/code>, <code>owner<\/code>, <code>costCenter<\/code>, <code>dataClassification<\/code>, <code>app<\/code>, <code>team<\/code><\/li>\n<li>Naming convention example:<\/li>\n<li>Resource group: <code>rg-&lt;app&gt;-&lt;env&gt;-&lt;region&gt;<\/code><\/li>\n<li>Azure OpenAI resource: <code>oai-&lt;app&gt;-&lt;env&gt;-&lt;region&gt;-&lt;nn&gt;<\/code><\/li>\n<li>Deployment: <code>&lt;capability&gt;-&lt;model&gt;-&lt;env&gt;<\/code> (keep it short), e.g. <code>chat-core-prod<\/code>, <code>embed-docs-prod<\/code><\/li>\n<li>Use Azure Policy to require tags and restrict regions.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Management plane<\/strong>: Azure RBAC controls who can create resources, deployments, and diagnostics.<\/li>\n<li><strong>Data plane<\/strong>: often API-key based; in some setups, <strong>Entra ID<\/strong> can be used for data-plane calls\u2014verify current Azure OpenAI authentication guidance:<\/li>\n<li>https:\/\/learn.microsoft.com\/azure\/ai-services\/openai\/<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Recommended approach:\n&#8211; Use <strong>managed identity<\/strong> from Azure compute where supported.\n&#8211; If using keys, store them in <strong>Key Vault<\/strong> and reference them via managed identity.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data in transit: HTTPS\/TLS.<\/li>\n<li>Data at rest: governed by Azure service defaults and your configuration. For specific guarantees and options, verify the Azure OpenAI security documentation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer <strong>Private Link<\/strong> for production.<\/li>\n<li>If public endpoint is required:<\/li>\n<li>restrict access via networking features available for the service<\/li>\n<li>tightly control key distribution<\/li>\n<li>front with API Management for additional policy enforcement (quotas, IP filtering, JWT validation)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Never commit keys to git.<\/li>\n<li>Use Key Vault + managed identities.<\/li>\n<li>Rotate keys and update downstream apps automatically.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enable diagnostic logs to Log Analytics for:<\/li>\n<li>security investigation<\/li>\n<li>operational debugging<\/li>\n<li>capacity planning<\/li>\n<li>Be careful: logs may contain sensitive prompt content depending on what is logged and how your app logs requests. Set <strong>data handling policies<\/strong> and redact where appropriate.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Compliance depends on:\n&#8211; Region selection and data residency needs\n&#8211; Your tenant\u2019s compliance requirements\n&#8211; Model\/provider terms and data handling policies<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Always review:\n&#8211; Azure OpenAI documentation\n&#8211; Your organization\u2019s compliance requirements (HIPAA, PCI, SOC, etc.)\n&#8211; Data classification of prompts and outputs<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Embedding API keys in client-side code<\/li>\n<li>No Private Link for sensitive workloads<\/li>\n<li>Overly broad RBAC roles (e.g., subscription-wide Contributor)<\/li>\n<li>Logging full prompts\/responses without redaction<\/li>\n<li>No rate limiting or abuse controls in front of the endpoint<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Put Azure OpenAI behind a <strong>backend<\/strong> service you control.<\/li>\n<li>Add <strong>authentication\/authorization<\/strong> at your app layer (and optionally APIM).<\/li>\n<li>Implement <strong>prompt injection defenses<\/strong> for RAG (treat retrieved content as untrusted).<\/li>\n<li>Use <strong>allow-listed tools\/actions<\/strong> if you build agentic workflows.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<blockquote>\n<p>These are common constraints; verify current limits and behaviors in official docs for your region\/models.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Known limitations \/ operational realities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Access\/eligibility<\/strong>: Azure OpenAI may require approval; not every subscription can create it immediately.<\/li>\n<li><strong>Regional model availability<\/strong>: not all models are offered in all regions.<\/li>\n<li><strong>Quotas and rate limits<\/strong>: hitting 429s is common without load planning.<\/li>\n<li><strong>API version differences<\/strong>: request\/response fields can change across <code>api-version<\/code>. Always follow the sample code for your deployment.<\/li>\n<li><strong>Deployment naming coupling<\/strong>: applications are coupled to deployment names\u2014plan versioning and migration.<\/li>\n<li><strong>Private endpoint complexity<\/strong>: DNS misconfiguration is a frequent cause of outages.<\/li>\n<li><strong>Cost surprises from RAG<\/strong>: retrieval context can dramatically increase input tokens.<\/li>\n<li><strong>Logging sensitivity<\/strong>: prompts\/responses can contain regulated data; avoid uncontrolled logging.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Migration challenges<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Moving from one model to another can change:<\/li>\n<li>output style and format stability<\/li>\n<li>token usage<\/li>\n<li>latency and cost<\/li>\n<li>Use canary releases and automated evals (if available in your Foundry workflow) to compare outputs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Vendor-specific nuances<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u201cModel\u201d in SDK calls often means <strong>deployment name<\/strong> for Azure OpenAI, which differs from some other providers.<\/li>\n<li>The Foundry Models catalog is an experience layer; the actual inference endpoint and limits are enforced by the underlying Azure OpenAI resource.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">How Azure OpenAI in Foundry Models compares<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Below is a practical comparison. Exact features and pricing change frequently\u2014verify with official docs.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Azure OpenAI in Foundry Models<\/strong><\/td>\n<td>Azure-first teams deploying OpenAI models with governance<\/td>\n<td>Azure RBAC\/governance, private networking options, integrated Azure ops<\/td>\n<td>Access approvals, regional constraints, quotas; Azure-specific coupling<\/td>\n<td>When you want production Azure controls and Foundry model workflow<\/td>\n<\/tr>\n<tr>\n<td><strong>Azure AI Foundry (non-OpenAI models)<\/strong><\/td>\n<td>Teams exploring multiple model providers in Foundry<\/td>\n<td>Catalog-based exploration, potential serverless options (verify)<\/td>\n<td>Some models may have different SLAs\/tooling<\/td>\n<td>When you want broader model choice beyond OpenAI family<\/td>\n<\/tr>\n<tr>\n<td><strong>OpenAI API (direct)<\/strong><\/td>\n<td>Fast prototyping outside Azure<\/td>\n<td>Rapid access, often latest models first (varies)<\/td>\n<td>Different governance model; may not meet enterprise Azure requirements<\/td>\n<td>When you don\u2019t need Azure governance and want direct provider access<\/td>\n<\/tr>\n<tr>\n<td><strong>AWS Bedrock<\/strong><\/td>\n<td>AWS-native generative AI platform<\/td>\n<td>Multi-model catalog; AWS integrations<\/td>\n<td>Different IAM\/networking model; migration cost if Azure-first<\/td>\n<td>When your platform is primarily on AWS<\/td>\n<\/tr>\n<tr>\n<td><strong>Google Vertex AI<\/strong><\/td>\n<td>GCP-native ML\/LLM platform<\/td>\n<td>Strong MLOps integration; GCP ecosystem<\/td>\n<td>Different governance\/tooling; Azure integration overhead<\/td>\n<td>When your platform is primarily on GCP<\/td>\n<\/tr>\n<tr>\n<td><strong>Self-hosted OSS models (AKS + GPUs)<\/strong><\/td>\n<td>Maximum control, custom inference<\/td>\n<td>Full control over weights, custom optimizations<\/td>\n<td>Significant ops burden, GPU cost, scaling complexity<\/td>\n<td>When you need on-prem\/edge, full control, or specialized requirements<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example: Regulated internal policy assistant<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Problem<\/strong>\nA financial services company needs an internal assistant that answers policy questions with strong governance, auditability, and network isolation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Proposed architecture<\/strong>\n&#8211; Azure AI Foundry project for model deployment workflow (Foundry Models)\n&#8211; Azure OpenAI deployment for chat + embeddings\n&#8211; Azure AI Search for indexed policy documents (RAG)\n&#8211; App hosted on AKS with managed identity\n&#8211; Private Link to Azure OpenAI and Azure AI Search\n&#8211; Key Vault for secrets (if keys used) and certificate management\n&#8211; Azure Monitor + Log Analytics for diagnostics, alerts, and audit workflows<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Why this service was chosen<\/strong>\n&#8211; Azure-native identity and governance\n&#8211; Private networking support (when configured correctly)\n&#8211; Standard operations tooling (diagnostics, RBAC, policy)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcomes<\/strong>\n&#8211; Faster policy answers with citations\n&#8211; Reduced support load on compliance teams\n&#8211; Strong audit trail of system usage\n&#8211; Controlled rollout with environment separation (dev\/test\/prod)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example: SaaS support copilot<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Problem<\/strong>\nA 10-person SaaS startup wants to speed up support responses and reduce time-to-resolution.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Proposed architecture<\/strong>\n&#8211; Azure AI Foundry for quick deployment + playground iteration\n&#8211; Azure OpenAI chat deployment for drafting responses\n&#8211; Optional embeddings deployment for searching past tickets\/KB\n&#8211; App hosted on Azure App Service\n&#8211; API Management in front (optional) for per-tenant throttling and auth\n&#8211; Basic budgets and alerts<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Why this service was chosen<\/strong>\n&#8211; Minimal infrastructure management\n&#8211; Fast prototyping in playground\n&#8211; Straightforward integration from Python\/Node\/.NET backends<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcomes<\/strong>\n&#8211; Support agents respond faster with consistent tone\n&#8211; Measurable reduction in average handling time\n&#8211; Controlled costs via token limits and caching<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1) Is \u201cAzure OpenAI in Foundry Models\u201d the same as Azure OpenAI Service?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Not exactly. <strong>Azure OpenAI Service<\/strong> is the underlying managed service and resource you deploy. <strong>Foundry Models<\/strong> is the <strong>Azure AI Foundry experience<\/strong> used to discover and deploy models (including Azure OpenAI models). Your application ultimately calls the Azure OpenAI endpoint.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2) Do I always need Azure AI Foundry to use Azure OpenAI?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">No. You can deploy and call Azure OpenAI without Foundry, but Foundry Models can simplify discovery, deployment, and testing workflows.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3) Does Azure OpenAI require approval?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Often yes. Requirements change; verify the current access process in official docs:\nhttps:\/\/learn.microsoft.com\/azure\/ai-services\/openai\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4) Are all models available in every Azure region?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">No. Availability is region-specific and can also depend on tenant eligibility. Always check your region\/model availability in the portal and docs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5) What should I store as the \u201cmodel name\u201d in my app config?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For Azure OpenAI APIs, your code typically references the <strong>deployment name<\/strong> (for example <code>chat-lab<\/code>). The underlying base model name is managed behind the deployment.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">6) What\u2019s the fastest way to verify my API version?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Use the <strong>sample code<\/strong> shown in the Azure portal\/Foundry deployment details. Copy the <code>api-version<\/code> from there.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">7) Should I use API keys or Microsoft Entra ID?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>API keys<\/strong> are simplest but require secret management and rotation.<\/li>\n<li><strong>Entra ID<\/strong> (where supported) reduces secret sprawl and improves governance.\nFollow the latest Azure OpenAI authentication guidance for your environment.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) How do I prevent data leakage via prompts?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Don\u2019t send secrets to the model<\/li>\n<li>Redact sensitive fields where feasible<\/li>\n<li>Use private networking where required<\/li>\n<li>Restrict who can access the endpoint and logs<\/li>\n<li>Apply least privilege and strong app authentication<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Why am I getting 429 errors?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You are hitting rate limits\/quota. Implement backoff, reduce tokens, reduce concurrency, and request quota increases if eligible.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">10) How do I control cost in RAG systems?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">RAG cost is often driven by <strong>retrieved context tokens<\/strong>. Limit retrieval results, chunk smartly, and compress\/summarize context.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">11) Can I log prompts and completions for debugging?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You can, but do it carefully. Prompts may contain regulated data. Prefer metadata logs (token counts, latency, status codes) and redact content when needed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">12) What\u2019s the recommended production network setup?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Typically:\n&#8211; Private Endpoint for Azure OpenAI\n&#8211; Disable public network access where feasible\n&#8211; Ensure private DNS is configured correctly\nExact steps depend on your network topology\u2014verify in official docs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">13) How do I rotate Azure OpenAI keys safely?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Store keys in Key Vault and use a rotation runbook:\n&#8211; Regenerate secondary key\n&#8211; Update Key Vault secret\n&#8211; Roll apps to use the new key\n&#8211; Regenerate the old key\nAutomate where possible.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">14) Can I use Azure OpenAI from a client-side SPA?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Not safely with API keys. Put Azure OpenAI calls behind your backend or API gateway so secrets are not exposed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">15) What\u2019s the safest way to upgrade models?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Create a <strong>new deployment<\/strong> (or staged deployment), run automated evaluations and canary traffic, then switch over. Avoid \u201cbig bang\u201d changes.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">16) How does Foundry Models help day-to-day development?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">It speeds up:\n&#8211; model selection\n&#8211; deployment creation\n&#8211; prompt iteration in playground\n&#8211; sharing repeatable setup across team members<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">17) Does this replace MLOps tooling?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Not entirely. For full ML lifecycle (datasets, training pipelines, registries), Azure Machine Learning may be needed. Foundry Models is focused on model consumption\/deployment experience.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn Azure OpenAI in Foundry Models<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official documentation<\/td>\n<td>Azure OpenAI documentation \u2014 https:\/\/learn.microsoft.com\/azure\/ai-services\/openai\/<\/td>\n<td>Core concepts, REST\/SDK guidance, regions, auth, networking<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>Azure AI Foundry documentation \u2014 https:\/\/learn.microsoft.com\/azure\/ai-foundry\/<\/td>\n<td>Foundry portal concepts, projects, model catalog workflows (verify current path)<\/td>\n<\/tr>\n<tr>\n<td>Official pricing<\/td>\n<td>Azure OpenAI pricing \u2014 https:\/\/azure.microsoft.com\/pricing\/details\/cognitive-services\/openai-service\/<\/td>\n<td>Current pricing model and regional pricing entries<\/td>\n<\/tr>\n<tr>\n<td>Pricing tool<\/td>\n<td>Azure Pricing Calculator \u2014 https:\/\/azure.microsoft.com\/pricing\/calculator\/<\/td>\n<td>Build estimates with your region and expected usage<\/td>\n<\/tr>\n<tr>\n<td>Architecture reference<\/td>\n<td>Azure Architecture Center \u2014 https:\/\/learn.microsoft.com\/azure\/architecture\/<\/td>\n<td>Patterns for secure, scalable Azure designs (search for RAG\/OpenAI)<\/td>\n<\/tr>\n<tr>\n<td>Official security<\/td>\n<td>Azure OpenAI security and governance (within docs) \u2014 https:\/\/learn.microsoft.com\/azure\/ai-services\/openai\/<\/td>\n<td>Guidance on identity, networking, and safe usage<\/td>\n<\/tr>\n<tr>\n<td>Samples (official\/community-trusted)<\/td>\n<td>Azure Samples on GitHub \u2014 https:\/\/github.com\/Azure-Samples<\/td>\n<td>Many practical Azure OpenAI integration examples<\/td>\n<\/tr>\n<tr>\n<td>Sample app (widely referenced)<\/td>\n<td>Azure Search + OpenAI demo \u2014 https:\/\/github.com\/Azure-Samples\/azure-search-openai-demo<\/td>\n<td>Practical RAG reference architecture and code (review before production)<\/td>\n<\/tr>\n<tr>\n<td>Videos<\/td>\n<td>Microsoft Azure YouTube \u2014 https:\/\/www.youtube.com\/@MicrosoftAzure<\/td>\n<td>Product updates, walkthroughs, architecture talks<\/td>\n<\/tr>\n<tr>\n<td>Updates<\/td>\n<td>Azure Updates \u2014 https:\/\/azure.microsoft.com\/updates\/<\/td>\n<td>Track changes to Azure AI Foundry and Azure OpenAI availability<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps, cloud engineers, architects, developers<\/td>\n<td>Azure + DevOps + AI engineering foundations and applied labs<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Beginners to intermediate DevOps learners<\/td>\n<td>SCM, CI\/CD, cloud fundamentals that support AI app delivery<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Ops\/SRE\/CloudOps teams<\/td>\n<td>Cloud operations practices, reliability, monitoring for AI workloads<\/td>\n<td>Check website<\/td>\n<td>https:\/\/cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs, platform and operations teams<\/td>\n<td>SRE principles, incident management, reliability for AI services<\/td>\n<td>Check website<\/td>\n<td>https:\/\/sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops + AI practitioners<\/td>\n<td>AIOps, monitoring\/automation concepts useful for AI-enabled platforms<\/td>\n<td>Check website<\/td>\n<td>https:\/\/aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>DevOps\/cloud training and guidance (verify offerings)<\/td>\n<td>Individuals and teams looking for hands-on coaching<\/td>\n<td>https:\/\/rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps training resources (verify offerings)<\/td>\n<td>DevOps engineers and beginners<\/td>\n<td>https:\/\/devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps consulting\/training platform (verify offerings)<\/td>\n<td>Small teams needing practical implementation help<\/td>\n<td>https:\/\/devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support\/training resources (verify offerings)<\/td>\n<td>Ops and DevOps teams needing operational support<\/td>\n<td>https:\/\/devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps\/IT consulting (verify exact services)<\/td>\n<td>Architecture, implementation support, operations setup<\/td>\n<td>Secure Azure OpenAI rollout, monitoring setup, CI\/CD integration<\/td>\n<td>https:\/\/cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps and cloud consulting\/training<\/td>\n<td>Platform engineering, DevOps pipelines, cloud adoption<\/td>\n<td>Landing zone + governance, deployment automation for Azure AI services<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting services (verify exact services)<\/td>\n<td>DevOps transformation, automation, operations maturity<\/td>\n<td>CI\/CD for AI apps, infrastructure-as-code, reliability practices<\/td>\n<td>https:\/\/devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before this service<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To be effective with Azure OpenAI in Foundry Models, learn:\n&#8211; Azure fundamentals: subscriptions, resource groups, RBAC, VNets\n&#8211; Basic security: Key Vault, managed identity, private endpoints\n&#8211; API basics: REST, authentication headers, rate limits, retries\n&#8211; Basic Python\/Node\/.NET skills for service integration\n&#8211; Intro to LLM concepts: tokens, prompt design, embeddings, RAG basics<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after this service<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Next steps for deeper capability:\n&#8211; <strong>RAG architecture<\/strong> on Azure:\n  &#8211; Azure AI Search indexing, chunking strategies, evaluation\n&#8211; <strong>Observability\/SRE<\/strong>:\n  &#8211; dashboards, SLOs, incident response, capacity planning for AI endpoints\n&#8211; <strong>Governance<\/strong>:\n  &#8211; Azure Policy, tagging standards, cost management\n&#8211; <strong>App patterns<\/strong>:\n  &#8211; API Management policies, caching, multi-tenant controls\n&#8211; <strong>Safety engineering<\/strong>:\n  &#8211; prompt injection defenses, content moderation workflows, human-in-the-loop review<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use it<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud Engineer \/ DevOps Engineer<\/li>\n<li>Solutions Architect<\/li>\n<li>Platform Engineer<\/li>\n<li>SRE \/ Operations Engineer<\/li>\n<li>AI Engineer (applied LLM developer)<\/li>\n<li>Security Engineer (cloud governance)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (if available)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">There isn\u2019t a single \u201cAzure OpenAI certification\u201d universally established as a standalone credential. Practical paths often include:\n&#8211; Azure fundamentals and architecture certifications\n&#8211; Azure developer certifications\n&#8211; Security certifications for Azure\n&#8211; AI fundamentals certifications<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Verify current Microsoft certification offerings:\nhttps:\/\/learn.microsoft.com\/credentials\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Prompt-gated FAQ bot<\/strong> (no retrieval): strict output format + logging metadata only.<\/li>\n<li><strong>RAG prototype<\/strong> with Azure AI Search: embeddings + citations + token budget monitoring.<\/li>\n<li><strong>Batch summarization pipeline<\/strong> with Azure Functions and queues.<\/li>\n<li><strong>API Management front door<\/strong>: per-client quotas, JWT auth, request size limits.<\/li>\n<li><strong>Private Link deployment<\/strong>: run app in VNet, validate DNS and outbound restrictions.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Azure AI Foundry<\/strong>: Azure experience for building and managing AI solutions (evolved from Azure AI Studio; verify current naming in your tenant).<\/li>\n<li><strong>Foundry Models \/ Model catalog<\/strong>: The model discovery and deployment experience within Azure AI Foundry.<\/li>\n<li><strong>Azure OpenAI resource<\/strong>: Azure resource that hosts OpenAI model deployments and endpoints.<\/li>\n<li><strong>Deployment<\/strong>: A named configuration mapping to a model\/version in Azure OpenAI. Apps call the deployment name.<\/li>\n<li><strong>Endpoint<\/strong>: The base URL for your Azure OpenAI resource (regional).<\/li>\n<li><strong>API version (<code>api-version<\/code>)<\/strong>: Version string controlling REST API shape and behavior.<\/li>\n<li><strong>Tokens<\/strong>: Units of text processed by the model; cost and limits are token-based.<\/li>\n<li><strong>Embeddings<\/strong>: Numeric vectors representing text meaning; used for semantic search and retrieval.<\/li>\n<li><strong>RAG (Retrieval-Augmented Generation)<\/strong>: Pattern that retrieves relevant documents and includes them as context to improve accuracy.<\/li>\n<li><strong>Private Link \/ Private Endpoint<\/strong>: Azure networking feature to access services privately within a VNet.<\/li>\n<li><strong>RBAC<\/strong>: Role-Based Access Control in Azure; governs management-plane permissions and sometimes data-plane.<\/li>\n<li><strong>Managed Identity<\/strong>: Azure identity for workloads to access resources without storing secrets.<\/li>\n<li><strong>Log Analytics<\/strong>: Azure Monitor component for log storage, querying, and alerting.<\/li>\n<li><strong>429 error<\/strong>: Rate limit exceeded; requires retries\/backoff and capacity planning.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Azure OpenAI in Foundry Models<\/strong> is the Azure-native way to <strong>discover, deploy, test, and operate Azure OpenAI model deployments<\/strong> using the <strong>Azure AI Foundry<\/strong> experience. It matters because it blends practical developer workflows (catalog + playground + sample code) with production requirements (RBAC, monitoring, private networking, and governance).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Key takeaways:\n&#8211; <strong>Fit:<\/strong> Best for Azure-first teams building secure generative AI features in the AI + Machine Learning space.\n&#8211; <strong>Cost:<\/strong> Driven primarily by <strong>token usage<\/strong> and model choice; RAG can multiply input tokens quickly.\n&#8211; <strong>Security:<\/strong> Use <strong>least privilege<\/strong>, prefer <strong>managed identity<\/strong> where supported, store keys in <strong>Key Vault<\/strong>, and strongly consider <strong>Private Link<\/strong> for production.\n&#8211; <strong>When to use:<\/strong> When you need an enterprise-ready, operationally manageable LLM endpoint in Azure with a guided deployment workflow.\n&#8211; <strong>Next learning step:<\/strong> Implement a small RAG proof-of-concept with Azure AI Search and add production-grade controls (APIM, budgets, diagnostics, and private networking).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI + Machine Learning<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,40],"tags":[],"class_list":["post-358","post","type-post","status-publish","format-standard","hentry","category-ai-machine-learning","category-azure"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/358","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=358"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/358\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=358"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=358"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=358"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}