{"id":362,"date":"2026-04-13T19:26:55","date_gmt":"2026-04-13T19:26:55","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/azure-document-intelligence-in-foundry-tools-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-ai-machine-learning\/"},"modified":"2026-04-13T19:26:55","modified_gmt":"2026-04-13T19:26:55","slug":"azure-document-intelligence-in-foundry-tools-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-ai-machine-learning","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/azure-document-intelligence-in-foundry-tools-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-ai-machine-learning\/","title":{"rendered":"Azure Document Intelligence in Foundry Tools Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for AI + Machine Learning"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p>AI + Machine Learning<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What this service is<\/h3>\n\n\n\n<p><strong>Azure Document Intelligence in Foundry Tools<\/strong> refers to using <strong>Azure Document Intelligence<\/strong> (an Azure AI service for extracting text and structured data from documents) from within the <strong>Azure AI Foundry<\/strong> developer experience and tooling (often described as \u201cFoundry tools\u201d), such as projects, workflows, and app-building utilities.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">One-paragraph simple explanation<\/h3>\n\n\n\n<p>If you have PDFs, scans, invoices, receipts, contracts, or forms and you want to turn them into usable data (fields, tables, key-value pairs), <strong>Azure Document Intelligence in Foundry Tools<\/strong> is a practical way to build and operate that extraction workflow in Azure\u2014so your apps can reliably read documents and feed the results into downstream systems.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">One-paragraph technical explanation<\/h3>\n\n\n\n<p>Azure Document Intelligence provides <strong>prebuilt document models<\/strong> (like invoice\/receipt\/ID) and <strong>custom extraction models<\/strong> for your organization\u2019s document types. You send documents to the service via API (REST\/SDK), and it returns structured outputs with text, detected layout elements, and extracted fields with confidence. When used \u201cin Foundry Tools,\u201d you typically integrate those calls into an end-to-end AI application workflow (for example, orchestrating extraction + validation + enrichment) managed in the Azure AI Foundry environment.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What problem it solves<\/h3>\n\n\n\n<p>Organizations often store critical information in unstructured documents. Manual data entry is slow, inconsistent, and expensive. <strong>Azure Document Intelligence in Foundry Tools<\/strong> solves this by:\n&#8211; Automating extraction of structured data from documents at scale\n&#8211; Reducing human effort while improving consistency and traceability\n&#8211; Enabling new AI + Machine Learning workflows such as retrieval, compliance checks, and downstream analytics<\/p>\n\n\n\n<blockquote>\n<p>Naming note (important): <strong>Azure Document Intelligence<\/strong> is the current name for what many teams previously knew as <strong>Azure Form Recognizer<\/strong>. If you find older blogs or SDK references, treat them as legacy and <strong>verify in official docs<\/strong> before implementing.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is Azure Document Intelligence in Foundry Tools?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Official purpose<\/h3>\n\n\n\n<p>The purpose of <strong>Azure Document Intelligence in Foundry Tools<\/strong> is to let teams build document-processing and data-extraction solutions using <strong>Azure Document Intelligence<\/strong> as the extraction engine, while using <strong>Azure AI Foundry tools<\/strong> to organize, orchestrate, and operationalize the workflow as part of broader AI + Machine Learning solutions.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Core capabilities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Extract text (OCR), tables, key-value pairs, and layout structure<\/li>\n<li>Use <strong>prebuilt models<\/strong> for common document types (for example invoices and receipts)<\/li>\n<li>Build <strong>custom models<\/strong> for organization-specific documents<\/li>\n<li>Integrate extraction results into downstream systems (databases, ERPs, data lakes, search indexes, LLM pipelines)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Major components<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Azure Document Intelligence resource<\/strong> (billable AI service endpoint in your Azure subscription)<\/li>\n<li><strong>Models<\/strong>:<\/li>\n<li>Prebuilt models (document-type specific)<\/li>\n<li>Custom models (trained or composed for your formats)<\/li>\n<li>Layout\/OCR capabilities for structure and text<\/li>\n<li><strong>Client access<\/strong>:<\/li>\n<li>REST API<\/li>\n<li>SDKs (for example Python, .NET, Java, JavaScript\u2014verify the latest in official docs)<\/li>\n<li><strong>Foundry tools layer<\/strong> (Azure AI Foundry):<\/li>\n<li>Project\/workspace organization for AI apps<\/li>\n<li>Tooling to connect to resources and orchestrate tasks (capabilities vary\u2014verify in official docs for your tenant\/region)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Service type<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed AI service<\/strong> (PaaS) for document processing and information extraction.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scope and availability model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Resource scope<\/strong>: provisioned inside an <strong>Azure subscription<\/strong> and <strong>resource group<\/strong><\/li>\n<li><strong>Regional<\/strong>: you choose an Azure region when creating the Document Intelligence resource. Data processing and feature availability can vary by region\u2014<strong>verify region support<\/strong> in official documentation.<\/li>\n<li><strong>Foundry scope<\/strong>: Azure AI Foundry typically organizes work in <strong>projects<\/strong> tied to your Azure environment. The Foundry tools themselves are not usually billed like a separate compute service, but they can connect to billable resources (Document Intelligence, storage, logging, etc.).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How it fits into the Azure ecosystem<\/h3>\n\n\n\n<p>Azure Document Intelligence is part of the broader Azure AI services portfolio and commonly integrates with:\n&#8211; <strong>Azure Storage<\/strong> (Blob Storage) for document ingestion and archiving\n&#8211; <strong>Azure Functions \/ Logic Apps<\/strong> for automation and event-driven workflows\n&#8211; <strong>Azure Key Vault<\/strong> for secrets management\n&#8211; <strong>Azure Monitor \/ Log Analytics<\/strong> for logging and metrics\n&#8211; <strong>Azure AI Foundry<\/strong> for building AI workflows and applications\n&#8211; <strong>Azure OpenAI<\/strong> for summarization, validation, or routing after extraction (when appropriate)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use Azure Document Intelligence in Foundry Tools?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Reduce manual data entry and document handling costs<\/li>\n<li>Improve turnaround time for document-heavy processes (AP, onboarding, claims)<\/li>\n<li>Increase consistency and auditability in extracted data<\/li>\n<li>Accelerate automation initiatives without building OCR\/extraction from scratch<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prebuilt models handle common business documents with minimal setup<\/li>\n<li>Custom models support specialized formats and organization-specific fields<\/li>\n<li>Strong integration patterns across Azure (identity, networking, monitoring)<\/li>\n<li>Production-grade APIs for scalable ingestion and processing<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Centralized resource management, diagnostics, and quota governance<\/li>\n<li>Repeatable workflows (dev\/test\/prod) when combined with Foundry tools and standard Azure practices<\/li>\n<li>Easier collaboration across data scientists, app developers, and platform teams<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enterprise controls via Azure (RBAC, Key Vault, private networking options)<\/li>\n<li>Logging and auditing with Azure Monitor<\/li>\n<li>Region selection for data residency (subject to service behavior\u2014<strong>verify in official docs<\/strong>)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Designed for high-throughput document processing patterns<\/li>\n<li>Supports asynchronous processing patterns (implementation dependent\u2014verify API patterns in docs)<\/li>\n<li>Can be placed behind private endpoints for internal-only access (where supported)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose it<\/h3>\n\n\n\n<p>Choose <strong>Azure Document Intelligence in Foundry Tools<\/strong> when:\n&#8211; You need repeatable extraction from PDFs\/scans and want structured results\n&#8211; You want to build an AI + Machine Learning pipeline that starts with documents\n&#8211; You need enterprise governance: identity, network controls, monitoring, and cost tracking<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When they should not choose it<\/h3>\n\n\n\n<p>Avoid or reconsider when:\n&#8211; You only need basic OCR for a few images and a simpler OCR tool suffices\n&#8211; You need fully offline\/on-prem processing with no cloud calls (unless using hybrid patterns)\n&#8211; Your documents are extremely specialized and require heavy bespoke computer vision research (you may need a custom ML approach, possibly with Azure Machine Learning)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is Azure Document Intelligence in Foundry Tools used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Finance and accounting (invoices, receipts, statements)<\/li>\n<li>Insurance (claims forms, evidence documents)<\/li>\n<li>Healthcare (intake forms; ensure strong compliance review)<\/li>\n<li>Government (applications, permits)<\/li>\n<li>Legal (contracts, exhibits\u2014often paired with search and summarization)<\/li>\n<li>Logistics and manufacturing (bills of lading, packing lists)<\/li>\n<li>HR (onboarding documents)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>App development teams integrating extraction into business apps<\/li>\n<li>Data engineering teams building ingestion pipelines<\/li>\n<li>Platform\/security teams implementing governance and private networking<\/li>\n<li>AI + Machine Learning teams combining extraction with classification\/summarization<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Document ingestion pipelines (batch and event-driven)<\/li>\n<li>Back-office automation (AP automation, onboarding)<\/li>\n<li>Content intelligence for search and analytics<\/li>\n<li>AI agent workflows that need reliable structured data from documents<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Event-driven: Blob upload \u2192 Event Grid \u2192 Function \u2192 Document Intelligence \u2192 Database<\/li>\n<li>Batch: scheduled jobs processing a backlog of documents<\/li>\n<li>Interactive: user uploads document in a web app and gets extracted fields back<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Real-world deployment contexts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Multi-environment deployments (dev\/test\/prod) with separate resources<\/li>\n<li>Multi-region strategies for latency or residency<\/li>\n<li>Centralized extraction service used by multiple internal apps via API<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Production vs dev\/test usage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Dev\/test<\/strong>: small sample sets, lower throughput, relaxed SLA, careful with PII<\/li>\n<li><strong>Production<\/strong>: private endpoints (where applicable), Key Vault, monitoring, retries, queue-based decoupling, cost controls, and documented runbooks<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p>Below are realistic scenarios where <strong>Azure Document Intelligence in Foundry Tools<\/strong> is a strong fit.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Accounts Payable invoice extraction<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Invoices arrive in many formats; manual entry is slow and error-prone.<\/li>\n<li><strong>Why this fits<\/strong>: Prebuilt invoice extraction + structured fields and tables.<\/li>\n<li><strong>Example<\/strong>: A finance team processes thousands of vendor invoices daily and automatically posts extracted totals and line items to an ERP after validation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Receipt capture for expense reporting<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Employees submit receipts as photos; inconsistent quality and formats.<\/li>\n<li><strong>Why this fits<\/strong>: Prebuilt receipt model + OCR and normalized fields.<\/li>\n<li><strong>Example<\/strong>: A mobile app uploads receipt images; extracted merchant\/date\/total are pre-filled in expense forms.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Customer onboarding packet processing<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Multiple forms and IDs must be processed and checked.<\/li>\n<li><strong>Why this fits<\/strong>: Document extraction + ID parsing patterns (capabilities depend on region\/SKU\u2014verify).<\/li>\n<li><strong>Example<\/strong>: A bank extracts identity fields and address details, then routes exceptions to manual review.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Claims intake automation (insurance)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Claims arrive with mixed document types and attachments.<\/li>\n<li><strong>Why this fits<\/strong>: Combine classification + extraction, store structured outputs.<\/li>\n<li><strong>Example<\/strong>: A claims portal ingests PDFs, extracts claimant details, and triggers downstream workflows.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Contract metadata extraction for search<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Contracts are PDFs; searching by parties\/dates\/terms is hard.<\/li>\n<li><strong>Why this fits<\/strong>: Layout + custom field extraction to build a metadata index.<\/li>\n<li><strong>Example<\/strong>: Legal ops extracts effective date, parties, renewal terms into a search index.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Shipping and logistics document processing<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Bills of lading and packing lists vary by carrier and region.<\/li>\n<li><strong>Why this fits<\/strong>: Custom model for organization-specific formats; table extraction for line items.<\/li>\n<li><strong>Example<\/strong>: A logistics system extracts SKU quantities and shipment IDs and reconciles inventory.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Compliance evidence collection<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Compliance teams need structured evidence from scanned documents.<\/li>\n<li><strong>Why this fits<\/strong>: Repeatable extraction pipeline + auditable logs\/metrics.<\/li>\n<li><strong>Example<\/strong>: Automated extraction populates compliance records and flags missing fields.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Clinical form digitization (with strict controls)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Paper forms must become structured records; privacy is critical.<\/li>\n<li><strong>Why this fits<\/strong>: Managed service with Azure security primitives; private networking options.<\/li>\n<li><strong>Example<\/strong>: Intake forms are digitized; extracted data is stored in a secure database with access controls.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Real estate document intake<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Leases and disclosures include tables and key fields.<\/li>\n<li><strong>Why this fits<\/strong>: Layout understanding + custom extraction for domain fields.<\/li>\n<li><strong>Example<\/strong>: A property management platform extracts rent, term, and deposit information into the system of record.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Manufacturing QA documentation<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Certificates and test reports arrive as PDFs\/scans.<\/li>\n<li><strong>Why this fits<\/strong>: Custom extraction + consistent structured output.<\/li>\n<li><strong>Example<\/strong>: Extract serial numbers, test dates, and pass\/fail metrics for audit trails.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) Student admissions document processing<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Transcripts and forms need indexing.<\/li>\n<li><strong>Why this fits<\/strong>: OCR + custom model for specific transcript layouts.<\/li>\n<li><strong>Example<\/strong>: Extract student ID, GPA, and completion dates, then route to reviewers when confidence is low.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) LLM\/RAG pipeline \u201cdocument-to-structure\u201d stage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: LLMs are unreliable at raw OCR and precise field extraction.<\/li>\n<li><strong>Why this fits<\/strong>: Deterministic extraction first, then LLM summarization\/QA on structured results.<\/li>\n<li><strong>Example<\/strong>: Extract invoice tables and totals with Document Intelligence, then ask an LLM to explain anomalies.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<blockquote>\n<p>Feature availability can vary by region, API version, and SKU. For anything you plan to standardize across environments, <strong>verify in official docs<\/strong>.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">6.1 Prebuilt document models<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Extracts structured fields from common document types (for example invoices and receipts).<\/li>\n<li><strong>Why it matters<\/strong>: Minimal setup; fast time-to-value.<\/li>\n<li><strong>Practical benefit<\/strong>: You can build an MVP without collecting\/training data.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Field sets and accuracy depend on document quality and supported languages; not all regions support the same prebuilt models.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.2 Layout extraction (document structure + OCR)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Detects text, lines, words, selection marks, tables, and layout geometry (bounding regions).<\/li>\n<li><strong>Why it matters<\/strong>: Layout is the foundation for downstream parsing and custom extraction.<\/li>\n<li><strong>Practical benefit<\/strong>: Enables table extraction and reliable citation of where data came from in the page.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Scans with skew, blur, or low resolution reduce accuracy; handwriting support depends on service capabilities\u2014verify.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.3 Custom models for specialized documents<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Lets you train or configure models to extract fields from your own document formats.<\/li>\n<li><strong>Why it matters<\/strong>: Real businesses rarely have one standardized template.<\/li>\n<li><strong>Practical benefit<\/strong>: Consistent extraction from vendor-specific forms, internal templates, and region-specific documents.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Requires representative sample documents; model lifecycle management (retraining, drift monitoring) is your responsibility.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.4 Confidence scores and traceability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Returns confidence values for extracted fields and often includes bounding regions for where the field was found.<\/li>\n<li><strong>Why it matters<\/strong>: You can route low-confidence results to human review.<\/li>\n<li><strong>Practical benefit<\/strong>: Enables \u201chuman-in-the-loop\u201d QA and compliance.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Confidence is not a guarantee of correctness; calibrate thresholds with real validation data.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.5 Batch and asynchronous-friendly patterns<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Supports processing patterns suitable for higher volume document ingestion (commonly via async pollers in SDKs).<\/li>\n<li><strong>Why it matters<\/strong>: Production workloads often require concurrency and backpressure controls.<\/li>\n<li><strong>Practical benefit<\/strong>: Better throughput and resilience compared to synchronous \u201csingle request\u201d patterns.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: You must implement retries, idempotency, and queue-based decoupling for reliability.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.6 Integration with Azure security primitives<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Works with Azure resource management, keys, and (in many Azure AI services) Microsoft Entra ID \/ RBAC.<\/li>\n<li><strong>Why it matters<\/strong>: Helps meet enterprise security requirements.<\/li>\n<li><strong>Practical benefit<\/strong>: Central access control, auditing, and secret rotation.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Exact auth methods vary by resource type and configuration\u2014verify Entra ID support for your setup in official docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.7 Networking controls (where supported)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Can be integrated with private networking patterns (for example private endpoints) to avoid public exposure.<\/li>\n<li><strong>Why it matters<\/strong>: Many document workloads include sensitive data.<\/li>\n<li><strong>Practical benefit<\/strong>: Reduces exfiltration risk and simplifies compliance arguments.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Private networking patterns can complicate development\/testing and require DNS planning.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.8 Tooling for testing and iteration (Studio\/Foundry tools)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Provides UI-based testing (for example Document Intelligence Studio) and workflow integration approaches through Foundry tools.<\/li>\n<li><strong>Why it matters<\/strong>: Faster iteration on extraction results without writing full apps first.<\/li>\n<li><strong>Practical benefit<\/strong>: Rapid evaluation of model fit, sample documents, and output structure.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Tool capabilities evolve; treat UI as a helper, not a substitute for automated tests.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level architecture<\/h3>\n\n\n\n<p>At runtime, an application or workflow:\n1. Receives a document (upload, email, portal submission, or batch input)\n2. Stores it (optional but common) in Blob Storage for durability and audit\n3. Calls <strong>Azure Document Intelligence<\/strong> to analyze it using a chosen model\n4. Receives structured output (fields\/tables\/text + confidence)\n5. Validates, enriches, and stores results in downstream systems\n6. Monitors performance, errors, and costs via Azure Monitor<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Request\/data\/control flow<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data flow<\/strong>: Document bytes or a document URL \u2192 Document Intelligence \u2192 structured extraction result<\/li>\n<li><strong>Control flow<\/strong>: Orchestrator (Function\/Logic App\/Foundry workflow) schedules work, retries on transient errors, and routes exceptions<\/li>\n<li><strong>Human-in-the-loop<\/strong>: Low-confidence results can be routed to a review UI or task queue<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Integrations with related services<\/h3>\n\n\n\n<p>Common Azure integrations:\n&#8211; <strong>Azure Storage (Blob)<\/strong>: document staging, immutability policies, lifecycle management\n&#8211; <strong>Azure Functions<\/strong>: event-driven processing\n&#8211; <strong>Azure Logic Apps<\/strong>: workflow automation and connectors\n&#8211; <strong>Azure Service Bus \/ Storage Queues<\/strong>: decouple ingestion from processing\n&#8211; <strong>Azure SQL \/ Cosmos DB<\/strong>: store extracted structured data\n&#8211; <strong>Azure AI Search<\/strong>: index extracted text and metadata for retrieval\n&#8211; <strong>Azure Key Vault<\/strong>: manage keys, secrets, and certificates\n&#8211; <strong>Azure Monitor + Log Analytics<\/strong>: operational visibility\n&#8211; <strong>Azure AI Foundry tools<\/strong>: project organization, workflow building, and collaboration (capabilities vary\u2014verify)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services<\/h3>\n\n\n\n<p>Minimum dependencies are typically:\n&#8211; An Azure subscription\n&#8211; A Document Intelligence resource\nOptional but strongly recommended for production:\n&#8211; Key Vault\n&#8211; Storage\n&#8211; Logging (Log Analytics workspace)\n&#8211; Queueing (Service Bus\/Storage queue)\n&#8211; Private networking (VNet + private endpoints), if required by security policy<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model<\/h3>\n\n\n\n<p>Common approaches:\n&#8211; <strong>API key authentication<\/strong>: simple, but keys must be protected and rotated\n&#8211; <strong>Microsoft Entra ID \/ RBAC<\/strong>: preferred in enterprise where supported; reduces reliance on long-lived secrets (verify support for your resource type)\n&#8211; <strong>Managed identities<\/strong>: ideal for Functions and other Azure compute calling the service without embedded secrets (verify supported auth flows)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Public endpoint by default (restrictable with network rules)<\/li>\n<li>Private endpoint patterns may be available depending on service configuration\u2014<strong>verify<\/strong> requirements, DNS, and supported scenarios<\/li>\n<li>For document URLs: if you provide a URL to Document Intelligence, the service must be able to reach it; private documents typically require <strong>SAS URLs<\/strong> or an internal access pattern (verify supported private access patterns)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring\/logging\/governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enable <strong>diagnostic settings<\/strong> to send logs\/metrics to Log Analytics<\/li>\n<li>Track:<\/li>\n<li>Request counts and latency<\/li>\n<li>Error rates by model<\/li>\n<li>Throttling occurrences (quota\/rate limits)<\/li>\n<li>Cost per document type (page counts and model selection)<\/li>\n<li>Use Azure Policy for:<\/li>\n<li>Required tags (cost center, environment)<\/li>\n<li>Allowed regions<\/li>\n<li>Private endpoint enforcement (where applicable)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  U[User \/ App] --&gt;|PDF\/Image| A[Foundry Tools Workflow or App Logic]\n  A --&gt;|Analyze document| DI[Azure Document Intelligence]\n  DI --&gt; R[Structured Results: fields, tables, text + confidence]\n  R --&gt; S[App \/ Database \/ Search Index]\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph Ingestion\n    UP[Upload Portal \/ Email Intake] --&gt; BLOB[Azure Blob Storage]\n    BLOB --&gt; EG[Event Grid]\n    EG --&gt; Q[Queue: Service Bus or Storage Queue]\n  end\n\n  subgraph Processing\n    Q --&gt; FN[Azure Functions (Managed Identity)]\n    FN --&gt;|Call| DI[Azure Document Intelligence]\n    DI --&gt; OUT[Extraction Output]\n    OUT --&gt; DB[(Azure SQL \/ Cosmos DB)]\n    OUT --&gt; SEARCH[Azure AI Search]\n  end\n\n  subgraph Governance_and_Security\n    KV[Azure Key Vault] --&gt; FN\n    MON[Azure Monitor + Log Analytics] &lt;---&gt; FN\n    MON &lt;---&gt; DI\n    POL[Azure Policy \/ Tags] --&gt; DI\n    VNET[VNet + Private Endpoints (where supported)] --- DI\n    VNET --- BLOB\n  end\n\n  subgraph AI_App_Layer\n    FOUND[Azure AI Foundry Tools (Project\/Workflow)] --&gt; FN\n    FOUND --&gt; SEARCH\n  end\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Account\/subscription\/tenant requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An active <strong>Azure subscription<\/strong><\/li>\n<li>Access to create:<\/li>\n<li>Resource groups<\/li>\n<li>Azure Document Intelligence resources<\/li>\n<li>(Optional) Storage account, Key Vault, Log Analytics workspace<\/li>\n<li>Ability to use <strong>Azure AI Foundry<\/strong> in your tenant (if your organization restricts it, request access)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions \/ IAM roles<\/h3>\n\n\n\n<p>Minimum recommended roles:\n&#8211; For resource creation: <strong>Contributor<\/strong> on the target resource group (or a custom role)\n&#8211; For operational management: <strong>Reader<\/strong> + specific monitoring roles\n&#8211; If using Entra ID auth: appropriate Azure AI service roles (often \u201cCognitive Services User\u201d or similar\u2014<strong>verify exact role names for Document Intelligence in official docs<\/strong>)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Billing requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A payment method on the subscription<\/li>\n<li>Ensure your organization allows provisioning Azure AI services<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">CLI\/SDK\/tools needed<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Azure CLI<\/strong> (optional but helpful): https:\/\/learn.microsoft.com\/cli\/azure\/install-azure-cli<\/li>\n<li><strong>Python 3.9+<\/strong> (recommended for the lab)<\/li>\n<li>A modern browser for Azure Portal and Foundry tools<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Document Intelligence is regional; <strong>choose a supported region<\/strong> for your workloads.<\/li>\n<li>Azure AI Foundry features can vary by region\/tenant\u2014<strong>verify<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas\/limits<\/h3>\n\n\n\n<p>Expect limits around:\n&#8211; Request rate (transactions per second\/minute)\n&#8211; Maximum document size and page count\n&#8211; Concurrency limits per resource\nThese vary by model and API version\u2014<strong>verify in official docs<\/strong> and plan load testing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services (recommended for production)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Azure Storage account (Blob)<\/li>\n<li>Log Analytics workspace<\/li>\n<li>Key Vault<\/li>\n<li>Queueing (Service Bus\/Storage queue)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<blockquote>\n<p>Do not rely on blog posts for pricing\u2014use the official pricing page and calculator for current rates and regional differences.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Current pricing model (how it is typically billed)<\/h3>\n\n\n\n<p>Azure Document Intelligence is generally billed based on:\n&#8211; <strong>Number of pages processed<\/strong> (a primary cost driver)\n&#8211; <strong>Model type used<\/strong> (prebuilt vs layout vs custom can differ)\n&#8211; <strong>Training or build operations<\/strong> for custom models (if applicable)\n&#8211; Potential differences by:\n  &#8211; Region\n  &#8211; SKU\/plan (for example free vs standard tiers where available)\n  &#8211; API version and feature set<\/p>\n\n\n\n<p><strong>Official pricing page<\/strong> (verify current SKUs and units):<br\/>\nhttps:\/\/azure.microsoft.com\/pricing\/details\/ai-document-intelligence\/<\/p>\n\n\n\n<p><strong>Azure Pricing Calculator<\/strong> (estimate your region and volume):<br\/>\nhttps:\/\/azure.microsoft.com\/pricing\/calculator\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing dimensions to track<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Pages\/month by document type (invoice, receipt, general docs, etc.)<\/li>\n<li>Average pages per document<\/li>\n<li>Percent of documents requiring reprocessing<\/li>\n<li>Ratio of prebuilt vs custom model usage<\/li>\n<li>Dev\/test usage (often overlooked)<\/li>\n<li>Outbound data transfer (if exporting results across regions)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Free tier (if applicable)<\/h3>\n\n\n\n<p>Azure services often offer limited free tiers for development. Availability and limits can change\u2014<strong>verify in the official pricing page<\/strong> and in the Azure Portal SKU selection.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Cost drivers<\/h3>\n\n\n\n<p>Direct drivers:\n&#8211; Total pages analyzed\n&#8211; Higher-cost model selection (if applicable)\n&#8211; Re-analysis due to workflow errors (duplicate processing)\n&#8211; Custom model training\/build cycles<\/p>\n\n\n\n<p>Indirect\/hidden drivers:\n&#8211; Storage costs for document retention (Blob)\n&#8211; Network egress (downloading documents\/results across regions or to on-prem)\n&#8211; Logging costs (Log Analytics ingestion volume)\n&#8211; Human review tooling (if you build it)\n&#8211; Queueing and function execution (if used)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Network\/data transfer implications<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Keeping storage and Document Intelligence in the <strong>same region<\/strong> reduces latency and potential egress costs.<\/li>\n<li>If documents must be processed in a different region due to service availability, you may incur data transfer costs and compliance complexity.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use the least-expensive model that meets accuracy needs (for example, don\u2019t use an expensive model when layout\/OCR is sufficient).<\/li>\n<li>Implement <strong>idempotency<\/strong> (avoid double-processing the same document).<\/li>\n<li>Cache results and store structured outputs so you don\u2019t re-run extraction.<\/li>\n<li>Use confidence thresholds and human review only for low-confidence fields rather than reprocessing entire documents.<\/li>\n<li>Control Log Analytics volume (sample logs, adjust retention, filter noisy logs).<\/li>\n<li>Separate dev\/test resources and enforce budgets.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (conceptual)<\/h3>\n\n\n\n<p>A starter pilot might process:\n&#8211; A few hundred documents\/month\n&#8211; 1\u20132 pages per document\n&#8211; Mostly prebuilt invoice or receipt extraction<\/p>\n\n\n\n<p>To estimate:\n1. Count total pages per month\n2. Select your region and model type in the pricing page\/calculator\n3. Add storage costs for retaining originals and results\n4. Add logging costs if diagnostics are enabled<\/p>\n\n\n\n<p>(Exact numbers vary by region and SKU\u2014<strong>use the official calculator<\/strong>.)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p>For production (tens\/hundreds of thousands of pages\/month), consider:\n&#8211; Reserved capacity is not typically how Azure AI services are billed; it\u2019s usually consumption-based\u2014verify if any commitment plans exist for your enterprise agreement.\n&#8211; Use multiple resources if you need higher throughput (but manage governance).\n&#8211; Build dashboards: cost per vendor, per document type, per business unit.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<p>This lab shows a realistic, beginner-friendly way to use <strong>Azure Document Intelligence in Foundry Tools<\/strong>:\n&#8211; Provision Document Intelligence in Azure\n&#8211; Validate extraction quickly in official tooling\n&#8211; Use Azure AI Foundry tools to run a small workflow step that calls Document Intelligence and returns structured fields<\/p>\n\n\n\n<blockquote>\n<p>UI labels in Azure AI Foundry evolve. If a screen name differs, use the nearest equivalent and <strong>verify in official docs<\/strong>.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<p>Extract key fields from an invoice PDF using <strong>Azure Document Intelligence<\/strong>, then run the extraction as a small callable step inside an <strong>Azure AI Foundry<\/strong> workflow\/project context.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p>You will:\n1. Create an Azure Document Intelligence resource\n2. Test extraction using Document Intelligence Studio (tool-based verification)\n3. Create an Azure AI Foundry project and store the Document Intelligence endpoint\/key securely (as secret variables or a connection)\n4. Add a simple Python-based tool\/step that calls the Document Intelligence SDK and prints extracted fields\n5. Validate results\n6. Clean up to avoid charges<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Create a resource group<\/h3>\n\n\n\n<p><strong>Expected outcome<\/strong>: A resource group exists to contain all lab resources.<\/p>\n\n\n\n<p>Option A (Azure Portal):\n1. Go to https:\/\/portal.azure.com\n2. Search <strong>Resource groups<\/strong>\n3. Select <strong>Create<\/strong>\n4. Choose:\n   &#8211; Subscription\n   &#8211; Resource group name (example: <code>rg-di-foundry-lab<\/code>)\n   &#8211; Region (any)<\/p>\n\n\n\n<p>Option B (Azure CLI):<\/p>\n\n\n\n<pre><code class=\"language-bash\">az login\naz account set --subscription \"&lt;YOUR_SUBSCRIPTION_ID&gt;\"\naz group create --name rg-di-foundry-lab --location &lt;YOUR_AZURE_REGION&gt;\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create an Azure Document Intelligence resource<\/h3>\n\n\n\n<p><strong>Expected outcome<\/strong>: You have an endpoint URL and an API key for Document Intelligence.<\/p>\n\n\n\n<p>Azure Portal steps:\n1. In the portal, search for <strong>Document Intelligence<\/strong> (Azure AI Document Intelligence).\n2. Select <strong>Create<\/strong>.\n3. Select:\n   &#8211; Subscription\n   &#8211; Resource group: <code>rg-di-foundry-lab<\/code>\n   &#8211; Region: choose the region that meets your needs\n   &#8211; Pricing tier\/SKU: choose the lowest tier that supports your test (verify free tier availability)\n4. Create the resource.\n5. After deployment:\n   &#8211; Open the resource\n   &#8211; Find <strong>Keys and Endpoint<\/strong>\n   &#8211; Copy:\n     &#8211; <strong>Endpoint<\/strong>\n     &#8211; <strong>Key 1<\/strong> (or Key 2)<\/p>\n\n\n\n<p>Store these securely. For the lab you can temporarily keep them in a password manager.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Quick verification using Document Intelligence Studio<\/h3>\n\n\n\n<p><strong>Expected outcome<\/strong>: You confirm the service works before writing any code.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In your Document Intelligence resource, locate the link to <strong>Document Intelligence Studio<\/strong> (or navigate via official docs entry point).<\/li>\n<li>Choose a <strong>prebuilt invoice<\/strong> (or similar prebuilt model).<\/li>\n<li>Provide a sample invoice:\n   &#8211; Use a known test document (many official quickstarts link to sample docs), or\n   &#8211; Upload a non-sensitive sample PDF you own<\/li>\n<li>Run analysis.<\/li>\n<\/ol>\n\n\n\n<p>Verify you see extracted fields such as invoice number, vendor name, invoice date, total, and line items (exact field names depend on the model).<\/p>\n\n\n\n<p>If Studio fails, do not proceed until you resolve:\n&#8211; wrong region\/SKU\n&#8211; network restrictions\n&#8211; invalid key\/endpoint<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Create an Azure AI Foundry project (Foundry tools)<\/h3>\n\n\n\n<p><strong>Expected outcome<\/strong>: A Foundry project exists where you can run a workflow\/tool that calls Document Intelligence.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Open Azure AI Foundry (start from Azure\u2019s official Foundry landing page or your organization\u2019s entry point).\n   &#8211; Official documentation entry point (verify current URL): https:\/\/learn.microsoft.com\/azure\/ai-studio\/ (Azure AI Studio\/Foundry documentation paths may change)<\/li>\n<li>Create a <strong>Project<\/strong> (or equivalent container).<\/li>\n<li>Associate it with the same Azure subscription\/resource group if prompted.<\/li>\n<\/ol>\n\n\n\n<p>In the project, locate where to configure secrets or connections. Common patterns include:\n&#8211; Project \u201cConnections\u201d\n&#8211; Secret\/environment variables\n&#8211; Linked Key Vault (recommended)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Store Document Intelligence credentials securely for the project<\/h3>\n\n\n\n<p><strong>Expected outcome<\/strong>: Your workflow can access the endpoint\/key without hardcoding them.<\/p>\n\n\n\n<p>Preferred approach (production): <strong>Azure Key Vault + managed identity<\/strong> (more secure).<br\/>\nLab approach: project <strong>secret variables<\/strong> (if available).<\/p>\n\n\n\n<p>Create two secret values in the Foundry project context:\n&#8211; <code>DI_ENDPOINT<\/code> = your Document Intelligence endpoint\n&#8211; <code>DI_KEY<\/code> = your Document Intelligence key<\/p>\n\n\n\n<p>If the Foundry UI supports \u201cconnections\u201d to Azure resources, use that approach instead and reference the connection in your tool step. If you cannot find a connection option, use secret variables.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Create a small Python tool\/step in Foundry tools to call Document Intelligence<\/h3>\n\n\n\n<p><strong>Expected outcome<\/strong>: You run a step that calls Document Intelligence and returns extracted invoice fields.<\/p>\n\n\n\n<p>In your Foundry project, create a workflow\/flow where you can add a <strong>Python tool<\/strong> (naming varies: \u201ctool,\u201d \u201ccomponent,\u201d \u201cstep,\u201d or \u201cnode\u201d).<\/p>\n\n\n\n<p>Use the following Python code as the core logic.<\/p>\n\n\n\n<blockquote>\n<p>SDK note: Azure SDK package names and classes can change across major versions. The safest approach is to start from the <strong>official quickstart<\/strong> for your chosen language and adapt it into this tool step. The code below demonstrates the pattern and should work with the current Document Intelligence SDK family, but <strong>verify imports and install commands in official docs<\/strong>.<\/p>\n<\/blockquote>\n\n\n\n<p>Install dependencies in the environment used by the Foundry tool (if the UI provides a package\/dependency setting). If you can run pip:<\/p>\n\n\n\n<pre><code class=\"language-bash\">pip install azure-ai-documentintelligence\n<\/code><\/pre>\n\n\n\n<p>Python tool code (core extraction logic):<\/p>\n\n\n\n<pre><code class=\"language-python\">import os\n\ndef extract_invoice_fields(invoice_url: str) -&gt; dict:\n    \"\"\"\n    Calls Azure Document Intelligence prebuilt invoice model for a document URL.\n    Returns a small dictionary of commonly used fields.\n    \"\"\"\n    endpoint = os.environ[\"DI_ENDPOINT\"]\n    key = os.environ[\"DI_KEY\"]\n\n    # Verify the latest SDK usage in official docs if this import fails.\n    from azure.ai.documentintelligence import DocumentIntelligenceClient\n    from azure.core.credentials import AzureKeyCredential\n\n    client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))\n\n    # Model ID for prebuilt invoice is commonly \"prebuilt-invoice\" in Document Intelligence.\n    # Verify model IDs in official docs for your API version.\n    poller = client.begin_analyze_document(\n        model_id=\"prebuilt-invoice\",\n        analyze_request={\"urlSource\": invoice_url},\n    )\n    result = poller.result()\n\n    # Extract a few common fields (field names vary; verify with your results)\n    extracted = {\n        \"document_count\": len(result.documents) if getattr(result, \"documents\", None) else 0,\n        \"fields\": {}\n    }\n\n    if getattr(result, \"documents\", None):\n        doc = result.documents[0]\n        fields = getattr(doc, \"fields\", {}) or {}\n\n        def get_field_value(name: str):\n            f = fields.get(name)\n            if not f:\n                return None\n            # SDKs often expose .value or typed properties; verify in your environment.\n            return getattr(f, \"value\", None) or getattr(f, \"content\", None)\n\n        extracted[\"fields\"][\"VendorName\"] = get_field_value(\"VendorName\")\n        extracted[\"fields\"][\"InvoiceId\"] = get_field_value(\"InvoiceId\")\n        extracted[\"fields\"][\"InvoiceDate\"] = get_field_value(\"InvoiceDate\")\n        extracted[\"fields\"][\"InvoiceTotal\"] = get_field_value(\"InvoiceTotal\")\n\n    return extracted\n\ndef main(invoice_url: str) -&gt; dict:\n    return extract_invoice_fields(invoice_url)\n<\/code><\/pre>\n\n\n\n<p>Provide an input URL to a sample invoice PDF. For a low-friction test, use a public sample document URL from official documentation or a trusted repository. In production, you typically:\n&#8211; Upload to Blob Storage\n&#8211; Generate a short-lived SAS URL\n&#8211; Analyze using that SAS URL<\/p>\n\n\n\n<p>Run the flow\/tool.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 7: Review output and add a basic validation rule<\/h3>\n\n\n\n<p><strong>Expected outcome<\/strong>: You see extracted field values and can detect missing\/low-quality results.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Confirm output contains some non-null fields.<\/li>\n<li>Add a simple rule in your workflow:\n   &#8211; If <code>InvoiceTotal<\/code> is missing \u2192 route to manual review queue\n   &#8211; If vendor name is missing \u2192 attempt fallback (layout extraction + regex) or manual review<\/li>\n<\/ol>\n\n\n\n<p>Even for prebuilt models, you should expect occasional misses due to scan quality or unusual templates.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p>Use this checklist:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Studio validation<\/strong>:<\/li>\n<li>Document Intelligence Studio shows extracted invoice fields for the same sample document<\/li>\n<li><strong>Workflow validation<\/strong>:<\/li>\n<li>Foundry tool run returns a dictionary with non-empty fields<\/li>\n<li><strong>Azure resource validation<\/strong>:<\/li>\n<li>In the Azure Portal, your Document Intelligence resource shows recent activity (metrics\/requests) after running the flow (exact metric names vary\u2014verify)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<p>Common issues and fixes:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>Authentication error (401\/403)<\/strong>\n   &#8211; Verify <code>DI_ENDPOINT<\/code> is correct (no typos, correct region endpoint).\n   &#8211; Verify <code>DI_KEY<\/code> is correct and not expired\/rotated.\n   &#8211; If using Entra ID auth, verify role assignments and token scope (advanced; verify in docs).<\/p>\n<\/li>\n<li>\n<p><strong>Model not found \/ invalid model ID<\/strong>\n   &#8211; Confirm model ID in official docs for your API version.\n   &#8211; Prebuilt model names can differ across API versions\u2014verify.<\/p>\n<\/li>\n<li>\n<p><strong>Document URL not accessible<\/strong>\n   &#8211; If using a URL, ensure it is publicly reachable or is a valid SAS URL.\n   &#8211; Some private network setups block outbound access; confirm network rules.<\/p>\n<\/li>\n<li>\n<p><strong>Timeouts \/ throttling<\/strong>\n   &#8211; Reduce concurrency in tests.\n   &#8211; Implement retry with exponential backoff in production.\n   &#8211; Verify quota limits and request rate caps.<\/p>\n<\/li>\n<li>\n<p><strong>Unexpected empty fields<\/strong>\n   &#8211; Try a higher-quality PDF or image.\n   &#8211; Confirm the document matches the model type (invoice vs receipt).\n   &#8211; Validate results in Studio to isolate whether it\u2019s a workflow\/code issue or an extraction issue.<\/p>\n<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p><strong>Expected outcome<\/strong>: You stop ongoing charges.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Delete the resource group:<\/li>\n<\/ol>\n\n\n\n<pre><code class=\"language-bash\">az group delete --name rg-di-foundry-lab --yes --no-wait\n<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\" start=\"2\">\n<li>\n<p>In Azure AI Foundry:\n&#8211; Delete the project (if it was created only for this lab), or\n&#8211; Remove stored secrets\/connections and any uploaded documents.<\/p>\n<\/li>\n<li>\n<p>If you created:\n&#8211; Log Analytics workspace: confirm retention settings or delete it\n&#8211; Storage account: delete it to avoid ongoing storage costs<\/p>\n<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Decouple ingestion from extraction<\/strong> using queues. This prevents spikes from breaking your pipeline.<\/li>\n<li>Store original documents in <strong>Blob Storage<\/strong> with lifecycle policies (hot \u2192 cool \u2192 archive).<\/li>\n<li>Keep extraction outputs in a <strong>structured store<\/strong> (SQL\/Cosmos DB) and optionally index text in <strong>Azure AI Search<\/strong>.<\/li>\n<li>Use a <strong>multi-stage pipeline<\/strong>:\n  1) classify document type (if needed)<br\/>\n  2) extract fields<br\/>\n  3) validate<br\/>\n  4) enrich\/normalize<br\/>\n  5) publish to downstream systems  <\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer <strong>managed identities<\/strong> and <strong>Entra ID\/RBAC<\/strong> where supported.<\/li>\n<li>If using API keys:<\/li>\n<li>Store keys in <strong>Key Vault<\/strong><\/li>\n<li>Rotate keys regularly<\/li>\n<li>Restrict access via RBAC to Key Vault secrets<\/li>\n<li>Use least privilege. Split roles by:<\/li>\n<li>operators (monitoring)<\/li>\n<li>developers (read-only to prod logs)<\/li>\n<li>CI\/CD (deploy permissions)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Track <strong>pages processed<\/strong> by environment and workload.<\/li>\n<li>Prevent duplicate processing with content hashes or blob metadata.<\/li>\n<li>Use budgets and alerts in Azure Cost Management.<\/li>\n<li>Turn on verbose logging only for debugging; keep normal operations lean.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use async processing patterns and concurrency controls.<\/li>\n<li>Keep storage and Document Intelligence in the same region when possible.<\/li>\n<li>Use batching and queues; avoid synchronous processing in user-facing request paths for large PDFs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Implement retries with exponential backoff for transient errors.<\/li>\n<li>Make processing idempotent (safe to retry without duplicating downstream effects).<\/li>\n<li>Use dead-letter queues for documents that repeatedly fail.<\/li>\n<li>Track \u201cpoison documents\u201d and handle them separately.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enable diagnostics to Log Analytics; create dashboards for:<\/li>\n<li>success rate by model<\/li>\n<li>latency percentiles<\/li>\n<li>throttling events<\/li>\n<li>Maintain runbooks:<\/li>\n<li>key rotation<\/li>\n<li>quota increase requests<\/li>\n<li>incident response steps<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Tag resources: <code>env<\/code>, <code>app<\/code>, <code>owner<\/code>, <code>costCenter<\/code>, <code>dataClassification<\/code><\/li>\n<li>Use consistent naming:<\/li>\n<li><code>di-&lt;app&gt;-&lt;env&gt;-&lt;region&gt;<\/code><\/li>\n<li>Use Azure Policy for:<\/li>\n<li>allowed regions<\/li>\n<li>required tags<\/li>\n<li>private endpoint enforcement (if applicable)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use Azure RBAC wherever supported to avoid distributing API keys.<\/li>\n<li>For API keys:<\/li>\n<li>Treat keys as secrets<\/li>\n<li>Avoid storing them in code, repos, or build logs<\/li>\n<li>Prefer Key Vault references in runtime environments<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Azure services typically encrypt data at rest by default; verify specifics for Document Intelligence.<\/li>\n<li>Ensure TLS is used in transit (HTTPS endpoints).<\/li>\n<li>If storing documents in Blob Storage, consider:<\/li>\n<li>customer-managed keys (CMK) for encryption (where required)<\/li>\n<li>immutability policies for compliance evidence<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Restrict public access where possible.<\/li>\n<li>Use private endpoints and VNet integration patterns when supported and required.<\/li>\n<li>Avoid exposing document analysis endpoints to the public internet without strict controls.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Key Vault for:<\/li>\n<li>Document Intelligence keys (if used)<\/li>\n<li>Storage SAS signing keys (avoid long-lived SAS; prefer short-lived SAS)<\/li>\n<li>Rotate keys and validate rotation does not break workflows.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enable diagnostic logs and maintain retention aligned to compliance needs.<\/li>\n<li>Monitor for:<\/li>\n<li>spikes in usage<\/li>\n<li>repeated auth failures<\/li>\n<li>unexpected regions or client IPs (where logged)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Documents often contain PII\/PHI\/financial data.<\/li>\n<li>Ensure:<\/li>\n<li>data classification policy<\/li>\n<li>least-privilege access controls<\/li>\n<li>region selection aligns with residency requirements<\/li>\n<li>vendor risk and compliance review is completed<\/li>\n<li>For regulated workloads, confirm whether additional controls are required (DLP, private networking, CMK, etc.).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Hardcoding API keys in application configuration files checked into source control<\/li>\n<li>Using long-lived SAS URLs for document access<\/li>\n<li>Not restricting who can read extracted results (often more sensitive than the original document)<\/li>\n<li>Logging raw document content or full extraction payloads unnecessarily<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Private endpoint + Key Vault + managed identity (where supported) for production<\/li>\n<li>Separate dev\/test\/prod subscriptions or at least resource groups with strong RBAC boundaries<\/li>\n<li>Document retention policy: delete or archive documents based on compliance requirements<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<blockquote>\n<p>The exact limits depend on model, region, and API version\u2014verify in official docs before committing to an architecture.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Known limitations (typical)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Quality dependency<\/strong>: poor scans, skewed photos, and low resolution reduce extraction quality.<\/li>\n<li><strong>Template variance<\/strong>: extreme layout differences may require custom models or multiple models.<\/li>\n<li><strong>Language support<\/strong> varies (OCR accuracy and prebuilt model support differ by language).<\/li>\n<li><strong>URL accessibility<\/strong>: analyzing by URL requires the service to fetch the document from that URL (SAS\/public reachability considerations).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Rate limits and concurrency caps can cause throttling under load.<\/li>\n<li>You may need multiple resources or a quota increase process\u2014verify Azure support paths.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Regional constraints<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Not every region supports every feature\/model. Plan a region strategy early.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing surprises<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Large PDFs with many pages can multiply costs quickly.<\/li>\n<li>Reprocessing documents due to workflow bugs doubles spend.<\/li>\n<li>Diagnostic logging can increase operational cost (Log Analytics ingestion).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compatibility issues<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Legacy SDKs may reference \u201cForm Recognizer\u201d naming; ensure you\u2019re using current packages and endpoints.<\/li>\n<li>Output field names can change by model version; do not hardcode assumptions without tests.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational gotchas<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Missing idempotency causes duplicates.<\/li>\n<li>Lack of dead-letter handling leads to silent data loss.<\/li>\n<li>Teams forget to delete dev resources (ongoing base costs like storage\/logs remain).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Migration challenges<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Migrating from legacy Form Recognizer endpoints\/SDKs may require code changes.<\/li>\n<li>Custom model management differs by API versions\u2014plan time for validation and regression testing.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Vendor-specific nuances<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Document Intelligence is best at extraction; it is not a full BPM\/workflow solution by itself. Use Functions\/Logic Apps\/Foundry tools for orchestration.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">How to think about alternatives<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If you need <strong>field-level structured extraction<\/strong> from documents, Document Intelligence is usually the closest match.<\/li>\n<li>If you need only <strong>basic OCR<\/strong>, a simpler OCR service may be enough.<\/li>\n<li>If you need <strong>end-to-end document workflow<\/strong>, pair extraction with orchestration services.<\/li>\n<li>If you want <strong>open-source\/offline<\/strong>, you trade off managed scaling and enterprise controls.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Comparison table<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Azure Document Intelligence in Foundry Tools<\/strong><\/td>\n<td>Structured extraction at scale integrated into AI workflows<\/td>\n<td>Prebuilt + custom extraction, Azure governance, integrates with Foundry tools<\/td>\n<td>Consumption cost at scale, model\/region variability, requires pipeline engineering<\/td>\n<td>When documents are core inputs to AI + Machine Learning workflows and you want managed extraction<\/td>\n<\/tr>\n<tr>\n<td>Azure AI Vision OCR<\/td>\n<td>Basic OCR needs<\/td>\n<td>Simpler OCR scenarios, can be cost-effective for pure text<\/td>\n<td>Not the same as prebuilt invoice\/receipt structured extraction<\/td>\n<td>When you only need text, not structured fields\/tables<\/td>\n<\/tr>\n<tr>\n<td>Azure OpenAI (LLM-only extraction)<\/td>\n<td>Flexible parsing for highly variable docs (after OCR)<\/td>\n<td>Great for summarization and reasoning<\/td>\n<td>Not deterministic; can hallucinate; needs OCR\/structure first<\/td>\n<td>Use after Document Intelligence for enrichment, not as the primary extractor<\/td>\n<\/tr>\n<tr>\n<td>Azure Machine Learning (custom CV\/NLP)<\/td>\n<td>Highly specialized extraction beyond standard models<\/td>\n<td>Full control, custom training<\/td>\n<td>Higher build\/ops complexity<\/td>\n<td>When managed extraction can\u2019t meet requirements and you have ML expertise<\/td>\n<\/tr>\n<tr>\n<td>AWS Textract<\/td>\n<td>Cross-cloud teams on AWS<\/td>\n<td>Strong document extraction features<\/td>\n<td>Different ecosystem; migration\/integration overhead for Azure shops<\/td>\n<td>When you\u2019re already standardized on AWS<\/td>\n<\/tr>\n<tr>\n<td>Google Document AI<\/td>\n<td>Google Cloud ecosystems<\/td>\n<td>Specialized processors and tooling<\/td>\n<td>Different ecosystem; may not align with Azure governance<\/td>\n<td>When you\u2019re standardized on Google Cloud<\/td>\n<\/tr>\n<tr>\n<td>Tesseract + custom pipelines (self-managed)<\/td>\n<td>Offline\/edge constraints<\/td>\n<td>No per-page cloud fees; offline<\/td>\n<td>Ops burden, scaling, accuracy challenges for complex docs<\/td>\n<td>When cloud calls are not allowed and you accept higher engineering cost<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example (regulated finance)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A finance enterprise processes multi-page invoices and supporting documents across multiple subsidiaries. They need consistent extraction, audit trails, and strict access controls.<\/li>\n<li><strong>Proposed architecture<\/strong>:<\/li>\n<li>Documents uploaded to <strong>Blob Storage<\/strong> (immutable retention for audit)<\/li>\n<li><strong>Event Grid<\/strong> triggers ingestion \u2192 <strong>Service Bus<\/strong> queue<\/li>\n<li><strong>Azure Functions<\/strong> (managed identity) calls <strong>Azure Document Intelligence<\/strong><\/li>\n<li>Results stored in <strong>Azure SQL<\/strong> (structured fields) + <strong>Azure AI Search<\/strong> (full text)<\/li>\n<li><strong>Azure AI Foundry tools<\/strong> used to manage the AI app workflow, evaluation, and iteration for downstream enrichment (for example, anomaly detection and summaries using approved models)<\/li>\n<li><strong>Key Vault<\/strong> for secrets, <strong>Private Endpoints<\/strong> for network isolation, <strong>Log Analytics<\/strong> for monitoring<\/li>\n<li><strong>Why this service was chosen<\/strong>:<\/li>\n<li>Strong structured extraction capability<\/li>\n<li>Fits Azure governance and networking model<\/li>\n<li>Supports scalable ingestion patterns<\/li>\n<li><strong>Expected outcomes<\/strong>:<\/li>\n<li>Reduced manual entry<\/li>\n<li>Measurable SLA improvements<\/li>\n<li>Clear auditability (who processed what, when, and with what result confidence)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example (SaaS expense automation)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A small SaaS team needs to extract receipt and invoice fields quickly without building OCR and parsers.<\/li>\n<li><strong>Proposed architecture<\/strong>:<\/li>\n<li>Simple web app uploads to Blob Storage<\/li>\n<li>App calls Document Intelligence directly for receipt\/invoice extraction<\/li>\n<li>Results stored in a managed DB<\/li>\n<li>Minimal Foundry tools usage to standardize the AI workflow and keep experimentation organized as the product grows<\/li>\n<li><strong>Why this service was chosen<\/strong>:<\/li>\n<li>Fast MVP via prebuilt models<\/li>\n<li>Simple API integration<\/li>\n<li>Clear scaling path to queues\/functions later<\/li>\n<li><strong>Expected outcomes<\/strong>:<\/li>\n<li>Faster product iteration<\/li>\n<li>Lower operational burden compared to self-managed OCR<\/li>\n<li>Predictable cost model based on pages processed<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>Is \u201cAzure Document Intelligence\u201d the same as \u201cForm Recognizer\u201d?<\/strong><br\/>\n   Azure Document Intelligence is the current product naming. \u201cForm Recognizer\u201d is commonly used in older SDKs, samples, and blog posts. Verify the latest naming, endpoints, and SDK packages in official docs.<\/p>\n<\/li>\n<li>\n<p><strong>What does \u201cin Foundry Tools\u201d mean?<\/strong><br\/>\n   It generally means you\u2019re using Document Intelligence as part of an AI application workflow built and managed using Azure AI Foundry tooling (projects, workflow orchestration tools, connections, and app-building utilities).<\/p>\n<\/li>\n<li>\n<p><strong>Do I need Azure AI Foundry to use Azure Document Intelligence?<\/strong><br\/>\n   No. Document Intelligence can be called from any application via API\/SDK. Foundry tools help structure and operationalize the broader AI workflow.<\/p>\n<\/li>\n<li>\n<p><strong>Can I process documents stored in private Blob Storage?<\/strong><br\/>\n   Yes, commonly via <strong>short-lived SAS URLs<\/strong> or by uploading document bytes directly (depending on API\/SDK patterns). Verify the recommended secure approach in official docs.<\/p>\n<\/li>\n<li>\n<p><strong>Should I send document bytes or use URLs?<\/strong><br\/>\n   For many production patterns, you store documents in Blob and use SAS URLs for analysis, or you send bytes directly from your service. URLs simplify some flows but require careful access control.<\/p>\n<\/li>\n<li>\n<p><strong>How do I choose between prebuilt and custom models?<\/strong><br\/>\n   Start with prebuilt models for common document types. Use custom models when your document formats are unique or when prebuilt output doesn\u2019t match required fields.<\/p>\n<\/li>\n<li>\n<p><strong>How accurate is it?<\/strong><br\/>\n   Accuracy depends on document quality, template variability, language, and model selection. Always validate with representative samples and implement human review for low-confidence fields.<\/p>\n<\/li>\n<li>\n<p><strong>Can it extract tables and line items?<\/strong><br\/>\n   Many document models and layout extraction support table detection. For invoices, line items are a common requirement; validate with your invoice layouts.<\/p>\n<\/li>\n<li>\n<p><strong>What\u2019s the best practice for human review?<\/strong><br\/>\n   Use confidence thresholds and route exceptions (not all documents) to a review queue. Store the bounding region so reviewers can quickly confirm fields.<\/p>\n<\/li>\n<li>\n<p><strong>Does it support Microsoft Entra ID authentication?<\/strong><br\/>\n   Many Azure AI services support Entra ID\/RBAC; support can vary by resource type and configuration. Verify the current authentication options for Document Intelligence in official docs.<\/p>\n<\/li>\n<li>\n<p><strong>How do I monitor failures and performance?<\/strong><br\/>\n   Enable diagnostic settings, ship logs\/metrics to Log Analytics, and build dashboards for throughput, error rate, latency, and throttling.<\/p>\n<\/li>\n<li>\n<p><strong>How do I avoid reprocessing the same document?<\/strong><br\/>\n   Store a hash of the document content (or blob ETag\/version) and keep a processing ledger. Make processing idempotent.<\/p>\n<\/li>\n<li>\n<p><strong>Is it suitable for real-time user uploads?<\/strong><br\/>\n   For small documents, yes. For large PDFs or unpredictable loads, use async processing with a queue and notify the user when results are ready.<\/p>\n<\/li>\n<li>\n<p><strong>What are the main cost risks?<\/strong><br\/>\n   Unexpected high page counts, repeated processing, and large dev\/test usage. Implement budgets and cost dashboards early.<\/p>\n<\/li>\n<li>\n<p><strong>Can I use it for RAG pipelines?<\/strong><br\/>\n   Yes. It\u2019s often best to extract clean text\/structure first (Document Intelligence), then index in search and use LLMs for summarization\/QA.<\/p>\n<\/li>\n<li>\n<p><strong>Do I need to keep the original documents after extraction?<\/strong><br\/>\n   Depends on compliance and business requirements. Many teams keep originals for audit; others delete quickly for privacy. Decide and implement lifecycle policies.<\/p>\n<\/li>\n<li>\n<p><strong>How does Foundry help operationally?<\/strong><br\/>\n   Foundry tools can help standardize AI app development, manage resource connections, and keep workflows organized. Exact capabilities depend on your Foundry setup\u2014verify in official docs.<\/p>\n<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn Azure Document Intelligence in Foundry Tools<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official documentation<\/td>\n<td>Azure AI Document Intelligence docs: https:\/\/learn.microsoft.com\/azure\/ai-services\/document-intelligence\/<\/td>\n<td>Primary source for concepts, models, limits, API versions, and how-to guides<\/td>\n<\/tr>\n<tr>\n<td>Official pricing<\/td>\n<td>Pricing page: https:\/\/azure.microsoft.com\/pricing\/details\/ai-document-intelligence\/<\/td>\n<td>Current SKUs, billing units, and regional pricing notes<\/td>\n<\/tr>\n<tr>\n<td>Pricing calculator<\/td>\n<td>Azure Pricing Calculator: https:\/\/azure.microsoft.com\/pricing\/calculator\/<\/td>\n<td>Build estimates for your region, volume, and environment split<\/td>\n<\/tr>\n<tr>\n<td>Official quickstarts<\/td>\n<td>Document Intelligence quickstarts (see docs section \u201cQuickstarts\u201d)<\/td>\n<td>Copy\/paste working SDK patterns and authentication approaches<\/td>\n<\/tr>\n<tr>\n<td>Official studio\/tooling<\/td>\n<td>Document Intelligence Studio (linked from docs\/resource)<\/td>\n<td>Rapidly test extraction on real documents and inspect fields\/confidence<\/td>\n<\/tr>\n<tr>\n<td>Foundry documentation<\/td>\n<td>Azure AI Foundry \/ Azure AI Studio docs: https:\/\/learn.microsoft.com\/azure\/ai-studio\/<\/td>\n<td>How to use Foundry tools (projects\/workflows\/connections) that can incorporate Document Intelligence<\/td>\n<\/tr>\n<tr>\n<td>Azure Architecture Center<\/td>\n<td>Azure Architecture Center: https:\/\/learn.microsoft.com\/azure\/architecture\/<\/td>\n<td>Reference architectures for ingestion, event-driven processing, and governance patterns<\/td>\n<\/tr>\n<tr>\n<td>SDK references<\/td>\n<td>Azure SDK docs (language-specific) via https:\/\/learn.microsoft.com\/azure\/<\/td>\n<td>Find current SDK packages\/classes and supported authentication methods<\/td>\n<\/tr>\n<tr>\n<td>Samples (official\/trusted)<\/td>\n<td>Azure samples repositories on GitHub (search \u201cdocument intelligence\u201d under https:\/\/github.com\/Azure-Samples)<\/td>\n<td>Practical examples for end-to-end pipelines (verify recency and API versions)<\/td>\n<\/tr>\n<tr>\n<td>Community learning<\/td>\n<td>Microsoft Learn training paths: https:\/\/learn.microsoft.com\/training\/<\/td>\n<td>Structured learning modules that align with Azure services and fundamentals<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps engineers, cloud engineers, architects<\/td>\n<td>Azure operations, automation, CI\/CD, cloud governance around AI workloads<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Beginners to intermediate<\/td>\n<td>DevOps fundamentals, SDLC tooling, cloud basics useful for AI workload delivery<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud operations teams<\/td>\n<td>Cloud ops practices, monitoring, reliability patterns<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs, platform engineers<\/td>\n<td>Reliability engineering, observability, incident response for cloud services<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops + AI practitioners<\/td>\n<td>AIOps concepts, monitoring automation, ML-assisted ops workflows<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>Cloud\/DevOps training content (verify current offerings)<\/td>\n<td>Beginners to intermediate<\/td>\n<td>https:\/\/rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps training (verify current offerings)<\/td>\n<td>DevOps engineers and students<\/td>\n<td>https:\/\/www.devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps guidance and services (verify current offerings)<\/td>\n<td>Small teams needing practical help<\/td>\n<td>https:\/\/www.devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support\/training resources (verify current offerings)<\/td>\n<td>Operations teams<\/td>\n<td>https:\/\/www.devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company Name<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps\/engineering services (verify exact portfolio)<\/td>\n<td>Delivery support, automation, cloud operations around AI services<\/td>\n<td>Implement event-driven doc ingestion; set up monitoring and cost controls<\/td>\n<td>https:\/\/cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>Training + consulting (verify current consulting offerings)<\/td>\n<td>Enablement, DevOps processes, platform practices<\/td>\n<td>Build CI\/CD for AI workflows; governance standards for AI resources<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting (verify exact offerings)<\/td>\n<td>Cloud operations, SRE practices, delivery pipelines<\/td>\n<td>Observability rollout; incident response runbooks for document pipelines<\/td>\n<td>https:\/\/devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before this service<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Azure fundamentals: subscriptions, resource groups, RBAC, networking basics<\/li>\n<li>Security fundamentals: Key Vault, managed identities, least privilege<\/li>\n<li>Data fundamentals: Blob Storage, queues, basic database concepts<\/li>\n<li>Basic Python\/.NET\/JavaScript development for SDK usage<\/li>\n<li>Monitoring basics: Azure Monitor, Log Analytics, alerts<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after this service<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Workflow orchestration at scale:<\/li>\n<li>Azure Functions, Logic Apps, Service Bus<\/li>\n<li>Search + retrieval:<\/li>\n<li>Azure AI Search indexing patterns<\/li>\n<li>AI enrichment:<\/li>\n<li>Azure OpenAI for summarization, classification, and anomaly explanation (use responsibly)<\/li>\n<li>MLOps and governance:<\/li>\n<li>Model\/version management (especially for custom models)<\/li>\n<li>CI\/CD and environment promotion patterns<\/li>\n<li>Compliance engineering:<\/li>\n<li>Data retention, DLP, access reviews, audit readiness<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use it<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud Engineer \/ Platform Engineer<\/li>\n<li>Solutions Architect<\/li>\n<li>Backend Developer \/ Integration Engineer<\/li>\n<li>Data Engineer<\/li>\n<li>SRE \/ Operations Engineer<\/li>\n<li>AI Engineer (when documents are inputs to AI apps)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (if available)<\/h3>\n\n\n\n<p>Azure certifications change over time. Commonly relevant certifications include:\n&#8211; Azure Fundamentals (AZ-900)\n&#8211; Azure Administrator (AZ-104)\n&#8211; Azure Developer (AZ-204)\n&#8211; Azure Solutions Architect (AZ-305)\n&#8211; AI-focused Azure certifications (verify current offerings on Microsoft Learn)<\/p>\n\n\n\n<p>Always confirm current certification names and objectives on Microsoft Learn:\nhttps:\/\/learn.microsoft.com\/credentials\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Invoice ingestion pipeline with queue + Function + Document Intelligence + SQL<\/li>\n<li>Receipt extraction mobile backend with confidence-based review<\/li>\n<li>Contract metadata extraction + Azure AI Search indexing<\/li>\n<li>Multi-tenant extraction service with per-tenant keys, budgets, and tagging<\/li>\n<li>RAG pipeline: extract \u2192 index \u2192 chat over documents with citations<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Azure Document Intelligence<\/strong>: Azure AI service that extracts structured information from documents (formerly associated with \u201cForm Recognizer\u201d naming).<\/li>\n<li><strong>Foundry Tools \/ Azure AI Foundry<\/strong>: Azure\u2019s app-building and workflow tooling for AI solutions (capabilities vary; verify current docs).<\/li>\n<li><strong>Prebuilt model<\/strong>: A Microsoft-provided model optimized for common document types (for example invoices).<\/li>\n<li><strong>Custom model<\/strong>: A model configured\/trained for your organization\u2019s document layouts and fields.<\/li>\n<li><strong>Layout extraction<\/strong>: Detection of document structure\u2014text, tables, selection marks, and geometry.<\/li>\n<li><strong>OCR<\/strong>: Optical Character Recognition\u2014turning images\/scans into machine-readable text.<\/li>\n<li><strong>Confidence score<\/strong>: A model-provided score indicating extraction certainty; used to drive human review.<\/li>\n<li><strong>Human-in-the-loop<\/strong>: A process where low-confidence results are reviewed\/approved by a person.<\/li>\n<li><strong>SAS URL<\/strong>: Shared Access Signature URL granting time-limited access to Azure Storage resources.<\/li>\n<li><strong>RBAC<\/strong>: Role-Based Access Control in Azure for managing who can do what.<\/li>\n<li><strong>Managed identity<\/strong>: Azure-provided identity for services to authenticate without stored secrets.<\/li>\n<li><strong>Private endpoint<\/strong>: Network interface that brings an Azure service into a VNet for private access (availability depends on service).<\/li>\n<li><strong>Idempotency<\/strong>: Designing processing so retries do not cause duplicate downstream side effects.<\/li>\n<li><strong>Throttling<\/strong>: Rate limiting by the service when quotas are exceeded.<\/li>\n<li><strong>Diagnostics settings<\/strong>: Azure configuration to send logs\/metrics to Log Analytics, Storage, or Event Hubs.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p><strong>Azure Document Intelligence in Foundry Tools<\/strong> combines a practical document extraction engine (Azure Document Intelligence) with Azure\u2019s AI application tooling ecosystem (Foundry tools) to help you build end-to-end document-to-data workflows.<\/p>\n\n\n\n<p>It matters because most business processes still rely on PDFs, scans, and forms\u2014and turning those into structured, validated data unlocks automation across finance, onboarding, claims, compliance, and analytics.<\/p>\n\n\n\n<p>From an architecture standpoint, it fits best as a managed extraction component in a queue-based, observable pipeline integrated with Storage, Functions\/Logic Apps, Key Vault, and monitoring. From a cost standpoint, your biggest levers are pages processed, model choice, and avoiding reprocessing. From a security standpoint, prioritize least privilege, secret hygiene (Key Vault), and private networking patterns where required.<\/p>\n\n\n\n<p>Use it when you need reliable structured extraction and want an Azure-native path to production. Next step: build a small production-grade proof-of-concept with a queue, Function, Blob Storage, and dashboards\u2014then validate accuracy and costs with real documents before scaling.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI + Machine Learning<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,40],"tags":[],"class_list":["post-362","post","type-post","status-publish","format-standard","hentry","category-ai-machine-learning","category-azure"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/362","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=362"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/362\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=362"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=362"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=362"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}