{"id":240,"date":"2026-04-13T08:04:53","date_gmt":"2026-04-13T08:04:53","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/aws-amazon-kendra-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-machine-learning-ml-and-artificial-intelligence-ai\/"},"modified":"2026-04-13T08:04:53","modified_gmt":"2026-04-13T08:04:53","slug":"aws-amazon-kendra-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-machine-learning-ml-and-artificial-intelligence-ai","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/aws-amazon-kendra-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-machine-learning-ml-and-artificial-intelligence-ai\/","title":{"rendered":"AWS Amazon Kendra Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Machine Learning (ML) and Artificial Intelligence (AI)"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p>Machine Learning (ML) and Artificial Intelligence (AI)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<p>Amazon Kendra is an AWS-managed enterprise search service that uses Machine Learning (ML) to help users find accurate answers and relevant documents across many content repositories (documents, wikis, knowledge bases, file shares, and SaaS tools).<\/p>\n\n\n\n<p>In simple terms: you connect Amazon Kendra to your data (for example, an Amazon S3 bucket or a wiki), let it index content, and then your users can search using natural language (for example, \u201cHow do I reset my VPN token?\u201d). Kendra returns ranked results and often highlights the exact passage that answers the question.<\/p>\n\n\n\n<p>Technically, Amazon Kendra builds and manages a search index that combines semantic ranking, document understanding, metadata filtering, and (optionally) access control enforcement. It supports ingesting data through pre-built connectors (data sources), custom ingestion APIs, and document enrichment pipelines. Applications query Kendra via AWS APIs\/SDKs, and can integrate the results into portals, chatbots, and Retrieval-Augmented Generation (RAG) workflows (for example, using Amazon Bedrock) without having to operate their own search infrastructure.<\/p>\n\n\n\n<p><strong>What problem it solves:<\/strong> Traditional keyword search often fails in enterprise environments because content is scattered, titles are inconsistent, users ask questions (not keywords), and relevance depends on context and permissions. Amazon Kendra aims to deliver \u201centerprise-grade search\u201d with better relevance, easier integration, and managed operations.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is Amazon Kendra?<\/h2>\n\n\n\n<p><strong>Official purpose:<\/strong> Amazon Kendra is a fully managed intelligent search service for enterprise content. It is designed to help organizations index and search large volumes of unstructured and semi-structured data stored across AWS and third-party systems.<\/p>\n\n\n\n<p><strong>Core capabilities (high level):<\/strong>\n&#8211; Create and manage <strong>indexes<\/strong> for enterprise search.\n&#8211; Ingest content from <strong>data sources\/connectors<\/strong> (for example Amazon S3, wiki platforms, CRM\/ITSM tools, and web pages) and\/or via <strong>custom ingestion APIs<\/strong>.\n&#8211; Run <strong>natural-language queries<\/strong> and return ranked results with highlighted passages and metadata.\n&#8211; Apply <strong>filters\/facets<\/strong> and <strong>relevance tuning<\/strong>.\n&#8211; Enforce document visibility with <strong>access control<\/strong> (when configured and supported by the ingestion method\/connector).\n&#8211; Improve ingestion quality with <strong>document enrichment<\/strong> (for example, extracting metadata, transforming text, or adding tags via AWS Lambda).<\/p>\n\n\n\n<p><strong>Major components:<\/strong>\n&#8211; <strong>Index<\/strong>: The core searchable repository that stores processed content and metadata.\n&#8211; <strong>Data sources (connectors)<\/strong>: Managed connectors and sync jobs that pull documents, metadata, and (in some cases) ACLs from repositories into the index.\n&#8211; <strong>Custom document ingestion<\/strong>: APIs to push documents directly (useful for proprietary systems or custom pipelines).\n&#8211; <strong>Query APIs<\/strong>: APIs to query\/search the index and retrieve results.\n&#8211; <strong>Access control configuration<\/strong>: Options to associate user identity\/group information with documents and enforce \u201cwho can see what\u201d during query.\n&#8211; <strong>Relevance tuning &amp; metadata<\/strong>: Field mapping, boosting, facets, and query-time filtering.<\/p>\n\n\n\n<p><strong>Service type:<\/strong> Fully managed AWS service (SaaS-like within AWS). You do not manage servers, clusters, or shards.<\/p>\n\n\n\n<p><strong>Scope (regional\/global\/account):<\/strong>\n&#8211; Amazon Kendra is a <strong>regional<\/strong> service. You create indexes in a specific AWS Region, and data sources\/sync jobs run in that Region.\n&#8211; Resources are <strong>account-scoped<\/strong> within the Region (subject to IAM permissions).<br\/>\nVerify current Region availability in the official docs: https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/what-is-kendra.html<\/p>\n\n\n\n<p><strong>How it fits into the AWS ecosystem:<\/strong>\n&#8211; Uses <strong>IAM<\/strong> for authentication\/authorization to Kendra APIs and for granting Kendra permission to read from your repositories (for example, S3).\n&#8211; Integrates with <strong>AWS KMS<\/strong> for encryption at rest (service-managed encryption and\/or customer-managed keys depending on configuration\u2014verify exact options in docs).\n&#8211; Works well with <strong>Amazon S3<\/strong> (common document store), <strong>AWS Lambda<\/strong> (document enrichment), <strong>Amazon CloudWatch<\/strong> (metrics), and <strong>AWS CloudTrail<\/strong> (API auditing).\n&#8211; Commonly paired with <strong>Amazon Lex<\/strong> (chatbots), <strong>Amazon Bedrock<\/strong> (RAG), <strong>AWS IAM Identity Center<\/strong> (enterprise identity), and application front ends via <strong>Amazon Cognito<\/strong>, <strong>API Gateway<\/strong>, or custom web apps.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use Amazon Kendra?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Faster answers, less time wasted:<\/strong> Reduce time employees spend hunting through wikis, PDFs, ticket systems, and shared drives.<\/li>\n<li><strong>Better self-service:<\/strong> Improve customer or employee self-service by indexing knowledge bases and support documentation.<\/li>\n<li><strong>Improved knowledge reuse:<\/strong> Make institutional knowledge discoverable even when content is poorly titled or inconsistently tagged.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Natural language search:<\/strong> Designed for question-like queries, not just keywords.<\/li>\n<li><strong>Connectors reduce integration time:<\/strong> Many common repositories can be indexed without writing a custom crawler.<\/li>\n<li><strong>Metadata filtering and facets:<\/strong> Combine semantic ranking with structured filtering (department, product, date, confidentiality, etc.).<\/li>\n<li><strong>APIs for application integration:<\/strong> Use AWS SDKs to embed search into portals, apps, and chat systems.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed infrastructure:<\/strong> No cluster provisioning, patching, scaling, or shard management.<\/li>\n<li><strong>Repeatable ingestion:<\/strong> Scheduled or on-demand sync jobs with status tracking.<\/li>\n<li><strong>Observability hooks:<\/strong> Metrics and audit events integrate into standard AWS operational tooling.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AWS IAM integration:<\/strong> Fine-grained control over who can administer indexes and data sources.<\/li>\n<li><strong>Encryption and auditing:<\/strong> Standard AWS encryption and API audit patterns (verify exact encryption capabilities for your configuration in official docs).<\/li>\n<li><strong>Access control-aware search (when configured):<\/strong> Search results can be filtered by user identity\/ACL rules, which is critical for enterprise content.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Designed for enterprise content volumes:<\/strong> Kendra is intended for large document sets and many concurrent users (subject to quotas and edition limits).<\/li>\n<li><strong>Relevance at scale:<\/strong> ML ranking is managed by AWS; you focus on content quality and metadata.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose Amazon Kendra<\/h3>\n\n\n\n<p>Choose Amazon Kendra when you need:\n&#8211; Enterprise search across multiple repositories\n&#8211; Strong relevance for natural language queries\n&#8211; Managed connectors and ingestion workflows\n&#8211; Access control-aware search in a managed service\n&#8211; A search layer that can feed RAG systems (retrieve relevant passages for an LLM)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should not choose Amazon Kendra<\/h3>\n\n\n\n<p>Consider alternatives when:\n&#8211; You only need <strong>simple keyword search<\/strong> on one small dataset (OpenSearch or database full-text search may be cheaper\/simpler).\n&#8211; You need <strong>full control<\/strong> of ranking algorithms, analyzers, or low-level search internals (OpenSearch\/Elasticsearch gives more control).\n&#8211; You are primarily building <strong>vector similarity search<\/strong> with custom embeddings and scoring (Amazon OpenSearch Service vector search or purpose-built vector databases may fit better; Kendra is not marketed as a general vector database).\n&#8211; You have strict constraints that require <strong>on-prem-only<\/strong> operation (Kendra is an AWS managed service).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is Amazon Kendra used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Technology &amp; SaaS:<\/strong> Internal engineering knowledge search, runbooks, product docs.<\/li>\n<li><strong>Financial services:<\/strong> Policy\/procedure search, compliance documents, internal knowledge bases (with strong access control requirements).<\/li>\n<li><strong>Healthcare &amp; life sciences:<\/strong> Research document discovery, internal SOPs (ensure compliance requirements are met).<\/li>\n<li><strong>Manufacturing:<\/strong> Maintenance manuals, part catalogs, troubleshooting docs.<\/li>\n<li><strong>Retail &amp; e-commerce:<\/strong> Customer support knowledge, product information aggregation.<\/li>\n<li><strong>Public sector\/education:<\/strong> Policy search, intranet knowledge, research repositories (subject to governance requirements).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Platform teams building internal portals<\/li>\n<li>Support engineering \/ IT service management teams<\/li>\n<li>Data\/ML teams building RAG assistants<\/li>\n<li>Security and compliance teams managing knowledge access<\/li>\n<li>DevOps\/SRE teams indexing operational runbooks and incident retrospectives<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enterprise search portals and intranets<\/li>\n<li>Support agent assist tools (\u201csuggest the best KB article for this ticket\u201d)<\/li>\n<li>Knowledge retrieval layer for chatbots and virtual assistants<\/li>\n<li>Compliance and policy discovery<\/li>\n<li>Document discovery across multiple silos<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Central search index<\/strong> with multiple connectors<\/li>\n<li><strong>Per-department index<\/strong> model with strict separation (sometimes used for governance\/cost control)<\/li>\n<li><strong>RAG architecture<\/strong>: Kendra retrieval \u2192 LLM summarization\/answering (via Amazon Bedrock or another LLM platform)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Real-world deployment contexts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Indexing documents from S3 + SharePoint + Confluence into one index<\/li>\n<li>Adding an internal search bar to a company portal<\/li>\n<li>Integrating with ticketing systems for support knowledge<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Production vs dev\/test usage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Dev\/test:<\/strong> Usually one small index, limited documents, infrequent sync. Delete when not needed to control hourly costs.<\/li>\n<li><strong>Production:<\/strong> Carefully designed index strategy, ingestion schedules, monitoring, ACL enforcement, and change management for metadata\/schema.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p>Below are realistic scenarios where Amazon Kendra is commonly a strong fit.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Internal IT Helpdesk Knowledge Search<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Employees submit repetitive tickets because solutions are hard to find.<\/li>\n<li><strong>Why Kendra fits:<\/strong> Indexes IT knowledge articles, PDFs, and runbooks; supports natural language questions.<\/li>\n<li><strong>Scenario:<\/strong> \u201cHow do I connect to the corporate VPN from macOS?\u201d returns the exact step-by-step doc and highlights the relevant passage.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Support Agent Assist for Faster Ticket Resolution<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Support agents waste time searching multiple systems while on a call.<\/li>\n<li><strong>Why Kendra fits:<\/strong> Single search layer over KB + product docs + past resolutions; can filter by product\/version.<\/li>\n<li><strong>Scenario:<\/strong> A support console calls Kendra for each ticket, showing top 5 suggested articles and known fixes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Enterprise Policy and Compliance Search<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Policies exist in many PDFs and sites; people can\u2019t find the \u201cright\u201d version.<\/li>\n<li><strong>Why Kendra fits:<\/strong> Indexes policies with metadata (effective date, owner, department). Facets improve discovery.<\/li>\n<li><strong>Scenario:<\/strong> \u201cTravel reimbursement for contractors\u201d returns the correct policy section with excerpt.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Engineering Runbook and Incident Retrospective Discovery<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> On-call engineers can\u2019t quickly find relevant runbooks and past incidents.<\/li>\n<li><strong>Why Kendra fits:<\/strong> Natural language works well (\u201clatency spike in us-east-1\u201d), and metadata filters help (service\/team).<\/li>\n<li><strong>Scenario:<\/strong> During an outage, a chatbot uses Kendra to retrieve relevant runbooks and links.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) HR and Employee Self-Service Portal<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Employees ask HR the same questions repeatedly.<\/li>\n<li><strong>Why Kendra fits:<\/strong> Index HR policies, benefits docs, and internal wiki pages; support synonyms (PTO vs vacation).<\/li>\n<li><strong>Scenario:<\/strong> \u201cHow many vacation days do I have?\u201d returns the benefits guide and highlights the PTO accrual section.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Knowledge Search Across Multiple SaaS Tools<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Critical content is spread across Confluence, SharePoint, Google Drive, and internal docs.<\/li>\n<li><strong>Why Kendra fits:<\/strong> Connectors reduce custom development; unified index improves user experience.<\/li>\n<li><strong>Scenario:<\/strong> A single search UI queries Kendra and returns results from multiple sources with source badges.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Product Documentation Search for Customers (Authenticated)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Customers can\u2019t find relevant product docs quickly; search results are noisy.<\/li>\n<li><strong>Why Kendra fits:<\/strong> Can index documentation and support semantic ranking; combine with authentication and filtering.<\/li>\n<li><strong>Scenario:<\/strong> Logged-in customers search \u201cconfigure SSO for Okta\u201d, getting the best matching docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Retrieval Layer for RAG (LLM Assistants)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> LLMs hallucinate without trustworthy context and citations.<\/li>\n<li><strong>Why Kendra fits:<\/strong> Retrieves relevant documents\/passages; your app can provide sources to the LLM.<\/li>\n<li><strong>Scenario:<\/strong> A Bedrock-powered assistant uses Kendra results as context and returns an answer with citations.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Document Discovery for Research\/Legal Teams<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Teams need to find documents and clauses quickly across large corpora.<\/li>\n<li><strong>Why Kendra fits:<\/strong> Semantic ranking and excerpt highlighting help locate relevant sections.<\/li>\n<li><strong>Scenario:<\/strong> \u201cIndemnification clause termination\u201d retrieves and highlights the clause across templates.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Central Search for Technical Training Materials<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Training content is fragmented; learners can\u2019t find the right lab or module.<\/li>\n<li><strong>Why Kendra fits:<\/strong> Indexes PDFs, HTML, and wiki pages; metadata facets by course\/topic.<\/li>\n<li><strong>Scenario:<\/strong> \u201cKubernetes ingress troubleshooting lab\u201d returns the lab guide and prerequisites.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) M&amp;A Knowledge Integration<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> After an acquisition, documentation is split across two toolchains.<\/li>\n<li><strong>Why Kendra fits:<\/strong> Connectors can index both repositories into one searchable index (with governance).<\/li>\n<li><strong>Scenario:<\/strong> Users search once and see results labeled by legacy company source.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) Field Service \/ Maintenance Manual Search<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Technicians need answers quickly from manuals and service bulletins.<\/li>\n<li><strong>Why Kendra fits:<\/strong> PDF-heavy content and question queries are common; excerpt highlighting is useful.<\/li>\n<li><strong>Scenario:<\/strong> \u201cError code E17 compressor\u201d returns the manual section and troubleshooting steps.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<blockquote>\n<p>Note: Amazon Kendra features evolve. Always verify the latest capabilities and connector list in the official documentation: https:\/\/docs.aws.amazon.com\/kendra\/<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">1) Managed indexes (Developer\/Enterprise editions)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Provides managed search indexes without running servers.<\/li>\n<li><strong>Why it matters:<\/strong> Removes operational burden of scaling and maintaining search clusters.<\/li>\n<li><strong>Practical benefit:<\/strong> Faster time to value; consistent managed experience.<\/li>\n<li><strong>Caveats:<\/strong> Pricing is typically hourly and can be significant; choose the correct edition and delete unused indexes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Data source connectors (managed ingestion)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Connects to supported repositories and syncs documents and metadata into Kendra.<\/li>\n<li><strong>Why it matters:<\/strong> Integration is often the hardest part of enterprise search.<\/li>\n<li><strong>Practical benefit:<\/strong> Faster onboarding for common systems (for example S3 and popular collaboration tools).<\/li>\n<li><strong>Caveats:<\/strong> Connector availability and ACL support vary by connector; verify capabilities per connector in docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Custom document ingestion APIs<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Allows you to push documents directly using APIs (for proprietary systems or event-driven pipelines).<\/li>\n<li><strong>Why it matters:<\/strong> Not every repository has a connector.<\/li>\n<li><strong>Practical benefit:<\/strong> You can index content generated by applications or stored in custom databases.<\/li>\n<li><strong>Caveats:<\/strong> You must manage batching, retries, idempotency, and mapping metadata fields correctly.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Natural language query understanding and semantic ranking<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Interprets user questions and ranks results using ML-based relevance.<\/li>\n<li><strong>Why it matters:<\/strong> Users ask questions (\u201cHow do I\u2026\u201d) rather than exact keywords.<\/li>\n<li><strong>Practical benefit:<\/strong> Better top results and fewer \u201cno results found\u201d experiences.<\/li>\n<li><strong>Caveats:<\/strong> Relevance depends on content quality, metadata, and correct field mappings.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Excerpts, highlights, and answer-like results<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Returns snippets from documents that match the query and highlights relevant passages.<\/li>\n<li><strong>Why it matters:<\/strong> Users can quickly validate if a result contains the answer.<\/li>\n<li><strong>Practical benefit:<\/strong> Faster click-through and less time scanning long documents.<\/li>\n<li><strong>Caveats:<\/strong> Quality varies by document format and extraction; scanned PDFs may require OCR before indexing.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Metadata schema, facets, and filtering<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Supports metadata fields and query-time filters (and faceted navigation in UIs).<\/li>\n<li><strong>Why it matters:<\/strong> Enterprise search often needs \u201cfilter by department\/product\/date\/confidentiality\u201d.<\/li>\n<li><strong>Practical benefit:<\/strong> Higher precision searches, better UX for large corpora.<\/li>\n<li><strong>Caveats:<\/strong> You must design metadata carefully; incorrect field types or sparse metadata reduces usefulness.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Relevance tuning (boosting, field importance)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Adjusts ranking by boosting certain fields or data sources.<\/li>\n<li><strong>Why it matters:<\/strong> Business context matters (official policies &gt; drafts, latest version &gt; old).<\/li>\n<li><strong>Practical benefit:<\/strong> Aligns search results with what users actually need.<\/li>\n<li><strong>Caveats:<\/strong> Over-boosting can hide relevant results; test changes and monitor user feedback.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Synonyms \/ thesaurus (terminology alignment)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Helps treat related terms as equivalent (for example, \u201cPTO\u201d and \u201cvacation\u201d).<\/li>\n<li><strong>Why it matters:<\/strong> Organizations use inconsistent terminology.<\/li>\n<li><strong>Practical benefit:<\/strong> Better recall and fewer missed results.<\/li>\n<li><strong>Caveats:<\/strong> Poor synonym design can increase noise; treat as a controlled vocabulary.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Document enrichment (preprocessing and metadata extraction)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Applies transformations to documents during ingestion (often via AWS Lambda) to add metadata, redact content, or normalize text.<\/li>\n<li><strong>Why it matters:<\/strong> \u201cGarbage in, garbage out\u201d applies strongly to enterprise search.<\/li>\n<li><strong>Practical benefit:<\/strong> Adds tags, cleans up content, extracts key fields for filtering.<\/li>\n<li><strong>Caveats:<\/strong> Enrichment adds complexity, latency, and Lambda costs; ensure idempotency and handle failures.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Access control-aware search (ACLs and user context)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Restricts results so users only see documents they are permitted to view.<\/li>\n<li><strong>Why it matters:<\/strong> Enterprise content is rarely all-public.<\/li>\n<li><strong>Practical benefit:<\/strong> Enables indexing sensitive repositories while respecting permissions.<\/li>\n<li><strong>Caveats:<\/strong> Correct ACL ingestion and identity mapping is critical; support varies by connector and configuration. Verify connector ACL support and identity requirements.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) Query suggestions (type-ahead)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Can suggest queries as users type (depending on configuration and API usage).<\/li>\n<li><strong>Why it matters:<\/strong> Improves UX and helps users discover common queries.<\/li>\n<li><strong>Practical benefit:<\/strong> Faster search and more consistent query patterns.<\/li>\n<li><strong>Caveats:<\/strong> Suggestions are not always desired for sensitive environments; evaluate privacy and UX.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) Amazon Kendra Intelligent Ranking (related capability)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Provides ML-based re-ranking for search results from other search engines (for example, OpenSearch\/Elasticsearch), so you can improve relevance without migrating the index.<br\/>\n  Verify current compatibility in official docs.<\/li>\n<li><strong>Why it matters:<\/strong> Many organizations already have search engines but want better ranking.<\/li>\n<li><strong>Practical benefit:<\/strong> Incremental improvement path.<\/li>\n<li><strong>Caveats:<\/strong> This is a distinct capability with its own setup and pricing model; not the same as running a full Kendra index.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level service architecture<\/h3>\n\n\n\n<p>Amazon Kendra sits between your content repositories and your search applications:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Ingestion:<\/strong> Kendra connects to content repositories via data source connectors or receives documents via APIs.<\/li>\n<li><strong>Processing:<\/strong> It extracts text and metadata, applies enrichment (optional), and builds an index.<\/li>\n<li><strong>Query:<\/strong> Applications call Kendra query APIs. Kendra evaluates user query + filters + (optional) user context for access control.<\/li>\n<li><strong>Results:<\/strong> Returns ranked documents\/snippets, metadata, and links back to the source.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Request \/ data \/ control flows<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Control plane:<\/strong> Create index, configure schema, configure data sources, run syncs, manage relevance tuning, and configure access control.<\/li>\n<li><strong>Data plane:<\/strong> Documents flow into the index during sync\/ingestion; queries flow from apps to Kendra and results back.<\/li>\n<li><strong>Security plane:<\/strong> IAM policies govern who can administer and query. IAM roles govern what Kendra can read from data sources (for example, S3). Optional user identity context can constrain results.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Integrations with related AWS services<\/h3>\n\n\n\n<p>Common integrations include:\n&#8211; <strong>Amazon S3<\/strong> for document storage and as a primary data source.\n&#8211; <strong>AWS Lambda<\/strong> for document enrichment during ingestion (custom metadata extraction, normalization, redaction workflows).\n&#8211; <strong>Amazon CloudWatch<\/strong> for metrics (and operational dashboards\/alarms).\n&#8211; <strong>AWS CloudTrail<\/strong> for auditing API calls.\n&#8211; <strong>AWS KMS<\/strong> for encryption keys (depending on configuration).\n&#8211; <strong>Amazon Cognito \/ IAM Identity Center<\/strong> for authenticating end users of a search portal.\n&#8211; <strong>Amazon Bedrock<\/strong> (or other LLM providers) for RAG: Kendra retrieves relevant context; the LLM generates an answer with citations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>For data sources: the repository itself (S3, SaaS, on-prem connectors via required connectivity).<\/li>\n<li>For enrichment: Lambda (and any services your Lambda calls).<\/li>\n<li>For identity\/ACL: your identity provider and group mapping strategy.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AWS API authentication:<\/strong> IAM (SigV4). Applications use IAM roles\/users to call Kendra APIs.<\/li>\n<li><strong>Repository access:<\/strong> Kendra assumes a service role you provide for connectors (for example, an IAM role granting <code>s3:GetObject<\/code> on a bucket).<\/li>\n<li><strong>End-user authorization:<\/strong> If you need \u201cper-user\u201d result filtering, you typically pass user context to the query and ensure ACLs were ingested correctly. The exact approach depends on connector and identity strategy\u2014verify in official docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Kendra is a managed regional AWS service with public regional endpoints.<\/li>\n<li>Many AWS services support <strong>VPC interface endpoints (AWS PrivateLink)<\/strong> to keep traffic on the AWS network. Availability can vary by Region and service\u2014<strong>verify PrivateLink support for Amazon Kendra in your Region<\/strong> in official AWS documentation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring\/logging\/governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Track:<\/li>\n<li>Index status and data source sync status (success\/failure, document counts).<\/li>\n<li>Query volume and latency (metrics).<\/li>\n<li>API activity (CloudTrail).<\/li>\n<li>Govern:<\/li>\n<li>Index naming\/tagging standards.<\/li>\n<li>IAM least privilege for admins, sync roles, and query clients.<\/li>\n<li>Cost controls: number of indexes, edition choice, and sync schedules.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  U[Users \/ App] --&gt;|Query API| K[Amazon Kendra Index]\n  S3[(Amazon S3 Documents)] --&gt;|Data Source Sync| K\n  K --&gt;|Ranked results + excerpts| U\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph Identity\n    IDP[Corporate IdP] --&gt; IC[IAM Identity Center \/ SSO Mapping]\n    IC --&gt; COG[Amazon Cognito \/ App Auth Layer]\n  end\n\n  subgraph Content\n    S3[(Amazon S3)]\n    CONF[Confluence \/ Wiki]\n    SP[SharePoint]\n    ITSM[Service Desk \/ ITSM]\n  end\n\n  subgraph Ingestion\n    DS[Amazon Kendra Data Sources] --&gt; IDX[Amazon Kendra Index]\n    L[Document Enrichment\\n(AWS Lambda)] --&gt; IDX\n  end\n\n  S3 --&gt; DS\n  CONF --&gt; DS\n  SP --&gt; DS\n  ITSM --&gt; DS\n\n  subgraph Apps\n    PORTAL[Internal Search Portal]\n    BOT[Chatbot \/ Agent Assist]\n    RAG[RAG Service]\n    LLM[Amazon Bedrock (LLM)]\n  end\n\n  COG --&gt; PORTAL\n  PORTAL --&gt;|Query + Filters + User Context| IDX\n  BOT --&gt;|Query| IDX\n\n  RAG --&gt;|Retrieve relevant passages| IDX\n  RAG --&gt;|Context + citations| LLM\n  LLM --&gt;|Answer| RAG\n\n  subgraph Security_Operations\n    IAM[IAM Roles\/Policies]\n    KMS[AWS KMS]\n    CW[Amazon CloudWatch Metrics\/Alarms]\n    CT[AWS CloudTrail]\n  end\n\n  IAM --&gt; DS\n  IAM --&gt; IDX\n  KMS --&gt; IDX\n  IDX --&gt; CW\n  DS --&gt; CW\n  IDX --&gt; CT\n  DS --&gt; CT\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">AWS account and billing<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An active <strong>AWS account<\/strong> with <strong>billing enabled<\/strong>.<\/li>\n<li>Amazon Kendra is not a \u201cfree\u201d service by default; expect hourly charges for indexes. Plan to clean up resources after labs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions \/ IAM roles<\/h3>\n\n\n\n<p>You will need permissions to:\n&#8211; Create and manage Kendra resources: index, data sources, sync jobs.\n&#8211; Create and manage an S3 bucket and upload sample documents.\n&#8211; Create or pass an IAM role for Kendra to access S3 (and optionally KMS).<\/p>\n\n\n\n<p>Typical IAM permissions (high level):\n&#8211; <code>kendra:*<\/code> for lab\/admin (scope down for production).\n&#8211; <code>s3:CreateBucket<\/code>, <code>s3:PutObject<\/code>, <code>s3:GetObject<\/code>, <code>s3:ListBucket<\/code>.\n&#8211; <code>iam:CreateRole<\/code>, <code>iam:PutRolePolicy<\/code>, <code>iam:PassRole<\/code>.\n&#8211; <code>kms:*<\/code> only if using customer-managed keys (scope down in production).\n&#8211; <code>cloudtrail:LookupEvents<\/code> and CloudWatch read permissions for validation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Tools<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>AWS Management Console access, or:<\/li>\n<li><strong>AWS CLI v2<\/strong> (optional, used in validation steps)<\/li>\n<li>(Optional) <strong>Python 3 + boto3<\/strong> for query examples<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Choose an AWS Region where <strong>Amazon Kendra is available<\/strong>.<\/li>\n<li>Verify availability and supported Regions: https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/what-is-kendra.html<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas \/ limits<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Amazon Kendra has service quotas (for example, number of indexes, document limits, throughput).<\/li>\n<li>Always verify current quotas in <strong>Service Quotas<\/strong> and Kendra documentation:<\/li>\n<li>https:\/\/docs.aws.amazon.com\/servicequotas\/<\/li>\n<li>https:\/\/docs.aws.amazon.com\/kendra\/<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services<\/h3>\n\n\n\n<p>For this tutorial:\n&#8211; Amazon S3 (for storing sample documents)\n&#8211; IAM (for roles\/policies)<\/p>\n\n\n\n<p>Optional (production patterns):\n&#8211; CloudTrail (recommended)\n&#8211; CloudWatch alarms\/dashboards (recommended)\n&#8211; KMS customer-managed key (optional; verify support\/configuration requirements)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<p>Amazon Kendra pricing changes over time and varies by Region and edition. Do <strong>not<\/strong> rely on blog posts for exact numbers\u2014use official pricing.<\/p>\n\n\n\n<p><strong>Official pricing page:<\/strong> https:\/\/aws.amazon.com\/kendra\/pricing\/<br\/>\n<strong>AWS Pricing Calculator:<\/strong> https:\/\/calculator.aws\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing dimensions (typical model)<\/h3>\n\n\n\n<p>Amazon Kendra cost is generally driven by:\n&#8211; <strong>Index capacity billed per time (hourly)<\/strong>, and the <strong>edition<\/strong> (for example Developer vs Enterprise).<br\/>\n  Exact included capacity and scaling model is documented on the pricing page\u2014<strong>verify the current edition definitions and hourly rates<\/strong>.\n&#8211; Potential additional charges for related capabilities such as <strong>Amazon Kendra Intelligent Ranking<\/strong> (if used).<br\/>\n  Verify on the official pricing page and the Intelligent Ranking docs.<\/p>\n\n\n\n<p>Kendra also indirectly drives costs in connected services:\n&#8211; <strong>S3 storage<\/strong> for your documents.\n&#8211; <strong>Data transfer<\/strong> (for example, if connectors pull from outside AWS or across Regions).\n&#8211; <strong>AWS Lambda<\/strong> costs if you use enrichment functions.\n&#8211; <strong>Secrets management<\/strong> costs if connectors require stored credentials (for example AWS Secrets Manager).\n&#8211; <strong>CloudWatch<\/strong> costs for metrics, dashboards, and alarms (usually modest, but not always zero).\n&#8211; <strong>KMS<\/strong> costs if using customer-managed keys (API requests, key usage).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Free tier<\/h3>\n\n\n\n<p>Amazon Kendra historically has not had a broad \u201calways-free\u201d tier like some AWS services. Some AWS services offer limited free usage, but <strong>verify whether Amazon Kendra currently offers any free tier or trial<\/strong> on the pricing page.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Key cost drivers<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Number of indexes<\/strong>: Each index typically incurs hourly charges. Multiple environments (dev\/test\/prod) can multiply cost quickly.<\/li>\n<li><strong>Edition choice<\/strong>: Developer vs Enterprise can change the baseline hourly cost and capacity.<\/li>\n<li><strong>Document volume and update frequency<\/strong>: More content and frequent re-syncs may require larger capacity or more operational effort (pricing impact depends on current Kendra model\u2014verify).<\/li>\n<li><strong>Query volume<\/strong>: Depending on pricing model, queries may or may not be a direct line item. Verify on pricing page.<\/li>\n<li><strong>Enrichment<\/strong>: Lambda-based enrichment can add compute cost and ingestion latency.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network\/data transfer implications<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Uploading documents to S3 in the same Region as the Kendra index is typically the simplest and avoids cross-Region transfer.<\/li>\n<li>If indexing external SaaS or on-prem systems, network egress and connector connectivity can introduce costs (and security constraints).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Start with one index<\/strong> and prove value before scaling to many per team\/department.<\/li>\n<li><strong>Use Developer edition for labs\/dev<\/strong> when appropriate (verify edition constraints).<\/li>\n<li><strong>Minimize idle indexes<\/strong>: If you don\u2019t need an index, delete it. (Kendra is managed; you typically can\u2019t \u201cstop\u201d it to avoid hourly costs.)<\/li>\n<li><strong>Control sync frequency<\/strong>: Don\u2019t sync every 5 minutes if daily is enough.<\/li>\n<li><strong>Keep documents clean<\/strong>: Avoid indexing duplicates, stale versions, and low-value content.<\/li>\n<li><strong>Evaluate alternatives for simple search<\/strong>: If requirements are basic keyword search, OpenSearch can be more cost-efficient.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (no fabricated numbers)<\/h3>\n\n\n\n<p>A typical low-cost lab scenario includes:\n&#8211; 1 Kendra index (Developer edition if supported\/appropriate)\n&#8211; 1 S3 data source\n&#8211; A small set of documents (10\u2013100)\n&#8211; A single manual sync\n&#8211; A few interactive queries for validation<\/p>\n\n\n\n<p><strong>How to estimate:<\/strong>\n1. Go to https:\/\/aws.amazon.com\/kendra\/pricing\/ and identify the hourly rate for the chosen edition in your Region.\n2. Multiply by the number of hours you plan to keep the index.\n3. Add S3 storage (small) and any Lambda enrichment costs (if used).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p>For production, account for:\n&#8211; Multiple indexes (prod + staging + dev)\n&#8211; Higher availability requirements and governance (more tooling and operational work)\n&#8211; Ongoing sync schedules\n&#8211; Identity\/ACL integration (often increases complexity and operational overhead)\n&#8211; Potential need for multiple data sources and content growth\n&#8211; RAG usage: additional costs for LLM inference (Amazon Bedrock) and any caching layers<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<p>Build a small, realistic Amazon Kendra search experience on AWS by:\n1. Creating a Kendra index\n2. Indexing documents stored in Amazon S3\n3. Querying the index via the console and AWS CLI\n4. Cleaning up resources to avoid ongoing charges<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p>You will create:\n&#8211; An S3 bucket with a few small text\/HTML\/PDF documents (keep it minimal)\n&#8211; An IAM role that Amazon Kendra assumes to read from the bucket\n&#8211; An Amazon Kendra index (choose the lowest-cost edition appropriate for labs\u2014verify current options)\n&#8211; An S3 data source and a one-time sync job\n&#8211; A few test queries to validate results<\/p>\n\n\n\n<p><strong>Expected outcome:<\/strong> You can type a natural language query and get ranked results with excerpts from your uploaded documents.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Choose a Region and confirm service access<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In the AWS Console, select an AWS Region where Amazon Kendra is supported.<\/li>\n<li>Confirm Amazon Kendra console loads: https:\/\/console.aws.amazon.com\/kendra\/  <\/li>\n<li>(Recommended) Confirm you have permission to create IAM roles and S3 buckets.<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> You can open the Amazon Kendra console without permission errors.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create an S3 bucket and upload sample documents<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Open the S3 console: https:\/\/console.aws.amazon.com\/s3\/<\/li>\n<li>\n<p>Create a bucket (example):<br\/>\n   &#8211; Bucket name: <code>kendra-lab-&lt;your-unique-suffix&gt;<\/code>\n   &#8211; Region: same as your Kendra index Region\n   &#8211; Keep defaults for a lab (do not enable public access)<\/p>\n<\/li>\n<li>\n<p>Upload a few small documents. Create three files locally and upload them:<\/p>\n<\/li>\n<\/ol>\n\n\n\n<p><strong><code>vpn-reset.txt<\/code><\/strong><\/p>\n\n\n\n<pre><code class=\"language-text\">VPN Token Reset Procedure\n1) Open the VPN portal.\n2) Click \"Reset token\".\n3) Confirm using MFA.\nIf you are locked out, contact IT Support.\n<\/code><\/pre>\n\n\n\n<p><strong><code>expense-policy.txt<\/code><\/strong><\/p>\n\n\n\n<pre><code class=\"language-text\">Travel Expense Policy (Summary)\n- Meals are reimbursable up to the daily limit.\n- Receipts are required for expenses over $25.\n- Contractors must obtain manager approval before booking travel.\n<\/code><\/pre>\n\n\n\n<p><strong><code>oncall-runbook.txt<\/code><\/strong><\/p>\n\n\n\n<pre><code class=\"language-text\">On-Call Runbook: API Latency Spikes\n1) Check dashboards for error rates and p95 latency.\n2) Review recent deployments.\n3) Verify upstream dependencies.\n4) If needed, rollback the last deployment.\n<\/code><\/pre>\n\n\n\n<p>Upload these into a prefix like <code>docs\/<\/code> (optional but tidy):\n&#8211; <code>docs\/vpn-reset.txt<\/code>\n&#8211; <code>docs\/expense-policy.txt<\/code>\n&#8211; <code>docs\/oncall-runbook.txt<\/code><\/p>\n\n\n\n<p><strong>Expected outcome:<\/strong> Your S3 bucket contains at least 3 documents.<\/p>\n\n\n\n<p><strong>Verification:<\/strong> In S3 console, open the bucket \u2192 verify objects exist and you can view\/download them.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Create an IAM role for Amazon Kendra to read from S3<\/h3>\n\n\n\n<p>Amazon Kendra needs permission to read objects from your S3 bucket when running the data source sync.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Open IAM console: https:\/\/console.aws.amazon.com\/iam\/<\/li>\n<li>\n<p>Create a role:\n   &#8211; Trusted entity type: <strong>AWS service<\/strong>\n   &#8211; Use case: look for <strong>Amazon Kendra<\/strong> (or a generic service trust if presented differently)\n   &#8211; If the console doesn\u2019t provide a Kendra-specific wizard, <strong>use the trust policy recommended by Kendra documentation<\/strong>.<br\/>\n     Verify in official docs: https:\/\/docs.aws.amazon.com\/kendra\/<\/p>\n<\/li>\n<li>\n<p>Attach a policy that allows reading the bucket (minimum for this lab):<\/p>\n<\/li>\n<\/ol>\n\n\n\n<p>Replace <code>kendra-lab-&lt;your-unique-suffix&gt;<\/code> with your bucket name:<\/p>\n\n\n\n<pre><code class=\"language-json\">{\n  \"Version\": \"2012-10-17\",\n  \"Statement\": [\n    {\n      \"Sid\": \"ListBucket\",\n      \"Effect\": \"Allow\",\n      \"Action\": [\"s3:ListBucket\"],\n      \"Resource\": [\"arn:aws:s3:::kendra-lab-&lt;your-unique-suffix&gt;\"]\n    },\n    {\n      \"Sid\": \"ReadObjects\",\n      \"Effect\": \"Allow\",\n      \"Action\": [\"s3:GetObject\"],\n      \"Resource\": [\"arn:aws:s3:::kendra-lab-&lt;your-unique-suffix&gt;\/*\"]\n    }\n  ]\n}\n<\/code><\/pre>\n\n\n\n<p>Name the role something like: <code>KendraS3DataSourceRole<\/code>.<\/p>\n\n\n\n<p><strong>Expected outcome:<\/strong> You have an IAM role that Amazon Kendra can assume to read your S3 documents.<\/p>\n\n\n\n<p><strong>Verification:<\/strong> In IAM \u2192 Roles \u2192 open the role \u2192 confirm:\n&#8211; Trust relationship includes Kendra service principal (as documented)\n&#8211; Permissions include <code>s3:ListBucket<\/code> and <code>s3:GetObject<\/code> for the bucket<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Create an Amazon Kendra index<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Open Amazon Kendra console: https:\/\/console.aws.amazon.com\/kendra\/<\/li>\n<li>Choose <strong>Create an index<\/strong><\/li>\n<li>\n<p>Configure:\n   &#8211; Index name: <code>kendra-lab-index<\/code>\n   &#8211; Description: optional\n   &#8211; IAM role: choose\/create the service role for Kendra index management as prompted\n   &#8211; Edition: choose the lowest-cost edition suitable for labs (often \u201cDeveloper edition\u201d, if available).<br\/>\n<strong>Verify current edition options and constraints in the console and pricing page.<\/strong><\/p>\n<\/li>\n<li>\n<p>Create the index.<\/p>\n<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> The index enters a \u201cCreating\u201d state, then becomes \u201cActive\/Ready\u201d.<\/p>\n\n\n\n<p><strong>Verification:<\/strong> In Kendra console, the index status shows <strong>Active<\/strong> (or equivalent) before proceeding.<\/p>\n\n\n\n<p><strong>Common wait time:<\/strong> Several minutes.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Add an S3 data source to the index<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In Kendra console, open your index: <code>kendra-lab-index<\/code><\/li>\n<li>Go to <strong>Data sources<\/strong> \u2192 <strong>Add data source<\/strong><\/li>\n<li>Choose <strong>Amazon S3<\/strong><\/li>\n<li>\n<p>Configure the data source:\n   &#8211; Name: <code>kendra-lab-s3<\/code>\n   &#8211; S3 bucket: <code>kendra-lab-&lt;your-unique-suffix&gt;<\/code>\n   &#8211; (Optional) Inclusion prefix: <code>docs\/<\/code> to limit indexing\n   &#8211; IAM role: select <code>KendraS3DataSourceRole<\/code>\n   &#8211; Sync schedule: set to <strong>Run on demand<\/strong> (or disable schedule) for the lab<\/p>\n<\/li>\n<li>\n<p>Save\/add the data source.<\/p>\n<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> The data source is created and ready to run a sync.<\/p>\n\n\n\n<p><strong>Verification:<\/strong> The data source appears in the list with a status like \u201cReady\u201d (wording may vary).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Run a sync job<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In the data source details page, choose <strong>Sync now<\/strong> (or <strong>Run<\/strong>).<\/li>\n<li>Monitor the sync status:\n   &#8211; It will move through states such as \u201cSyncing\u201d, then \u201cSucceeded\u201d or \u201cFailed\u201d.<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> Sync completes successfully and documents are indexed.<\/p>\n\n\n\n<p><strong>Verification:<\/strong>\n&#8211; Data source shows <strong>Last sync status: Succeeded<\/strong>\n&#8211; Document count in index statistics increases (exact UI varies)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 7: Query the index in the Kendra console<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In the index view, open the <strong>Search console<\/strong> (Kendra provides a built-in test search UI)<\/li>\n<li>Run queries such as:\n   &#8211; <code>How do I reset my VPN token?<\/code>\n   &#8211; <code>expense receipts required<\/code>\n   &#8211; <code>what to do during API latency spikes<\/code><\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> You get ranked results and excerpts matching the correct document.<\/p>\n\n\n\n<p><strong>Verification tips:<\/strong>\n&#8211; The VPN query should return <code>vpn-reset.txt<\/code> near the top.\n&#8211; The expense query should return <code>expense-policy.txt<\/code>.\n&#8211; The on-call query should return <code>oncall-runbook.txt<\/code>.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 8 (Optional): Query using AWS CLI<\/h3>\n\n\n\n<p>This helps you validate programmatic access\u2014how real applications will query Kendra.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">8.1 Configure AWS CLI<\/h4>\n\n\n\n<pre><code class=\"language-bash\">aws configure\naws sts get-caller-identity\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">8.2 Find your index ID<\/h4>\n\n\n\n<p>In the Kendra console, open the index details and copy the <strong>Index ID<\/strong>.<\/p>\n\n\n\n<p>Or use CLI (if permissions allow):<\/p>\n\n\n\n<pre><code class=\"language-bash\">aws kendra list-indices\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">8.3 Run a query<\/h4>\n\n\n\n<p>Replace <code>INDEX_ID<\/code>:<\/p>\n\n\n\n<pre><code class=\"language-bash\">aws kendra query \\\n  --index-id \"INDEX_ID\" \\\n  --query-text \"How do I reset my VPN token?\"\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong> JSON output includes matching document(s), titles, URIs, and excerpt text fields.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 9 (Optional): Query with Python (boto3)<\/h3>\n\n\n\n<p>Install dependencies:<\/p>\n\n\n\n<pre><code class=\"language-bash\">python3 -m pip install boto3\n<\/code><\/pre>\n\n\n\n<p>Example script (<code>query_kendra.py<\/code>):<\/p>\n\n\n\n<pre><code class=\"language-python\">import boto3\n\nINDEX_ID = \"INDEX_ID\"\n\nkendra = boto3.client(\"kendra\")\n\nresp = kendra.query(\n    IndexId=INDEX_ID,\n    QueryText=\"expense receipts required\",\n)\n\nfor item in resp.get(\"ResultItems\", []):\n    print(item.get(\"Type\"), item.get(\"DocumentTitle\", {}))\n    excerpt = item.get(\"DocumentExcerpt\", {}).get(\"Text\", \"\")\n    if excerpt:\n        print(\"Excerpt:\", excerpt[:200])\n    print(\"---\")\n<\/code><\/pre>\n\n\n\n<p>Run:<\/p>\n\n\n\n<pre><code class=\"language-bash\">python3 query_kendra.py\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong> Printed results include excerpts referencing receipts and policy limits.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p>Use this checklist:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>[ ] Index status is <strong>Active<\/strong><\/li>\n<li>[ ] Data source last sync status is <strong>Succeeded<\/strong><\/li>\n<li>[ ] Query in console returns correct top documents<\/li>\n<li>[ ] (Optional) CLI query returns structured results<\/li>\n<li>[ ] (Optional) Python query prints excerpts<\/li>\n<\/ul>\n\n\n\n<p>If validation fails, use the troubleshooting section.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<p>Common issues and fixes:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>Data source sync failed: AccessDenied to S3<\/strong>\n   &#8211; Cause: IAM role doesn\u2019t have correct <code>s3:ListBucket<\/code>\/<code>s3:GetObject<\/code> permissions, or bucket policy blocks access.\n   &#8211; Fix:<\/p>\n<ul>\n<li>Recheck IAM role permissions.<\/li>\n<li>Ensure bucket policy doesn\u2019t deny access.<\/li>\n<li>Confirm the data source is using the intended role.<\/li>\n<\/ul>\n<\/li>\n<li>\n<p><strong>Index never becomes Active<\/strong>\n   &#8211; Cause: Missing service-linked roles\/permissions, or account restrictions.\n   &#8211; Fix:<\/p>\n<ul>\n<li>Check IAM permissions for creating Kendra resources.<\/li>\n<li>Check AWS Health Dashboard and service limits.<\/li>\n<li>Review CloudTrail for failed API calls.<\/li>\n<\/ul>\n<\/li>\n<li>\n<p><strong>No results returned<\/strong>\n   &#8211; Cause: Sync didn\u2019t index documents, wrong prefix, unsupported file type, or query mismatch.\n   &#8211; Fix:<\/p>\n<ul>\n<li>Confirm objects exist under the inclusion prefix.<\/li>\n<li>Confirm the sync status and document counts.<\/li>\n<li>Try simpler queries (keywords) to sanity-check.<\/li>\n<\/ul>\n<\/li>\n<li>\n<p><strong>Results returned but excerpts look empty<\/strong>\n   &#8211; Cause: Document text extraction issues (format, encoding, scanned PDFs).\n   &#8211; Fix:<\/p>\n<ul>\n<li>Use plain text files to validate.<\/li>\n<li>For PDFs, ensure they contain selectable text (OCR may be required upstream).<\/li>\n<\/ul>\n<\/li>\n<li>\n<p><strong>CLI returns <code>AccessDeniedException<\/code> for <code>kendra:Query<\/code><\/strong>\n   &#8211; Cause: Your IAM identity doesn\u2019t have query permissions.\n   &#8211; Fix: Attach an IAM policy allowing <code>kendra:Query<\/code> on the index ARN.<\/p>\n<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p>To avoid ongoing charges, delete resources in this order:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>Delete Kendra index<\/strong>\n   &#8211; Kendra console \u2192 Indexes \u2192 select <code>kendra-lab-index<\/code> \u2192 <strong>Delete<\/strong>\n   &#8211; This is the main cost driver; delete it even if you keep the S3 bucket.<\/p>\n<\/li>\n<li>\n<p><strong>Delete Kendra data source<\/strong> (if required separately by the console flow)<\/p>\n<\/li>\n<li>\n<p><strong>Delete S3 objects and bucket<\/strong>\n   &#8211; Empty the bucket\n   &#8211; Delete the bucket<\/p>\n<\/li>\n<li>\n<p><strong>Delete IAM role<\/strong>\n   &#8211; IAM \u2192 Roles \u2192 delete <code>KendraS3DataSourceRole<\/code> (and any inline policies)<\/p>\n<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> No Kendra index remains, preventing further hourly charges.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Design your index strategy intentionally<\/strong><\/li>\n<li>One index for the whole organization is simpler for users, but can be harder for governance and cost attribution.<\/li>\n<li>Multiple indexes (per department\/app) can simplify access control boundaries but increases cost and operational overhead.<\/li>\n<li><strong>Keep data in-region<\/strong><\/li>\n<li>Store documents in S3 in the same Region as the Kendra index to reduce latency and cross-Region transfer.<\/li>\n<li><strong>Use metadata as a first-class design element<\/strong><\/li>\n<li>Define fields like <code>department<\/code>, <code>product<\/code>, <code>document_type<\/code>, <code>effective_date<\/code>, <code>owner<\/code>, <code>confidentiality<\/code>.<\/li>\n<li>Enforce consistent tagging at ingestion time.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Least privilege<\/strong><\/li>\n<li>Separate roles for: Kendra admins, data source sync role(s), and application query role(s).<\/li>\n<li><strong>Restrict who can modify relevance tuning<\/strong><\/li>\n<li>Ranking changes can impact business outcomes; treat as controlled configuration with change management.<\/li>\n<li><strong>Use explicit <code>iam:PassRole<\/code> constraints<\/strong><\/li>\n<li>Allow passing only the specific data source role(s) to Kendra.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Delete dev\/test indexes quickly<\/strong><\/li>\n<li>Kendra index hourly charges can accumulate.<\/li>\n<li><strong>Avoid duplicate content<\/strong><\/li>\n<li>Deduplicate documents and remove outdated versions.<\/li>\n<li><strong>Tune sync schedules<\/strong><\/li>\n<li>Sync only as often as needed; use on-demand sync for low-change repositories.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Use filters to narrow broad queries<\/strong><\/li>\n<li>Expose facets in UIs where possible.<\/li>\n<li><strong>Optimize document quality<\/strong><\/li>\n<li>Prefer machine-readable PDFs and clean text. Extract text from scanned docs before indexing.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Monitor sync health<\/strong><\/li>\n<li>Alert on sync failures or prolonged sync durations.<\/li>\n<li><strong>Have a rollback plan for schema changes<\/strong><\/li>\n<li>Metadata schema changes can affect filters and relevance.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Tag resources<\/strong><\/li>\n<li>Add tags for <code>Env<\/code>, <code>App<\/code>, <code>Owner<\/code>, <code>CostCenter<\/code>, <code>DataClassification<\/code>.<\/li>\n<li><strong>Use CloudTrail for auditing<\/strong><\/li>\n<li>Track who changed data sources, index settings, or access control configuration.<\/li>\n<li><strong>Document connector credentials lifecycle<\/strong><\/li>\n<li>Rotate credentials and store in secure services (for example AWS Secrets Manager) when applicable.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Naming:<\/li>\n<li><code>kendra-&lt;app&gt;-&lt;env&gt;-index<\/code><\/li>\n<li><code>kendra-&lt;app&gt;-&lt;env&gt;-ds-&lt;source&gt;<\/code><\/li>\n<li>Tagging:<\/li>\n<li><code>Environment=dev|staging|prod<\/code><\/li>\n<li><code>OwnerEmail=...<\/code><\/li>\n<li><code>CostCenter=...<\/code><\/li>\n<li><code>DataSensitivity=public|internal|confidential<\/code><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Administrative access<\/strong> is controlled by IAM permissions (create\/update\/delete indexes and data sources).<\/li>\n<li><strong>Query access<\/strong> should be granted to application roles\/users with <code>kendra:Query<\/code> (and related APIs needed).<\/li>\n<li><strong>Data source access<\/strong> is controlled by the IAM role Kendra assumes to read the repository (for example S3).<\/li>\n<li><strong>End-user document access control<\/strong> requires:<\/li>\n<li>Correct ACL ingestion (connector-dependent)<\/li>\n<li>Correct identity mapping (user\/group) provided to Kendra at query time (implementation varies\u2014verify in docs)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>In transit:<\/strong> Use HTTPS endpoints for Kendra APIs.<\/li>\n<li><strong>At rest:<\/strong> Kendra stores indexed content and metadata. Encryption at rest is expected in AWS services; confirm your exact options (AWS-owned keys vs customer-managed keys) in Kendra docs and console for your Region.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>By default, applications call regional Kendra endpoints over the internet (HTTPS).<\/li>\n<li>If your environment requires private connectivity, check whether <strong>VPC interface endpoints (PrivateLink)<\/strong> are available for Kendra in your Region and architecture. <strong>Verify in official AWS docs<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>For connectors that require credentials (SaaS systems), store secrets in a managed secret store (commonly <strong>AWS Secrets Manager<\/strong>) and restrict access with IAM.<\/li>\n<li>Rotate credentials and audit secret access.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enable <strong>CloudTrail<\/strong> in all Regions (or at least the Kendra Region) to log Kendra API calls.<\/li>\n<li>Use CloudWatch metrics and alarms for:<\/li>\n<li>Data source sync failures<\/li>\n<li>Unusual query spikes<\/li>\n<li>Operational anomalies<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Understand what content is indexed (including sensitive fields).<\/li>\n<li>Ensure your access control model matches compliance requirements (least privilege, separation of duties).<\/li>\n<li>For regulated industries, validate:<\/li>\n<li>Data residency (Region)<\/li>\n<li>Encryption model (KMS options)<\/li>\n<li>Audit requirements (CloudTrail retention, log immutability)<\/li>\n<li>Connector handling of ACLs and permissions<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Granting <code>kendra:*<\/code> to broad roles used by applications (over-privileged).<\/li>\n<li>Indexing sensitive repositories without ACL enforcement, then exposing a search UI broadly.<\/li>\n<li>Forgetting to restrict <code>iam:PassRole<\/code>, allowing users to attach overly permissive roles to data sources.<\/li>\n<li>Sync roles with overly broad S3 permissions (for example <code>s3:*<\/code> on <code>*<\/code>).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use separate IAM roles for:<\/li>\n<li>Index administration<\/li>\n<li>Data source sync<\/li>\n<li>Application query<\/li>\n<li>Use resource-level permissions where supported (restrict to specific index ARNs).<\/li>\n<li>Keep documents in private S3 buckets; avoid public access.<\/li>\n<li>Implement defense-in-depth: authentication (Cognito\/SSO), authorization, logging, and monitoring.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<blockquote>\n<p>Always verify current service quotas and connector limitations in official docs.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Known limitations \/ constraints (common categories)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Connector variability:<\/strong> Not all connectors support all features (especially ACL ingestion). Verify per-connector capabilities.<\/li>\n<li><strong>Document format limitations:<\/strong> Some file types may not extract well; scanned PDFs often require OCR before indexing.<\/li>\n<li><strong>Quota limits:<\/strong> Number of indexes per account, document limits, data source limits, and query throughput limits exist.<\/li>\n<li><strong>Regional availability:<\/strong> Kendra is not available in every Region; connector support can also vary by Region.<\/li>\n<li><strong>Cost visibility:<\/strong> Index hourly costs can surprise teams if indexes are left running in dev\/test.<\/li>\n<li><strong>Access control complexity:<\/strong> Proper ACL enforcement requires careful identity mapping and testing; mistakes can cause overexposure or missing results.<\/li>\n<li><strong>Schema changes require planning:<\/strong> Metadata changes can break filters\/facets in UIs and require re-indexing behaviors depending on configuration.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational gotchas<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Sync failures may not be obvious to end users<\/strong><\/li>\n<li>If sync silently stops, search results become stale. Add alarms\/workflows.<\/li>\n<li><strong>Inclusion\/exclusion prefix mistakes<\/strong><\/li>\n<li>A wrong S3 prefix can lead to \u201c0 documents indexed\u201d.<\/li>\n<li><strong>Duplicate content<\/strong><\/li>\n<li>Indexing multiple repositories with duplicates can degrade relevance.<\/li>\n<li><strong>Over-broad synonyms<\/strong><\/li>\n<li>Poor thesaurus design can significantly reduce precision.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Migration challenges<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Moving from an existing search engine to Kendra often requires:<\/li>\n<li>Metadata normalization<\/li>\n<li>New ingestion pipelines<\/li>\n<li>Access control mapping<\/li>\n<li>Query UX updates (filters, facets)<\/li>\n<li>Consider phased rollout: one repository first, then expand.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<p>Amazon Kendra is one option in AWS\u2019s broader search + AI ecosystem. The best choice depends on relevance needs, control requirements, and cost.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Comparison table<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Amazon Kendra<\/strong><\/td>\n<td>Enterprise search across multiple repositories with ML relevance<\/td>\n<td>Managed connectors, semantic ranking, excerpt highlighting, enterprise-focused features<\/td>\n<td>Can be costly; less low-level control than self-managed engines; ACL\/identity can be complex<\/td>\n<td>You need managed enterprise search with strong relevance and connectors<\/td>\n<\/tr>\n<tr>\n<td><strong>Amazon OpenSearch Service<\/strong><\/td>\n<td>Custom search applications, logs\/observability search, keyword + vector search<\/td>\n<td>Deep control, flexible indexing\/analyzers, predictable cluster sizing, broad ecosystem<\/td>\n<td>You manage cluster sizing\/tuning; relevance tuning is more manual; connectors are DIY<\/td>\n<td>You need control, custom scoring, vector search, or already run OpenSearch<\/td>\n<\/tr>\n<tr>\n<td><strong>OpenSearch (self-managed)<\/strong><\/td>\n<td>Maximum control, on-prem\/hybrid<\/td>\n<td>Full control, extensibility<\/td>\n<td>High ops burden, scaling, patching, security hardening<\/td>\n<td>You must run on-prem or need custom extensions not available managed<\/td>\n<\/tr>\n<tr>\n<td><strong>Database full-text search (Aurora\/RDS engine-specific)<\/strong><\/td>\n<td>Simple search in app databases<\/td>\n<td>Minimal extra infra, simple integration<\/td>\n<td>Not designed for enterprise multi-repo search; limited semantic relevance<\/td>\n<td>Small apps with basic search requirements<\/td>\n<\/tr>\n<tr>\n<td><strong>Azure AI Search (other cloud)<\/strong><\/td>\n<td>Enterprise search in Azure ecosystem<\/td>\n<td>Tight Azure integration; managed search<\/td>\n<td>Cross-cloud complexity; data gravity<\/td>\n<td>Organization is standardized on Azure<\/td>\n<\/tr>\n<tr>\n<td><strong>Google Cloud Vertex AI Search (other cloud)<\/strong><\/td>\n<td>Enterprise search in GCP ecosystem<\/td>\n<td>Tight GCP integration<\/td>\n<td>Cross-cloud complexity; data gravity<\/td>\n<td>Organization is standardized on GCP<\/td>\n<\/tr>\n<tr>\n<td><strong>RAG-only vector DB approach<\/strong><\/td>\n<td>Similarity search for LLM context<\/td>\n<td>Strong semantic similarity; embeddings-driven retrieval<\/td>\n<td>Requires embedding pipelines; governance\/ACL must be designed carefully<\/td>\n<td>You primarily need embedding similarity retrieval for LLMs, not enterprise connector search<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example: Global financial services internal policy + procedures search<\/h3>\n\n\n\n<p><strong>Problem<\/strong>\n&#8211; Policies and procedures exist in SharePoint, PDFs in S3, and wiki pages.\n&#8211; Employees need quick answers, but content is sensitive and access differs by department and region.\n&#8211; Compliance requires auditable access and controlled changes.<\/p>\n\n\n\n<p><strong>Proposed architecture<\/strong>\n&#8211; Amazon Kendra index in the primary Region for the organization.\n&#8211; Data sources:\n  &#8211; SharePoint connector for controlled sites\n  &#8211; S3 connector for policy PDFs\n  &#8211; Confluence connector for engineering procedures\n&#8211; Enrichment:\n  &#8211; Lambda enrichment extracts metadata: <code>policy_owner<\/code>, <code>effective_date<\/code>, <code>region<\/code>, <code>classification<\/code>\n&#8211; Access control:\n  &#8211; Connector-level ACL ingestion where supported\n  &#8211; Query-time user context tied to corporate identity (verify the best practice for your identity setup in Kendra docs)\n&#8211; Front end:\n  &#8211; Internal portal using Cognito\/SSO authentication\n  &#8211; API layer (API Gateway + Lambda) calling Kendra Query APIs\n&#8211; Monitoring:\n  &#8211; CloudWatch alarms on sync failures and index health\n  &#8211; CloudTrail for audit trails<\/p>\n\n\n\n<p><strong>Why Amazon Kendra was chosen<\/strong>\n&#8211; Managed connectors reduce integration effort.\n&#8211; Better relevance for question-like queries compared to legacy keyword search.\n&#8211; Enterprise features (metadata, access control patterns) align with compliance needs.<\/p>\n\n\n\n<p><strong>Expected outcomes<\/strong>\n&#8211; Reduced time to locate policies.\n&#8211; Fewer compliance escalations due to outdated information usage.\n&#8211; Centralized search with clear auditing and governance.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example: Support knowledge base and RAG assistant<\/h3>\n\n\n\n<p><strong>Problem<\/strong>\n&#8211; A fast-growing startup has docs in S3 and a wiki, but support agents can\u2019t find answers quickly.\n&#8211; They want an LLM assistant, but need citations and reliable grounding.<\/p>\n\n\n\n<p><strong>Proposed architecture<\/strong>\n&#8211; Single Amazon Kendra index (start small).\n&#8211; S3 as the primary source for product docs and troubleshooting guides.\n&#8211; Simple metadata: <code>product_area<\/code>, <code>version<\/code>.\n&#8211; RAG service:\n  &#8211; Application queries Kendra to retrieve top passages\n  &#8211; Sends passages + citations to an LLM (for example, Amazon Bedrock)\n&#8211; Minimal ops:\n  &#8211; On-demand sync after doc updates, then move to scheduled sync when stable<\/p>\n\n\n\n<p><strong>Why Amazon Kendra was chosen<\/strong>\n&#8211; Faster setup than building a search cluster.\n&#8211; Works as a retrieval layer for RAG with citations back to source docs.\n&#8211; Reduces engineering time spent building search infrastructure.<\/p>\n\n\n\n<p><strong>Expected outcomes<\/strong>\n&#8211; Faster support resolution time.\n&#8211; Fewer escalations to engineering.\n&#8211; Higher customer satisfaction due to consistent answers with citations.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<p>1) <strong>Is Amazon Kendra the same as Amazon OpenSearch Service?<\/strong><br\/>\nNo. Amazon Kendra is a managed enterprise search service focused on ML relevance and connectors. Amazon OpenSearch Service is a managed search\/analytics engine (OpenSearch) that offers more low-level control and broader use cases (logs, metrics, custom search).<\/p>\n\n\n\n<p>2) <strong>Is Amazon Kendra a vector database?<\/strong><br\/>\nAmazon Kendra is not typically positioned as a general-purpose vector database. It provides ML-based relevance for enterprise search and retrieval. For dedicated vector similarity search, evaluate OpenSearch vector search or specialized vector stores. Verify current Kendra retrieval capabilities in the docs.<\/p>\n\n\n\n<p>3) <strong>Can Amazon Kendra enforce document permissions?<\/strong><br\/>\nYes, when configured correctly and when ACL ingestion\/user context is supported for your connector or ingestion approach. This requires careful identity mapping and testing\u2014verify the recommended approach in official docs.<\/p>\n\n\n\n<p>4) <strong>Does Amazon Kendra support Amazon S3 as a data source?<\/strong><br\/>\nYes. S3 is one of the most common Kendra data sources. You provide an IAM role for Kendra to read objects.<\/p>\n\n\n\n<p>5) <strong>Can I index content from SaaS tools like Confluence or SharePoint?<\/strong><br\/>\nKendra supports multiple connectors for third-party repositories. Connector availability and features vary\u2014check the current connector list in the Kendra documentation.<\/p>\n\n\n\n<p>6) <strong>How long does indexing take?<\/strong><br\/>\nIt depends on document count, size, and connector. For small labs, minutes. For large repositories, longer. Monitor sync status in the console.<\/p>\n\n\n\n<p>7) <strong>Can I run Kendra in multiple Regions?<\/strong><br\/>\nYou can create indexes in multiple Regions, but each is separate. Consider data residency, latency, and cost.<\/p>\n\n\n\n<p>8) <strong>Can I \u201cpause\u201d a Kendra index to stop hourly charges?<\/strong><br\/>\nTypically, managed indexes are billed while they exist. The reliable way to stop charges is usually to delete the index. Verify current billing behavior on the pricing page.<\/p>\n\n\n\n<p>9) <strong>How do I improve poor search relevance?<\/strong><br\/>\nStart with content hygiene and metadata:\n&#8211; Ensure titles and headings are meaningful\n&#8211; Add consistent metadata fields\n&#8211; Use relevance tuning (boost fields\/sources)\n&#8211; Add synonyms carefully\n&#8211; Remove duplicates and outdated versions<\/p>\n\n\n\n<p>10) <strong>What\u2019s the best way to support RAG with Amazon Kendra?<\/strong><br\/>\nUse Kendra to retrieve top relevant passages\/documents, then provide them as grounded context to an LLM (for example Amazon Bedrock). Keep citations (source URIs) and implement guardrails (don\u2019t let the model answer without retrieved context).<\/p>\n\n\n\n<p>11) <strong>Does Amazon Kendra integrate with Amazon Lex?<\/strong><br\/>\nKendra is commonly used as a knowledge search backend for chatbots. Validate the current best practice integration patterns in AWS docs for Lex and Kendra.<\/p>\n\n\n\n<p>12) <strong>How do I secure a public-facing search experience?<\/strong><br\/>\nDon\u2019t expose Kendra directly to browsers. Put an authenticated API layer in front (API Gateway + Lambda) and apply IAM least privilege, rate limiting, and logging.<\/p>\n\n\n\n<p>13) <strong>How do I handle scanned PDFs?<\/strong><br\/>\nKendra may not extract text well from scanned images. Perform OCR upstream (for example using Amazon Textract or another OCR solution) and index the extracted text.<\/p>\n\n\n\n<p>14) <strong>Can I index multiple S3 buckets?<\/strong><br\/>\nYes, typically by creating multiple S3 data sources, each with its own configuration and IAM access. Validate quotas and best practices for your scale.<\/p>\n\n\n\n<p>15) <strong>How do I track who changed index settings?<\/strong><br\/>\nEnable CloudTrail and review events for Kendra API calls (create\/update\/delete index, data source changes, sync triggers).<\/p>\n\n\n\n<p>16) <strong>What\u2019s the difference between a data source sync and custom ingestion?<\/strong><br\/>\n&#8211; Data source sync: Kendra pulls from a repository on schedule\/on-demand via connector.\n&#8211; Custom ingestion: your pipeline pushes documents into Kendra using APIs. Choose based on repository type and control needs.<\/p>\n\n\n\n<p>17) <strong>How do I avoid indexing sensitive data accidentally?<\/strong><br\/>\nUse inclusion\/exclusion patterns, metadata classification, and (if needed) enrichment-based redaction\/tagging. Apply IAM controls and review the content scope during onboarding.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn Amazon Kendra<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official Documentation<\/td>\n<td>Amazon Kendra Documentation<\/td>\n<td>Primary source for features, APIs, connectors, quotas, and security guidance. https:\/\/docs.aws.amazon.com\/kendra\/<\/td>\n<\/tr>\n<tr>\n<td>Official \u201cWhat is\u201d page<\/td>\n<td>What is Amazon Kendra?<\/td>\n<td>Good conceptual overview and core terminology. https:\/\/docs.aws.amazon.com\/kendra\/latest\/dg\/what-is-kendra.html<\/td>\n<\/tr>\n<tr>\n<td>Official Pricing<\/td>\n<td>Amazon Kendra Pricing<\/td>\n<td>Current pricing by edition\/Region and related capabilities. https:\/\/aws.amazon.com\/kendra\/pricing\/<\/td>\n<\/tr>\n<tr>\n<td>Pricing Tool<\/td>\n<td>AWS Pricing Calculator<\/td>\n<td>Build Region-specific estimates including related services. https:\/\/calculator.aws\/<\/td>\n<\/tr>\n<tr>\n<td>API Reference<\/td>\n<td>Amazon Kendra API Reference<\/td>\n<td>Exact request\/response shapes for Query, ingestion, and admin APIs. Verify latest endpoints via docs navigation: https:\/\/docs.aws.amazon.com\/kendra\/<\/td>\n<\/tr>\n<tr>\n<td>Security\/Auditing<\/td>\n<td>AWS CloudTrail User Guide<\/td>\n<td>How to audit Kendra API calls and set retention. https:\/\/docs.aws.amazon.com\/awscloudtrail\/latest\/userguide\/<\/td>\n<\/tr>\n<tr>\n<td>Monitoring<\/td>\n<td>Amazon CloudWatch Documentation<\/td>\n<td>Metrics, dashboards, and alarms for operational visibility. https:\/\/docs.aws.amazon.com\/cloudwatch\/<\/td>\n<\/tr>\n<tr>\n<td>Architecture Guidance<\/td>\n<td>AWS Architecture Center<\/td>\n<td>Patterns for building secure, scalable AWS solutions (search for Kendra references). https:\/\/aws.amazon.com\/architecture\/<\/td>\n<\/tr>\n<tr>\n<td>Samples (Trusted)<\/td>\n<td>AWS Samples on GitHub (search: \u201camazon kendra\u201d)<\/td>\n<td>Example code for querying and integration patterns. https:\/\/github.com\/aws-samples (use search for \u201ckendra\u201d)<\/td>\n<\/tr>\n<tr>\n<td>Videos<\/td>\n<td>AWS YouTube Channel (search: \u201cAmazon Kendra\u201d)<\/td>\n<td>Service deep dives, demos, and integration examples. https:\/\/www.youtube.com\/user\/AmazonWebServices<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>Beginners to experienced cloud\/DevOps practitioners<\/td>\n<td>AWS fundamentals, DevOps, and practical cloud labs (verify current Kendra coverage on site)<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Students and early-career engineers<\/td>\n<td>DevOps\/SCM basics, cloud introductions, hands-on learning<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud operations and platform teams<\/td>\n<td>Cloud operations practices, automation, operational readiness<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs, DevOps, operations engineers<\/td>\n<td>Reliability engineering practices, monitoring, incident response<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops + AI\/ML practitioners<\/td>\n<td>AIOps concepts, automation, AI-assisted operations<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>Cloud\/DevOps training content (verify specific offerings)<\/td>\n<td>Learners seeking guided training and mentorship<\/td>\n<td>https:\/\/rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps and cloud training<\/td>\n<td>Beginners to intermediate engineers<\/td>\n<td>https:\/\/www.devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps help\/training platform (verify services)<\/td>\n<td>Teams needing short-term coaching or implementation help<\/td>\n<td>https:\/\/www.devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support and training resources (verify scope)<\/td>\n<td>Operations teams seeking practical support-style learning<\/td>\n<td>https:\/\/www.devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company Name<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps consulting (verify offerings)<\/td>\n<td>Architecture reviews, implementation support, automation<\/td>\n<td>Designing an AWS search portal architecture; setting up IAM governance; CI\/CD for related apps<\/td>\n<td>https:\/\/cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>Training + consulting (verify offerings)<\/td>\n<td>Enablement, cloud\/DevOps delivery, workshops<\/td>\n<td>Kendra proof-of-concept; operational readiness; cost and security review for a search\/RAG rollout<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting (verify offerings)<\/td>\n<td>DevOps transformation, cloud operations, platform practices<\/td>\n<td>Secure AWS deployment patterns; monitoring strategy; IAM least-privilege design for Kendra apps<\/td>\n<td>https:\/\/www.devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before Amazon Kendra<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AWS fundamentals<\/strong><\/li>\n<li>IAM users\/roles\/policies, least privilege, <code>iam:PassRole<\/code><\/li>\n<li>S3 buckets, object permissions, bucket policies<\/li>\n<li>CloudWatch and CloudTrail basics<\/li>\n<li><strong>Search fundamentals<\/strong><\/li>\n<li>Precision\/recall, relevance, metadata, facets<\/li>\n<li>Content lifecycle and governance<\/li>\n<li><strong>Security fundamentals<\/strong><\/li>\n<li>Data classification, encryption basics, audit logging<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after Amazon Kendra<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>RAG architectures<\/strong><\/li>\n<li>Retrieval strategies, chunking, citations, evaluation<\/li>\n<li>Integrating Kendra retrieval with <strong>Amazon Bedrock<\/strong><\/li>\n<li><strong>Enterprise identity<\/strong><\/li>\n<li>IAM Identity Center, SAML\/OIDC concepts, user\/group mapping<\/li>\n<li><strong>Operational excellence<\/strong><\/li>\n<li>Dashboards, alarms, incident playbooks for ingestion failures<\/li>\n<li><strong>Alternatives<\/strong><\/li>\n<li>Amazon OpenSearch Service for custom ranking\/vector search<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use it<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud engineer \/ cloud developer<\/li>\n<li>Solutions architect<\/li>\n<li>DevOps \/ SRE (internal tooling and portals)<\/li>\n<li>Knowledge management engineer<\/li>\n<li>ML engineer (RAG integrations)<\/li>\n<li>Security engineer (governance and access control validation)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (if available)<\/h3>\n\n\n\n<p>AWS certifications don\u2019t typically focus on a single service, but relevant paths include:\n&#8211; AWS Certified Solutions Architect (Associate\/Professional)\n&#8211; AWS Certified Developer (Associate)\n&#8211; AWS Certified Machine Learning \/ AI-related certifications (check current AWS certification catalog)<br\/>\nVerify current certification names and availability: https:\/\/aws.amazon.com\/certification\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Build a secure internal search portal with:<\/li>\n<li>Cognito auth<\/li>\n<li>API Gateway + Lambda<\/li>\n<li>Kendra query + metadata filters<\/li>\n<li>Implement document enrichment:<\/li>\n<li>Add tags like <code>team<\/code>, <code>severity<\/code>, <code>service<\/code> based on content<\/li>\n<li>Build a RAG assistant:<\/li>\n<li>Kendra retrieval + Bedrock generation + citations<\/li>\n<li>Implement governance:<\/li>\n<li>Multiple indexes by environment with tagging + cost reporting<\/li>\n<li>CloudWatch alarms on sync failures<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>ACL (Access Control List):<\/strong> Rules that define which users\/groups can access a document. In search, ACLs must be enforced so users only see permitted results.<\/li>\n<li><strong>Connector (Data source):<\/strong> A managed integration that syncs documents from a repository (S3, wiki, SaaS tool) into Kendra.<\/li>\n<li><strong>Data source sync:<\/strong> The job that reads from the repository and updates the Kendra index.<\/li>\n<li><strong>Document enrichment:<\/strong> A pipeline step that transforms documents or adds metadata during ingestion (often using AWS Lambda).<\/li>\n<li><strong>Facet:<\/strong> A UI element that lets users filter results by a metadata field (for example department or date).<\/li>\n<li><strong>IAM:<\/strong> AWS Identity and Access Management\u2014controls permissions for AWS API calls and role assumption.<\/li>\n<li><strong>Index:<\/strong> The searchable structure Kendra builds from ingested documents.<\/li>\n<li><strong>Metadata:<\/strong> Structured fields attached to documents (owner, department, date, tags) used for filtering and relevance.<\/li>\n<li><strong>RAG (Retrieval-Augmented Generation):<\/strong> An architecture where a retrieval system fetches relevant context (documents\/passages) for an LLM to generate grounded answers.<\/li>\n<li><strong>Relevance tuning:<\/strong> Adjusting ranking behavior (boosting fields\/sources) to improve result quality.<\/li>\n<li><strong>Synonyms\/Thesaurus:<\/strong> Configuration that maps related terms (PTO\/vacation) to improve recall.<\/li>\n<li><strong>User context:<\/strong> Information about the querying user (identity\/groups) used to enforce access control during query.<\/li>\n<li><strong>VPC interface endpoint (PrivateLink):<\/strong> A private network path to AWS service APIs without traversing the public internet (availability varies; verify for Kendra).<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p>Amazon Kendra is AWS\u2019s managed enterprise search service in the <strong>Machine Learning (ML) and Artificial Intelligence (AI)<\/strong> category, built to index documents across repositories and return highly relevant results for natural language queries. It matters when your organization needs better-than-keyword relevance, unified search across silos, and a managed operational model with strong AWS integrations.<\/p>\n\n\n\n<p>From an architecture perspective, Kendra sits between content sources (often S3 and SaaS tools) and applications (portals, chatbots, and RAG assistants). Cost is primarily driven by the existence and edition\/capacity of indexes (often billed hourly), so cost control depends on minimizing unnecessary indexes, choosing the correct edition, and tuning sync schedules. Security depends on IAM least privilege, careful connector role design, encryption configuration (verify options), and\u2014if needed\u2014correct ACL and identity mapping so users only see what they should.<\/p>\n\n\n\n<p>Use Amazon Kendra when you want managed enterprise search with connectors and ML relevance. Consider alternatives like Amazon OpenSearch Service when you need deeper control, lower-level customization, or dedicated vector search. Next, extend this tutorial by adding metadata schema design, document enrichment via Lambda, and a production-grade authenticated search API\u2014then evaluate a RAG workflow using Amazon Bedrock with Kendra as the retrieval layer.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Machine Learning (ML) and Artificial Intelligence (AI)<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[20,32],"tags":[],"class_list":["post-240","post","type-post","status-publish","format-standard","hentry","category-aws","category-machine-learning-ml-and-artificial-intelligence-ai"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/240","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=240"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/240\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=240"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=240"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=240"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}