AWS Amazon Kendra Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Machine Learning (ML) and Artificial Intelligence (AI)

1. Introduction

Amazon Kendra is an AWS-managed enterprise search service that uses Machine Learning (ML) to help users find accurate answers and relevant documents across many content repositories (documents, wikis, knowledge bases, file shares, and SaaS tools).

In simple terms: you connect Amazon Kendra to your data (for example, an Amazon S3 bucket or a wiki), let it index content, and then your users can search using natural language (for example, “How do I reset my VPN token?”). Kendra returns ranked results and often highlights the exact passage that answers the question.

Technically, Amazon Kendra builds and manages a search index that combines semantic ranking, document understanding, metadata filtering, and (optionally) access control enforcement. It supports ingesting data through pre-built connectors (data sources), custom ingestion APIs, and document enrichment pipelines. Applications query Kendra via AWS APIs/SDKs, and can integrate the results into portals, chatbots, and Retrieval-Augmented Generation (RAG) workflows (for example, using Amazon Bedrock) without having to operate their own search infrastructure.

What problem it solves: Traditional keyword search often fails in enterprise environments because content is scattered, titles are inconsistent, users ask questions (not keywords), and relevance depends on context and permissions. Amazon Kendra aims to deliver “enterprise-grade search” with better relevance, easier integration, and managed operations.

2. What is Amazon Kendra?

Official purpose: Amazon Kendra is a fully managed intelligent search service for enterprise content. It is designed to help organizations index and search large volumes of unstructured and semi-structured data stored across AWS and third-party systems.

Core capabilities (high level): – Create and manage indexes for enterprise search. – Ingest content from data sources/connectors (for example Amazon S3, wiki platforms, CRM/ITSM tools, and web pages) and/or via custom ingestion APIs. – Run natural-language queries and return ranked results with highlighted passages and metadata. – Apply filters/facets and relevance tuning. – Enforce document visibility with access control (when configured and supported by the ingestion method/connector). – Improve ingestion quality with document enrichment (for example, extracting metadata, transforming text, or adding tags via AWS Lambda).

Major components: – Index: The core searchable repository that stores processed content and metadata. – Data sources (connectors): Managed connectors and sync jobs that pull documents, metadata, and (in some cases) ACLs from repositories into the index. – Custom document ingestion: APIs to push documents directly (useful for proprietary systems or custom pipelines). – Query APIs: APIs to query/search the index and retrieve results. – Access control configuration: Options to associate user identity/group information with documents and enforce “who can see what” during query. – Relevance tuning & metadata: Field mapping, boosting, facets, and query-time filtering.

Service type: Fully managed AWS service (SaaS-like within AWS). You do not manage servers, clusters, or shards.

Scope (regional/global/account): – Amazon Kendra is a regional service. You create indexes in a specific AWS Region, and data sources/sync jobs run in that Region. – Resources are account-scoped within the Region (subject to IAM permissions).
Verify current Region availability in the official docs: https://docs.aws.amazon.com/kendra/latest/dg/what-is-kendra.html

How it fits into the AWS ecosystem: – Uses IAM for authentication/authorization to Kendra APIs and for granting Kendra permission to read from your repositories (for example, S3). – Integrates with AWS KMS for encryption at rest (service-managed encryption and/or customer-managed keys depending on configuration—verify exact options in docs). – Works well with Amazon S3 (common document store), AWS Lambda (document enrichment), Amazon CloudWatch (metrics), and AWS CloudTrail (API auditing). – Commonly paired with Amazon Lex (chatbots), Amazon Bedrock (RAG), AWS IAM Identity Center (enterprise identity), and application front ends via Amazon Cognito, API Gateway, or custom web apps.

3. Why use Amazon Kendra?

Business reasons

Faster answers, less time wasted: Reduce time employees spend hunting through wikis, PDFs, ticket systems, and shared drives.
Better self-service: Improve customer or employee self-service by indexing knowledge bases and support documentation.
Improved knowledge reuse: Make institutional knowledge discoverable even when content is poorly titled or inconsistently tagged.

Technical reasons

Natural language search: Designed for question-like queries, not just keywords.
Connectors reduce integration time: Many common repositories can be indexed without writing a custom crawler.
Metadata filtering and facets: Combine semantic ranking with structured filtering (department, product, date, confidentiality, etc.).
APIs for application integration: Use AWS SDKs to embed search into portals, apps, and chat systems.

Operational reasons

Managed infrastructure: No cluster provisioning, patching, scaling, or shard management.
Repeatable ingestion: Scheduled or on-demand sync jobs with status tracking.
Observability hooks: Metrics and audit events integrate into standard AWS operational tooling.

Security/compliance reasons

AWS IAM integration: Fine-grained control over who can administer indexes and data sources.
Encryption and auditing: Standard AWS encryption and API audit patterns (verify exact encryption capabilities for your configuration in official docs).
Access control-aware search (when configured): Search results can be filtered by user identity/ACL rules, which is critical for enterprise content.

Scalability/performance reasons

Designed for enterprise content volumes: Kendra is intended for large document sets and many concurrent users (subject to quotas and edition limits).
Relevance at scale: ML ranking is managed by AWS; you focus on content quality and metadata.

When teams should choose Amazon Kendra

Choose Amazon Kendra when you need: – Enterprise search across multiple repositories – Strong relevance for natural language queries – Managed connectors and ingestion workflows – Access control-aware search in a managed service – A search layer that can feed RAG systems (retrieve relevant passages for an LLM)

When teams should not choose Amazon Kendra

Consider alternatives when: – You only need simple keyword search on one small dataset (OpenSearch or database full-text search may be cheaper/simpler). – You need full control of ranking algorithms, analyzers, or low-level search internals (OpenSearch/Elasticsearch gives more control). – You are primarily building vector similarity search with custom embeddings and scoring (Amazon OpenSearch Service vector search or purpose-built vector databases may fit better; Kendra is not marketed as a general vector database). – You have strict constraints that require on-prem-only operation (Kendra is an AWS managed service).

4. Where is Amazon Kendra used?

Industries

Technology & SaaS: Internal engineering knowledge search, runbooks, product docs.
Financial services: Policy/procedure search, compliance documents, internal knowledge bases (with strong access control requirements).
Healthcare & life sciences: Research document discovery, internal SOPs (ensure compliance requirements are met).
Manufacturing: Maintenance manuals, part catalogs, troubleshooting docs.
Retail & e-commerce: Customer support knowledge, product information aggregation.
Public sector/education: Policy search, intranet knowledge, research repositories (subject to governance requirements).

Team types

Platform teams building internal portals
Support engineering / IT service management teams
Data/ML teams building RAG assistants
Security and compliance teams managing knowledge access
DevOps/SRE teams indexing operational runbooks and incident retrospectives

Workloads

Enterprise search portals and intranets
Support agent assist tools (“suggest the best KB article for this ticket”)
Knowledge retrieval layer for chatbots and virtual assistants
Compliance and policy discovery
Document discovery across multiple silos

Architectures

Central search index with multiple connectors
Per-department index model with strict separation (sometimes used for governance/cost control)
RAG architecture: Kendra retrieval → LLM summarization/answering (via Amazon Bedrock or another LLM platform)

Real-world deployment contexts

Indexing documents from S3 + SharePoint + Confluence into one index
Adding an internal search bar to a company portal
Integrating with ticketing systems for support knowledge

Production vs dev/test usage

Dev/test: Usually one small index, limited documents, infrequent sync. Delete when not needed to control hourly costs.
Production: Carefully designed index strategy, ingestion schedules, monitoring, ACL enforcement, and change management for metadata/schema.

5. Top Use Cases and Scenarios

Below are realistic scenarios where Amazon Kendra is commonly a strong fit.

1) Internal IT Helpdesk Knowledge Search

Problem: Employees submit repetitive tickets because solutions are hard to find.
Why Kendra fits: Indexes IT knowledge articles, PDFs, and runbooks; supports natural language questions.
Scenario: “How do I connect to the corporate VPN from macOS?” returns the exact step-by-step doc and highlights the relevant passage.

2) Support Agent Assist for Faster Ticket Resolution

Problem: Support agents waste time searching multiple systems while on a call.
Why Kendra fits: Single search layer over KB + product docs + past resolutions; can filter by product/version.
Scenario: A support console calls Kendra for each ticket, showing top 5 suggested articles and known fixes.

3) Enterprise Policy and Compliance Search

Problem: Policies exist in many PDFs and sites; people can’t find the “right” version.
Why Kendra fits: Indexes policies with metadata (effective date, owner, department). Facets improve discovery.
Scenario: “Travel reimbursement for contractors” returns the correct policy section with excerpt.

4) Engineering Runbook and Incident Retrospective Discovery

Problem: On-call engineers can’t quickly find relevant runbooks and past incidents.
Why Kendra fits: Natural language works well (“latency spike in us-east-1”), and metadata filters help (service/team).
Scenario: During an outage, a chatbot uses Kendra to retrieve relevant runbooks and links.

5) HR and Employee Self-Service Portal

Problem: Employees ask HR the same questions repeatedly.
Why Kendra fits: Index HR policies, benefits docs, and internal wiki pages; support synonyms (PTO vs vacation).
Scenario: “How many vacation days do I have?” returns the benefits guide and highlights the PTO accrual section.

6) Knowledge Search Across Multiple SaaS Tools

Problem: Critical content is spread across Confluence, SharePoint, Google Drive, and internal docs.
Why Kendra fits: Connectors reduce custom development; unified index improves user experience.
Scenario: A single search UI queries Kendra and returns results from multiple sources with source badges.

7) Product Documentation Search for Customers (Authenticated)

Problem: Customers can’t find relevant product docs quickly; search results are noisy.
Why Kendra fits: Can index documentation and support semantic ranking; combine with authentication and filtering.
Scenario: Logged-in customers search “configure SSO for Okta”, getting the best matching docs.

8) Retrieval Layer for RAG (LLM Assistants)

Problem: LLMs hallucinate without trustworthy context and citations.
Why Kendra fits: Retrieves relevant documents/passages; your app can provide sources to the LLM.
Scenario: A Bedrock-powered assistant uses Kendra results as context and returns an answer with citations.

9) Document Discovery for Research/Legal Teams

Problem: Teams need to find documents and clauses quickly across large corpora.
Why Kendra fits: Semantic ranking and excerpt highlighting help locate relevant sections.
Scenario: “Indemnification clause termination” retrieves and highlights the clause across templates.

10) Central Search for Technical Training Materials

Problem: Training content is fragmented; learners can’t find the right lab or module.
Why Kendra fits: Indexes PDFs, HTML, and wiki pages; metadata facets by course/topic.
Scenario: “Kubernetes ingress troubleshooting lab” returns the lab guide and prerequisites.

11) M&A Knowledge Integration

Problem: After an acquisition, documentation is split across two toolchains.
Why Kendra fits: Connectors can index both repositories into one searchable index (with governance).
Scenario: Users search once and see results labeled by legacy company source.

12) Field Service / Maintenance Manual Search

Problem: Technicians need answers quickly from manuals and service bulletins.
Why Kendra fits: PDF-heavy content and question queries are common; excerpt highlighting is useful.
Scenario: “Error code E17 compressor” returns the manual section and troubleshooting steps.

6. Core Features

Note: Amazon Kendra features evolve. Always verify the latest capabilities and connector list in the official documentation: https://docs.aws.amazon.com/kendra/

1) Managed indexes (Developer/Enterprise editions)

What it does: Provides managed search indexes without running servers.
Why it matters: Removes operational burden of scaling and maintaining search clusters.
Practical benefit: Faster time to value; consistent managed experience.
Caveats: Pricing is typically hourly and can be significant; choose the correct edition and delete unused indexes.

2) Data source connectors (managed ingestion)

What it does: Connects to supported repositories and syncs documents and metadata into Kendra.
Why it matters: Integration is often the hardest part of enterprise search.
Practical benefit: Faster onboarding for common systems (for example S3 and popular collaboration tools).
Caveats: Connector availability and ACL support vary by connector; verify capabilities per connector in docs.

3) Custom document ingestion APIs

What it does: Allows you to push documents directly using APIs (for proprietary systems or event-driven pipelines).
Why it matters: Not every repository has a connector.
Practical benefit: You can index content generated by applications or stored in custom databases.
Caveats: You must manage batching, retries, idempotency, and mapping metadata fields correctly.

4) Natural language query understanding and semantic ranking

What it does: Interprets user questions and ranks results using ML-based relevance.
Why it matters: Users ask questions (“How do I…”) rather than exact keywords.
Practical benefit: Better top results and fewer “no results found” experiences.
Caveats: Relevance depends on content quality, metadata, and correct field mappings.

5) Excerpts, highlights, and answer-like results

What it does: Returns snippets from documents that match the query and highlights relevant passages.
Why it matters: Users can quickly validate if a result contains the answer.
Practical benefit: Faster click-through and less time scanning long documents.
Caveats: Quality varies by document format and extraction; scanned PDFs may require OCR before indexing.

6) Metadata schema, facets, and filtering

What it does: Supports metadata fields and query-time filters (and faceted navigation in UIs).
Why it matters: Enterprise search often needs “filter by department/product/date/confidentiality”.
Practical benefit: Higher precision searches, better UX for large corpora.
Caveats: You must design metadata carefully; incorrect field types or sparse metadata reduces usefulness.

7) Relevance tuning (boosting, field importance)

What it does: Adjusts ranking by boosting certain fields or data sources.
Why it matters: Business context matters (official policies > drafts, latest version > old).
Practical benefit: Aligns search results with what users actually need.
Caveats: Over-boosting can hide relevant results; test changes and monitor user feedback.

8) Synonyms / thesaurus (terminology alignment)

What it does: Helps treat related terms as equivalent (for example, “PTO” and “vacation”).
Why it matters: Organizations use inconsistent terminology.
Practical benefit: Better recall and fewer missed results.
Caveats: Poor synonym design can increase noise; treat as a controlled vocabulary.

9) Document enrichment (preprocessing and metadata extraction)

What it does: Applies transformations to documents during ingestion (often via AWS Lambda) to add metadata, redact content, or normalize text.
Why it matters: “Garbage in, garbage out” applies strongly to enterprise search.
Practical benefit: Adds tags, cleans up content, extracts key fields for filtering.
Caveats: Enrichment adds complexity, latency, and Lambda costs; ensure idempotency and handle failures.

10) Access control-aware search (ACLs and user context)

What it does: Restricts results so users only see documents they are permitted to view.
Why it matters: Enterprise content is rarely all-public.
Practical benefit: Enables indexing sensitive repositories while respecting permissions.
Caveats: Correct ACL ingestion and identity mapping is critical; support varies by connector and configuration. Verify connector ACL support and identity requirements.

11) Query suggestions (type-ahead)

What it does: Can suggest queries as users type (depending on configuration and API usage).
Why it matters: Improves UX and helps users discover common queries.
Practical benefit: Faster search and more consistent query patterns.
Caveats: Suggestions are not always desired for sensitive environments; evaluate privacy and UX.

12) Amazon Kendra Intelligent Ranking (related capability)

What it does: Provides ML-based re-ranking for search results from other search engines (for example, OpenSearch/Elasticsearch), so you can improve relevance without migrating the index.
Verify current compatibility in official docs.
Why it matters: Many organizations already have search engines but want better ranking.
Practical benefit: Incremental improvement path.
Caveats: This is a distinct capability with its own setup and pricing model; not the same as running a full Kendra index.

7. Architecture and How It Works

High-level service architecture

Amazon Kendra sits between your content repositories and your search applications:

Ingestion: Kendra connects to content repositories via data source connectors or receives documents via APIs.
Processing: It extracts text and metadata, applies enrichment (optional), and builds an index.
Query: Applications call Kendra query APIs. Kendra evaluates user query + filters + (optional) user context for access control.
Results: Returns ranked documents/snippets, metadata, and links back to the source.

Request / data / control flows

Control plane: Create index, configure schema, configure data sources, run syncs, manage relevance tuning, and configure access control.
Data plane: Documents flow into the index during sync/ingestion; queries flow from apps to Kendra and results back.
Security plane: IAM policies govern who can administer and query. IAM roles govern what Kendra can read from data sources (for example, S3). Optional user identity context can constrain results.

Integrations with related AWS services

Common integrations include: – Amazon S3 for document storage and as a primary data source. – AWS Lambda for document enrichment during ingestion (custom metadata extraction, normalization, redaction workflows). – Amazon CloudWatch for metrics (and operational dashboards/alarms). – AWS CloudTrail for auditing API calls. – AWS KMS for encryption keys (depending on configuration). – Amazon Cognito / IAM Identity Center for authenticating end users of a search portal. – Amazon Bedrock (or other LLM providers) for RAG: Kendra retrieves relevant context; the LLM generates an answer with citations.

Dependency services

For data sources: the repository itself (S3, SaaS, on-prem connectors via required connectivity).
For enrichment: Lambda (and any services your Lambda calls).
For identity/ACL: your identity provider and group mapping strategy.

Security/authentication model

AWS API authentication: IAM (SigV4). Applications use IAM roles/users to call Kendra APIs.
Repository access: Kendra assumes a service role you provide for connectors (for example, an IAM role granting s3:GetObject on a bucket).
End-user authorization: If you need “per-user” result filtering, you typically pass user context to the query and ensure ACLs were ingested correctly. The exact approach depends on connector and identity strategy—verify in official docs.

Networking model

Kendra is a managed regional AWS service with public regional endpoints.
Many AWS services support VPC interface endpoints (AWS PrivateLink) to keep traffic on the AWS network. Availability can vary by Region and service—verify PrivateLink support for Amazon Kendra in your Region in official AWS documentation.

Monitoring/logging/governance considerations

Track:
Index status and data source sync status (success/failure, document counts).
Query volume and latency (metrics).
API activity (CloudTrail).
Govern:
Index naming/tagging standards.
IAM least privilege for admins, sync roles, and query clients.
Cost controls: number of indexes, edition choice, and sync schedules.

Simple architecture diagram (Mermaid)

flowchart LR
  U[Users / App] -->|Query API| K[Amazon Kendra Index]
  S3[(Amazon S3 Documents)] -->|Data Source Sync| K
  K -->|Ranked results + excerpts| U

Production-style architecture diagram (Mermaid)

flowchart TB
  subgraph Identity
    IDP[Corporate IdP] --> IC[IAM Identity Center / SSO Mapping]
    IC --> COG[Amazon Cognito / App Auth Layer]
  end

  subgraph Content
    S3[(Amazon S3)]
    CONF[Confluence / Wiki]
    SP[SharePoint]
    ITSM[Service Desk / ITSM]
  end

  subgraph Ingestion
    DS[Amazon Kendra Data Sources] --> IDX[Amazon Kendra Index]
    L[Document Enrichment\n(AWS Lambda)] --> IDX
  end

  S3 --> DS
  CONF --> DS
  SP --> DS
  ITSM --> DS

  subgraph Apps
    PORTAL[Internal Search Portal]
    BOT[Chatbot / Agent Assist]
    RAG[RAG Service]
    LLM[Amazon Bedrock (LLM)]
  end

  COG --> PORTAL
  PORTAL -->|Query + Filters + User Context| IDX
  BOT -->|Query| IDX

  RAG -->|Retrieve relevant passages| IDX
  RAG -->|Context + citations| LLM
  LLM -->|Answer| RAG

  subgraph Security_Operations
    IAM[IAM Roles/Policies]
    KMS[AWS KMS]
    CW[Amazon CloudWatch Metrics/Alarms]
    CT[AWS CloudTrail]
  end

  IAM --> DS
  IAM --> IDX
  KMS --> IDX
  IDX --> CW
  DS --> CW
  IDX --> CT
  DS --> CT

8. Prerequisites

AWS account and billing

An active AWS account with billing enabled.
Amazon Kendra is not a “free” service by default; expect hourly charges for indexes. Plan to clean up resources after labs.

Permissions / IAM roles

You will need permissions to: – Create and manage Kendra resources: index, data sources, sync jobs. – Create and manage an S3 bucket and upload sample documents. – Create or pass an IAM role for Kendra to access S3 (and optionally KMS).

Typical IAM permissions (high level): – kendra:* for lab/admin (scope down for production). – s3:CreateBucket, s3:PutObject, s3:GetObject, s3:ListBucket. – iam:CreateRole, iam:PutRolePolicy, iam:PassRole. – kms:* only if using customer-managed keys (scope down in production). – cloudtrail:LookupEvents and CloudWatch read permissions for validation.

Tools

AWS Management Console access, or:
AWS CLI v2 (optional, used in validation steps)
(Optional) Python 3 + boto3 for query examples

Region availability

Choose an AWS Region where Amazon Kendra is available.
Verify availability and supported Regions: https://docs.aws.amazon.com/kendra/latest/dg/what-is-kendra.html

Quotas / limits

Amazon Kendra has service quotas (for example, number of indexes, document limits, throughput).
Always verify current quotas in Service Quotas and Kendra documentation:
https://docs.aws.amazon.com/servicequotas/
https://docs.aws.amazon.com/kendra/

Prerequisite services

For this tutorial: – Amazon S3 (for storing sample documents) – IAM (for roles/policies)

Optional (production patterns): – CloudTrail (recommended) – CloudWatch alarms/dashboards (recommended) – KMS customer-managed key (optional; verify support/configuration requirements)

9. Pricing / Cost

Amazon Kendra pricing changes over time and varies by Region and edition. Do not rely on blog posts for exact numbers—use official pricing.

Official pricing page: https://aws.amazon.com/kendra/pricing/
AWS Pricing Calculator: https://calculator.aws/

Pricing dimensions (typical model)

Amazon Kendra cost is generally driven by: – Index capacity billed per time (hourly), and the edition (for example Developer vs Enterprise).
Exact included capacity and scaling model is documented on the pricing page—verify the current edition definitions and hourly rates. – Potential additional charges for related capabilities such as Amazon Kendra Intelligent Ranking (if used).
Verify on the official pricing page and the Intelligent Ranking docs.

Kendra also indirectly drives costs in connected services: – S3 storage for your documents. – Data transfer (for example, if connectors pull from outside AWS or across Regions). – AWS Lambda costs if you use enrichment functions. – Secrets management costs if connectors require stored credentials (for example AWS Secrets Manager). – CloudWatch costs for metrics, dashboards, and alarms (usually modest, but not always zero). – KMS costs if using customer-managed keys (API requests, key usage).

Free tier

Amazon Kendra historically has not had a broad “always-free” tier like some AWS services. Some AWS services offer limited free usage, but verify whether Amazon Kendra currently offers any free tier or trial on the pricing page.

Key cost drivers

Number of indexes: Each index typically incurs hourly charges. Multiple environments (dev/test/prod) can multiply cost quickly.
Edition choice: Developer vs Enterprise can change the baseline hourly cost and capacity.
Document volume and update frequency: More content and frequent re-syncs may require larger capacity or more operational effort (pricing impact depends on current Kendra model—verify).
Query volume: Depending on pricing model, queries may or may not be a direct line item. Verify on pricing page.
Enrichment: Lambda-based enrichment can add compute cost and ingestion latency.

Network/data transfer implications

Uploading documents to S3 in the same Region as the Kendra index is typically the simplest and avoids cross-Region transfer.
If indexing external SaaS or on-prem systems, network egress and connector connectivity can introduce costs (and security constraints).

How to optimize cost

Start with one index and prove value before scaling to many per team/department.
Use Developer edition for labs/dev when appropriate (verify edition constraints).
Minimize idle indexes: If you don’t need an index, delete it. (Kendra is managed; you typically can’t “stop” it to avoid hourly costs.)
Control sync frequency: Don’t sync every 5 minutes if daily is enough.
Keep documents clean: Avoid indexing duplicates, stale versions, and low-value content.
Evaluate alternatives for simple search: If requirements are basic keyword search, OpenSearch can be more cost-efficient.

Example low-cost starter estimate (no fabricated numbers)

A typical low-cost lab scenario includes: – 1 Kendra index (Developer edition if supported/appropriate) – 1 S3 data source – A small set of documents (10–100) – A single manual sync – A few interactive queries for validation

How to estimate: 1. Go to https://aws.amazon.com/kendra/pricing/ and identify the hourly rate for the chosen edition in your Region. 2. Multiply by the number of hours you plan to keep the index. 3. Add S3 storage (small) and any Lambda enrichment costs (if used).

Example production cost considerations

For production, account for: – Multiple indexes (prod + staging + dev) – Higher availability requirements and governance (more tooling and operational work) – Ongoing sync schedules – Identity/ACL integration (often increases complexity and operational overhead) – Potential need for multiple data sources and content growth – RAG usage: additional costs for LLM inference (Amazon Bedrock) and any caching layers

10. Step-by-Step Hands-On Tutorial

Objective

Build a small, realistic Amazon Kendra search experience on AWS by: 1. Creating a Kendra index 2. Indexing documents stored in Amazon S3 3. Querying the index via the console and AWS CLI 4. Cleaning up resources to avoid ongoing charges

Lab Overview

You will create: – An S3 bucket with a few small text/HTML/PDF documents (keep it minimal) – An IAM role that Amazon Kendra assumes to read from the bucket – An Amazon Kendra index (choose the lowest-cost edition appropriate for labs—verify current options) – An S3 data source and a one-time sync job – A few test queries to validate results

Expected outcome: You can type a natural language query and get ranked results with excerpts from your uploaded documents.

Step 1: Choose a Region and confirm service access

In the AWS Console, select an AWS Region where Amazon Kendra is supported.
Confirm Amazon Kendra console loads: https://console.aws.amazon.com/kendra/
(Recommended) Confirm you have permission to create IAM roles and S3 buckets.

Expected outcome: You can open the Amazon Kendra console without permission errors.

Step 2: Create an S3 bucket and upload sample documents

Open the S3 console: https://console.aws.amazon.com/s3/
Create a bucket (example):
– Bucket name: kendra-lab-<your-unique-suffix> – Region: same as your Kendra index Region – Keep defaults for a lab (do not enable public access)
Upload a few small documents. Create three files locally and upload them:

vpn-reset.txt

VPN Token Reset Procedure
1) Open the VPN portal.
2) Click "Reset token".
3) Confirm using MFA.
If you are locked out, contact IT Support.

expense-policy.txt

Travel Expense Policy (Summary)
- Meals are reimbursable up to the daily limit.
- Receipts are required for expenses over $25.
- Contractors must obtain manager approval before booking travel.

oncall-runbook.txt

On-Call Runbook: API Latency Spikes
1) Check dashboards for error rates and p95 latency.
2) Review recent deployments.
3) Verify upstream dependencies.
4) If needed, rollback the last deployment.

Upload these into a prefix like docs/ (optional but tidy): – docs/vpn-reset.txt – docs/expense-policy.txt – docs/oncall-runbook.txt

Expected outcome: Your S3 bucket contains at least 3 documents.

Verification: In S3 console, open the bucket → verify objects exist and you can view/download them.

Step 3: Create an IAM role for Amazon Kendra to read from S3

Amazon Kendra needs permission to read objects from your S3 bucket when running the data source sync.

Open IAM console: https://console.aws.amazon.com/iam/
Create a role: – Trusted entity type: AWS service – Use case: look for Amazon Kendra (or a generic service trust if presented differently) – If the console doesn’t provide a Kendra-specific wizard, use the trust policy recommended by Kendra documentation.
Verify in official docs: https://docs.aws.amazon.com/kendra/
Attach a policy that allows reading the bucket (minimum for this lab):

Replace kendra-lab-<your-unique-suffix> with your bucket name:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ListBucket",
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": ["arn:aws:s3:::kendra-lab-<your-unique-suffix>"]
    },
    {
      "Sid": "ReadObjects",
      "Effect": "Allow",
      "Action": ["s3:GetObject"],
      "Resource": ["arn:aws:s3:::kendra-lab-<your-unique-suffix>/*"]
    }
  ]
}

Name the role something like: KendraS3DataSourceRole.

Expected outcome: You have an IAM role that Amazon Kendra can assume to read your S3 documents.

Verification: In IAM → Roles → open the role → confirm: – Trust relationship includes Kendra service principal (as documented) – Permissions include s3:ListBucket and s3:GetObject for the bucket

Step 4: Create an Amazon Kendra index

Open Amazon Kendra console: https://console.aws.amazon.com/kendra/
Choose Create an index
Configure: – Index name: kendra-lab-index – Description: optional – IAM role: choose/create the service role for Kendra index management as prompted – Edition: choose the lowest-cost edition suitable for labs (often “Developer edition”, if available).
Verify current edition options and constraints in the console and pricing page.
Create the index.

Expected outcome: The index enters a “Creating” state, then becomes “Active/Ready”.

Verification: In Kendra console, the index status shows Active (or equivalent) before proceeding.

Common wait time: Several minutes.

Step 5: Add an S3 data source to the index

In Kendra console, open your index: kendra-lab-index
Go to Data sources → Add data source
Choose Amazon S3
Configure the data source: – Name: kendra-lab-s3 – S3 bucket: kendra-lab-<your-unique-suffix> – (Optional) Inclusion prefix: docs/ to limit indexing – IAM role: select KendraS3DataSourceRole – Sync schedule: set to Run on demand (or disable schedule) for the lab
Save/add the data source.

Expected outcome: The data source is created and ready to run a sync.

Verification: The data source appears in the list with a status like “Ready” (wording may vary).

Step 6: Run a sync job

In the data source details page, choose Sync now (or Run).
Monitor the sync status: – It will move through states such as “Syncing”, then “Succeeded” or “Failed”.

Expected outcome: Sync completes successfully and documents are indexed.

Verification: – Data source shows Last sync status: Succeeded – Document count in index statistics increases (exact UI varies)

Step 7: Query the index in the Kendra console

In the index view, open the Search console (Kendra provides a built-in test search UI)
Run queries such as: – How do I reset my VPN token? – expense receipts required – what to do during API latency spikes

Expected outcome: You get ranked results and excerpts matching the correct document.

Verification tips: – The VPN query should return vpn-reset.txt near the top. – The expense query should return expense-policy.txt. – The on-call query should return oncall-runbook.txt.

Step 8 (Optional): Query using AWS CLI

This helps you validate programmatic access—how real applications will query Kendra.

8.1 Configure AWS CLI

aws configure
aws sts get-caller-identity

8.2 Find your index ID

In the Kendra console, open the index details and copy the Index ID.

Or use CLI (if permissions allow):

aws kendra list-indices

8.3 Run a query

Replace INDEX_ID:

aws kendra query \
  --index-id "INDEX_ID" \
  --query-text "How do I reset my VPN token?"

Expected outcome: JSON output includes matching document(s), titles, URIs, and excerpt text fields.

Step 9 (Optional): Query with Python (boto3)

Install dependencies:

python3 -m pip install boto3

Example script (query_kendra.py):

import boto3

INDEX_ID = "INDEX_ID"

kendra = boto3.client("kendra")

resp = kendra.query(
    IndexId=INDEX_ID,
    QueryText="expense receipts required",
)

for item in resp.get("ResultItems", []):
    print(item.get("Type"), item.get("DocumentTitle", {}))
    excerpt = item.get("DocumentExcerpt", {}).get("Text", "")
    if excerpt:
        print("Excerpt:", excerpt[:200])
    print("---")

Run:

python3 query_kendra.py

Expected outcome: Printed results include excerpts referencing receipts and policy limits.

Validation

Use this checklist:

[ ] Index status is Active
[ ] Data source last sync status is Succeeded
[ ] Query in console returns correct top documents
[ ] (Optional) CLI query returns structured results
[ ] (Optional) Python query prints excerpts

If validation fails, use the troubleshooting section.

Troubleshooting

Common issues and fixes:

Data source sync failed: AccessDenied to S3 – Cause: IAM role doesn’t have correct s3:ListBucket/s3:GetObject permissions, or bucket policy blocks access. – Fix:
- Recheck IAM role permissions.
- Ensure bucket policy doesn’t deny access.
- Confirm the data source is using the intended role.
Index never becomes Active – Cause: Missing service-linked roles/permissions, or account restrictions. – Fix:
- Check IAM permissions for creating Kendra resources.
- Check AWS Health Dashboard and service limits.
- Review CloudTrail for failed API calls.
No results returned – Cause: Sync didn’t index documents, wrong prefix, unsupported file type, or query mismatch. – Fix:
- Confirm objects exist under the inclusion prefix.
- Confirm the sync status and document counts.
- Try simpler queries (keywords) to sanity-check.
Results returned but excerpts look empty – Cause: Document text extraction issues (format, encoding, scanned PDFs). – Fix:
- Use plain text files to validate.
- For PDFs, ensure they contain selectable text (OCR may be required upstream).
CLI returns AccessDeniedException for kendra:Query – Cause: Your IAM identity doesn’t have query permissions. – Fix: Attach an IAM policy allowing kendra:Query on the index ARN.

Cleanup

To avoid ongoing charges, delete resources in this order:

Delete Kendra index – Kendra console → Indexes → select kendra-lab-index → Delete – This is the main cost driver; delete it even if you keep the S3 bucket.
Delete Kendra data source (if required separately by the console flow)
Delete S3 objects and bucket – Empty the bucket – Delete the bucket
Delete IAM role – IAM → Roles → delete KendraS3DataSourceRole (and any inline policies)

Expected outcome: No Kendra index remains, preventing further hourly charges.

11. Best Practices

Architecture best practices

Design your index strategy intentionally
One index for the whole organization is simpler for users, but can be harder for governance and cost attribution.
Multiple indexes (per department/app) can simplify access control boundaries but increases cost and operational overhead.
Keep data in-region
Store documents in S3 in the same Region as the Kendra index to reduce latency and cross-Region transfer.
Use metadata as a first-class design element
Define fields like department, product, document_type, effective_date, owner, confidentiality.
Enforce consistent tagging at ingestion time.

IAM/security best practices

Least privilege
Separate roles for: Kendra admins, data source sync role(s), and application query role(s).
Restrict who can modify relevance tuning
Ranking changes can impact business outcomes; treat as controlled configuration with change management.
Use explicit iam:PassRole constraints
Allow passing only the specific data source role(s) to Kendra.

Cost best practices

Delete dev/test indexes quickly
Kendra index hourly charges can accumulate.
Avoid duplicate content
Deduplicate documents and remove outdated versions.
Tune sync schedules
Sync only as often as needed; use on-demand sync for low-change repositories.

Performance best practices

Use filters to narrow broad queries
Expose facets in UIs where possible.
Optimize document quality
Prefer machine-readable PDFs and clean text. Extract text from scanned docs before indexing.

Reliability best practices

Monitor sync health
Alert on sync failures or prolonged sync durations.
Have a rollback plan for schema changes
Metadata schema changes can affect filters and relevance.

Operations best practices

Tag resources
Add tags for Env, App, Owner, CostCenter, DataClassification.
Use CloudTrail for auditing
Track who changed data sources, index settings, or access control configuration.
Document connector credentials lifecycle
Rotate credentials and store in secure services (for example AWS Secrets Manager) when applicable.

Governance/tagging/naming best practices

Naming:
kendra-<app>-<env>-index
kendra-<app>-<env>-ds-<source>
Tagging:
Environment=dev|staging|prod
OwnerEmail=...
CostCenter=...
DataSensitivity=public|internal|confidential

12. Security Considerations

Identity and access model

Administrative access is controlled by IAM permissions (create/update/delete indexes and data sources).
Query access should be granted to application roles/users with kendra:Query (and related APIs needed).
Data source access is controlled by the IAM role Kendra assumes to read the repository (for example S3).
End-user document access control requires:
Correct ACL ingestion (connector-dependent)
Correct identity mapping (user/group) provided to Kendra at query time (implementation varies—verify in docs)

Encryption

In transit: Use HTTPS endpoints for Kendra APIs.
At rest: Kendra stores indexed content and metadata. Encryption at rest is expected in AWS services; confirm your exact options (AWS-owned keys vs customer-managed keys) in Kendra docs and console for your Region.

Network exposure

By default, applications call regional Kendra endpoints over the internet (HTTPS).
If your environment requires private connectivity, check whether VPC interface endpoints (PrivateLink) are available for Kendra in your Region and architecture. Verify in official AWS docs.

Secrets handling

For connectors that require credentials (SaaS systems), store secrets in a managed secret store (commonly AWS Secrets Manager) and restrict access with IAM.
Rotate credentials and audit secret access.

Audit/logging

Enable CloudTrail in all Regions (or at least the Kendra Region) to log Kendra API calls.
Use CloudWatch metrics and alarms for:
Data source sync failures
Unusual query spikes
Operational anomalies

Compliance considerations

Understand what content is indexed (including sensitive fields).
Ensure your access control model matches compliance requirements (least privilege, separation of duties).
For regulated industries, validate:
Data residency (Region)
Encryption model (KMS options)
Audit requirements (CloudTrail retention, log immutability)
Connector handling of ACLs and permissions

Common security mistakes

Granting kendra:* to broad roles used by applications (over-privileged).
Indexing sensitive repositories without ACL enforcement, then exposing a search UI broadly.
Forgetting to restrict iam:PassRole, allowing users to attach overly permissive roles to data sources.
Sync roles with overly broad S3 permissions (for example s3:* on *).

Secure deployment recommendations

Use separate IAM roles for:
Index administration
Data source sync
Application query
Use resource-level permissions where supported (restrict to specific index ARNs).
Keep documents in private S3 buckets; avoid public access.
Implement defense-in-depth: authentication (Cognito/SSO), authorization, logging, and monitoring.

13. Limitations and Gotchas

Always verify current service quotas and connector limitations in official docs.

Known limitations / constraints (common categories)

Connector variability: Not all connectors support all features (especially ACL ingestion). Verify per-connector capabilities.
Document format limitations: Some file types may not extract well; scanned PDFs often require OCR before indexing.
Quota limits: Number of indexes per account, document limits, data source limits, and query throughput limits exist.
Regional availability: Kendra is not available in every Region; connector support can also vary by Region.
Cost visibility: Index hourly costs can surprise teams if indexes are left running in dev/test.
Access control complexity: Proper ACL enforcement requires careful identity mapping and testing; mistakes can cause overexposure or missing results.
Schema changes require planning: Metadata changes can break filters/facets in UIs and require re-indexing behaviors depending on configuration.

Operational gotchas

Sync failures may not be obvious to end users
If sync silently stops, search results become stale. Add alarms/workflows.
Inclusion/exclusion prefix mistakes
A wrong S3 prefix can lead to “0 documents indexed”.
Duplicate content
Indexing multiple repositories with duplicates can degrade relevance.
Over-broad synonyms
Poor thesaurus design can significantly reduce precision.

Migration challenges

Moving from an existing search engine to Kendra often requires:
Metadata normalization
New ingestion pipelines
Access control mapping
Query UX updates (filters, facets)
Consider phased rollout: one repository first, then expand.

14. Comparison with Alternatives

Amazon Kendra is one option in AWS’s broader search + AI ecosystem. The best choice depends on relevance needs, control requirements, and cost.

Comparison table

Option	Best For	Strengths	Weaknesses	When to Choose
Amazon Kendra	Enterprise search across multiple repositories with ML relevance	Managed connectors, semantic ranking, excerpt highlighting, enterprise-focused features	Can be costly; less low-level control than self-managed engines; ACL/identity can be complex	You need managed enterprise search with strong relevance and connectors
Amazon OpenSearch Service	Custom search applications, logs/observability search, keyword + vector search	Deep control, flexible indexing/analyzers, predictable cluster sizing, broad ecosystem	You manage cluster sizing/tuning; relevance tuning is more manual; connectors are DIY	You need control, custom scoring, vector search, or already run OpenSearch
OpenSearch (self-managed)	Maximum control, on-prem/hybrid	Full control, extensibility	High ops burden, scaling, patching, security hardening	You must run on-prem or need custom extensions not available managed
Database full-text search (Aurora/RDS engine-specific)	Simple search in app databases	Minimal extra infra, simple integration	Not designed for enterprise multi-repo search; limited semantic relevance	Small apps with basic search requirements
Azure AI Search (other cloud)	Enterprise search in Azure ecosystem	Tight Azure integration; managed search	Cross-cloud complexity; data gravity	Organization is standardized on Azure
Google Cloud Vertex AI Search (other cloud)	Enterprise search in GCP ecosystem	Tight GCP integration	Cross-cloud complexity; data gravity	Organization is standardized on GCP
RAG-only vector DB approach	Similarity search for LLM context	Strong semantic similarity; embeddings-driven retrieval	Requires embedding pipelines; governance/ACL must be designed carefully	You primarily need embedding similarity retrieval for LLMs, not enterprise connector search

15. Real-World Example

Enterprise example: Global financial services internal policy + procedures search

Problem – Policies and procedures exist in SharePoint, PDFs in S3, and wiki pages. – Employees need quick answers, but content is sensitive and access differs by department and region. – Compliance requires auditable access and controlled changes.

Proposed architecture – Amazon Kendra index in the primary Region for the organization. – Data sources: – SharePoint connector for controlled sites – S3 connector for policy PDFs – Confluence connector for engineering procedures – Enrichment: – Lambda enrichment extracts metadata: policy_owner, effective_date, region, classification – Access control: – Connector-level ACL ingestion where supported – Query-time user context tied to corporate identity (verify the best practice for your identity setup in Kendra docs) – Front end: – Internal portal using Cognito/SSO authentication – API layer (API Gateway + Lambda) calling Kendra Query APIs – Monitoring: – CloudWatch alarms on sync failures and index health – CloudTrail for audit trails

Why Amazon Kendra was chosen – Managed connectors reduce integration effort. – Better relevance for question-like queries compared to legacy keyword search. – Enterprise features (metadata, access control patterns) align with compliance needs.

Expected outcomes – Reduced time to locate policies. – Fewer compliance escalations due to outdated information usage. – Centralized search with clear auditing and governance.

Startup/small-team example: Support knowledge base and RAG assistant

Problem – A fast-growing startup has docs in S3 and a wiki, but support agents can’t find answers quickly. – They want an LLM assistant, but need citations and reliable grounding.

Proposed architecture – Single Amazon Kendra index (start small). – S3 as the primary source for product docs and troubleshooting guides. – Simple metadata: product_area, version. – RAG service: – Application queries Kendra to retrieve top passages – Sends passages + citations to an LLM (for example, Amazon Bedrock) – Minimal ops: – On-demand sync after doc updates, then move to scheduled sync when stable

Why Amazon Kendra was chosen – Faster setup than building a search cluster. – Works as a retrieval layer for RAG with citations back to source docs. – Reduces engineering time spent building search infrastructure.

Expected outcomes – Faster support resolution time. – Fewer escalations to engineering. – Higher customer satisfaction due to consistent answers with citations.

16. FAQ

1) Is Amazon Kendra the same as Amazon OpenSearch Service?
No. Amazon Kendra is a managed enterprise search service focused on ML relevance and connectors. Amazon OpenSearch Service is a managed search/analytics engine (OpenSearch) that offers more low-level control and broader use cases (logs, metrics, custom search).

2) Is Amazon Kendra a vector database?
Amazon Kendra is not typically positioned as a general-purpose vector database. It provides ML-based relevance for enterprise search and retrieval. For dedicated vector similarity search, evaluate OpenSearch vector search or specialized vector stores. Verify current Kendra retrieval capabilities in the docs.

3) Can Amazon Kendra enforce document permissions?
Yes, when configured correctly and when ACL ingestion/user context is supported for your connector or ingestion approach. This requires careful identity mapping and testing—verify the recommended approach in official docs.

4) Does Amazon Kendra support Amazon S3 as a data source?
Yes. S3 is one of the most common Kendra data sources. You provide an IAM role for Kendra to read objects.

5) Can I index content from SaaS tools like Confluence or SharePoint?
Kendra supports multiple connectors for third-party repositories. Connector availability and features vary—check the current connector list in the Kendra documentation.

6) How long does indexing take?
It depends on document count, size, and connector. For small labs, minutes. For large repositories, longer. Monitor sync status in the console.

7) Can I run Kendra in multiple Regions?
You can create indexes in multiple Regions, but each is separate. Consider data residency, latency, and cost.

8) Can I “pause” a Kendra index to stop hourly charges?
Typically, managed indexes are billed while they exist. The reliable way to stop charges is usually to delete the index. Verify current billing behavior on the pricing page.

9) How do I improve poor search relevance?
Start with content hygiene and metadata: – Ensure titles and headings are meaningful – Add consistent metadata fields – Use relevance tuning (boost fields/sources) – Add synonyms carefully – Remove duplicates and outdated versions

10) What’s the best way to support RAG with Amazon Kendra?
Use Kendra to retrieve top relevant passages/documents, then provide them as grounded context to an LLM (for example Amazon Bedrock). Keep citations (source URIs) and implement guardrails (don’t let the model answer without retrieved context).

11) Does Amazon Kendra integrate with Amazon Lex?
Kendra is commonly used as a knowledge search backend for chatbots. Validate the current best practice integration patterns in AWS docs for Lex and Kendra.

12) How do I secure a public-facing search experience?
Don’t expose Kendra directly to browsers. Put an authenticated API layer in front (API Gateway + Lambda) and apply IAM least privilege, rate limiting, and logging.

13) How do I handle scanned PDFs?
Kendra may not extract text well from scanned images. Perform OCR upstream (for example using Amazon Textract or another OCR solution) and index the extracted text.

14) Can I index multiple S3 buckets?
Yes, typically by creating multiple S3 data sources, each with its own configuration and IAM access. Validate quotas and best practices for your scale.

15) How do I track who changed index settings?
Enable CloudTrail and review events for Kendra API calls (create/update/delete index, data source changes, sync triggers).

16) What’s the difference between a data source sync and custom ingestion?
– Data source sync: Kendra pulls from a repository on schedule/on-demand via connector. – Custom ingestion: your pipeline pushes documents into Kendra using APIs. Choose based on repository type and control needs.

17) How do I avoid indexing sensitive data accidentally?
Use inclusion/exclusion patterns, metadata classification, and (if needed) enrichment-based redaction/tagging. Apply IAM controls and review the content scope during onboarding.

17. Top Online Resources to Learn Amazon Kendra

Resource Type	Name	Why It Is Useful
Official Documentation	Amazon Kendra Documentation	Primary source for features, APIs, connectors, quotas, and security guidance. https://docs.aws.amazon.com/kendra/
Official “What is” page	What is Amazon Kendra?	Good conceptual overview and core terminology. https://docs.aws.amazon.com/kendra/latest/dg/what-is-kendra.html
Official Pricing	Amazon Kendra Pricing	Current pricing by edition/Region and related capabilities. https://aws.amazon.com/kendra/pricing/
Pricing Tool	AWS Pricing Calculator	Build Region-specific estimates including related services. https://calculator.aws/
API Reference	Amazon Kendra API Reference	Exact request/response shapes for Query, ingestion, and admin APIs. Verify latest endpoints via docs navigation: https://docs.aws.amazon.com/kendra/
Security/Auditing	AWS CloudTrail User Guide	How to audit Kendra API calls and set retention. https://docs.aws.amazon.com/awscloudtrail/latest/userguide/
Monitoring	Amazon CloudWatch Documentation	Metrics, dashboards, and alarms for operational visibility. https://docs.aws.amazon.com/cloudwatch/
Architecture Guidance	AWS Architecture Center	Patterns for building secure, scalable AWS solutions (search for Kendra references). https://aws.amazon.com/architecture/
Samples (Trusted)	AWS Samples on GitHub (search: “amazon kendra”)	Example code for querying and integration patterns. https://github.com/aws-samples (use search for “kendra”)
Videos	AWS YouTube Channel (search: “Amazon Kendra”)	Service deep dives, demos, and integration examples. https://www.youtube.com/user/AmazonWebServices

18. Training and Certification Providers

Institute	Suitable Audience	Likely Learning Focus	Mode	Website URL
DevOpsSchool.com	Beginners to experienced cloud/DevOps practitioners	AWS fundamentals, DevOps, and practical cloud labs (verify current Kendra coverage on site)	Check website	https://www.devopsschool.com/
ScmGalaxy.com	Students and early-career engineers	DevOps/SCM basics, cloud introductions, hands-on learning	Check website	https://www.scmgalaxy.com/
CLoudOpsNow.in	Cloud operations and platform teams	Cloud operations practices, automation, operational readiness	Check website	https://www.cloudopsnow.in/
SreSchool.com	SREs, DevOps, operations engineers	Reliability engineering practices, monitoring, incident response	Check website	https://www.sreschool.com/
AiOpsSchool.com	Ops + AI/ML practitioners	AIOps concepts, automation, AI-assisted operations	Check website	https://www.aiopsschool.com/

19. Top Trainers

Platform/Site	Likely Specialization	Suitable Audience	Website URL
RajeshKumar.xyz	Cloud/DevOps training content (verify specific offerings)	Learners seeking guided training and mentorship	https://rajeshkumar.xyz/
devopstrainer.in	DevOps and cloud training	Beginners to intermediate engineers	https://www.devopstrainer.in/
devopsfreelancer.com	Freelance DevOps help/training platform (verify services)	Teams needing short-term coaching or implementation help	https://www.devopsfreelancer.com/
devopssupport.in	DevOps support and training resources (verify scope)	Operations teams seeking practical support-style learning	https://www.devopssupport.in/

20. Top Consulting Companies

Company Name	Likely Service Area	Where They May Help	Consulting Use Case Examples	Website URL
cotocus.com	Cloud/DevOps consulting (verify offerings)	Architecture reviews, implementation support, automation	Designing an AWS search portal architecture; setting up IAM governance; CI/CD for related apps	https://cotocus.com/
DevOpsSchool.com	Training + consulting (verify offerings)	Enablement, cloud/DevOps delivery, workshops	Kendra proof-of-concept; operational readiness; cost and security review for a search/RAG rollout	https://www.devopsschool.com/
DEVOPSCONSULTING.IN	DevOps consulting (verify offerings)	DevOps transformation, cloud operations, platform practices	Secure AWS deployment patterns; monitoring strategy; IAM least-privilege design for Kendra apps	https://www.devopsconsulting.in/

21. Career and Learning Roadmap

What to learn before Amazon Kendra

AWS fundamentals
IAM users/roles/policies, least privilege, iam:PassRole
S3 buckets, object permissions, bucket policies
CloudWatch and CloudTrail basics
Search fundamentals
Precision/recall, relevance, metadata, facets
Content lifecycle and governance
Security fundamentals
Data classification, encryption basics, audit logging

What to learn after Amazon Kendra

RAG architectures
Retrieval strategies, chunking, citations, evaluation
Integrating Kendra retrieval with Amazon Bedrock
Enterprise identity
IAM Identity Center, SAML/OIDC concepts, user/group mapping
Operational excellence
Dashboards, alarms, incident playbooks for ingestion failures
Alternatives
Amazon OpenSearch Service for custom ranking/vector search

Job roles that use it

Cloud engineer / cloud developer
Solutions architect
DevOps / SRE (internal tooling and portals)
Knowledge management engineer
ML engineer (RAG integrations)
Security engineer (governance and access control validation)

Certification path (if available)

AWS certifications don’t typically focus on a single service, but relevant paths include: – AWS Certified Solutions Architect (Associate/Professional) – AWS Certified Developer (Associate) – AWS Certified Machine Learning / AI-related certifications (check current AWS certification catalog)
Verify current certification names and availability: https://aws.amazon.com/certification/

Project ideas for practice

Build a secure internal search portal with:
Cognito auth
API Gateway + Lambda
Kendra query + metadata filters
Implement document enrichment:
Add tags like team, severity, service based on content
Build a RAG assistant:
Kendra retrieval + Bedrock generation + citations
Implement governance:
Multiple indexes by environment with tagging + cost reporting
CloudWatch alarms on sync failures

22. Glossary

ACL (Access Control List): Rules that define which users/groups can access a document. In search, ACLs must be enforced so users only see permitted results.
Connector (Data source): A managed integration that syncs documents from a repository (S3, wiki, SaaS tool) into Kendra.
Data source sync: The job that reads from the repository and updates the Kendra index.
Document enrichment: A pipeline step that transforms documents or adds metadata during ingestion (often using AWS Lambda).
Facet: A UI element that lets users filter results by a metadata field (for example department or date).
IAM: AWS Identity and Access Management—controls permissions for AWS API calls and role assumption.
Index: The searchable structure Kendra builds from ingested documents.
Metadata: Structured fields attached to documents (owner, department, date, tags) used for filtering and relevance.
RAG (Retrieval-Augmented Generation): An architecture where a retrieval system fetches relevant context (documents/passages) for an LLM to generate grounded answers.
Relevance tuning: Adjusting ranking behavior (boosting fields/sources) to improve result quality.
Synonyms/Thesaurus: Configuration that maps related terms (PTO/vacation) to improve recall.
User context: Information about the querying user (identity/groups) used to enforce access control during query.
VPC interface endpoint (PrivateLink): A private network path to AWS service APIs without traversing the public internet (availability varies; verify for Kendra).

23. Summary

Amazon Kendra is AWS’s managed enterprise search service in the Machine Learning (ML) and Artificial Intelligence (AI) category, built to index documents across repositories and return highly relevant results for natural language queries. It matters when your organization needs better-than-keyword relevance, unified search across silos, and a managed operational model with strong AWS integrations.

From an architecture perspective, Kendra sits between content sources (often S3 and SaaS tools) and applications (portals, chatbots, and RAG assistants). Cost is primarily driven by the existence and edition/capacity of indexes (often billed hourly), so cost control depends on minimizing unnecessary indexes, choosing the correct edition, and tuning sync schedules. Security depends on IAM least privilege, careful connector role design, encryption configuration (verify options), and—if needed—correct ACL and identity mapping so users only see what they should.

Use Amazon Kendra when you want managed enterprise search with connectors and ML relevance. Consider alternatives like Amazon OpenSearch Service when you need deeper control, lower-level customization, or dedicated vector search. Next, extend this tutorial by adding metadata schema design, document enrichment via Lambda, and a production-grade authenticated search API—then evaluate a RAG workflow using Amazon Bedrock with Kendra as the retrieval layer.

Category