Google Cloud Vertex AI Search Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for AI and ML

Category

AI and ML

1. Introduction

What this service is

Vertex AI Search is Google Cloud’s managed enterprise search service for building high-quality search experiences over your organization’s content (documents, web pages, knowledge bases, product catalogs, etc.). It is designed to reduce the time and expertise required to build relevant, scalable search with Google-quality ranking and query understanding.

One-paragraph simple explanation

If you have a bucket of PDFs, internal documentation, or a content repository and you want a search box that returns the “right” results (with filters, relevance tuning, and secure access patterns), Vertex AI Search provides the ingestion, indexing, and search APIs so you don’t have to manage a search cluster.

One-paragraph technical explanation

Technically, Vertex AI Search provides managed indexing pipelines (“data stores”), a serving layer (“search apps/engines” and “serving configs”), and APIs for querying, filtering, faceting, and ranking content. It is built on Google Cloud infrastructure and exposes the Discovery Engine API surface for programmatic operations. You integrate it with Cloud Storage, BigQuery, or supported connectors, configure schema and relevance, and query via API (or a hosted widget/UI patterns).

What problem it solves

Teams often struggle with: – Relevance: keyword-only search, poor ranking, and inconsistent results. – Operations: managing Elasticsearch/OpenSearch clusters (shards, upgrades, scaling, backups). – Time-to-value: building ingestion, indexing, and query pipelines from scratch. – Governance: securing access, auditability, and compliance needs.

Vertex AI Search addresses these by providing a managed search system optimized for enterprise content and application search patterns.

Naming note (important): Google’s product and API naming has evolved. Vertex AI Search is commonly delivered via Vertex AI Agent Builder experiences in the console, and the underlying API is Discovery Engine (discoveryengine.googleapis.com). If you see “Vertex AI Search and Conversation” in docs or pricing, that is the broader umbrella that includes search plus conversational (generative) experiences. This tutorial focuses on Vertex AI Search. Verify the latest naming in official docs if your console labels differ.

2. What is Vertex AI Search?

Official purpose

Vertex AI Search helps you build enterprise-grade search applications on Google Cloud by ingesting content, building an index, and serving low-latency, relevant search results through managed APIs and configurable ranking.

Official docs entry point (product landing/documentation hub):
https://cloud.google.com/vertex-ai-search

Core capabilities

Commonly supported capabilities include (availability can vary by data type and configuration—verify in official docs for your exact scenario): – Ingest and index content from supported sources (for example, Cloud Storage and other connectors) – Full-text search with relevance ranking – Filters and facets over metadata – Synonyms and relevance tuning controls – Query suggestions / autocomplete (where supported) – Programmatic query via REST APIs (Discovery Engine Search)

Major components (conceptual model)

While exact resource names can differ across console vs API views, the typical building blocks are:

  • Data store: Where your content is defined and ingested from (documents, records, or pages).
  • Ingestion pipeline / import: How documents are brought in (batch import, connector sync, or updates).
  • Index: The managed search index created from your content.
  • Search app / engine: The serving application that exposes search to clients.
  • Serving config: A serving endpoint configuration (e.g., default search settings) used by the Search API.
  • Schema / metadata fields: Field definitions used for filtering, faceting, sorting, and relevance.

Service type

  • Fully managed search service (you do not manage servers, shards, or patching).
  • Exposed via Google Cloud APIs and configured in Google Cloud Console.

Scope: regional/global/project

Vertex AI Search resources are generally: – Project-scoped (tied to a Google Cloud project and its IAM/billing) – Location-scoped (many examples use global as the location for Discovery Engine resources; confirm supported locations for your edition/data type in official docs)

How it fits into the Google Cloud ecosystem

Vertex AI Search commonly integrates with: – Cloud Storage (document repositories) – BigQuery (structured content sources) – IAM (project access control) – Cloud Audit Logs / Cloud Logging (governance and troubleshooting) – VPC Service Controls (for perimeter-based data exfiltration controls—verify product support) – Vertex AI / Agent Builder (to add conversational experiences on top of search, if desired)

3. Why use Vertex AI Search?

Business reasons

  • Faster delivery: Stand up search in days instead of months.
  • Better findability: Improved relevance can reduce support tickets and internal time-waste.
  • Lower operational overhead: Avoid operating search clusters and re-index pipelines yourself.
  • Consistent experience: Standardize search across teams and apps.

Technical reasons

  • Managed indexing and serving: No shard sizing, node tuning, or rolling upgrades.
  • Relevance and tuning: Built-in controls for ranking behavior and synonyms (capabilities vary).
  • Metadata-aware search: Filter/facet using document attributes for app-like search experiences.
  • API-first: Use REST APIs to integrate with web, mobile, internal tools, and services.

Operational reasons

  • Elastic scaling: Designed to handle variable query volume (subject to quotas).
  • Observability integration: Use audit logs and API logs to track access and errors.
  • Lifecycle management: Clear separation of dev/test/prod projects and data stores.

Security/compliance reasons

  • Google Cloud IAM for administrative control.
  • Auditability through Cloud Audit Logs.
  • Potential integration with organization policies and VPC Service Controls (verify support for your specific deployment).

Scalability/performance reasons

  • Managed service designed for search latency and throughput at scale.
  • Supports structured filters/facets for large catalogs and content sets (subject to quotas/limits).

When teams should choose it

Choose Vertex AI Search when you need: – A managed search backend for enterprise content – Good relevance without building custom IR (information retrieval) systems – Integration with Google Cloud data sources – A path to add conversational search experiences later (optional)

When teams should not choose it

Avoid or reconsider when: – You must run fully on-premises or in an environment without Google Cloud connectivity. – You need complete control over analyzers/tokenizers/ranking algorithms and low-level index internals (self-managed search may fit better). – Your constraints require a specific open-source ecosystem (e.g., advanced Elasticsearch plugins) or strict data residency not supported by available locations. – You only need vector similarity search for embeddings at scale (consider Vertex AI Vector Search—but note it solves a different problem).

4. Where is Vertex AI Search used?

Industries

  • SaaS and software documentation portals
  • Retail and marketplaces (catalog search)
  • Healthcare and life sciences (internal knowledge search; ensure compliance and PHI controls)
  • Financial services (policies, procedures, research documents; strong audit needs)
  • Manufacturing and logistics (parts catalogs, SOPs, manuals)
  • Media and publishing (article archives, content libraries)
  • Education (course materials and knowledge bases)

Team types

  • Platform and developer experience teams
  • Data/AI platform teams supporting internal search
  • Product engineering teams building search into applications
  • Security and compliance teams (governance oversight)
  • IT/helpdesk teams (internal knowledge search)

Workloads

  • Internal knowledge base search (wikis, runbooks, policies)
  • Customer-facing help center search
  • Product and inventory search (with filters and facets)
  • Document repository search (PDFs, Office docs, HTML—based on supported formats)

Architectures

  • Web app → Search API → Results rendering (classic)
  • API gateway → backend service → Vertex AI Search (centralized auth and logging)
  • Event-driven ingestion (content updates to storage/DB → scheduled re-import or incremental updates)

Production vs dev/test usage

  • Dev/test: small sample data stores, limited query traffic, quick experiments with schema/filters.
  • Production: strict IAM, separate projects, controlled ingestion pipelines, monitoring/error budgets, and cost controls around query volume and indexing.

5. Top Use Cases and Scenarios

Below are realistic scenarios where Vertex AI Search is commonly a good fit.

1) Internal policy and procedure search

  • Problem: Employees can’t find the latest HR/IT/security policies across shared drives and PDFs.
  • Why this service fits: Ingest unstructured documents and provide relevance-ranked search with metadata filters (department, policy type).
  • Scenario: Import policy PDFs from Cloud Storage; build a search app used in the company intranet.

2) Customer help center search

  • Problem: Customers search documentation but get irrelevant results and open support tickets.
  • Why this service fits: Managed search with tuning and consistent ranking.
  • Scenario: Index support articles; add facets for product/version; reduce ticket volume.

3) Product catalog search for e-commerce

  • Problem: Catalog search needs filters (price, brand, size), facets, and sorting.
  • Why this service fits: Designed for structured/metadata-heavy search patterns (verify the best data model for retail/catalog in docs).
  • Scenario: Ingest catalog data (e.g., from BigQuery) and serve results to web/mobile.

4) Technical runbook and incident knowledge search

  • Problem: During incidents, engineers waste time searching past postmortems and runbooks.
  • Why this service fits: Central search for operational content; can be integrated into internal tools.
  • Scenario: Index runbooks and postmortems; add tags like service name, severity, and environment for filtering.

5) Legal document discovery (internal)

  • Problem: Legal teams need to search contracts and clauses quickly.
  • Why this service fits: Fast search over a large corpus; metadata-driven filtering.
  • Scenario: Store contracts in Cloud Storage with metadata (counterparty, effective date); enable search for counsel.

6) Multi-team engineering documentation search

  • Problem: Docs are spread across multiple repositories; search is inconsistent.
  • Why this service fits: Unified search layer over multiple sources (depending on connector availability).
  • Scenario: Index docs from approved sources; provide a single internal developer portal search.

7) Compliance evidence search

  • Problem: Auditors ask for evidence and teams scramble through tickets, docs, and reports.
  • Why this service fits: Organized index with metadata for audit period, control ID, system.
  • Scenario: Build a searchable compliance evidence library with strict IAM and audit logs.

8) Knowledge search for field service and maintenance

  • Problem: Technicians need manuals and SOPs on-site with unreliable connectivity.
  • Why this service fits: Central search backend; cache results in an app where needed.
  • Scenario: Index equipment manuals; technician app queries and caches relevant docs for jobs.

9) Research archive search (papers, memos)

  • Problem: Researchers can’t find prior work due to inconsistent naming and formats.
  • Why this service fits: Relevance ranking and robust indexing across unstructured content.
  • Scenario: Index PDFs and HTML reports; filter by year/team/topic metadata.

10) Secure internal portal search with “audited access”

  • Problem: Need search, but must demonstrate who accessed what and when.
  • Why this service fits: Google Cloud audit logging + structured control plane operations.
  • Scenario: Use Vertex AI Search with strict IAM; correlate audit logs with portal usage logs.

11) SaaS in-app search for user-generated content (UGC)

  • Problem: Users need search across their own workspaces/projects.
  • Why this service fits: Multi-tenant patterns can be implemented via metadata filters and per-tenant segregation strategies (design carefully).
  • Scenario: Tag documents by tenant_id; enforce tenant isolation in your application layer and queries.

12) Migration off self-managed search clusters

  • Problem: Existing Elasticsearch/OpenSearch cluster is expensive and operationally heavy.
  • Why this service fits: Managed indexing and search; fewer operational tasks.
  • Scenario: Rebuild ingestion pipeline to data store; migrate search endpoints behind a feature flag.

6. Core Features

Feature availability can vary by edition, data source type (structured vs unstructured), and region/location. Verify in official docs for your exact configuration.

Managed data stores (content ingestion and indexing)

  • What it does: Lets you define a content repository and ingest documents/records into a managed index.
  • Why it matters: Removes the need to provision and operate indexing infrastructure.
  • Practical benefit: Faster onboarding of content sources; repeatable indexing.
  • Limitations/caveats: Import formats, document sizes, and update strategies have limits/quotas—verify current limits.

Search apps / engines (serving layer)

  • What it does: Provides the search endpoint configuration used by client applications.
  • Why it matters: Separates ingestion/indexing from query serving configuration.
  • Practical benefit: You can tune serving behavior without rebuilding ingestion.
  • Limitations/caveats: Some advanced controls may be constrained compared to self-managed search engines.

REST API access via Discovery Engine

  • What it does: Exposes programmatic operations and query requests over HTTPS.
  • Why it matters: Enables integration with any application stack.
  • Practical benefit: Works with serverless backends, Kubernetes services, batch jobs, and CI pipelines.
  • Limitations/caveats: Quotas apply; not all console features are always 1:1 with API fields.

Relevance tuning and synonyms (where supported)

  • What it does: Adjusts ranking behavior and term matching via configuration.
  • Why it matters: Search relevance is often the #1 success factor.
  • Practical benefit: Improve “findability” without rewriting content.
  • Limitations/caveats: The granularity of tuning differs from engines like Elasticsearch (custom analyzers, scoring scripts). Verify exact tuning options.

Metadata-based filtering and faceting

  • What it does: Filters results by structured attributes (e.g., department=IT, year=2025) and provides facet counts.
  • Why it matters: Users often need to narrow down results quickly.
  • Practical benefit: Enables e-commerce style navigation and enterprise content refinement.
  • Limitations/caveats: Requires consistent metadata; schema planning is important.

Access patterns and governance hooks

  • What it does: Uses Google Cloud IAM for administrative access and Cloud Audit Logs for auditing.
  • Why it matters: Enterprise deployments need clear governance and traceability.
  • Practical benefit: Standard GCP security posture, log retention, SIEM export.
  • Limitations/caveats: Document-level authorization is usually an application design concern unless a connector/source supports ACL propagation; verify current capabilities.

Integration with broader Vertex AI / Agent Builder (optional)

  • What it does: Allows adding conversational interfaces on top of search results in some configurations (often marketed under “Search and Conversation”).
  • Why it matters: Some teams want both search and Q&A experiences.
  • Practical benefit: Evolves from search-first to assistant-like experiences.
  • Limitations/caveats: Generative features may require additional setup, policies, and have different pricing.

7. Architecture and How It Works

Service architecture at a high level

At a high level, Vertex AI Search has two planes:

  • Control plane: Create/manage data stores, configure schema/metadata, create apps/engines, configure serving settings, manage access.
  • Data plane: Ingest content, build indexes, serve search queries with low latency.

Request/data/control flow (typical)

  1. Ingestion: Content exists in Cloud Storage, BigQuery, or another supported source.
  2. Import/sync: Data store import reads content and builds/updates a managed index.
  3. Serving: Your application sends a Search request to the Discovery Engine Search endpoint for your engine/serving config.
  4. Response: Results return with document references, snippets, and metadata fields.
  5. Rendering: Your UI renders results; optionally uses facets/filters.

Integrations with related services

  • Cloud Storage: Common source for PDFs/HTML/text.
  • BigQuery: Common source for structured records (catalogs, KB entries).
  • IAM: Control plane access.
  • Cloud Logging / Audit Logs: Operational visibility and compliance.
  • Secret Manager (indirect): Store API keys/secrets for your application layers (Vertex AI Search itself typically uses IAM auth rather than static keys).
  • API Gateway / Apigee (indirect): Front the search calls for centralized policy enforcement.

Dependency services

  • Discovery Engine API: The underlying API surface for many operations and search requests.
  • Service agents: Google-managed service identities used during ingestion (e.g., reading from Cloud Storage). Ensure the correct bucket permissions.

Security/authentication model

  • Administrative and API access typically uses:
  • OAuth2 access tokens (user or service account) for REST API calls.
  • IAM roles to authorize management operations.
  • In production, call the API from a backend using a service account with least privilege.

Networking model

  • Client applications call Google APIs endpoints over the internet by default.
  • Enterprises often place requests behind a backend and restrict egress using organization policies and/or VPC controls (verify support and constraints for Vertex AI Search/Discovery Engine in your environment).

Monitoring/logging/governance considerations

  • Cloud Audit Logs: Tracks admin activity and data access events where applicable.
  • Cloud Logging: API request logs, errors.
  • Cloud Monitoring: Use available metrics for API usage/latency where exposed; otherwise track in your app and via logs (verify metric availability in official docs).
  • Governance: Use separate projects for environments; label/tag resources; restrict who can create data stores and who can import content.

Simple architecture diagram (Mermaid)

flowchart LR
  U[User in Web App] --> A[Backend API]
  A -->|Search request| V[Vertex AI Search<br/>(Discovery Engine API)]
  V -->|Ranked results| A
  A --> U

  S[Cloud Storage Bucket<br/>Documents] -->|Import/Ingest| V

Production-style architecture diagram (Mermaid)

flowchart TB
  subgraph Client
    B[Browser / Mobile]
  end

  subgraph Edge
    CDN[Cloud CDN]
    WAF[Cloud Armor (optional)]
  end

  subgraph App
    FE[Web Frontend]
    BE[Search Backend Service]
    IAM[Service Account<br/>Least Privilege]
    SM[Secret Manager (app secrets)]
  end

  subgraph DataSources
    GCS[Cloud Storage<br/>Docs]
    BQ[BigQuery<br/>Records]
  end

  subgraph Vertex
    VAS[Vertex AI Search<br/>Data Store + Engine]
    API[Discovery Engine API]
  end

  subgraph Ops
    LOG[Cloud Logging / Audit Logs]
    MON[Cloud Monitoring]
    SIEM[Export to SIEM<br/>(optional)]
  end

  B --> CDN --> WAF --> FE --> BE
  BE -->|OAuth token via SA| API
  API --> VAS

  GCS -->|Import| VAS
  BQ -->|Import| VAS

  API --> LOG
  VAS --> LOG
  BE --> LOG
  LOG --> MON
  LOG --> SIEM
  BE --> SM
  IAM -.grants.-> BE

8. Prerequisites

Account/project requirements

  • A Google Cloud project with billing enabled.
  • Organization policies should allow enabling required APIs and creating resources.

Permissions / IAM roles

You typically need permissions to: – Enable APIs – Create/manage Vertex AI Search resources (data stores, engines/apps) – Read from your data source (e.g., Cloud Storage bucket)

Commonly relevant roles (exact role names and recommended combinations can change—verify in official IAM docs): – Project-level: Editor (broad; not least-privilege) or specific admin roles for Discovery Engine / Vertex AI Search – For Cloud Storage ingestion: bucket-level permissions like Storage Object Viewer for the service agent or ingestion identity – For API calling from a backend: a role that allows search queries against the engine (verify the least-privilege “user”/“viewer” role for Discovery Engine)

Billing requirements

  • Billing account attached to the project
  • Budgets and alerts recommended (see cost section)

CLI/SDK/tools needed

  • Google Cloud CLI: gcloud
    Install: https://cloud.google.com/sdk/docs/install
  • gsutil (bundled with gcloud) for Cloud Storage operations
  • curl for REST API testing
  • Optional: a programming runtime (Python/Node/Java) if you build an app integration

Region/location availability

  • Many Discovery Engine examples use global. Some features and data stores may support specific locations.
  • Verify supported locations in the official documentation for Vertex AI Search/Discovery Engine before production rollout.

Quotas/limits

  • API quotas for requests per minute/day
  • Limits on document size, number of documents, indexing throughput, etc.
  • Check Quotas page in Google Cloud console and product docs; raise quota via support if needed.

Prerequisite services

  • Discovery Engine API (Vertex AI Search uses it)
  • Cloud Storage API (if using Cloud Storage)
  • IAM (always)

9. Pricing / Cost

Vertex AI Search pricing can vary based on: – The edition/features you enable (search-only vs “search and conversation” capabilities) – Location/region – Your data type and ingestion method – Contracted enterprise agreements (in some cases)

Because pricing and SKUs change, do not rely on static blog numbers. Use official sources: – Official pricing page: https://cloud.google.com/vertex-ai-search-and-conversation/pricing (verify this is the current pricing page for your product view) – Google Cloud Pricing Calculator: https://cloud.google.com/products/calculator

Pricing dimensions (typical model)

Pricing commonly includes some combination of: – Indexing / ingestion: charges based on the amount of content indexed or processed (varies by connector/type). – Storage: charges for storing indexed content or embeddings/metadata (model depends on product SKUs). – Query requests: charges per number of search queries (and sometimes per feature used). – Optional advanced features: generative answer features (if enabled under the broader “Search and Conversation” umbrella) may be priced separately (often by token usage or “answer” units).

Verify in official docs: the exact SKUs for “data store size”, “document processing”, “search requests”, “answer generation”, and any connector-specific fees.

Free tier (if applicable)

Google Cloud products sometimes provide limited free usage (trial credits for new accounts or free-tier units). For Vertex AI Search, verify current free tier availability on the official pricing page. Do not assume free indexing or free query volume.

Cost drivers (direct)

  • Number of documents/records indexed
  • Document sizes and parsing complexity (PDFs vs plain text)
  • Frequency of re-indexing / updates
  • Query volume and peak QPS
  • Use of facets/filters, advanced ranking, or add-on features (depending on SKUs)

Hidden or indirect costs

  • Cloud Storage costs for storing source documents (standard storage, operations, retrieval).
  • BigQuery costs if your pipeline stages data there (storage + query processing).
  • Network egress if your clients or workloads access APIs from outside Google Cloud regions (egress rules apply).
  • Logging: very high request volume can increase Cloud Logging ingestion costs if not managed (consider log sampling or exclusions carefully—without harming audit requirements).

Network/data transfer implications

  • Calling Google APIs from outside Google Cloud can incur egress and latency.
  • If you front the API via a backend in Google Cloud, you often reduce external egress and centralize access control.

How to optimize cost

  • Start with a small pilot data store and a controlled query test plan.
  • Avoid unnecessary re-imports; design an update strategy appropriate for your content cadence.
  • Use metadata fields intentionally—index what you need for filtering/faceting, not everything.
  • Control query traffic with caching and UI guardrails (debounce autocomplete, limit per-keystroke queries).
  • Use budgets/alerts and monitor the product’s usage metrics and billing export.

Example low-cost starter estimate (non-numeric)

A low-cost lab/pilot typically includes: – A small Cloud Storage bucket with a few documents – One data store and one engine/app – Low query volume (manual testing) – Minimal logging retention beyond defaults

Use the pricing calculator and product pricing page to estimate: – Indexed content size – Expected query count per month – Any optional feature usage

Example production cost considerations

For production, expect cost to scale with: – Content growth (more documents, larger PDFs, more metadata) – Query volume (daily active users × searches per session) – Environments (dev + staging + prod) – Logging and monitoring retention – Additional features (connectors, advanced ranking, conversational experiences)

10. Step-by-Step Hands-On Tutorial

This lab builds a small, real search experience over a few documents stored in Cloud Storage, then queries it using the Discovery Engine Search REST API.

Objective

  • Create a Cloud Storage bucket with sample documents
  • Create a Vertex AI Search data store and import those documents
  • Create a Vertex AI Search app/engine
  • Run search queries via REST API and validate results
  • Clean up all resources to avoid ongoing costs

Lab Overview

You will: 1. Prepare a Google Cloud project and enable APIs 2. Upload sample documents to Cloud Storage 3. Create a Vertex AI Search data store and import Cloud Storage documents 4. Create a search app (engine) backed by the data store 5. Query the engine using curl and view results 6. Troubleshoot common issues 7. Clean up resources

Notes before you begin: – Console steps may change as Google Cloud UI evolves. The resource concepts remain the same. – Many APIs/examples use locations/global. If your organization requires regional resources, verify supported locations for your use case.


Step 1: Set up your environment (project, billing, gcloud)

1) Choose or create a project:

export PROJECT_ID="YOUR_PROJECT_ID"
gcloud config set project "$PROJECT_ID"

2) Confirm billing is enabled: – Console: Billing → My projects and ensure the project is linked to an active billing account.

Expected outcome: Your project is set and billable.

3) Enable required APIs:

gcloud services enable \
  discoveryengine.googleapis.com \
  storage.googleapis.com

Expected outcome: APIs enable successfully.

Verification:

gcloud services list --enabled --filter="name:discoveryengine OR name:storage"

Step 2: Create a Cloud Storage bucket and upload sample documents

1) Set a bucket name (must be globally unique):

export BUCKET_NAME="vas-lab-$PROJECT_ID-$(date +%s)"
export BUCKET_URI="gs://$BUCKET_NAME"

2) Create the bucket (pick a location that matches your policy; example uses us multi-region):

gcloud storage buckets create "$BUCKET_URI" --location=US

3) Create a few sample text documents locally:

mkdir -p vas-docs

cat > vas-docs/return-policy.txt <<'EOF'
Return Policy
Customers can return items within 30 days of delivery.
Items must be unused and in original packaging.
Refunds are processed within 5-7 business days after inspection.
EOF

cat > vas-docs/shipping-info.txt <<'EOF'
Shipping Information
Standard shipping takes 3-5 business days.
Expedited shipping takes 1-2 business days.
International shipping times vary by destination and customs.
EOF

cat > vas-docs/security-guidelines.txt <<'EOF'
Security Guidelines
Use multi-factor authentication for all admin accounts.
Rotate credentials every 90 days.
Report suspected phishing immediately to the security team.
EOF

4) Upload documents to Cloud Storage:

gcloud storage cp vas-docs/* "$BUCKET_URI/"

Expected outcome: Objects appear in the bucket.

Verification:

gcloud storage ls "$BUCKET_URI/"

Step 3: Create a Vertex AI Search data store and import documents (Console)

The console experience is commonly under Vertex AI and/or Agent Builder. Use the official product entry point:
https://cloud.google.com/vertex-ai-search

1) In Google Cloud Console, navigate to the Vertex AI Search/Agent Builder area: – Open Console → search for Vertex AI Search (or Agent Builder). – Look for a workflow to create a Search app or Data store.

2) Create a Data store: – Choose Search as the solution type (wording may vary). – Choose a data source type such as Cloud Storage. – Choose unstructured document ingestion (if prompted). – Select the bucket or provide the path: gs://YOUR_BUCKET_NAME/

3) Start an import/sync: – Confirm it discovers the files you uploaded. – Start ingestion.

Expected outcome: A data store is created, and document import begins.

Verification: – In console, find the data store and check Import status / Document count. – Wait until indexing shows completed/successful (may take several minutes even for small data).

Common permission requirement: – The ingestion process may require a Google-managed service identity to read the bucket. – If import fails with permission errors, grant Storage Object Viewer on the bucket to the relevant service agent (the console error usually identifies it). If it doesn’t, consult official docs for “Vertex AI Search Cloud Storage permissions” and verify which service account is used in your project.


Step 4: Create a Search app (engine) backed by the data store (Console)

1) Create a Search app (sometimes called an Engine): – Select the data store you created. – Use default settings for this lab. – Note the App/Engine ID and Location shown in the app details.

Expected outcome: A search app is created and linked to the data store.

Verification: – Use any built-in “Preview” or “Test” search in the console if available. – Try a query like return or shipping.


Step 5: Query Vertex AI Search via REST API (Discovery Engine)

This step validates that your search app works programmatically.

1) Set environment variables (replace values from the console):

export LOCATION="global"                 # verify in console
export COLLECTION="default_collection"   # common default; verify in console
export ENGINE_ID="YOUR_ENGINE_ID"        # from the Search app details
export SERVING_CONFIG="default_search"   # common default; verify in console

2) Run a search query:

curl -s -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/collections/${COLLECTION}/engines/${ENGINE_ID}/servingConfigs/${SERVING_CONFIG}:search" \
  -d '{
    "query": "How long is the return window?",
    "pageSize": 5
  }' | sed 's/\\n/\n/g' | head -n 60

Expected outcome: You receive a JSON response containing results that reference your uploaded documents (for example, return-policy.txt).

3) Try another query:

curl -s -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/collections/${COLLECTION}/engines/${ENGINE_ID}/servingConfigs/${SERVING_CONFIG}:search" \
  -d '{
    "query": "credential rotation",
    "pageSize": 5
  }' | sed 's/\\n/\n/g' | head -n 60

Expected outcome: Results should include content from security-guidelines.txt.


Step 6: (Optional) Add metadata and filters (conceptual)

Many real applications rely on filters/facets using metadata fields. How metadata is attached depends on your ingestion method (e.g., structured records, document fields, or connector-provided attributes). For this lab, keep it simple and focus on end-to-end indexing + querying.

If you want filters: – Plan a schema with fields like department, doc_type, created_date – Ingest documents with metadata fields populated – Use filters in the Search request (verify filter syntax in official Search API docs)


Validation

Use this checklist:

1) Bucket has documents:

gcloud storage ls "$BUCKET_URI/"

2) Data store indexing completed (console): – Document count > 0 – Import job shows success

3) Search app works in console (if preview/test exists): – Query returns relevant documents

4) API query works: – curl returns HTTP 200 and includes results referencing your documents

To see HTTP status quickly:

curl -i -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/collections/${COLLECTION}/engines/${ENGINE_ID}/servingConfigs/${SERVING_CONFIG}:search" \
  -d '{"query":"shipping","pageSize":3}' | head -n 20

Troubleshooting

Error: PERMISSION_DENIED when importing from Cloud Storage

Cause: The ingestion identity cannot read objects in your bucket.
Fix: – Identify the service account mentioned in the console error. – Grant it bucket-level access: – Role: roles/storage.objectViewer on the bucket
– Re-run the import.

If you can’t find the service account, verify in official docs which service agent Vertex AI Search uses for Cloud Storage ingestion for your project.

Error: 404 NOT_FOUND on the search endpoint

Cause: Incorrect ENGINE_ID, LOCATION, COLLECTION, or SERVING_CONFIG.
Fix: – Confirm the engine/app ID in the console. – Confirm whether the API path uses engines/... or a different resource path for your app type. – Verify location (often global). – Verify serving config name (commonly default_search but not guaranteed).

Error: 403 PERMISSION_DENIED when calling the Search API

Cause: Your identity lacks permission to query the engine.
Fix: – Ensure you are authenticated: bash gcloud auth list – If using a service account, ensure it has the correct Discovery Engine/Vertex AI Search role to query. – Verify IAM bindings at project level.

Index shows 0 documents

Cause: Import failed silently, unsupported file types, or wrong bucket path/prefix.
Fix: – Check import job details in console. – Ensure objects exist under the specified path. – Verify supported formats in official docs.


Cleanup

To avoid ongoing costs, delete what you created:

1) Delete the Search app/engine and data store in the console: – Navigate to the Vertex AI Search/Agent Builder area – Delete the Search app – Delete the Data store – Confirm deletion completes

2) Delete the Cloud Storage bucket and objects:

gcloud storage rm -r "$BUCKET_URI"

3) (Optional) Disable APIs if this was a dedicated lab project:

gcloud services disable discoveryengine.googleapis.com

11. Best Practices

Architecture best practices

  • Separate environments: Use separate projects for dev/stage/prod to isolate data and IAM.
  • Design for ingestion: Decide early whether your source of truth is Cloud Storage, BigQuery, or a connector, and define update cadence (hourly/daily/on-change).
  • Metadata strategy first: Plan metadata fields needed for filters/facets and governance (owner, sensitivity, department).
  • Backend mediation: In many cases, query Vertex AI Search from a backend service (not directly from browsers) to centralize auth, enforce tenant isolation, and add caching/rate limits.

IAM/security best practices

  • Least privilege: Grant only roles needed to import content vs query engines vs administer resources.
  • Separate ingestion identity: Use controlled identities for import jobs and keep bucket permissions tight.
  • Audit logs: Ensure audit logging is enabled and retained per policy.

Cost best practices

  • Control query rates: Debounce autocomplete, cache popular queries, limit repeated calls on UI events.
  • Minimize re-index churn: Avoid full re-imports unless necessary; prefer incremental updates if supported by your ingestion model.
  • Budget alerts: Configure budgets and alerts on the project.
  • Logging discipline: Avoid overly verbose app logs at high QPS; use sampling while preserving security/audit requirements.

Performance best practices

  • Use filters instead of post-filtering: Filter at query time using metadata rather than filtering in your application after results return.
  • Keep metadata clean: Poor metadata quality hurts facet usefulness and relevance.
  • Test with real queries: Build a query test set (top searches, misspellings, synonyms) and evaluate relevance regularly.

Reliability best practices

  • Retry transient errors: Implement exponential backoff for 429/5xx responses.
  • Graceful degradation: If the search API is unavailable, show a friendly message and log correlation IDs.
  • Change management: Treat schema/tuning changes as versioned releases (staging validation before prod).

Operations best practices

  • Monitoring: Track latency, error rate, request volume from your app layer; supplement with API logs/metrics.
  • Runbooks: Document common failures (permissions, ingestion failures, quota errors).
  • Labeling/naming: Use consistent naming for data stores/engines (e.g., kb-search-prod, catalog-search-dev).

Governance/tagging/naming best practices

  • Use resource naming that includes:
  • system name
  • environment (dev/stage/prod)
  • data classification (public/internal/confidential)
  • Apply project labels and billing export to attribute costs by product/team.

12. Security Considerations

Identity and access model

  • Vertex AI Search is controlled through Google Cloud IAM.
  • Use:
  • Admins: manage data stores, imports, engines/apps, tuning
  • Operators: monitor jobs, view logs
  • Query callers: service accounts used by your application runtime

Recommendation: Do not let end-user browsers call the Search API directly unless you have a clear, secure design for authentication, authorization, abuse prevention, and keyless IAM.

Encryption

  • Google Cloud encrypts data at rest by default.
  • Data in transit uses TLS.
  • If you require customer-managed encryption keys (CMEK), verify whether Vertex AI Search supports CMEK for your resource types in current docs (do not assume).

Network exposure

  • API endpoints are accessed over the public internet by default.
  • Reduce exposure by:
  • Calling from a backend in Google Cloud
  • Using organization egress controls and policies
  • Considering VPC Service Controls if supported for this product in your org

Secrets handling

  • Prefer service account identity over static API keys.
  • Store application secrets (if any) in Secret Manager.
  • Rotate secrets and restrict access.

Audit/logging

  • Enable and retain:
  • Admin activity logs
  • Data access logs where applicable
  • Export logs to a SIEM for correlation when required.

Compliance considerations

  • Data residency: confirm the location and data processing region behavior.
  • Regulated data (PII/PHI): ensure classification, access control, audit trails, and retention policies.
  • Review legal requirements for indexing sensitive documents (some orgs restrict indexing certain classes of data).

Common security mistakes

  • Over-broad IAM roles (project Editor for everyone)
  • Importing sensitive documents into a shared dev project
  • Storing confidential documents in a bucket with public access or overly broad viewers
  • No audit log retention / no monitoring on ingestion failures

Secure deployment recommendations

  • Separate projects per environment and sensitivity.
  • Use least-privilege IAM and bucket-level IAM conditions if appropriate.
  • Place an application backend in front of the Search API for:
  • tenant isolation
  • rate limiting
  • consistent authentication
  • response shaping and redaction

13. Limitations and Gotchas

These are common gotchas; always confirm the current official limits and behaviors.

  • Location constraints: Some features may require global or specific locations.
  • Connector variability: Supported sources/connectors and ACL propagation differ—verify for your source system.
  • Document format limits: Large PDFs, scanned documents, or unsupported encodings may import poorly.
  • Ingestion permissions: Cloud Storage imports often fail due to missing service agent permissions.
  • Schema changes can be disruptive: Schema/metadata changes may require re-indexing or careful rollout (depends on data model).
  • Quotas and throttling: Expect 429 errors under load until quotas are raised.
  • Cost surprises: High query volume (especially autocomplete) can scale quickly; add client-side debouncing and caching.
  • Multi-tenant isolation: Achieving strict tenant isolation requires careful design (separate engines or strict metadata filters + backend enforcement). Don’t rely on UI-level filtering for security.
  • Not a drop-in Elasticsearch replacement: If you rely on custom analyzers, complex scoring scripts, or low-level index tuning, you may need redesign.

14. Comparison with Alternatives

Nearest services in Google Cloud

  • Vertex AI Vector Search: best for vector similarity search over embeddings; often paired with RAG pipelines. It does not replace enterprise search relevance/ranking and metadata-based search end-to-end.
  • Self-managed Elasticsearch/OpenSearch on GKE/Compute Engine: maximum control, maximum ops overhead.
  • Google Workspace/Cloud Search: oriented around Google Workspace content search; not the same as Vertex AI Search for app embedding and custom corpora (verify current product scope).

Nearest services in other clouds

  • Amazon Kendra: managed enterprise search.
  • Azure AI Search: managed search with keyword + vector capabilities.

Open-source/self-managed

  • Elasticsearch / OpenSearch: flexible and powerful but operationally heavy.
  • Meilisearch / Typesense: simpler search engines for smaller deployments; fewer enterprise features.

Comparison table

Option Best For Strengths Weaknesses When to Choose
Vertex AI Search (Google Cloud) Managed enterprise/app search on Google Cloud Managed indexing + serving, Google Cloud integration, faster time-to-value Less low-level control than self-managed engines; quotas/editions apply You want managed search with strong relevance and GCP integration
Vertex AI Vector Search Vector similarity and ANN search Great for embeddings, semantic retrieval at scale Not a complete enterprise search product by itself You’re building embedding-based retrieval/RAG and need ANN search
Self-managed Elasticsearch/OpenSearch Custom search stacks and fine-grained control Custom analyzers, plugins, deep control Ops overhead (scaling, upgrades, shards, backups) You must control internals or rely on specific Elasticsearch features
Amazon Kendra Enterprise search on AWS Managed, connector ecosystem AWS-native; different integration story Your platform is primarily AWS and Kendra fits your sources
Azure AI Search Managed search on Azure Strong search features, hybrid keyword/vector patterns Azure-native Your platform is primarily Azure and you want Azure-native search
Algolia (SaaS) UX-focused app/site search Fast autocomplete, great developer UX SaaS cost model; enterprise governance varies You need best-in-class frontend search UX and accept SaaS tradeoffs
Open-source lightweight engines (Typesense/Meilisearch) Small to mid search deployments Simple setup, good performance Fewer enterprise features; self-managed ops You need a simpler, cheaper self-managed option

15. Real-World Example

Enterprise example: Internal compliance and policy search

  • Problem: A regulated enterprise has thousands of policy PDFs and evidence documents spread across multiple repositories. Auditors require rapid retrieval and proof of access controls.
  • Proposed architecture:
  • Source documents stored in Cloud Storage with strict bucket IAM
  • Vertex AI Search data store indexes approved folders/prefixes
  • A backend service mediates queries, enforces user authorization, and logs searches
  • Cloud Audit Logs + app logs exported to SIEM
  • Why Vertex AI Search was chosen:
  • Managed search reduces operational burden
  • Strong GCP-native governance (IAM + audit logs)
  • Faster deployment than building a custom search cluster
  • Expected outcomes:
  • Faster audit response times (minutes instead of hours)
  • Reduced time spent searching for evidence
  • Centralized monitoring and access logging

Startup/small-team example: Customer documentation search

  • Problem: A startup’s docs site grows quickly and users can’t find answers; support volume increases.
  • Proposed architecture:
  • Docs exported nightly to a Cloud Storage bucket
  • Vertex AI Search indexes docs and serves search queries
  • Frontend calls a lightweight backend (Cloud Run) that queries the Search API
  • Why Vertex AI Search was chosen:
  • Minimal ops; team doesn’t want to manage Elasticsearch
  • Quick to integrate into a web app
  • Expected outcomes:
  • Improved self-serve success rate
  • Reduced support tickets
  • Simple path to add richer experiences later (if needed)

16. FAQ

1) Is Vertex AI Search the same as Discovery Engine?
Vertex AI Search is the product experience; Discovery Engine is the underlying Google Cloud API (discoveryengine.googleapis.com) used for programmatic operations and search requests. The console may use “Agent Builder” labels while still using Discovery Engine under the hood.

2) Is Vertex AI Search only for unstructured documents like PDFs?
No. It can support different data types (unstructured and structured). The best ingestion method depends on your content source and desired filters/facets. Verify supported data models in official docs.

3) Do I need to run servers or manage a cluster?
No. Vertex AI Search is managed; you manage configuration, content ingestion, and your application integration.

4) How do I authenticate API calls?
Typically with OAuth2 access tokens from a Google identity (user) or a service account. For production, prefer service accounts and least-privilege roles.

5) Can I call the Search API directly from the browser?
It’s usually better to call from a backend to avoid exposing tokens and to enforce authorization, rate limiting, and tenant isolation. If you do direct calls, design security very carefully.

6) How do I restrict which users can see which documents?
In many deployments, authorization is enforced in your application layer using metadata filters and backend checks. Some connectors may propagate ACLs. Verify current document-level security support for your ingestion method.

7) How long does indexing take?
Depends on document count, size, and ingestion method. Small labs can take minutes; large corpora can take longer. Monitor import job status in the console.

8) Does Vertex AI Search support synonyms?
Synonym support and tuning options are commonly provided, but exact capabilities can vary. Check the current Vertex AI Search tuning docs for your app type.

9) What’s the difference between Vertex AI Search and Vertex AI Vector Search?
Vertex AI Search is managed enterprise/app search with indexing and serving for text and metadata-driven queries. Vertex AI Vector Search is for approximate nearest neighbor (ANN) search over embeddings.

10) How do I estimate costs?
Use the official pricing page and Google Cloud Pricing Calculator. Key drivers are indexed content size, ingestion frequency, and query volume.

11) What are common causes of PERMISSION_DENIED during import?
Missing Cloud Storage bucket permissions for the service agent used by ingestion. Grant appropriate read permissions at the bucket level and retry.

12) Can I keep separate dev and prod indexes?
Yes—recommended. Use separate projects or separate data stores/engines with clear naming and IAM separation.

13) How do I monitor search latency and errors?
Track it in your application (client and backend), and review Cloud Logging and Audit Logs for API activity. Check Cloud Monitoring for available metrics.

14) Can I migrate from Elasticsearch/OpenSearch?
Yes, but it’s not a one-click migration. You’ll redesign ingestion, metadata schema, and query logic. Plan relevance evaluation and A/B testing.

15) Is global the only location?
Many examples use global, but location support depends on product configuration and features. Always verify available locations and data residency requirements in official docs.

16) What’s the safest way to start?
Index a small, non-sensitive corpus in a dev project, validate relevance and cost, then scale with governance controls and environment separation.

17. Top Online Resources to Learn Vertex AI Search

Resource Type Name Why It Is Useful
Official documentation Vertex AI Search docs (entry point) — https://cloud.google.com/vertex-ai-search Canonical product docs, concepts, setup, and references
Official API docs Discovery Engine API documentation — https://cloud.google.com/generative-ai-app-builder/docs/reference/rest (Verify current URL/section for Discovery Engine) REST method details for search queries and resource paths
Official pricing Vertex AI Search and Conversation pricing — https://cloud.google.com/vertex-ai-search-and-conversation/pricing Current SKUs, units, and billing model (verify latest)
Cost estimation Google Cloud Pricing Calculator — https://cloud.google.com/products/calculator Build estimates without guessing static prices
Getting started Vertex AI Search “Get started” guides — https://cloud.google.com/vertex-ai-search/docs (navigate to quickstarts) Step-by-step official onboarding
Architecture guidance Google Cloud Architecture Center — https://cloud.google.com/architecture Reference architectures and best practices (search for “Vertex AI Search” in the center)
IAM/security IAM overview — https://cloud.google.com/iam/docs/overview How to design least-privilege for production
Logging/auditing Cloud Audit Logs — https://cloud.google.com/logging/docs/audit Governance and compliance logging patterns
Storage ingestion Cloud Storage documentation — https://cloud.google.com/storage/docs Buckets, IAM, object lifecycle, and permissions needed for ingestion
Samples Google Cloud samples on GitHub — https://github.com/GoogleCloudPlatform (search within for Discovery Engine / Vertex AI Search samples) Practical code examples (verify repository freshness and compatibility)
Videos Google Cloud Tech YouTube — https://www.youtube.com/@googlecloudtech Product walkthroughs and architecture sessions (search within channel)
Community Google Cloud Community — https://www.googlecloudcommunity.com/ Peer discussion, troubleshooting patterns (validate against official docs)

18. Training and Certification Providers

Institute Suitable Audience Likely Learning Focus Mode Website URL
DevOpsSchool.com Engineers, DevOps, architects Cloud/DevOps training programs that may include Google Cloud and AI/ML topics Check website https://www.devopsschool.com/
ScmGalaxy.com Beginners to intermediate practitioners Software engineering, DevOps, and tooling fundamentals Check website https://www.scmgalaxy.com/
CLoudOpsNow.in Cloud ops and platform teams Cloud operations practices, SRE/ops enablement Check website https://www.cloudopsnow.in/
SreSchool.com SREs, operations, reliability engineers Reliability engineering, monitoring, incident response Check website https://www.sreschool.com/
AiOpsSchool.com Ops + AI practitioners AIOps concepts, monitoring automation, ML-assisted ops Check website https://www.aiopsschool.com/

19. Top Trainers

Platform/Site Likely Specialization Suitable Audience Website URL
RajeshKumar.xyz DevOps/cloud training content (verify current offerings) Beginners to advanced https://rajeshkumar.xyz/
devopstrainer.in DevOps training and workshops (verify current offerings) Engineers and teams https://www.devopstrainer.in/
devopsfreelancer.com Freelance DevOps consulting/training resources (verify current offerings) Teams needing short-term help https://www.devopsfreelancer.com/
devopssupport.in DevOps support and training resources (verify current offerings) Operations and DevOps teams https://www.devopssupport.in/

20. Top Consulting Companies

Company Likely Service Area Where They May Help Consulting Use Case Examples Website URL
cotocus.com Cloud and software consulting (verify offerings) Architecture, implementation, operational support Build a search-backed internal portal; integrate GCP services; set up logging/monitoring https://cotocus.com/
DevOpsSchool.com DevOps/cloud consulting and training services Platform enablement, DevOps processes, cloud migrations CI/CD for search-related apps; infra governance; operational best practices https://www.devopsschool.com/
DEVOPSCONSULTING.IN DevOps consulting services (verify offerings) DevOps transformation, automation, cloud ops Secure deployment pipelines; reliability improvements; cost governance https://www.devopsconsulting.in/

21. Career and Learning Roadmap

What to learn before this service

  • Google Cloud fundamentals: projects, billing, IAM, service accounts
  • Cloud Storage basics: buckets, IAM, lifecycle, object organization
  • APIs on Google Cloud: enabling services, OAuth tokens, quotas
  • Search fundamentals:
  • precision/recall
  • relevance ranking
  • facets/filters
  • query logs and evaluation

What to learn after this service

  • Relevance evaluation and tuning practices (building test query sets, offline evaluation)
  • Production SRE practices:
  • SLIs/SLOs for search latency and error rate
  • incident response and runbooks
  • Security governance:
  • org policies
  • audit log exports
  • data classification and retention
  • Adjacent AI/ML patterns:
  • semantic retrieval and embeddings
  • RAG architectures (where Vertex AI Vector Search may be relevant)
  • conversational experiences (under Vertex AI Agent Builder / “Search and Conversation” umbrella, if desired)

Job roles that use it

  • Cloud Engineer / Platform Engineer
  • Solutions Architect
  • Search Engineer (application search)
  • DevOps / SRE supporting internal platforms
  • Data Engineer (content ingestion pipelines)
  • Security Engineer (governance and compliance)

Certification path (if available)

Google Cloud certifications that complement this skill (verify current certification names and paths): – Associate Cloud Engineer – Professional Cloud Architect – Professional Data Engineer (helpful for ingestion pipelines) – AI/ML certifications (if integrating with broader Vertex AI capabilities)

Project ideas for practice

1) Build a searchable internal handbook with filters for team/department. 2) Index release notes and runbooks; add “service” and “severity” metadata facets. 3) Create a product catalog search with brand/category/price facets (verify best data model). 4) Build a multi-tenant knowledge base search backend (careful with authorization design). 5) Create a “search quality dashboard” by logging queries and measuring click-through and zero-result rates.

22. Glossary

  • Data store: A managed container that defines the content source and indexing setup for Vertex AI Search.
  • Engine / Search app: The serving application that exposes search functionality over one or more data stores.
  • Serving config: A named configuration used by the Search API to serve results (often includes default ranking/tuning behavior).
  • Discovery Engine API: The Google Cloud API (discoveryengine.googleapis.com) commonly used to query and manage search resources programmatically.
  • Ingestion: The process of importing documents/records into the data store for indexing.
  • Indexing: Building internal search structures from ingested content so queries can be served quickly.
  • Facet: A category breakdown (with counts) used to refine search (e.g., by department, year, product category).
  • Filter: A query constraint applied to metadata fields (e.g., department = "HR").
  • Relevance tuning: Adjusting ranking/matching so the most useful results appear first.
  • Quota: A limit on API usage (requests per minute/day, etc.) enforced by Google Cloud.
  • Service account: A non-human identity used by applications to authenticate to Google Cloud services securely.
  • Cloud Audit Logs: Logs that record administrative actions and (where enabled) data access for Google Cloud services.

23. Summary

Vertex AI Search is Google Cloud’s managed enterprise search service in the AI and ML category that helps you ingest content, build a managed index, and serve relevant results via APIs—without operating a search cluster. It matters because search quality and operational simplicity are difficult to achieve with DIY systems, especially at enterprise scale.

From an architecture perspective, treat it as a managed indexing + serving layer: design your ingestion strategy, plan metadata and access control patterns, and integrate it behind a backend for security and governance. Cost is primarily driven by indexed content and query volume, with potential add-on costs for advanced features—so use the official pricing page and calculator, and control query rates in your application.

Use Vertex AI Search when you want a Google Cloud-native, managed search foundation for internal or customer-facing applications. Next step: build a production-ready proof of concept with real content, a metadata schema, a relevance evaluation set, and least-privilege IAM—then scale with monitoring, quotas, and cost controls.