Oracle Cloud Generative AI Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Analytics and AI

1. Introduction

Oracle Cloud Generative AI is a managed service in the Analytics and AI portfolio that lets you call large language models (LLMs) and related foundation models through APIs and the Oracle Cloud Console. You use it to generate text, chat, summarize, extract information, and create embeddings for semantic search and retrieval-augmented generation (RAG), without standing up and operating model-serving infrastructure yourself.

In simple terms: you send a prompt (and optional context) to Generative AI, and it returns a model response. You pay for usage (pricing varies by model and region), and Oracle Cloud handles capacity, patching, and the service control plane.

Technically, Generative AI exposes model inference endpoints (and associated request/response schemas) secured by Oracle Cloud Infrastructure (OCI) Identity and Access Management (IAM). Your application authenticates with OCI (API keys, instance principals, resource principals, etc.), calls the Generative AI inference API in a specific region and compartment scope, and receives outputs such as generated text or vector embeddings. You typically integrate it with data sources (Object Storage, databases, search engines), application runtimes (Functions, Kubernetes), and observability (Logging, Audit) to build production systems.

The service solves the problem of reliably consuming foundation models in enterprise environments—with OCI IAM, compartments, policies, audit trails, and architecture patterns that platform teams can govern—while reducing the operational burden of self-hosting GPUs and model servers.

Naming note (verify in official docs): Oracle’s official documentation often refers to the service as “OCI Generative AI”. In the Console it may appear as “Generative AI” under Analytics & AI. This tutorial uses Generative AI as the primary service name, aligned to the requested mapping, and calls out OCI-specific terms where required.

2. What is Generative AI?

Official purpose (what it’s for)

Generative AI in Oracle Cloud is a managed service to run inference on supported generative models (for example, chat/text generation models and embedding models) using OCI-native security and governance. It is intended for building applications such as assistants, document summarizers, knowledge search, content generation, and automation workflows.

For the most current statement of scope and supported model families, verify in official documentation: – https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm

Core capabilities (what you can do)

Common, practical capabilities include:

Chat/text generation: Provide instructions and context; get natural-language responses.
Summarization and rewriting: Summarize long text, rewrite for tone, extract action items.
Information extraction: Extract entities, structured fields, or key points (often via prompting).
Embeddings: Convert text into vectors for semantic similarity search and RAG pipelines.
Reranking (if available for selected models; verify): Improve search relevance by reranking candidate passages.

Exact features depend on which models Oracle makes available in your region and your tenancy.

Major components (how it’s organized)

Generative AI typically includes:

Models / model catalog: The list of supported models and their identifiers (often used in API requests).
Inference API: Endpoints you call for chat/text generation and embeddings.
Serving modes (availability varies; verify): Many managed AI offerings distinguish on-demand shared capacity vs. dedicated capacity. If your tenancy has access to multiple serving options, choose based on throughput, latency, and isolation needs.
OCI IAM integration: Policies to control which groups/apps can use the service and in which compartments.
Audit and observability hooks: OCI Audit events for API calls; optional Logging/Monitoring patterns depending on your application design.

Service type

Managed AI inference service (not a general compute service)
Consumed primarily via API/SDK/CLI and sometimes via Console playgrounds (availability depends on region and current UI).

Scope: regional vs. global, tenancy vs. project

Generative AI is typically regional in OCI: you select a region, and inference calls go to that region’s service endpoint. Resources and access control are tenancy-based and compartment-scoped (OCI’s standard governance model).

Because Oracle may expand regions and model availability over time, confirm your region support here: – Verify in official docs and your Console’s region selector.

Fit in the Oracle Cloud ecosystem

Generative AI is usually part of an OCI solution that includes:

Networking: VCN, private subnets, NAT Gateway or Service Gateway (depending on architecture)
Data: Object Storage, Autonomous Database / Oracle Database services, streaming/logs
App runtime: OCI Functions, Compute instances, OKE (Kubernetes)
Security: IAM policies, Vault for secrets, Cloud Guard (governance), Audit

3. Why use Generative AI?

Business reasons

Faster delivery of AI-powered features: Add chat/search/summarization without building model hosting.
Lower operational burden: No GPU fleet management, patching, scaling, or model server maintenance.
Enterprise governance: OCI compartments, tagging, IAM policies, and audit trails support regulated environments.

Technical reasons

Standard APIs and SDKs: Integrate using OCI SDKs (Python/Java/Go/JS, etc.) and authenticated REST calls.
Embeddings support for RAG: A practical path to build knowledge assistants over internal documents.
Regional deployment: Keep workloads near your data and applications in OCI regions (subject to availability).

Operational reasons

Separation of duties: Platform teams manage IAM/policies and networking; app teams consume APIs.
Repeatable deployment patterns: Use Terraform for IAM/networking and CI/CD for apps.
Observability alignment: Use OCI Audit for API activity and your application logs for prompt/response metadata (with redaction).

Security/compliance reasons

OCI IAM policy enforcement: Centralized access control with least privilege.
OCI Audit: Trace who invoked what (for governance).
Data residency alignment: Regional endpoints help with residency requirements (verify exact commitments in Oracle policies and your contracts).

Important: Data handling for prompts/responses is a policy and contractual topic. Verify Oracle’s published data usage statements and your organization’s compliance requirements in the official documentation and legal terms.

Scalability/performance reasons

Elastic inference: On-demand inference reduces capacity planning for variable workloads.
Dedicated capacity (if available; verify): For consistent latency and throughput, some tenancies can use dedicated serving options.

When teams should choose it

Choose Generative AI on Oracle Cloud when:

You already run workloads on OCI and want native IAM + compartment governance.
You need managed inference rather than self-hosting models.
Your workloads benefit from embeddings + RAG and you want a managed foundation model endpoint.
You need a path to production that fits OCI networking/security patterns.

When teams should not choose it

Consider alternatives when:

You must run a specific model not available in Generative AI and cannot accept substitutes.
You need full control over model weights, fine-tuning pipelines, or custom model serving.
You require offline/air-gapped deployments that cannot call managed cloud endpoints.
Your cost model strongly favors self-hosted inference at very high steady-state volumes (after careful benchmarking).

4. Where is Generative AI used?

Industries

Financial services (customer support summarization, compliance drafting assistance)
Healthcare/life sciences (non-diagnostic document workflows, policy Q&A)
Retail/e-commerce (product content generation, support bots)
Manufacturing (maintenance knowledge assistants)
SaaS/technology (in-app copilots, ticket triage)
Public sector (policy document search and summarization; verify procurement constraints)

Team types

App developers building assistants and copilots
Data engineers and analytics teams building search/RAG pipelines
Platform and cloud engineering teams governing AI access
Security teams implementing data controls and monitoring
SRE/operations teams running production services at scale

Workloads

Internal knowledge assistant for SOPs/runbooks
Customer service agent assist (draft replies, summarize conversations)
Document processing pipeline (summaries + extracted fields)
Semantic search over policies/contracts/product docs
Content generation with human review (marketing, documentation, email drafts)

Architectures

Web application + backend API calling Generative AI
Event-driven pipelines (Functions) summarizing new documents
RAG with embeddings + vector store (DB/search) + Generative AI chat
Hybrid: on-prem data sources with secure transfer into OCI + inference calls

Real-world deployment contexts

Dev/test: Explore prompts in a playground or a small app; use on-demand capacity; minimal IAM.
Production: Enforce least privilege policies; integrate Vault; implement logging/redaction; use rate limiting; implement fallbacks; track cost and quality metrics.

5. Top Use Cases and Scenarios

Below are realistic scenarios where Oracle Cloud Generative AI is commonly applied.

1) Support ticket summarization and next-step extraction

Problem: Support queues contain long back-and-forth threads; agents need quick context and next actions.
Why this service fits: Chat/text generation models can summarize and extract structured outputs (via prompting).
Example: A helpdesk system sends ticket history to Generative AI and stores a summary + “action items” field.

2) Internal knowledge base Q&A (RAG)

Problem: Employees can’t find answers across scattered docs.
Why this service fits: Embeddings enable semantic search; the chat model can answer using retrieved passages.
Example: Index HR policies into embeddings; retrieve top passages; generate answers with citations.

3) Meeting notes summarization (non-realtime)

Problem: Teams lose decisions and action items in long notes.
Why this service fits: Summarization prompts are straightforward; batch processing is cost-controlled.
Example: After a meeting, a pipeline summarizes notes into decisions/risks/tasks.

4) Contract clause extraction (assistive, with review)

Problem: Legal teams need key clause extraction at scale.
Why this service fits: Models can extract fields (termination date, governing law) with careful prompting and validation.
Example: A document workflow extracts clauses and flags missing items; attorneys review.

5) Developer documentation assistant

Problem: Engineers need answers from internal runbooks and service docs.
Why this service fits: RAG reduces hallucinations by grounding answers in retrieved documents.
Example: A Slack bot answers “How do I rotate OCI API keys?” citing internal SOP sections.

6) Product content drafting with brand guidelines

Problem: Thousands of SKUs need consistent descriptions.
Why this service fits: Template-based prompting generates drafts quickly; humans approve.
Example: For each SKU, generate a description in the company tone and store as a draft.

7) Log/incident report summarization (ops)

Problem: Postmortems take time; incident timelines are long.
Why this service fits: Summarize timelines and extract likely root cause hypotheses (with human validation).
Example: Compile incident Slack thread + alerts into a structured incident summary.

8) Call center agent assist (draft responses)

Problem: Agents need suggested responses that match policy and tone.
Why this service fits: Chat models can draft replies; RAG can ensure policy grounding.
Example: Provide relevant policy snippets; ask the model to draft a response; agent edits before sending.

9) Multilingual rewriting and translation (with review)

Problem: Global teams need content localized quickly.
Why this service fits: Many LLMs handle multilingual tasks; outputs still require QA.
Example: Translate internal announcements; keep a consistent tone.

10) Search relevance improvements (embeddings + rerank if available)

Problem: Keyword search returns irrelevant results.
Why this service fits: Embeddings and reranking can improve relevance beyond keyword matching.
Example: For a query, retrieve candidates via BM25, rerank with a model, and show top answers.

11) Data catalog description generation

Problem: Data assets lack usable descriptions and ownership metadata.
Why this service fits: Generate plain-language descriptions from schema/metadata.
Example: Summarize a table’s columns into a human-friendly description and suggested owners.

12) Compliance policy drafting assistant (not final authority)

Problem: Security/compliance teams draft repetitive policy language.
Why this service fits: Drafting patterns are consistent; humans review and approve.
Example: Generate an initial access control policy draft based on requirements, then review.

6. Core Features

Feature availability depends on region, tenancy, and the models offered at the time. Always verify current capabilities in the official Generative AI docs.

1) Managed access to foundation models (model catalog)

What it does: Provides a set of supported models you can call via OCI APIs.
Why it matters: Reduces the need to source, host, and patch model servers.
Practical benefit: Faster proof-of-concept to production with standard governance.
Limitations/caveats: Model availability differs by region; some models may have specific usage policies.

2) Chat/text generation inference API

What it does: Accepts prompts and returns generated text (chat responses, summaries, rewrites).
Why it matters: Enables natural language interfaces and automation.
Practical benefit: Implement assistants, summarizers, and drafting tools.
Limitations/caveats: Outputs can be incorrect; you must implement validation, grounding, and human review where needed.

3) Embeddings inference API

What it does: Converts text to vectors for semantic similarity search.
Why it matters: Embeddings are foundational for RAG and semantic search.
Practical benefit: Build “search by meaning” across documents and tickets.
Limitations/caveats: You must manage a vector store (database/search engine/local index). Embedding dimension/model choice affects recall and cost.

4) Serving mode options (on-demand vs. dedicated) (verify availability)

What it does: Lets you choose shared on-demand inference or dedicated capacity, depending on what Oracle offers in your tenancy/region.
Why it matters: Production workloads often need predictable latency and throughput.
Practical benefit: Match cost and performance to workload shape.
Limitations/caveats: Dedicated capacity may require provisioning, quotas, or contracts.

5) OCI IAM integration (compartments + policies)

What it does: Controls who/what can call Generative AI and where.
Why it matters: Enterprise-grade access control and separation of duties.
Practical benefit: Enforce least privilege, environment separation (dev/test/prod), and auditing.
Limitations/caveats: Misconfigured policies are a common cause of “NotAuthorizedOrNotFound”.

6) OCI Audit visibility for API activity

What it does: Records calls to OCI services, including Generative AI API operations, in Audit logs.
Why it matters: Governance, incident response, and compliance.
Practical benefit: Track who accessed AI endpoints and when.
Limitations/caveats: Audit logs won’t automatically capture full prompts/responses; your application logging design matters.

7) SDK and API support (multi-language)

What it does: Enables integration via OCI SDKs and REST APIs.
Why it matters: Reduces custom auth code; consistent OCI patterns.
Practical benefit: Quick integration from Python/Java/Node/Go apps.
Limitations/caveats: SDK versions must match API versions; verify sample code against current SDK docs.

8) Compartment-based organization and governance

What it does: Lets you isolate AI usage by project/team/environment.
Why it matters: Cost tracking, least privilege, blast-radius control.
Practical benefit: Separate dev/test/prod access and budgets.
Limitations/caveats: You must design your compartment structure intentionally.

7. Architecture and How It Works

High-level service architecture

At a high level, your app (or a developer in the Console) sends an inference request to the Generative AI endpoint in a region. The request includes:

Authentication (OCI signed request via SDK/CLI)
Compartment context (where policies apply)
Model identifier (which model to run)
Prompt/input parameters (messages, temperature, max tokens, etc.)

The service returns a response:

Generated text or chat response
Embeddings vectors
Metadata (request id, token usage if provided by model/service; verify exact fields)

Request/data/control flow

Identity: The caller authenticates using OCI IAM (user API key, instance principal, resource principal).
Authorization: OCI IAM evaluates policies for the target compartment and the Generative AI service family.
Inference: The service routes the request to the selected model/serving mode.
Response: Output is returned over HTTPS to the client.
Audit: OCI Audit records relevant API activity.

Integrations with related OCI services

Common integrations include:

OCI Object Storage: Store documents for RAG ingestion and audit-friendly retention.
OCI Functions: Event-driven summarization and extraction pipelines.
OKE (Kubernetes): Run your RAG backend and API services.
OCI Vault: Store secrets (API keys for external systems; OCI API keys usually live in config files—prefer principals when possible).
OCI Logging: Centralize app logs (prompt metadata, latency, errors) with redaction.
OCI Monitoring: Track application metrics (requests, latency, token usage).
Databases/Search: Store embeddings and content indexes (verify best-fit OCI offerings for vector storage in your environment).

Dependency services

Generative AI depends on standard OCI foundations:

IAM (users, groups, policies)
Networking (public endpoints; private connectivity patterns depend on your design)
Regional availability of the service and chosen models

Security/authentication model

OCI authentication options you’ll commonly use:

User principals (API keys): Good for developer testing; riskier for production if keys leak.
Instance principals: Best for OCI Compute-based apps (no long-lived keys).
Resource principals: Best for OCI Functions and some managed services.
OKE workload identity: If supported in your environment; otherwise use instance principals via node pools (verify current OCI guidance).

Authorization is controlled by policies granting access to the Generative AI service family in a compartment.

Networking model

Generative AI is typically accessed via HTTPS endpoints.
From private subnets, you may route outbound traffic via:
NAT Gateway for internet egress, or
Service Gateway for access to Oracle services on the Oracle Services Network (common OCI pattern; verify that Generative AI is reachable via service gateway in your region).

Monitoring/logging/governance considerations

Audit: Enable and review for access patterns.
Application logs: Store request ids, latency, model id, and token counts (if available).
Redaction: Avoid logging raw PII prompts/responses.
Tagging: Tag compartments/resources used by surrounding architecture for cost allocation.

Simple architecture diagram (Mermaid)

flowchart LR
  U[Developer / App] -->|OCI Auth + HTTPS| GAI[Oracle Cloud Generative AI<br/>Inference API]
  GAI --> R[Response<br/>Text / Embeddings]

Production-style architecture diagram (Mermaid)

flowchart TB
  subgraph VCN[OCI VCN]
    subgraph APP[Private Subnet]
      SVC[App Service<br/>API / RAG Backend]
      VS[(Vector Store<br/>DB/Search)]
      OBJ[(Object Storage<br/>Docs)]
    end
    NAT[NAT or Service Gateway<br/>(depends on design)]
  end

  ID[IAM Policies<br/>Groups/Principals] --> SVC
  SVC -->|Retrieve docs| OBJ
  SVC -->|Embed + similarity search| VS
  SVC -->|OCI-signed HTTPS call| NAT --> GAI[Generative AI Inference API<br/>(Regional)]
  GAI --> SVC

  SVC --> LOG[OCI Logging]
  SVC --> MET[OCI Monitoring Metrics]
  AUD[OCI Audit] --> SEC[Security Review / SIEM]

8. Prerequisites

Tenancy/account requirements

An Oracle Cloud tenancy with access to Analytics and AI services.
Generative AI service enabled/available in your tenancy and chosen region (availability may vary).

Permissions/IAM roles

You need permissions to: – Use Generative AI in a target compartment – Read model information (if required by the workflow)

OCI policy examples (verify exact policy verbs and service family name in official docs):

Allow group <group-name> to use generative-ai-family in compartment <compartment-name>

In some environments you may need broader permissions during setup, then tighten later. Always validate the least-privilege set that still works for your use case.

Billing requirements

A paid tenancy or an active billing account.
Free Tier applicability varies; do not assume Generative AI is covered. Verify on Oracle’s Free Tier and pricing pages.

Tools needed

Choose one approach:

Console-only (quick testing):
Oracle Cloud Console access
CLI/SDK lab (recommended for this tutorial):
OCI Cloud Shell (recommended) or local machine
Python 3.9+ (or a version supported by OCI Python SDK)
OCI Python SDK: https://docs.oracle.com/en-us/iaas/tools/python/latest/
OCI CLI (optional): https://docs.oracle.com/en-us/iaas/tools/oci-cli/latest/

Region availability

Select a region where Generative AI is available and where your chosen model is offered.
Confirm in the Console under Analytics & AI → Generative AI (or via official docs).

Quotas/limits

Service limits exist for request rates, concurrency, token sizes, and possibly dedicated capacity.
Review OCI service limits and request increases if needed (common for production).

Prerequisite services (for the hands-on lab)

IAM user or principal configuration
(Optional) Object Storage bucket if you extend the lab to RAG with documents

9. Pricing / Cost

Do not rely on blog posts for AI pricing. Always confirm on Oracle’s official pricing pages for your region and the specific model/SKU.

Current pricing model (how you’re billed)

Generative AI pricing is typically usage-based, and commonly depends on dimensions such as:

Model type (chat/generation vs embeddings; premium vs standard models)
Tokens processed (input tokens + output tokens) for text/chat models (common industry pattern; verify OCI’s exact metering units)
Characters/tokens for embeddings (verify metering units)
Serving mode (on-demand vs dedicated capacity) if both are available in your tenancy
Region (pricing differs by region)

Official pricing entry points: – Oracle Cloud price list: https://www.oracle.com/cloud/price-list/ – Oracle Cloud cost estimator: https://www.oracle.com/cloud/costestimator.html

Search the price list for AI Services and Generative AI. If your organization has a contract, your negotiated rates may differ from list pricing.

Free tier

Oracle has a Free Tier program, but Generative AI may not be included or may have limited free usage depending on current offers. Verify: – https://www.oracle.com/cloud/free/

Cost drivers (what increases spend)

Direct drivers: – Larger prompts (more input tokens) – Larger outputs (more output tokens) – More calls (higher request volume) – Higher-priced models

Indirect drivers: – Vector storage costs (database/search service) – Object Storage costs for documents – Compute/runtime costs for your RAG service (Functions/Compute/OKE) – Logging retention costs if you store large payloads – Data egress if you send responses outside OCI (networking costs vary)

Network/data transfer implications

Calls to Generative AI are HTTPS; if your app runs outside OCI, you may pay egress from OCI or from your hosting provider depending on traffic direction.
If your app runs inside OCI, prefer OCI-native networking patterns to minimize public internet exposure.

How to optimize cost (practical)

Set max output tokens to a reasonable limit for each endpoint.
Summarize context before sending long documents; don’t stuff entire PDFs into prompts.
Use RAG: retrieve only the top relevant chunks rather than sending entire corpora.
Choose embeddings model appropriate for your recall/latency/cost needs.
Implement caching for repeated queries (prompt hash → response).
Track token usage (if exposed) and build cost dashboards at the app level.

Example low-cost starter estimate (no fabricated prices)

A small pilot’s cost is primarily a function of: – number of requests/day, – average input size, – average output size, – chosen model.

To estimate without guessing numbers: 1. Pick a model in the OCI pricing list. 2. Estimate average tokens in/out per request. 3. Multiply by expected request volume. 4. Add compute/logging/storage.

Use Oracle’s cost estimator: – https://www.oracle.com/cloud/costestimator.html

Example production cost considerations

For production, plan for: – peak request rates (capacity and potential dedicated serving) – multi-environment usage (dev/test/prod) – observability retention (log volume) – vector store scale (embedding count × dimension × indexing overhead) – A/B testing different models/prompts (can double or triple usage temporarily)

10. Step-by-Step Hands-On Tutorial

This lab is designed to be beginner-friendly, low-risk, and practical. You will: 1) confirm access and IAM, 2) identify a model to use, 3) call Generative AI from Python to summarize text and extract action items, 4) optionally generate embeddings for a few sample documents to power a tiny semantic search, 5) clean up.

If any UI labels or SDK classes differ in your tenancy, follow the closest equivalent in the official docs and SDK references.

Objective

Build a small command-line tool that uses Oracle Cloud Generative AI to: – summarize a support ticket conversation, and – extract action items, then (optional) generate embeddings for three knowledge snippets and perform a local similarity match.

Lab Overview

Environment: Oracle Cloud Shell (recommended) or local machine
Auth: OCI config file (developer-friendly) or principals (production-friendly)
Service: Generative AI inference API
Cost control: small prompts, capped output tokens

Step 1: Confirm service availability and choose a compartment

Log in to the Oracle Cloud Console.
Select your target Region (top-right).
Navigate to Analytics & AI → Generative AI (exact navigation can vary).
Pick (or create) a compartment for the lab, for example: ai-labs.

Expected outcome – You can open the Generative AI page in the Console and select a compartment.

Verification – If the service isn’t visible, confirm: – your region supports it, – your tenancy is entitled to it, – your IAM user has permissions.

Step 2: Create or confirm IAM permissions

You need permission to call Generative AI in the compartment.

In Console, go to Identity & Security → IAM → Policies.
Create a policy in the root compartment (or appropriate parent) that grants your group access.

Example (verify exact wording in official docs for your tenancy):

Allow group <your-group> to use generative-ai-family in compartment ai-labs

If you plan to list models or manage related resources, you might need broader permissions, but start with least privilege.

Expected outcome – Policy is created and attached.

Verification – Wait a minute (OCI policy propagation can take a short time). – You should be able to access Generative AI pages for the compartment.

Step 3: Set up authentication for SDK calls (OCI config)

Use one of these approaches:

Option A (recommended for labs): Cloud Shell + OCI config

Oracle Cloud Shell often comes with OCI CLI configured for the logged-in user. You still may need an API key for SDK use depending on your setup.

Open Cloud Shell from the Console.
Check if ~/.oci/config exists:

ls -la ~/.oci
cat ~/.oci/config

If it does not exist or is incomplete, create an API key for your IAM user:

Console → Identity & Security → IAM → Users → → API Keys
Add API key and download the private key
Save it securely and update ~/.oci/config

A minimal ~/.oci/config profile looks like this:

[DEFAULT]
user=ocid1.user.oc1..exampleuniqueID
fingerprint=aa:bb:cc:dd:...
tenancy=ocid1.tenancy.oc1..exampleuniqueID
region=us-ashburn-1
key_file=/home/<you>/.oci/oci_api_key.pem

Keep the private key file permissions restricted (chmod 600).

Option B (production pattern): Instance principals or resource principals

For production, prefer instance principals (Compute) or resource principals (Functions) to avoid distributing long-lived keys. Implementation differs by runtime; verify with OCI docs.

Expected outcome – You have working OCI authentication for SDK calls.

Verification Run a simple CLI command (optional):

oci os ns get

If this works, your identity/auth is likely set.

Step 4: Identify a model to call (model OCID/identifier)

In the Console: 1. Go to Analytics & AI → Generative AI. 2. Find the Models list/catalog (UI naming varies). 3. Choose a model suitable for chat/summarization. 4. Copy the model identifier (often an OCID or a model id string).

Save it in an environment variable in Cloud Shell:

export COMPARTMENT_OCID="ocid1.compartment.oc1..example"
export MODEL_ID="ocid1.generativeaimodel.oc1..example"   # verify format in your console

Expected outcome – You have a compartment OCID and a model identifier for inference.

Verification – Double-check there are no extra spaces or quotes.

Step 5: Create a Python virtual environment and install OCI SDK

In Cloud Shell:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install oci

Expected outcome – oci Python package installed.

Verification

python -c "import oci; print(oci.__version__)"

Step 6: Call Generative AI to summarize and extract action items

Create a file named genai_summarize.py:

import os
import sys
import oci

def main():
    compartment_id = os.environ.get("COMPARTMENT_OCID")
    model_id = os.environ.get("MODEL_ID")
    if not compartment_id or not model_id:
        print("Set COMPARTMENT_OCID and MODEL_ID environment variables.")
        sys.exit(1)

    config = oci.config.from_file()  # uses ~/.oci/config [DEFAULT]

    # NOTE: OCI SDK module/class names can change across versions.
    # Verify current Generative AI Inference SDK in official docs if this import fails.
    from oci.generative_ai_inference import GenerativeAiInferenceClient
    from oci.generative_ai_inference.models import (
        ChatDetails,
        OnDemandServingMode,
        CohereChatRequest,  # model request schema depends on provider/model; verify in docs
    )

    client = GenerativeAiInferenceClient(config)

    ticket_text = """
Subject: VPN access failing

User: I can't connect to VPN since yesterday. Error: TLS handshake failed.
Agent: Are you on home Wi-Fi or mobile hotspot?
User: Home Wi-Fi. It worked last week.
Agent: Please confirm your client version.
User: 5.1.2
Agent: We recently rotated certificates. Try updating to 5.1.4 and re-import the new profile.
User: Update done, still fails.
Agent: We'll check if your account is blocked and reset your VPN profile. Also please try from hotspot to rule out ISP interference.
"""

    prompt = f"""
You are an IT support assistant. Summarize the ticket and extract action items.
Return the result in plain text with two sections:

Summary:
- (3-5 bullets)

Action Items:
- (bulleted list, each item starts with an owner: User/Agent/SRE)
Ticket text:
{ticket_text}
"""

    details = ChatDetails(
        compartment_id=compartment_id,
        serving_mode=OnDemandServingMode(model_id=model_id),
        chat_request=CohereChatRequest(
            message=prompt,
            temperature=0.2,
            max_tokens=400,
        ),
    )

    resp = client.chat(details)
    # Response fields vary; print the whole data structure safely.
    print(resp.data)

if __name__ == "__main__":
    main()

Run it:

python genai_summarize.py

Expected outcome – The script prints a structured response object containing the model output (summary + action items).

Verification – Confirm the output contains your two requested sections and references the ticket content. – If the SDK returns a nested object, you may need to print the specific text field. Use print(resp.data) first, then adjust.

Step 7 (Optional): Generate embeddings and do a tiny local semantic search

This step demonstrates the embeddings workflow without provisioning a database. You’ll embed three short knowledge snippets, embed a query, compute cosine similarity locally, and print the best match.

Create genai_embeddings_search.py:

import os
import sys
import math
import oci

def cosine(a, b):
    dot = sum(x*y for x, y in zip(a, b))
    na = math.sqrt(sum(x*x for x in a))
    nb = math.sqrt(sum(y*y for y in b))
    return dot / (na * nb + 1e-12)

def main():
    compartment_id = os.environ.get("COMPARTMENT_OCID")
    model_id = os.environ.get("MODEL_ID_EMBED") or os.environ.get("MODEL_ID")
    if not compartment_id or not model_id:
        print("Set COMPARTMENT_OCID and MODEL_ID_EMBED (or MODEL_ID).")
        sys.exit(1)

    config = oci.config.from_file()

    # NOTE: Verify current SDK names/schemas for embeddings in official docs.
    from oci.generative_ai_inference import GenerativeAiInferenceClient
    from oci.generative_ai_inference.models import (
        EmbedTextDetails,
        OnDemandServingMode,
        CohereEmbedRequest,  # depends on selected embeddings model/provider
    )

    client = GenerativeAiInferenceClient(config)

    docs = [
        ("vpn_profile_reset", "To fix VPN TLS handshake failures after certificate rotation, reset the VPN profile and import the new configuration."),
        ("client_upgrade", "If the VPN client is outdated, upgrade to the latest approved version and reboot before reconnecting."),
        ("network_isolation", "Test from a mobile hotspot to isolate ISP or home router issues when corporate VPN fails."),
    ]

    query = "VPN TLS handshake failed after certificate changes. What should I do first?"

    def embed(texts):
        details = EmbedTextDetails(
            compartment_id=compartment_id,
            serving_mode=OnDemandServingMode(model_id=model_id),
            embed_text_request=CohereEmbedRequest(
                texts=texts,
                input_type="search_document"  # verify allowed values for your model
            ),
        )
        resp = client.embed_text(details)
        return resp.data

    doc_vectors_resp = embed([d[1] for d in docs])
    query_vector_resp = embed([query])

    # Response parsing varies. Inspect resp.data shape if needed.
    # The following assumes a structure like: resp.data.embeddings = [[...], [...]]
    doc_vectors = doc_vectors_resp.embeddings
    query_vec = query_vector_resp.embeddings[0]

    scored = []
    for (doc_id, _), vec in zip(docs, doc_vectors):
        scored.append((cosine(query_vec, vec), doc_id))
    scored.sort(reverse=True)

    print("Query:", query)
    print("Top match:", scored[0])

if __name__ == "__main__":
    main()

Run:

python genai_embeddings_search.py

Expected outcome – The script prints the query and the best-matching snippet id (likely vpn_profile_reset or client_upgrade).

Verification – If it fails due to response field names, print the entire response objects and adjust field access.

Validation

You have successfully completed the lab if you can:

Call Generative AI and receive a summary/action items response.
(Optional) Generate embeddings and compute a local similarity match.
See successful requests in your application output and (optionally) OCI Audit logs.

To check OCI Audit (high level): – Console → Identity & Security → Audit – Filter by your user and time window, and look for Generative AI-related events (service naming may vary).

Troubleshooting

1) NotAuthorizedOrNotFound – Likely IAM policy missing or in wrong compartment. – Fix: verify the policy is in the correct parent compartment and references the right group and compartment.

2) Service not visible in Console – Region may not support Generative AI or your tenancy may not be enabled. – Fix: switch regions; verify entitlement and docs.

3) Python import errors for oci.generative_ai_inference – OCI Python SDK version may be old. – Fix: upgrade SDK: pip install --upgrade oci – Verify SDK docs: https://docs.oracle.com/en-us/iaas/tools/python/latest/

4) Model id invalid – You might be using the wrong identifier format (OCID vs model string). – Fix: copy the model identifier from the Generative AI model list for your region.

5) Request too large / token limit errors – Prompt/context too long. – Fix: reduce text, chunk documents, cap max_tokens, and use RAG retrieval to send only relevant chunks.

Cleanup

To avoid ongoing cost: – Delete any optional resources you created (Object Storage bucket, Functions, databases) if you extended the lab. – Remove local virtual environment if desired:

deactivate
rm -rf .venv
rm -f genai_summarize.py genai_embeddings_search.py

For security hygiene: – Rotate/revoke user API keys used for testing if they’re no longer needed. – Prefer principals (instance/resource) for production.

11. Best Practices

Architecture best practices

Use RAG for enterprise knowledge: Don’t rely on the model’s latent knowledge for internal policies. Retrieve relevant text and ground the answer.
Chunk documents deliberately: 300–1,000 tokens per chunk is a common starting point; tune based on retrieval quality.
Add citations: Store chunk ids/URLs and ask the model to cite them; display citations to users.
Implement fallbacks: If the model call fails or times out, degrade gracefully (keyword search, cached answer).

IAM/security best practices

Least privilege policies: Limit access to only required compartments and actions.
Prefer principals over API keys: Instance principals/resource principals reduce secret sprawl.
Separate environments: Use separate compartments for dev/test/prod and separate policies.
Use Vault for non-OCI secrets: Store DB passwords, external API keys, etc.

Cost best practices

Control output length with max_tokens.
Avoid sending entire documents; retrieve top passages only.
Cache embeddings and reuse them; don’t re-embed unchanged documents.
Track cost per feature: measure requests/user/day and tokens/request.

Performance best practices

Batch embedding requests where supported to reduce overhead.
Keep context tight: shorter prompts reduce latency.
Use asynchronous pipelines for bulk summarization jobs.
Plan rate limiting: protect the service and your budget from accidental loops.

Reliability best practices

Timeouts and retries: Use bounded retries with jitter for transient errors.
Circuit breakers: Disable model calls temporarily if error rates spike.
Multi-region planning (advanced): If your product needs HA across regions, design for it at the app layer (verify service parity across regions).

Operations best practices

Log request ids returned by OCI/SDK for support.
Redact sensitive content from logs.
Monitor latency and error rates at the application level.
Tag resources used by the broader solution for chargeback.

Governance/tagging/naming best practices

Consistent compartment naming: prod-ai, dev-ai, shared-ai
Tags: CostCenter, DataSensitivity, Owner, Environment
Maintain a “model registry” document: which models are allowed for which data types.

12. Security Considerations

Identity and access model

OCI IAM controls access to Generative AI APIs.
Use:
Groups + policies for humans,
Dynamic groups + instance principals for compute,
Resource principals for Functions.

Key principle: the app identity should have only the minimum permissions to call Generative AI in the required compartment(s).

Encryption

Data in transit is protected by HTTPS/TLS.
For stored data in your architecture (documents, embeddings, logs), enable OCI encryption features (default encryption is common for OCI storage services; verify service-specific encryption options).

Network exposure

Prefer private networking patterns:
Run apps in private subnets.
Use Service Gateway/NAT as appropriate.
Avoid exposing internal RAG endpoints publicly without authentication and WAF protections.

Secrets handling

Avoid hardcoding:
OCI private keys
Database credentials
API tokens for external systems
Use OCI Vault for secrets and rotate regularly.
For production, avoid long-lived user API keys where possible.

Audit/logging

Use OCI Audit for governance of API calls.
Keep application-level logs focused on:
request id,
model id,
latency,
token counts (if available),
coarse metadata (not raw PII).

Compliance considerations

Data classification matters:
Don’t send regulated data to any model endpoint unless your policies and Oracle’s terms explicitly permit it.
Ensure your solution supports:
retention policies,
deletion workflows,
access reviews.

Common security mistakes

Logging full prompts/responses that include PII/secrets.
Using a broad policy like “manage all-resources in tenancy” for an app.
Sharing one API key across multiple apps/teams.
No rate limiting → runaway costs and potential denial-of-wallet incidents.

Secure deployment recommendations

Start with a reference threat model:
who can submit prompts,
what data can be retrieved,
where logs go,
how to detect misuse.
Add content filters and input validation:
prompt injection defenses (strip or isolate instructions from retrieved docs),
allow-lists for retrieval sources,
user authorization checks before retrieving documents.

13. Limitations and Gotchas

These are common constraints; verify exact limits in official Oracle Cloud documentation for Generative AI.

Regional availability: Not all regions support the service or the same models.
Model availability changes: Models can be added/updated; behavior and output quality may shift.
Token/context limits: Each model has max input/output sizes; large docs must be chunked.
Latency variability: Shared on-demand serving can vary under load; dedicated options may be needed.
IAM propagation delay: New policies can take time to apply.
SDK/API versioning: Example code may break if SDK modules/classes change; pin versions in production.
Prompt injection in RAG: Retrieved documents can contain malicious instructions. Treat retrieved text as untrusted.
Data leakage through logs: Over-verbose logging can become your biggest security issue.
Cost surprises: Long prompts + long outputs + high volume = fast cost growth. Add hard limits.
Environment drift: Different compartments/regions can have different model catalogs; document your dependencies.

14. Comparison with Alternatives

Nearest services in Oracle Cloud

OCI Data Science: For building/training/hosting your own models and MLOps pipelines (more control, more ops).
Other OCI AI Services (Language, Vision, Speech, etc.): Task-specific APIs; may be more predictable for certain workloads than general LLM prompting.
OCI Search / Database services: Not generative, but critical for RAG storage/retrieval.

Nearest services in other clouds

AWS: Amazon Bedrock (managed foundation model access) and SageMaker (custom ML).
Azure: Azure OpenAI Service (managed OpenAI models) and Azure AI Foundry/ML.
Google Cloud: Vertex AI (Gemini + model garden, embeddings, MLOps).

Open-source/self-managed alternatives

Self-host LLM inference on OCI Compute GPU instances (or Kubernetes with GPU nodes) using vLLM/TGI/Ollama (operationally heavier, potentially cost-effective at scale).
Open-source embedding models with self-managed vector DB (more control, more setup).

Comparison table

Option	Best For	Strengths	Weaknesses	When to Choose
Oracle Cloud Generative AI	OCI-native apps needing managed LLM/embeddings	OCI IAM/compartments, managed inference, enterprise governance	Model/region availability constraints; usage-based costs can spike	You want managed inference with OCI security/governance
OCI Data Science (self-host inference)	Teams needing custom models/control	Full control, custom serving, can optimize cost at high scale	GPU ops complexity, scaling, patching	You need a specific model or custom serving behavior
AWS Bedrock	AWS-centric foundation model consumption	Broad model marketplace, strong ecosystem	AWS IAM/networking alignment needed; cross-cloud adds complexity	Your platform is primarily on AWS
Azure OpenAI	Microsoft-centric apps needing OpenAI models	Strong enterprise integration, tooling	Model/provider constraints; region capacity considerations	You’re standardized on Azure and need OpenAI APIs
Google Vertex AI	Google Cloud AI platform users	Integrated MLOps + foundation models	Cross-cloud complexity if you’re on OCI	Your data and apps are on GCP
Self-managed open-source (vLLM on GPUs)	Cost/latency control at steady high volume	Full control, no per-token managed fees	Significant ops/security burden	You have ML infra maturity and predictable workload

15. Real-World Example

Enterprise example: Policy-aware employee assistant (RAG)

Problem: A large enterprise has thousands of internal policies and runbooks. Employees ask repetitive questions, and the helpdesk is overloaded.
Proposed architecture:
Store documents in OCI Object Storage
Extract text and chunk documents
Generate embeddings with Generative AI embeddings
Store vectors in an enterprise-approved vector store (database/search)
Backend service on OKE calls:
- vector store for top-k chunks
- Generative AI chat model with grounded prompt including retrieved chunks
Logs to OCI Logging, access tracked via OCI Audit
Why Generative AI was chosen:
Managed inference reduces GPU operations.
OCI IAM and compartments align with enterprise governance.
Expected outcomes:
Reduced mean time to answer internal questions.
Lower helpdesk ticket volume.
Auditable access patterns via IAM + Audit.

Startup/small-team example: Customer support copilot

Problem: A startup needs to respond quickly to customer emails and chats with a consistent tone and accurate product info.
Proposed architecture:
A lightweight backend on OCI Compute (or Functions)
Product FAQ stored in a small database + embeddings for semantic retrieval
Generative AI used to draft replies, with human approval step
Why Generative AI was chosen:
Small team avoids managing model servers.
Usage-based pricing fits early-stage variability.
Expected outcomes:
Faster first-response time.
Consistent messaging.
Easy iteration on prompts without redeploying infrastructure.

16. FAQ

1) Is Generative AI in Oracle Cloud the same as Oracle Database AI features?
No. Generative AI is a managed inference service under Analytics and AI. Oracle Database has separate AI-related features (including vector and ML capabilities depending on edition/version). Use the right tool for inference vs. storage/query.

2) Do I need GPUs to use Generative AI?
No. Oracle hosts the model inference infrastructure. You call it via API.

3) Is Generative AI regional in OCI?
Typically yes—choose a region endpoint. Verify region support in official docs and the Console.

4) How do I control who can use Generative AI?
Use OCI IAM policies scoped to compartments and least privilege.

5) Can I use instance principals instead of user API keys?
Yes, and it’s recommended for production when your app runs on OCI Compute. Use resource principals for Functions. Verify the exact setup in OCI IAM docs.

6) What’s the difference between embeddings and chat models?
Embeddings convert text to vectors for similarity search. Chat models generate text responses. RAG uses both.

7) How do I reduce hallucinations?
Use RAG grounding, require citations, constrain prompts, validate outputs, and add human review for critical workflows.

8) Can I send sensitive data in prompts?
Only if your security/compliance policies allow it and Oracle’s service terms and data handling commitments meet your requirements. Verify with official docs and your legal/security team.

9) How do I estimate cost?
Estimate volume and token usage, then apply Oracle’s price list for the specific model and region. Use the OCI cost estimator.

10) What should I log in production?
Log request ids, latency, model id, and high-level metadata. Avoid logging raw prompts/responses unless redacted and explicitly approved.

11) How do I choose a model?
Start with a model suited for your task (summarization/chat vs embeddings). Benchmark quality, latency, and cost with representative prompts.

12) Can I do RAG without a vector database?
For tiny datasets, yes (in-memory vectors like the lab). For production, use a durable store optimized for vector search.

13) What are common IAM errors?
Wrong compartment, missing policy, or using the wrong principal. Also allow time for policy propagation.

14) How do I handle prompt injection in RAG?
Treat retrieved text as untrusted. Use system prompts that reject tool override instructions, filter retrieved content, and enforce authorization checks before retrieval.

15) Can I use Generative AI from outside OCI?
Yes if networking and IAM allow it, but consider egress costs and security posture. Many production teams keep the calling service inside OCI.

16) Is there a playground in the Console?
Often there is a UI for testing prompts, but UI availability can change. Verify in your region’s Console.

17) Does Generative AI support dedicated capacity?
Some managed AI services offer on-demand vs dedicated serving. Availability and setup vary—verify in official docs and your tenancy options.

17. Top Online Resources to Learn Generative AI

Resource Type	Name	Why It Is Useful
Official documentation	Oracle Cloud Generative AI docs	Primary source for features, limits, IAM, and API usage: https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
Official pricing	Oracle Cloud Price List	Find the Generative AI SKUs and pricing dimensions: https://www.oracle.com/cloud/price-list/
Pricing tool	Oracle Cloud Cost Estimator	Build rough estimates for PoCs and production: https://www.oracle.com/cloud/costestimator.html
Free tier info	Oracle Cloud Free Tier	Check whether any AI usage is included (often limited/changes): https://www.oracle.com/cloud/free/
SDK docs	OCI Python SDK docs	SDK install/auth patterns and examples: https://docs.oracle.com/en-us/iaas/tools/python/latest/
CLI docs	OCI CLI docs	Useful for environment validation and automation: https://docs.oracle.com/en-us/iaas/tools/oci-cli/latest/
IAM concepts	OCI IAM documentation	Policies, compartments, principals, audit basics: https://docs.oracle.com/en-us/iaas/Content/Identity/home.htm
Observability	OCI Audit documentation	How to review service activity for governance: https://docs.oracle.com/en-us/iaas/Content/Audit/home.htm
Official GitHub org	oracle on GitHub	SDK source and official repos: https://github.com/oracle
Official samples hub	oracle-samples on GitHub	Look for OCI AI/Generative AI samples (verify repo relevance): https://github.com/oracle-samples
Architecture guidance	Oracle Architecture Center	Reference architectures and patterns (search for “generative ai”): https://docs.oracle.com/en/solutions/

18. Training and Certification Providers

Institute	Suitable Audience	Likely Learning Focus	Mode	Website
DevOpsSchool.com	Developers, DevOps, platform teams	Cloud + DevOps practices; may include OCI and AI ops topics (verify offerings)	Check website	https://www.devopsschool.com/
ScmGalaxy.com	Beginners to intermediate engineers	DevOps/SCM and automation foundations; may complement OCI deployments	Check website	https://www.scmgalaxy.com/
CLoudOpsNow.in	Cloud engineers, operations	Cloud operations and deployment practices (verify OCI modules)	Check website	https://www.cloudopsnow.in/
SreSchool.com	SREs, reliability engineers	SRE practices for production systems (useful for AI services ops)	Check website	https://www.sreschool.com/
AiOpsSchool.com	Ops + AI/automation practitioners	AIOps concepts, monitoring, automation around AI workloads	Check website	https://www.aiopsschool.com/

19. Top Trainers

Platform/Site	Likely Specialization	Suitable Audience	Website
RajeshKumar.xyz	Cloud/DevOps training content (verify specifics)	Engineers seeking hands-on guidance	https://www.rajeshkumar.xyz/
devopstrainer.in	DevOps training and mentoring (verify course list)	Beginners to intermediate DevOps engineers	https://www.devopstrainer.in/
devopsfreelancer.com	Consulting/training platform (verify services)	Teams needing short-term enablement	https://www.devopsfreelancer.com/
devopssupport.in	Operational support and training resources (verify)	Ops/SRE teams	https://www.devopssupport.in/

20. Top Consulting Companies

Company	Likely Service Area	Where They May Help	Consulting Use Case Examples	Website
cotocus.com	Cloud/DevOps/engineering consulting (verify offerings)	Architecture, implementation, operations	Deploy an OCI-based RAG service; implement CI/CD and observability	https://cotocus.com/
DevOpsSchool.com	DevOps and cloud consulting/training (verify offerings)	Enablement, platform rollout	Establish IAM/compartments, build deployment templates, run workshops	https://www.devopsschool.com/
DEVOPSCONSULTING.IN	DevOps consulting services (verify offerings)	Delivery and operationalization	Build production pipelines, monitoring, incident response processes	https://www.devopsconsulting.in/

21. Career and Learning Roadmap

What to learn before this service

OCI fundamentals:
Tenancy, compartments, regions
IAM users, groups, policies
VCN basics (subnets, gateways)
API basics:
REST concepts, auth, SDK usage
Security fundamentals:
least privilege, secrets management, logging hygiene
Basic AI concepts:
tokens, prompts, temperature, embeddings, vector search

What to learn after this service

RAG system design:
chunking strategies, evaluation, citations
Vector databases/search:
indexing, recall/precision tuning, hybrid search
Production operations:
SLOs for AI features, drift monitoring, prompt versioning
Governance:
data classification, privacy engineering, red-team testing for prompt injection

Job roles that use it

Cloud engineer / platform engineer (governance + deployment)
Solutions architect (end-to-end AI architecture)
Backend developer (API integrations)
DevOps/SRE (reliability, monitoring, cost controls)
Security engineer (data protection and IAM)
Data engineer (pipelines and indexing)

Certification path (if available)

Oracle’s certification catalog changes. Check Oracle University / Oracle training for: – OCI foundations – OCI architect tracks – AI/analytics tracks (if offered)

Start here and search for OCI certifications: – https://education.oracle.com/

Project ideas for practice

Build a document summarization pipeline for Object Storage uploads.
Build a RAG assistant for operational runbooks with citations and access control.
Implement cost controls: per-user quotas + dashboards for token usage.
Add prompt injection defenses and test with a red-team prompt set.
Build an agent-assist tool that drafts replies with required policy citations.

22. Glossary

Compartment (OCI): A logical container for organizing and isolating cloud resources with IAM policies.
IAM Policy (OCI): A rule defining who can do what in which scope (tenancy/compartment).
Principal: An identity that can authenticate (user, instance principal, resource principal).
Prompt: Input instructions and context sent to a generative model.
Token: A unit of text used for metering and model processing (not exactly a word).
Temperature: A parameter controlling randomness in model output (higher = more varied).
Embeddings: Vector representations of text used for semantic similarity search.
Vector store: A database/index optimized for storing and searching vectors by similarity.
RAG (Retrieval-Augmented Generation): Pattern that retrieves relevant documents and uses them as context for generation.
Prompt injection: An attack where malicious instructions are embedded in content to override system behavior.
Least privilege: Security principle of granting only required access and nothing more.
On-demand serving: Shared capacity model where you pay by usage (term may vary; verify OCI terminology).
Dedicated serving/capacity: Provisioned capacity for predictable performance (availability varies).

23. Summary

Generative AI in Oracle Cloud (Analytics and AI) is a managed service for calling foundation models—commonly for chat/text generation and embeddings—using OCI-native IAM, compartments, and audit capabilities. It matters because it enables teams to deliver AI-powered features quickly without operating GPU-based inference infrastructure, while still fitting enterprise governance and security patterns.

Cost is primarily driven by model choice, token/embedding volume, and request rates, with indirect costs from vector storage, compute runtimes, logging, and network egress. Security success depends on least-privilege IAM, careful secrets handling, controlled logging/redaction, and robust RAG defenses against prompt injection and data leakage.

Use Generative AI when you want managed inference tightly integrated with OCI security and operations. For the next learning step, build a small RAG service: embeddings + vector search + grounded prompting—then add production controls (rate limiting, monitoring, cost budgets, and security reviews) before scaling to real users.

rajeshkumar

Category