{"id":838,"date":"2026-04-16T08:42:04","date_gmt":"2026-04-16T08:42:04","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/oracle-cloud-generative-ai-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics-and-ai\/"},"modified":"2026-04-16T08:42:04","modified_gmt":"2026-04-16T08:42:04","slug":"oracle-cloud-generative-ai-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics-and-ai","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/oracle-cloud-generative-ai-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics-and-ai\/","title":{"rendered":"Oracle Cloud Generative AI Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Analytics and AI"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p>Analytics and AI<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<p>Oracle Cloud <strong>Generative AI<\/strong> is a managed service in the <strong>Analytics and AI<\/strong> portfolio that lets you call large language models (LLMs) and related foundation models through APIs and the Oracle Cloud Console. You use it to generate text, chat, summarize, extract information, and create embeddings for semantic search and retrieval-augmented generation (RAG), without standing up and operating model-serving infrastructure yourself.<\/p>\n\n\n\n<p>In simple terms: <strong>you send a prompt (and optional context) to Generative AI, and it returns a model response<\/strong>. You pay for usage (pricing varies by model and region), and Oracle Cloud handles capacity, patching, and the service control plane.<\/p>\n\n\n\n<p>Technically, Generative AI exposes model inference endpoints (and associated request\/response schemas) secured by Oracle Cloud Infrastructure (OCI) Identity and Access Management (IAM). Your application authenticates with OCI (API keys, instance principals, resource principals, etc.), calls the Generative AI inference API in a specific region and compartment scope, and receives outputs such as generated text or vector embeddings. You typically integrate it with data sources (Object Storage, databases, search engines), application runtimes (Functions, Kubernetes), and observability (Logging, Audit) to build production systems.<\/p>\n\n\n\n<p>The service solves the problem of <strong>reliably consuming foundation models in enterprise environments<\/strong>\u2014with OCI IAM, compartments, policies, audit trails, and architecture patterns that platform teams can govern\u2014while reducing the operational burden of self-hosting GPUs and model servers.<\/p>\n\n\n\n<blockquote>\n<p>Naming note (verify in official docs): Oracle\u2019s official documentation often refers to the service as <strong>\u201cOCI Generative AI\u201d<\/strong>. In the Console it may appear as <strong>\u201cGenerative AI\u201d<\/strong> under Analytics &amp; AI. This tutorial uses <strong>Generative AI<\/strong> as the primary service name, aligned to the requested mapping, and calls out OCI-specific terms where required.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is Generative AI?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Official purpose (what it\u2019s for)<\/h3>\n\n\n\n<p>Generative AI in Oracle Cloud is a managed service to <strong>run inference<\/strong> on supported generative models (for example, chat\/text generation models and embedding models) using <strong>OCI-native security and governance<\/strong>. It is intended for building applications such as assistants, document summarizers, knowledge search, content generation, and automation workflows.<\/p>\n\n\n\n<p>For the most current statement of scope and supported model families, verify in official documentation:\n&#8211; https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/generative-ai\/home.htm<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Core capabilities (what you can do)<\/h3>\n\n\n\n<p>Common, practical capabilities include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Chat\/text generation<\/strong>: Provide instructions and context; get natural-language responses.<\/li>\n<li><strong>Summarization and rewriting<\/strong>: Summarize long text, rewrite for tone, extract action items.<\/li>\n<li><strong>Information extraction<\/strong>: Extract entities, structured fields, or key points (often via prompting).<\/li>\n<li><strong>Embeddings<\/strong>: Convert text into vectors for semantic similarity search and RAG pipelines.<\/li>\n<li><strong>Reranking (if available for selected models; verify)<\/strong>: Improve search relevance by reranking candidate passages.<\/li>\n<\/ul>\n\n\n\n<p>Exact features depend on which models Oracle makes available in your region and your tenancy.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Major components (how it\u2019s organized)<\/h3>\n\n\n\n<p>Generative AI typically includes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Models \/ model catalog<\/strong>: The list of supported models and their identifiers (often used in API requests).<\/li>\n<li><strong>Inference API<\/strong>: Endpoints you call for chat\/text generation and embeddings.<\/li>\n<li><strong>Serving modes (availability varies; verify)<\/strong>: Many managed AI offerings distinguish on-demand shared capacity vs. dedicated capacity. If your tenancy has access to multiple serving options, choose based on throughput, latency, and isolation needs.<\/li>\n<li><strong>OCI IAM integration<\/strong>: Policies to control which groups\/apps can use the service and in which compartments.<\/li>\n<li><strong>Audit and observability hooks<\/strong>: OCI Audit events for API calls; optional Logging\/Monitoring patterns depending on your application design.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Service type<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed AI inference service<\/strong> (not a general compute service)<\/li>\n<li>Consumed primarily via <strong>API\/SDK\/CLI<\/strong> and sometimes via <strong>Console playgrounds<\/strong> (availability depends on region and current UI).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scope: regional vs. global, tenancy vs. project<\/h3>\n\n\n\n<p>Generative AI is typically <strong>regional<\/strong> in OCI: you select a region, and inference calls go to that region\u2019s service endpoint. Resources and access control are <strong>tenancy-based<\/strong> and <strong>compartment-scoped<\/strong> (OCI\u2019s standard governance model).<\/p>\n\n\n\n<p>Because Oracle may expand regions and model availability over time, confirm your region support here:\n&#8211; Verify in official docs and your Console\u2019s region selector.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Fit in the Oracle Cloud ecosystem<\/h3>\n\n\n\n<p>Generative AI is usually part of an OCI solution that includes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Networking<\/strong>: VCN, private subnets, NAT Gateway or Service Gateway (depending on architecture)<\/li>\n<li><strong>Data<\/strong>: Object Storage, Autonomous Database \/ Oracle Database services, streaming\/logs<\/li>\n<li><strong>App runtime<\/strong>: OCI Functions, Compute instances, OKE (Kubernetes)<\/li>\n<li><strong>Security<\/strong>: IAM policies, Vault for secrets, Cloud Guard (governance), Audit<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use Generative AI?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Faster delivery of AI-powered features<\/strong>: Add chat\/search\/summarization without building model hosting.<\/li>\n<li><strong>Lower operational burden<\/strong>: No GPU fleet management, patching, scaling, or model server maintenance.<\/li>\n<li><strong>Enterprise governance<\/strong>: OCI compartments, tagging, IAM policies, and audit trails support regulated environments.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Standard APIs and SDKs<\/strong>: Integrate using OCI SDKs (Python\/Java\/Go\/JS, etc.) and authenticated REST calls.<\/li>\n<li><strong>Embeddings support for RAG<\/strong>: A practical path to build knowledge assistants over internal documents.<\/li>\n<li><strong>Regional deployment<\/strong>: Keep workloads near your data and applications in OCI regions (subject to availability).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Separation of duties<\/strong>: Platform teams manage IAM\/policies and networking; app teams consume APIs.<\/li>\n<li><strong>Repeatable deployment patterns<\/strong>: Use Terraform for IAM\/networking and CI\/CD for apps.<\/li>\n<li><strong>Observability alignment<\/strong>: Use OCI Audit for API activity and your application logs for prompt\/response metadata (with redaction).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI IAM policy enforcement<\/strong>: Centralized access control with least privilege.<\/li>\n<li><strong>OCI Audit<\/strong>: Trace who invoked what (for governance).<\/li>\n<li><strong>Data residency alignment<\/strong>: Regional endpoints help with residency requirements (verify exact commitments in Oracle policies and your contracts).<\/li>\n<\/ul>\n\n\n\n<blockquote>\n<p>Important: Data handling for prompts\/responses is a policy and contractual topic. Verify Oracle\u2019s published data usage statements and your organization\u2019s compliance requirements in the official documentation and legal terms.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Elastic inference<\/strong>: On-demand inference reduces capacity planning for variable workloads.<\/li>\n<li><strong>Dedicated capacity (if available; verify)<\/strong>: For consistent latency and throughput, some tenancies can use dedicated serving options.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose it<\/h3>\n\n\n\n<p>Choose Generative AI on Oracle Cloud when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You already run workloads on OCI and want <strong>native IAM + compartment governance<\/strong>.<\/li>\n<li>You need <strong>managed inference<\/strong> rather than self-hosting models.<\/li>\n<li>Your workloads benefit from <strong>embeddings + RAG<\/strong> and you want a managed foundation model endpoint.<\/li>\n<li>You need a path to production that fits OCI networking\/security patterns.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should not choose it<\/h3>\n\n\n\n<p>Consider alternatives when:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You must run a <strong>specific model<\/strong> not available in Generative AI and cannot accept substitutes.<\/li>\n<li>You need <strong>full control<\/strong> over model weights, fine-tuning pipelines, or custom model serving.<\/li>\n<li>You require <strong>offline\/air-gapped<\/strong> deployments that cannot call managed cloud endpoints.<\/li>\n<li>Your cost model strongly favors <strong>self-hosted inference<\/strong> at very high steady-state volumes (after careful benchmarking).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is Generative AI used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Financial services (customer support summarization, compliance drafting assistance)<\/li>\n<li>Healthcare\/life sciences (non-diagnostic document workflows, policy Q&amp;A)<\/li>\n<li>Retail\/e-commerce (product content generation, support bots)<\/li>\n<li>Manufacturing (maintenance knowledge assistants)<\/li>\n<li>SaaS\/technology (in-app copilots, ticket triage)<\/li>\n<li>Public sector (policy document search and summarization; verify procurement constraints)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>App developers building assistants and copilots<\/li>\n<li>Data engineers and analytics teams building search\/RAG pipelines<\/li>\n<li>Platform and cloud engineering teams governing AI access<\/li>\n<li>Security teams implementing data controls and monitoring<\/li>\n<li>SRE\/operations teams running production services at scale<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Internal knowledge assistant for SOPs\/runbooks<\/li>\n<li>Customer service agent assist (draft replies, summarize conversations)<\/li>\n<li>Document processing pipeline (summaries + extracted fields)<\/li>\n<li>Semantic search over policies\/contracts\/product docs<\/li>\n<li>Content generation with human review (marketing, documentation, email drafts)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Web application + backend API calling Generative AI<\/li>\n<li>Event-driven pipelines (Functions) summarizing new documents<\/li>\n<li>RAG with embeddings + vector store (DB\/search) + Generative AI chat<\/li>\n<li>Hybrid: on-prem data sources with secure transfer into OCI + inference calls<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Real-world deployment contexts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Dev\/test<\/strong>: Explore prompts in a playground or a small app; use on-demand capacity; minimal IAM.<\/li>\n<li><strong>Production<\/strong>: Enforce least privilege policies; integrate Vault; implement logging\/redaction; use rate limiting; implement fallbacks; track cost and quality metrics.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p>Below are realistic scenarios where Oracle Cloud Generative AI is commonly applied.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Support ticket summarization and next-step extraction<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Support queues contain long back-and-forth threads; agents need quick context and next actions.<\/li>\n<li><strong>Why this service fits<\/strong>: Chat\/text generation models can summarize and extract structured outputs (via prompting).<\/li>\n<li><strong>Example<\/strong>: A helpdesk system sends ticket history to Generative AI and stores a summary + \u201caction items\u201d field.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Internal knowledge base Q&amp;A (RAG)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Employees can\u2019t find answers across scattered docs.<\/li>\n<li><strong>Why this service fits<\/strong>: Embeddings enable semantic search; the chat model can answer using retrieved passages.<\/li>\n<li><strong>Example<\/strong>: Index HR policies into embeddings; retrieve top passages; generate answers with citations.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Meeting notes summarization (non-realtime)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Teams lose decisions and action items in long notes.<\/li>\n<li><strong>Why this service fits<\/strong>: Summarization prompts are straightforward; batch processing is cost-controlled.<\/li>\n<li><strong>Example<\/strong>: After a meeting, a pipeline summarizes notes into decisions\/risks\/tasks.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Contract clause extraction (assistive, with review)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Legal teams need key clause extraction at scale.<\/li>\n<li><strong>Why this service fits<\/strong>: Models can extract fields (termination date, governing law) with careful prompting and validation.<\/li>\n<li><strong>Example<\/strong>: A document workflow extracts clauses and flags missing items; attorneys review.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Developer documentation assistant<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Engineers need answers from internal runbooks and service docs.<\/li>\n<li><strong>Why this service fits<\/strong>: RAG reduces hallucinations by grounding answers in retrieved documents.<\/li>\n<li><strong>Example<\/strong>: A Slack bot answers \u201cHow do I rotate OCI API keys?\u201d citing internal SOP sections.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Product content drafting with brand guidelines<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Thousands of SKUs need consistent descriptions.<\/li>\n<li><strong>Why this service fits<\/strong>: Template-based prompting generates drafts quickly; humans approve.<\/li>\n<li><strong>Example<\/strong>: For each SKU, generate a description in the company tone and store as a draft.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Log\/incident report summarization (ops)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Postmortems take time; incident timelines are long.<\/li>\n<li><strong>Why this service fits<\/strong>: Summarize timelines and extract likely root cause hypotheses (with human validation).<\/li>\n<li><strong>Example<\/strong>: Compile incident Slack thread + alerts into a structured incident summary.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Call center agent assist (draft responses)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Agents need suggested responses that match policy and tone.<\/li>\n<li><strong>Why this service fits<\/strong>: Chat models can draft replies; RAG can ensure policy grounding.<\/li>\n<li><strong>Example<\/strong>: Provide relevant policy snippets; ask the model to draft a response; agent edits before sending.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Multilingual rewriting and translation (with review)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Global teams need content localized quickly.<\/li>\n<li><strong>Why this service fits<\/strong>: Many LLMs handle multilingual tasks; outputs still require QA.<\/li>\n<li><strong>Example<\/strong>: Translate internal announcements; keep a consistent tone.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Search relevance improvements (embeddings + rerank if available)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Keyword search returns irrelevant results.<\/li>\n<li><strong>Why this service fits<\/strong>: Embeddings and reranking can improve relevance beyond keyword matching.<\/li>\n<li><strong>Example<\/strong>: For a query, retrieve candidates via BM25, rerank with a model, and show top answers.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) Data catalog description generation<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Data assets lack usable descriptions and ownership metadata.<\/li>\n<li><strong>Why this service fits<\/strong>: Generate plain-language descriptions from schema\/metadata.<\/li>\n<li><strong>Example<\/strong>: Summarize a table\u2019s columns into a human-friendly description and suggested owners.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) Compliance policy drafting assistant (not final authority)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Security\/compliance teams draft repetitive policy language.<\/li>\n<li><strong>Why this service fits<\/strong>: Drafting patterns are consistent; humans review and approve.<\/li>\n<li><strong>Example<\/strong>: Generate an initial access control policy draft based on requirements, then review.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<blockquote>\n<p>Feature availability depends on region, tenancy, and the models offered at the time. Always verify current capabilities in the official Generative AI docs.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">1) Managed access to foundation models (model catalog)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Provides a set of supported models you can call via OCI APIs.<\/li>\n<li><strong>Why it matters<\/strong>: Reduces the need to source, host, and patch model servers.<\/li>\n<li><strong>Practical benefit<\/strong>: Faster proof-of-concept to production with standard governance.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Model availability differs by region; some models may have specific usage policies.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Chat\/text generation inference API<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Accepts prompts and returns generated text (chat responses, summaries, rewrites).<\/li>\n<li><strong>Why it matters<\/strong>: Enables natural language interfaces and automation.<\/li>\n<li><strong>Practical benefit<\/strong>: Implement assistants, summarizers, and drafting tools.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Outputs can be incorrect; you must implement validation, grounding, and human review where needed.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Embeddings inference API<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Converts text to vectors for semantic similarity search.<\/li>\n<li><strong>Why it matters<\/strong>: Embeddings are foundational for RAG and semantic search.<\/li>\n<li><strong>Practical benefit<\/strong>: Build \u201csearch by meaning\u201d across documents and tickets.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: You must manage a vector store (database\/search engine\/local index). Embedding dimension\/model choice affects recall and cost.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Serving mode options (on-demand vs. dedicated) (verify availability)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Lets you choose shared on-demand inference or dedicated capacity, depending on what Oracle offers in your tenancy\/region.<\/li>\n<li><strong>Why it matters<\/strong>: Production workloads often need predictable latency and throughput.<\/li>\n<li><strong>Practical benefit<\/strong>: Match cost and performance to workload shape.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Dedicated capacity may require provisioning, quotas, or contracts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) OCI IAM integration (compartments + policies)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Controls who\/what can call Generative AI and where.<\/li>\n<li><strong>Why it matters<\/strong>: Enterprise-grade access control and separation of duties.<\/li>\n<li><strong>Practical benefit<\/strong>: Enforce least privilege, environment separation (dev\/test\/prod), and auditing.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Misconfigured policies are a common cause of \u201cNotAuthorizedOrNotFound\u201d.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) OCI Audit visibility for API activity<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Records calls to OCI services, including Generative AI API operations, in Audit logs.<\/li>\n<li><strong>Why it matters<\/strong>: Governance, incident response, and compliance.<\/li>\n<li><strong>Practical benefit<\/strong>: Track who accessed AI endpoints and when.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: Audit logs won\u2019t automatically capture full prompts\/responses; your application logging design matters.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) SDK and API support (multi-language)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Enables integration via OCI SDKs and REST APIs.<\/li>\n<li><strong>Why it matters<\/strong>: Reduces custom auth code; consistent OCI patterns.<\/li>\n<li><strong>Practical benefit<\/strong>: Quick integration from Python\/Java\/Node\/Go apps.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: SDK versions must match API versions; verify sample code against current SDK docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Compartment-based organization and governance<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Lets you isolate AI usage by project\/team\/environment.<\/li>\n<li><strong>Why it matters<\/strong>: Cost tracking, least privilege, blast-radius control.<\/li>\n<li><strong>Practical benefit<\/strong>: Separate dev\/test\/prod access and budgets.<\/li>\n<li><strong>Limitations\/caveats<\/strong>: You must design your compartment structure intentionally.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level service architecture<\/h3>\n\n\n\n<p>At a high level, your app (or a developer in the Console) sends an inference request to the Generative AI endpoint in a region. The request includes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Authentication (OCI signed request via SDK\/CLI)<\/li>\n<li>Compartment context (where policies apply)<\/li>\n<li>Model identifier (which model to run)<\/li>\n<li>Prompt\/input parameters (messages, temperature, max tokens, etc.)<\/li>\n<\/ul>\n\n\n\n<p>The service returns a response:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Generated text or chat response<\/li>\n<li>Embeddings vectors<\/li>\n<li>Metadata (request id, token usage if provided by model\/service; verify exact fields)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Request\/data\/control flow<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Identity<\/strong>: The caller authenticates using OCI IAM (user API key, instance principal, resource principal).<\/li>\n<li><strong>Authorization<\/strong>: OCI IAM evaluates policies for the target compartment and the Generative AI service family.<\/li>\n<li><strong>Inference<\/strong>: The service routes the request to the selected model\/serving mode.<\/li>\n<li><strong>Response<\/strong>: Output is returned over HTTPS to the client.<\/li>\n<li><strong>Audit<\/strong>: OCI Audit records relevant API activity.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Integrations with related OCI services<\/h3>\n\n\n\n<p>Common integrations include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI Object Storage<\/strong>: Store documents for RAG ingestion and audit-friendly retention.<\/li>\n<li><strong>OCI Functions<\/strong>: Event-driven summarization and extraction pipelines.<\/li>\n<li><strong>OKE (Kubernetes)<\/strong>: Run your RAG backend and API services.<\/li>\n<li><strong>OCI Vault<\/strong>: Store secrets (API keys for external systems; OCI API keys usually live in config files\u2014prefer principals when possible).<\/li>\n<li><strong>OCI Logging<\/strong>: Centralize app logs (prompt metadata, latency, errors) with redaction.<\/li>\n<li><strong>OCI Monitoring<\/strong>: Track application metrics (requests, latency, token usage).<\/li>\n<li><strong>Databases\/Search<\/strong>: Store embeddings and content indexes (verify best-fit OCI offerings for vector storage in your environment).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services<\/h3>\n\n\n\n<p>Generative AI depends on standard OCI foundations:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>IAM (users, groups, policies)<\/li>\n<li>Networking (public endpoints; private connectivity patterns depend on your design)<\/li>\n<li>Regional availability of the service and chosen models<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model<\/h3>\n\n\n\n<p>OCI authentication options you\u2019ll commonly use:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>User principals (API keys)<\/strong>: Good for developer testing; riskier for production if keys leak.<\/li>\n<li><strong>Instance principals<\/strong>: Best for OCI Compute-based apps (no long-lived keys).<\/li>\n<li><strong>Resource principals<\/strong>: Best for OCI Functions and some managed services.<\/li>\n<li><strong>OKE workload identity<\/strong>: If supported in your environment; otherwise use instance principals via node pools (verify current OCI guidance).<\/li>\n<\/ul>\n\n\n\n<p>Authorization is controlled by <strong>policies<\/strong> granting access to the Generative AI service family in a compartment.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Generative AI is typically accessed via <strong>HTTPS endpoints<\/strong>.<\/li>\n<li>From private subnets, you may route outbound traffic via:<\/li>\n<li><strong>NAT Gateway<\/strong> for internet egress, or<\/li>\n<li><strong>Service Gateway<\/strong> for access to Oracle services on the Oracle Services Network (common OCI pattern; verify that Generative AI is reachable via service gateway in your region).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring\/logging\/governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Audit<\/strong>: Enable and review for access patterns.<\/li>\n<li><strong>Application logs<\/strong>: Store request ids, latency, model id, and token counts (if available).<\/li>\n<li><strong>Redaction<\/strong>: Avoid logging raw PII prompts\/responses.<\/li>\n<li><strong>Tagging<\/strong>: Tag compartments\/resources used by surrounding architecture for cost allocation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  U[Developer \/ App] --&gt;|OCI Auth + HTTPS| GAI[Oracle Cloud Generative AI&lt;br\/&gt;Inference API]\n  GAI --&gt; R[Response&lt;br\/&gt;Text \/ Embeddings]\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph VCN[OCI VCN]\n    subgraph APP[Private Subnet]\n      SVC[App Service&lt;br\/&gt;API \/ RAG Backend]\n      VS[(Vector Store&lt;br\/&gt;DB\/Search)]\n      OBJ[(Object Storage&lt;br\/&gt;Docs)]\n    end\n    NAT[NAT or Service Gateway&lt;br\/&gt;(depends on design)]\n  end\n\n  ID[IAM Policies&lt;br\/&gt;Groups\/Principals] --&gt; SVC\n  SVC --&gt;|Retrieve docs| OBJ\n  SVC --&gt;|Embed + similarity search| VS\n  SVC --&gt;|OCI-signed HTTPS call| NAT --&gt; GAI[Generative AI Inference API&lt;br\/&gt;(Regional)]\n  GAI --&gt; SVC\n\n  SVC --&gt; LOG[OCI Logging]\n  SVC --&gt; MET[OCI Monitoring Metrics]\n  AUD[OCI Audit] --&gt; SEC[Security Review \/ SIEM]\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Tenancy\/account requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An <strong>Oracle Cloud<\/strong> tenancy with access to <strong>Analytics and AI<\/strong> services.<\/li>\n<li>Generative AI service enabled\/available in your tenancy and chosen region (availability may vary).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions\/IAM roles<\/h3>\n\n\n\n<p>You need permissions to:\n&#8211; Use Generative AI in a target <strong>compartment<\/strong>\n&#8211; Read model information (if required by the workflow)<\/p>\n\n\n\n<p>OCI policy examples (verify exact policy verbs and service family name in official docs):<\/p>\n\n\n\n<pre><code class=\"language-text\">Allow group &lt;group-name&gt; to use generative-ai-family in compartment &lt;compartment-name&gt;\n<\/code><\/pre>\n\n\n\n<p>In some environments you may need broader permissions during setup, then tighten later. Always validate the least-privilege set that still works for your use case.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Billing requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A paid tenancy or an active billing account.<\/li>\n<li>Free Tier applicability varies; do not assume Generative AI is covered. Verify on Oracle\u2019s Free Tier and pricing pages.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Tools needed<\/h3>\n\n\n\n<p>Choose one approach:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Console-only<\/strong> (quick testing):<\/li>\n<li>\n<p>Oracle Cloud Console access<\/p>\n<\/li>\n<li>\n<p><strong>CLI\/SDK lab<\/strong> (recommended for this tutorial):<\/p>\n<\/li>\n<li>OCI Cloud Shell (recommended) or local machine<\/li>\n<li>Python 3.9+ (or a version supported by OCI Python SDK)<\/li>\n<li>OCI Python SDK: https:\/\/docs.oracle.com\/en-us\/iaas\/tools\/python\/latest\/<\/li>\n<li>OCI CLI (optional): https:\/\/docs.oracle.com\/en-us\/iaas\/tools\/oci-cli\/latest\/<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Select a region where Generative AI is available and where your chosen model is offered.<\/li>\n<li>Confirm in the Console under Analytics &amp; AI \u2192 Generative AI (or via official docs).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas\/limits<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Service limits exist for request rates, concurrency, token sizes, and possibly dedicated capacity.<\/li>\n<li>Review OCI service limits and request increases if needed (common for production).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services (for the hands-on lab)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>IAM user or principal configuration<\/li>\n<li>(Optional) Object Storage bucket if you extend the lab to RAG with documents<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<blockquote>\n<p>Do not rely on blog posts for AI pricing. Always confirm on Oracle\u2019s official pricing pages for your region and the specific model\/SKU.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Current pricing model (how you\u2019re billed)<\/h3>\n\n\n\n<p>Generative AI pricing is typically <strong>usage-based<\/strong>, and commonly depends on dimensions such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Model type<\/strong> (chat\/generation vs embeddings; premium vs standard models)<\/li>\n<li><strong>Tokens processed<\/strong> (input tokens + output tokens) for text\/chat models (common industry pattern; verify OCI\u2019s exact metering units)<\/li>\n<li><strong>Characters\/tokens<\/strong> for embeddings (verify metering units)<\/li>\n<li><strong>Serving mode<\/strong> (on-demand vs dedicated capacity) if both are available in your tenancy<\/li>\n<li><strong>Region<\/strong> (pricing differs by region)<\/li>\n<\/ul>\n\n\n\n<p>Official pricing entry points:\n&#8211; Oracle Cloud price list: https:\/\/www.oracle.com\/cloud\/price-list\/\n&#8211; Oracle Cloud cost estimator: https:\/\/www.oracle.com\/cloud\/costestimator.html<\/p>\n\n\n\n<p>Search the price list for <strong>AI Services<\/strong> and <strong>Generative AI<\/strong>. If your organization has a contract, your negotiated rates may differ from list pricing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Free tier<\/h3>\n\n\n\n<p>Oracle has a Free Tier program, but <strong>Generative AI may not be included<\/strong> or may have limited free usage depending on current offers. Verify:\n&#8211; https:\/\/www.oracle.com\/cloud\/free\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Cost drivers (what increases spend)<\/h3>\n\n\n\n<p>Direct drivers:\n&#8211; Larger prompts (more input tokens)\n&#8211; Larger outputs (more output tokens)\n&#8211; More calls (higher request volume)\n&#8211; Higher-priced models<\/p>\n\n\n\n<p>Indirect drivers:\n&#8211; Vector storage costs (database\/search service)\n&#8211; Object Storage costs for documents\n&#8211; Compute\/runtime costs for your RAG service (Functions\/Compute\/OKE)\n&#8211; Logging retention costs if you store large payloads\n&#8211; Data egress if you send responses outside OCI (networking costs vary)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Network\/data transfer implications<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Calls to Generative AI are HTTPS; if your app runs outside OCI, you may pay <strong>egress<\/strong> from OCI or from your hosting provider depending on traffic direction.<\/li>\n<li>If your app runs inside OCI, prefer OCI-native networking patterns to minimize public internet exposure.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost (practical)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Set max output tokens<\/strong> to a reasonable limit for each endpoint.<\/li>\n<li><strong>Summarize context<\/strong> before sending long documents; don\u2019t stuff entire PDFs into prompts.<\/li>\n<li>Use <strong>RAG<\/strong>: retrieve only the top relevant chunks rather than sending entire corpora.<\/li>\n<li>Choose <strong>embeddings model<\/strong> appropriate for your recall\/latency\/cost needs.<\/li>\n<li>Implement <strong>caching<\/strong> for repeated queries (prompt hash \u2192 response).<\/li>\n<li>Track token usage (if exposed) and build cost dashboards at the app level.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (no fabricated prices)<\/h3>\n\n\n\n<p>A small pilot\u2019s cost is primarily a function of:\n&#8211; number of requests\/day,\n&#8211; average input size,\n&#8211; average output size,\n&#8211; chosen model.<\/p>\n\n\n\n<p>To estimate without guessing numbers:\n1. Pick a model in the OCI pricing list.\n2. Estimate average tokens in\/out per request.\n3. Multiply by expected request volume.\n4. Add compute\/logging\/storage.<\/p>\n\n\n\n<p>Use Oracle\u2019s cost estimator:\n&#8211; https:\/\/www.oracle.com\/cloud\/costestimator.html<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p>For production, plan for:\n&#8211; peak request rates (capacity and potential dedicated serving)\n&#8211; multi-environment usage (dev\/test\/prod)\n&#8211; observability retention (log volume)\n&#8211; vector store scale (embedding count \u00d7 dimension \u00d7 indexing overhead)\n&#8211; A\/B testing different models\/prompts (can double or triple usage temporarily)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<p>This lab is designed to be <strong>beginner-friendly<\/strong>, <strong>low-risk<\/strong>, and <strong>practical<\/strong>. You will:\n1) confirm access and IAM,\n2) identify a model to use,\n3) call Generative AI from Python to summarize text and extract action items,\n4) optionally generate embeddings for a few sample documents to power a tiny semantic search,\n5) clean up.<\/p>\n\n\n\n<p>If any UI labels or SDK classes differ in your tenancy, follow the closest equivalent in the official docs and SDK references.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<p>Build a small command-line tool that uses Oracle Cloud <strong>Generative AI<\/strong> to:\n&#8211; summarize a support ticket conversation, and\n&#8211; extract action items,\nthen (optional) generate embeddings for three knowledge snippets and perform a local similarity match.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Environment<\/strong>: Oracle Cloud Shell (recommended) or local machine<\/li>\n<li><strong>Auth<\/strong>: OCI config file (developer-friendly) or principals (production-friendly)<\/li>\n<li><strong>Service<\/strong>: Generative AI inference API<\/li>\n<li><strong>Cost control<\/strong>: small prompts, capped output tokens<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Confirm service availability and choose a compartment<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Log in to the <strong>Oracle Cloud Console<\/strong>.<\/li>\n<li>Select your target <strong>Region<\/strong> (top-right).<\/li>\n<li>Navigate to <strong>Analytics &amp; AI \u2192 Generative AI<\/strong> (exact navigation can vary).<\/li>\n<li>Pick (or create) a compartment for the lab, for example: <code>ai-labs<\/code>.<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; You can open the Generative AI page in the Console and select a compartment.<\/p>\n\n\n\n<p><strong>Verification<\/strong>\n&#8211; If the service isn\u2019t visible, confirm:\n  &#8211; your region supports it,\n  &#8211; your tenancy is entitled to it,\n  &#8211; your IAM user has permissions.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create or confirm IAM permissions<\/h3>\n\n\n\n<p>You need permission to call Generative AI in the compartment.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In Console, go to <strong>Identity &amp; Security \u2192 IAM \u2192 Policies<\/strong>.<\/li>\n<li>Create a policy in the root compartment (or appropriate parent) that grants your group access.<\/li>\n<\/ol>\n\n\n\n<p>Example (verify exact wording in official docs for your tenancy):<\/p>\n\n\n\n<pre><code class=\"language-text\">Allow group &lt;your-group&gt; to use generative-ai-family in compartment ai-labs\n<\/code><\/pre>\n\n\n\n<p>If you plan to list models or manage related resources, you might need broader permissions, but start with least privilege.<\/p>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; Policy is created and attached.<\/p>\n\n\n\n<p><strong>Verification<\/strong>\n&#8211; Wait a minute (OCI policy propagation can take a short time).\n&#8211; You should be able to access Generative AI pages for the compartment.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Set up authentication for SDK calls (OCI config)<\/h3>\n\n\n\n<p>Use one of these approaches:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Option A (recommended for labs): Cloud Shell + OCI config<\/h4>\n\n\n\n<p>Oracle Cloud Shell often comes with OCI CLI configured for the logged-in user. You still may need an API key for SDK use depending on your setup.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Open <strong>Cloud Shell<\/strong> from the Console.<\/li>\n<li>Check if <code>~\/.oci\/config<\/code> exists:<\/li>\n<\/ol>\n\n\n\n<pre><code class=\"language-bash\">ls -la ~\/.oci\ncat ~\/.oci\/config\n<\/code><\/pre>\n\n\n\n<p>If it does not exist or is incomplete, create an API key for your IAM user:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Console \u2192 <strong>Identity &amp; Security \u2192 IAM \u2192 Users \u2192 <your user=\"\"> \u2192 API Keys<\/your><\/strong><\/li>\n<li>Add API key and download the private key<\/li>\n<li>Save it securely and update <code>~\/.oci\/config<\/code><\/li>\n<\/ul>\n\n\n\n<p>A minimal <code>~\/.oci\/config<\/code> profile looks like this:<\/p>\n\n\n\n<pre><code class=\"language-ini\">[DEFAULT]\nuser=ocid1.user.oc1..exampleuniqueID\nfingerprint=aa:bb:cc:dd:...\ntenancy=ocid1.tenancy.oc1..exampleuniqueID\nregion=us-ashburn-1\nkey_file=\/home\/&lt;you&gt;\/.oci\/oci_api_key.pem\n<\/code><\/pre>\n\n\n\n<blockquote>\n<p>Keep the private key file permissions restricted (<code>chmod 600<\/code>).<\/p>\n<\/blockquote>\n\n\n\n<h4 class=\"wp-block-heading\">Option B (production pattern): Instance principals or resource principals<\/h4>\n\n\n\n<p>For production, prefer <strong>instance principals<\/strong> (Compute) or <strong>resource principals<\/strong> (Functions) to avoid distributing long-lived keys. Implementation differs by runtime; verify with OCI docs.<\/p>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; You have working OCI authentication for SDK calls.<\/p>\n\n\n\n<p><strong>Verification<\/strong>\nRun a simple CLI command (optional):<\/p>\n\n\n\n<pre><code class=\"language-bash\">oci os ns get\n<\/code><\/pre>\n\n\n\n<p>If this works, your identity\/auth is likely set.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Identify a model to call (model OCID\/identifier)<\/h3>\n\n\n\n<p>In the Console:\n1. Go to <strong>Analytics &amp; AI \u2192 Generative AI<\/strong>.\n2. Find the <strong>Models<\/strong> list\/catalog (UI naming varies).\n3. Choose a model suitable for chat\/summarization.\n4. Copy the <strong>model identifier<\/strong> (often an OCID or a model id string).<\/p>\n\n\n\n<p>Save it in an environment variable in Cloud Shell:<\/p>\n\n\n\n<pre><code class=\"language-bash\">export COMPARTMENT_OCID=\"ocid1.compartment.oc1..example\"\nexport MODEL_ID=\"ocid1.generativeaimodel.oc1..example\"   # verify format in your console\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; You have a compartment OCID and a model identifier for inference.<\/p>\n\n\n\n<p><strong>Verification<\/strong>\n&#8211; Double-check there are no extra spaces or quotes.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Create a Python virtual environment and install OCI SDK<\/h3>\n\n\n\n<p>In Cloud Shell:<\/p>\n\n\n\n<pre><code class=\"language-bash\">python3 -m venv .venv\nsource .venv\/bin\/activate\npython -m pip install --upgrade pip\npython -m pip install oci\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; <code>oci<\/code> Python package installed.<\/p>\n\n\n\n<p><strong>Verification<\/strong><\/p>\n\n\n\n<pre><code class=\"language-bash\">python -c \"import oci; print(oci.__version__)\"\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Call Generative AI to summarize and extract action items<\/h3>\n\n\n\n<p>Create a file named <code>genai_summarize.py<\/code>:<\/p>\n\n\n\n<pre><code class=\"language-python\">import os\nimport sys\nimport oci\n\ndef main():\n    compartment_id = os.environ.get(\"COMPARTMENT_OCID\")\n    model_id = os.environ.get(\"MODEL_ID\")\n    if not compartment_id or not model_id:\n        print(\"Set COMPARTMENT_OCID and MODEL_ID environment variables.\")\n        sys.exit(1)\n\n    config = oci.config.from_file()  # uses ~\/.oci\/config [DEFAULT]\n\n    # NOTE: OCI SDK module\/class names can change across versions.\n    # Verify current Generative AI Inference SDK in official docs if this import fails.\n    from oci.generative_ai_inference import GenerativeAiInferenceClient\n    from oci.generative_ai_inference.models import (\n        ChatDetails,\n        OnDemandServingMode,\n        CohereChatRequest,  # model request schema depends on provider\/model; verify in docs\n    )\n\n    client = GenerativeAiInferenceClient(config)\n\n    ticket_text = \"\"\"\nSubject: VPN access failing\n\nUser: I can't connect to VPN since yesterday. Error: TLS handshake failed.\nAgent: Are you on home Wi-Fi or mobile hotspot?\nUser: Home Wi-Fi. It worked last week.\nAgent: Please confirm your client version.\nUser: 5.1.2\nAgent: We recently rotated certificates. Try updating to 5.1.4 and re-import the new profile.\nUser: Update done, still fails.\nAgent: We'll check if your account is blocked and reset your VPN profile. Also please try from hotspot to rule out ISP interference.\n\"\"\"\n\n    prompt = f\"\"\"\nYou are an IT support assistant. Summarize the ticket and extract action items.\nReturn the result in plain text with two sections:\n\nSummary:\n- (3-5 bullets)\n\nAction Items:\n- (bulleted list, each item starts with an owner: User\/Agent\/SRE)\nTicket text:\n{ticket_text}\n\"\"\"\n\n    details = ChatDetails(\n        compartment_id=compartment_id,\n        serving_mode=OnDemandServingMode(model_id=model_id),\n        chat_request=CohereChatRequest(\n            message=prompt,\n            temperature=0.2,\n            max_tokens=400,\n        ),\n    )\n\n    resp = client.chat(details)\n    # Response fields vary; print the whole data structure safely.\n    print(resp.data)\n\nif __name__ == \"__main__\":\n    main()\n<\/code><\/pre>\n\n\n\n<p>Run it:<\/p>\n\n\n\n<pre><code class=\"language-bash\">python genai_summarize.py\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; The script prints a structured response object containing the model output (summary + action items).<\/p>\n\n\n\n<p><strong>Verification<\/strong>\n&#8211; Confirm the output contains your two requested sections and references the ticket content.\n&#8211; If the SDK returns a nested object, you may need to print the specific text field. Use <code>print(resp.data)<\/code> first, then adjust.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 7 (Optional): Generate embeddings and do a tiny local semantic search<\/h3>\n\n\n\n<p>This step demonstrates the <strong>embeddings<\/strong> workflow without provisioning a database. You\u2019ll embed three short knowledge snippets, embed a query, compute cosine similarity locally, and print the best match.<\/p>\n\n\n\n<p>Create <code>genai_embeddings_search.py<\/code>:<\/p>\n\n\n\n<pre><code class=\"language-python\">import os\nimport sys\nimport math\nimport oci\n\ndef cosine(a, b):\n    dot = sum(x*y for x, y in zip(a, b))\n    na = math.sqrt(sum(x*x for x in a))\n    nb = math.sqrt(sum(y*y for y in b))\n    return dot \/ (na * nb + 1e-12)\n\ndef main():\n    compartment_id = os.environ.get(\"COMPARTMENT_OCID\")\n    model_id = os.environ.get(\"MODEL_ID_EMBED\") or os.environ.get(\"MODEL_ID\")\n    if not compartment_id or not model_id:\n        print(\"Set COMPARTMENT_OCID and MODEL_ID_EMBED (or MODEL_ID).\")\n        sys.exit(1)\n\n    config = oci.config.from_file()\n\n    # NOTE: Verify current SDK names\/schemas for embeddings in official docs.\n    from oci.generative_ai_inference import GenerativeAiInferenceClient\n    from oci.generative_ai_inference.models import (\n        EmbedTextDetails,\n        OnDemandServingMode,\n        CohereEmbedRequest,  # depends on selected embeddings model\/provider\n    )\n\n    client = GenerativeAiInferenceClient(config)\n\n    docs = [\n        (\"vpn_profile_reset\", \"To fix VPN TLS handshake failures after certificate rotation, reset the VPN profile and import the new configuration.\"),\n        (\"client_upgrade\", \"If the VPN client is outdated, upgrade to the latest approved version and reboot before reconnecting.\"),\n        (\"network_isolation\", \"Test from a mobile hotspot to isolate ISP or home router issues when corporate VPN fails.\"),\n    ]\n\n    query = \"VPN TLS handshake failed after certificate changes. What should I do first?\"\n\n    def embed(texts):\n        details = EmbedTextDetails(\n            compartment_id=compartment_id,\n            serving_mode=OnDemandServingMode(model_id=model_id),\n            embed_text_request=CohereEmbedRequest(\n                texts=texts,\n                input_type=\"search_document\"  # verify allowed values for your model\n            ),\n        )\n        resp = client.embed_text(details)\n        return resp.data\n\n    doc_vectors_resp = embed([d[1] for d in docs])\n    query_vector_resp = embed([query])\n\n    # Response parsing varies. Inspect resp.data shape if needed.\n    # The following assumes a structure like: resp.data.embeddings = [[...], [...]]\n    doc_vectors = doc_vectors_resp.embeddings\n    query_vec = query_vector_resp.embeddings[0]\n\n    scored = []\n    for (doc_id, _), vec in zip(docs, doc_vectors):\n        scored.append((cosine(query_vec, vec), doc_id))\n    scored.sort(reverse=True)\n\n    print(\"Query:\", query)\n    print(\"Top match:\", scored[0])\n\nif __name__ == \"__main__\":\n    main()\n<\/code><\/pre>\n\n\n\n<p>Run:<\/p>\n\n\n\n<pre><code class=\"language-bash\">python genai_embeddings_search.py\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome<\/strong>\n&#8211; The script prints the query and the best-matching snippet id (likely <code>vpn_profile_reset<\/code> or <code>client_upgrade<\/code>).<\/p>\n\n\n\n<p><strong>Verification<\/strong>\n&#8211; If it fails due to response field names, print the entire response objects and adjust field access.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p>You have successfully completed the lab if you can:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Call Generative AI and receive a summary\/action items response.<\/li>\n<li>(Optional) Generate embeddings and compute a local similarity match.<\/li>\n<li>See successful requests in your application output and (optionally) OCI Audit logs.<\/li>\n<\/ul>\n\n\n\n<p>To check OCI Audit (high level):\n&#8211; Console \u2192 <strong>Identity &amp; Security \u2192 Audit<\/strong>\n&#8211; Filter by your user and time window, and look for Generative AI-related events (service naming may vary).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<p><strong>1) <code>NotAuthorizedOrNotFound<\/code><\/strong>\n&#8211; Likely IAM policy missing or in wrong compartment.\n&#8211; Fix: verify the policy is in the correct parent compartment and references the right group and compartment.<\/p>\n\n\n\n<p><strong>2) Service not visible in Console<\/strong>\n&#8211; Region may not support Generative AI or your tenancy may not be enabled.\n&#8211; Fix: switch regions; verify entitlement and docs.<\/p>\n\n\n\n<p><strong>3) Python import errors for <code>oci.generative_ai_inference<\/code><\/strong>\n&#8211; OCI Python SDK version may be old.\n&#8211; Fix: upgrade SDK: <code>pip install --upgrade oci<\/code>\n&#8211; Verify SDK docs: https:\/\/docs.oracle.com\/en-us\/iaas\/tools\/python\/latest\/<\/p>\n\n\n\n<p><strong>4) Model id invalid<\/strong>\n&#8211; You might be using the wrong identifier format (OCID vs model string).\n&#8211; Fix: copy the model identifier from the Generative AI model list for your region.<\/p>\n\n\n\n<p><strong>5) Request too large \/ token limit errors<\/strong>\n&#8211; Prompt\/context too long.\n&#8211; Fix: reduce text, chunk documents, cap <code>max_tokens<\/code>, and use RAG retrieval to send only relevant chunks.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p>To avoid ongoing cost:\n&#8211; Delete any optional resources you created (Object Storage bucket, Functions, databases) if you extended the lab.\n&#8211; Remove local virtual environment if desired:<\/p>\n\n\n\n<pre><code class=\"language-bash\">deactivate\nrm -rf .venv\nrm -f genai_summarize.py genai_embeddings_search.py\n<\/code><\/pre>\n\n\n\n<p>For security hygiene:\n&#8211; Rotate\/revoke user API keys used for testing if they\u2019re no longer needed.\n&#8211; Prefer principals (instance\/resource) for production.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Use RAG for enterprise knowledge<\/strong>: Don\u2019t rely on the model\u2019s latent knowledge for internal policies. Retrieve relevant text and ground the answer.<\/li>\n<li><strong>Chunk documents deliberately<\/strong>: 300\u20131,000 tokens per chunk is a common starting point; tune based on retrieval quality.<\/li>\n<li><strong>Add citations<\/strong>: Store chunk ids\/URLs and ask the model to cite them; display citations to users.<\/li>\n<li><strong>Implement fallbacks<\/strong>: If the model call fails or times out, degrade gracefully (keyword search, cached answer).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Least privilege policies<\/strong>: Limit access to only required compartments and actions.<\/li>\n<li><strong>Prefer principals over API keys<\/strong>: Instance principals\/resource principals reduce secret sprawl.<\/li>\n<li><strong>Separate environments<\/strong>: Use separate compartments for dev\/test\/prod and separate policies.<\/li>\n<li><strong>Use Vault for non-OCI secrets<\/strong>: Store DB passwords, external API keys, etc.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Control output length<\/strong> with <code>max_tokens<\/code>.<\/li>\n<li><strong>Avoid sending entire documents<\/strong>; retrieve top passages only.<\/li>\n<li><strong>Cache embeddings<\/strong> and reuse them; don\u2019t re-embed unchanged documents.<\/li>\n<li><strong>Track cost per feature<\/strong>: measure requests\/user\/day and tokens\/request.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Batch embedding requests<\/strong> where supported to reduce overhead.<\/li>\n<li><strong>Keep context tight<\/strong>: shorter prompts reduce latency.<\/li>\n<li><strong>Use asynchronous pipelines<\/strong> for bulk summarization jobs.<\/li>\n<li><strong>Plan rate limiting<\/strong>: protect the service and your budget from accidental loops.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Timeouts and retries<\/strong>: Use bounded retries with jitter for transient errors.<\/li>\n<li><strong>Circuit breakers<\/strong>: Disable model calls temporarily if error rates spike.<\/li>\n<li><strong>Multi-region planning (advanced)<\/strong>: If your product needs HA across regions, design for it at the app layer (verify service parity across regions).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Log request ids<\/strong> returned by OCI\/SDK for support.<\/li>\n<li><strong>Redact sensitive content<\/strong> from logs.<\/li>\n<li><strong>Monitor latency and error rates<\/strong> at the application level.<\/li>\n<li><strong>Tag resources<\/strong> used by the broader solution for chargeback.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Consistent compartment naming: <code>prod-ai<\/code>, <code>dev-ai<\/code>, <code>shared-ai<\/code><\/li>\n<li>Tags: <code>CostCenter<\/code>, <code>DataSensitivity<\/code>, <code>Owner<\/code>, <code>Environment<\/code><\/li>\n<li>Maintain a \u201cmodel registry\u201d document: which models are allowed for which data types.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OCI IAM controls access to Generative AI APIs.<\/li>\n<li>Use:<\/li>\n<li><strong>Groups + policies<\/strong> for humans,<\/li>\n<li><strong>Dynamic groups + instance principals<\/strong> for compute,<\/li>\n<li><strong>Resource principals<\/strong> for Functions.<\/li>\n<\/ul>\n\n\n\n<p>Key principle: <strong>the app identity should have only the minimum permissions<\/strong> to call Generative AI in the required compartment(s).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data in transit is protected by HTTPS\/TLS.<\/li>\n<li>For stored data in your architecture (documents, embeddings, logs), enable OCI encryption features (default encryption is common for OCI storage services; verify service-specific encryption options).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer private networking patterns:<\/li>\n<li>Run apps in private subnets.<\/li>\n<li>Use Service Gateway\/NAT as appropriate.<\/li>\n<li>Avoid exposing internal RAG endpoints publicly without authentication and WAF protections.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid hardcoding:<\/li>\n<li>OCI private keys<\/li>\n<li>Database credentials<\/li>\n<li>API tokens for external systems<\/li>\n<li>Use OCI Vault for secrets and rotate regularly.<\/li>\n<li>For production, avoid long-lived user API keys where possible.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>OCI Audit<\/strong> for governance of API calls.<\/li>\n<li>Keep application-level logs focused on:<\/li>\n<li>request id,<\/li>\n<li>model id,<\/li>\n<li>latency,<\/li>\n<li>token counts (if available),<\/li>\n<li>coarse metadata (not raw PII).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data classification matters:<\/li>\n<li>Don\u2019t send regulated data to any model endpoint unless your policies and Oracle\u2019s terms explicitly permit it.<\/li>\n<li>Ensure your solution supports:<\/li>\n<li>retention policies,<\/li>\n<li>deletion workflows,<\/li>\n<li>access reviews.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Logging full prompts\/responses that include PII\/secrets.<\/li>\n<li>Using a broad policy like \u201cmanage all-resources in tenancy\u201d for an app.<\/li>\n<li>Sharing one API key across multiple apps\/teams.<\/li>\n<li>No rate limiting \u2192 runaway costs and potential denial-of-wallet incidents.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Start with a <strong>reference threat model<\/strong>:<\/li>\n<li>who can submit prompts,<\/li>\n<li>what data can be retrieved,<\/li>\n<li>where logs go,<\/li>\n<li>how to detect misuse.<\/li>\n<li>Add <strong>content filters<\/strong> and input validation:<\/li>\n<li>prompt injection defenses (strip or isolate instructions from retrieved docs),<\/li>\n<li>allow-lists for retrieval sources,<\/li>\n<li>user authorization checks before retrieving documents.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<blockquote>\n<p>These are common constraints; verify exact limits in official Oracle Cloud documentation for Generative AI.<\/p>\n<\/blockquote>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Regional availability<\/strong>: Not all regions support the service or the same models.<\/li>\n<li><strong>Model availability changes<\/strong>: Models can be added\/updated; behavior and output quality may shift.<\/li>\n<li><strong>Token\/context limits<\/strong>: Each model has max input\/output sizes; large docs must be chunked.<\/li>\n<li><strong>Latency variability<\/strong>: Shared on-demand serving can vary under load; dedicated options may be needed.<\/li>\n<li><strong>IAM propagation delay<\/strong>: New policies can take time to apply.<\/li>\n<li><strong>SDK\/API versioning<\/strong>: Example code may break if SDK modules\/classes change; pin versions in production.<\/li>\n<li><strong>Prompt injection in RAG<\/strong>: Retrieved documents can contain malicious instructions. Treat retrieved text as untrusted.<\/li>\n<li><strong>Data leakage through logs<\/strong>: Over-verbose logging can become your biggest security issue.<\/li>\n<li><strong>Cost surprises<\/strong>: Long prompts + long outputs + high volume = fast cost growth. Add hard limits.<\/li>\n<li><strong>Environment drift<\/strong>: Different compartments\/regions can have different model catalogs; document your dependencies.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Nearest services in Oracle Cloud<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI Data Science<\/strong>: For building\/training\/hosting your own models and MLOps pipelines (more control, more ops).<\/li>\n<li><strong>Other OCI AI Services (Language, Vision, Speech, etc.)<\/strong>: Task-specific APIs; may be more predictable for certain workloads than general LLM prompting.<\/li>\n<li><strong>OCI Search \/ Database services<\/strong>: Not generative, but critical for RAG storage\/retrieval.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Nearest services in other clouds<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AWS<\/strong>: Amazon Bedrock (managed foundation model access) and SageMaker (custom ML).<\/li>\n<li><strong>Azure<\/strong>: Azure OpenAI Service (managed OpenAI models) and Azure AI Foundry\/ML.<\/li>\n<li><strong>Google Cloud<\/strong>: Vertex AI (Gemini + model garden, embeddings, MLOps).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Open-source\/self-managed alternatives<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Self-host LLM inference<\/strong> on OCI Compute GPU instances (or Kubernetes with GPU nodes) using vLLM\/TGI\/Ollama (operationally heavier, potentially cost-effective at scale).<\/li>\n<li><strong>Open-source embedding models<\/strong> with self-managed vector DB (more control, more setup).<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Comparison table<\/h4>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Oracle Cloud Generative AI<\/strong><\/td>\n<td>OCI-native apps needing managed LLM\/embeddings<\/td>\n<td>OCI IAM\/compartments, managed inference, enterprise governance<\/td>\n<td>Model\/region availability constraints; usage-based costs can spike<\/td>\n<td>You want managed inference with OCI security\/governance<\/td>\n<\/tr>\n<tr>\n<td><strong>OCI Data Science (self-host inference)<\/strong><\/td>\n<td>Teams needing custom models\/control<\/td>\n<td>Full control, custom serving, can optimize cost at high scale<\/td>\n<td>GPU ops complexity, scaling, patching<\/td>\n<td>You need a specific model or custom serving behavior<\/td>\n<\/tr>\n<tr>\n<td><strong>AWS Bedrock<\/strong><\/td>\n<td>AWS-centric foundation model consumption<\/td>\n<td>Broad model marketplace, strong ecosystem<\/td>\n<td>AWS IAM\/networking alignment needed; cross-cloud adds complexity<\/td>\n<td>Your platform is primarily on AWS<\/td>\n<\/tr>\n<tr>\n<td><strong>Azure OpenAI<\/strong><\/td>\n<td>Microsoft-centric apps needing OpenAI models<\/td>\n<td>Strong enterprise integration, tooling<\/td>\n<td>Model\/provider constraints; region capacity considerations<\/td>\n<td>You\u2019re standardized on Azure and need OpenAI APIs<\/td>\n<\/tr>\n<tr>\n<td><strong>Google Vertex AI<\/strong><\/td>\n<td>Google Cloud AI platform users<\/td>\n<td>Integrated MLOps + foundation models<\/td>\n<td>Cross-cloud complexity if you\u2019re on OCI<\/td>\n<td>Your data and apps are on GCP<\/td>\n<\/tr>\n<tr>\n<td><strong>Self-managed open-source (vLLM on GPUs)<\/strong><\/td>\n<td>Cost\/latency control at steady high volume<\/td>\n<td>Full control, no per-token managed fees<\/td>\n<td>Significant ops\/security burden<\/td>\n<td>You have ML infra maturity and predictable workload<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example: Policy-aware employee assistant (RAG)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A large enterprise has thousands of internal policies and runbooks. Employees ask repetitive questions, and the helpdesk is overloaded.<\/li>\n<li><strong>Proposed architecture<\/strong>:<\/li>\n<li>Store documents in <strong>OCI Object Storage<\/strong><\/li>\n<li>Extract text and chunk documents<\/li>\n<li>Generate embeddings with <strong>Generative AI embeddings<\/strong><\/li>\n<li>Store vectors in an enterprise-approved vector store (database\/search)<\/li>\n<li>Backend service on <strong>OKE<\/strong> calls:<ul>\n<li>vector store for top-k chunks<\/li>\n<li>Generative AI chat model with grounded prompt including retrieved chunks<\/li>\n<\/ul>\n<\/li>\n<li>Logs to <strong>OCI Logging<\/strong>, access tracked via <strong>OCI Audit<\/strong><\/li>\n<li><strong>Why Generative AI was chosen<\/strong>:<\/li>\n<li>Managed inference reduces GPU operations.<\/li>\n<li>OCI IAM and compartments align with enterprise governance.<\/li>\n<li><strong>Expected outcomes<\/strong>:<\/li>\n<li>Reduced mean time to answer internal questions.<\/li>\n<li>Lower helpdesk ticket volume.<\/li>\n<li>Auditable access patterns via IAM + Audit.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example: Customer support copilot<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A startup needs to respond quickly to customer emails and chats with a consistent tone and accurate product info.<\/li>\n<li><strong>Proposed architecture<\/strong>:<\/li>\n<li>A lightweight backend on <strong>OCI Compute<\/strong> (or Functions)<\/li>\n<li>Product FAQ stored in a small database + embeddings for semantic retrieval<\/li>\n<li>Generative AI used to draft replies, with human approval step<\/li>\n<li><strong>Why Generative AI was chosen<\/strong>:<\/li>\n<li>Small team avoids managing model servers.<\/li>\n<li>Usage-based pricing fits early-stage variability.<\/li>\n<li><strong>Expected outcomes<\/strong>:<\/li>\n<li>Faster first-response time.<\/li>\n<li>Consistent messaging.<\/li>\n<li>Easy iteration on prompts without redeploying infrastructure.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<p>1) <strong>Is Generative AI in Oracle Cloud the same as Oracle Database AI features?<\/strong><br\/>\nNo. Generative AI is a managed inference service under Analytics and AI. Oracle Database has separate AI-related features (including vector and ML capabilities depending on edition\/version). Use the right tool for inference vs. storage\/query.<\/p>\n\n\n\n<p>2) <strong>Do I need GPUs to use Generative AI?<\/strong><br\/>\nNo. Oracle hosts the model inference infrastructure. You call it via API.<\/p>\n\n\n\n<p>3) <strong>Is Generative AI regional in OCI?<\/strong><br\/>\nTypically yes\u2014choose a region endpoint. Verify region support in official docs and the Console.<\/p>\n\n\n\n<p>4) <strong>How do I control who can use Generative AI?<\/strong><br\/>\nUse OCI IAM policies scoped to compartments and least privilege.<\/p>\n\n\n\n<p>5) <strong>Can I use instance principals instead of user API keys?<\/strong><br\/>\nYes, and it\u2019s recommended for production when your app runs on OCI Compute. Use resource principals for Functions. Verify the exact setup in OCI IAM docs.<\/p>\n\n\n\n<p>6) <strong>What\u2019s the difference between embeddings and chat models?<\/strong><br\/>\nEmbeddings convert text to vectors for similarity search. Chat models generate text responses. RAG uses both.<\/p>\n\n\n\n<p>7) <strong>How do I reduce hallucinations?<\/strong><br\/>\nUse RAG grounding, require citations, constrain prompts, validate outputs, and add human review for critical workflows.<\/p>\n\n\n\n<p>8) <strong>Can I send sensitive data in prompts?<\/strong><br\/>\nOnly if your security\/compliance policies allow it and Oracle\u2019s service terms and data handling commitments meet your requirements. Verify with official docs and your legal\/security team.<\/p>\n\n\n\n<p>9) <strong>How do I estimate cost?<\/strong><br\/>\nEstimate volume and token usage, then apply Oracle\u2019s price list for the specific model and region. Use the OCI cost estimator.<\/p>\n\n\n\n<p>10) <strong>What should I log in production?<\/strong><br\/>\nLog request ids, latency, model id, and high-level metadata. Avoid logging raw prompts\/responses unless redacted and explicitly approved.<\/p>\n\n\n\n<p>11) <strong>How do I choose a model?<\/strong><br\/>\nStart with a model suited for your task (summarization\/chat vs embeddings). Benchmark quality, latency, and cost with representative prompts.<\/p>\n\n\n\n<p>12) <strong>Can I do RAG without a vector database?<\/strong><br\/>\nFor tiny datasets, yes (in-memory vectors like the lab). For production, use a durable store optimized for vector search.<\/p>\n\n\n\n<p>13) <strong>What are common IAM errors?<\/strong><br\/>\nWrong compartment, missing policy, or using the wrong principal. Also allow time for policy propagation.<\/p>\n\n\n\n<p>14) <strong>How do I handle prompt injection in RAG?<\/strong><br\/>\nTreat retrieved text as untrusted. Use system prompts that reject tool override instructions, filter retrieved content, and enforce authorization checks before retrieval.<\/p>\n\n\n\n<p>15) <strong>Can I use Generative AI from outside OCI?<\/strong><br\/>\nYes if networking and IAM allow it, but consider egress costs and security posture. Many production teams keep the calling service inside OCI.<\/p>\n\n\n\n<p>16) <strong>Is there a playground in the Console?<\/strong><br\/>\nOften there is a UI for testing prompts, but UI availability can change. Verify in your region\u2019s Console.<\/p>\n\n\n\n<p>17) <strong>Does Generative AI support dedicated capacity?<\/strong><br\/>\nSome managed AI services offer on-demand vs dedicated serving. Availability and setup vary\u2014verify in official docs and your tenancy options.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn Generative AI<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official documentation<\/td>\n<td>Oracle Cloud Generative AI docs<\/td>\n<td>Primary source for features, limits, IAM, and API usage: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/generative-ai\/home.htm<\/td>\n<\/tr>\n<tr>\n<td>Official pricing<\/td>\n<td>Oracle Cloud Price List<\/td>\n<td>Find the Generative AI SKUs and pricing dimensions: https:\/\/www.oracle.com\/cloud\/price-list\/<\/td>\n<\/tr>\n<tr>\n<td>Pricing tool<\/td>\n<td>Oracle Cloud Cost Estimator<\/td>\n<td>Build rough estimates for PoCs and production: https:\/\/www.oracle.com\/cloud\/costestimator.html<\/td>\n<\/tr>\n<tr>\n<td>Free tier info<\/td>\n<td>Oracle Cloud Free Tier<\/td>\n<td>Check whether any AI usage is included (often limited\/changes): https:\/\/www.oracle.com\/cloud\/free\/<\/td>\n<\/tr>\n<tr>\n<td>SDK docs<\/td>\n<td>OCI Python SDK docs<\/td>\n<td>SDK install\/auth patterns and examples: https:\/\/docs.oracle.com\/en-us\/iaas\/tools\/python\/latest\/<\/td>\n<\/tr>\n<tr>\n<td>CLI docs<\/td>\n<td>OCI CLI docs<\/td>\n<td>Useful for environment validation and automation: https:\/\/docs.oracle.com\/en-us\/iaas\/tools\/oci-cli\/latest\/<\/td>\n<\/tr>\n<tr>\n<td>IAM concepts<\/td>\n<td>OCI IAM documentation<\/td>\n<td>Policies, compartments, principals, audit basics: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Identity\/home.htm<\/td>\n<\/tr>\n<tr>\n<td>Observability<\/td>\n<td>OCI Audit documentation<\/td>\n<td>How to review service activity for governance: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Audit\/home.htm<\/td>\n<\/tr>\n<tr>\n<td>Official GitHub org<\/td>\n<td>oracle on GitHub<\/td>\n<td>SDK source and official repos: https:\/\/github.com\/oracle<\/td>\n<\/tr>\n<tr>\n<td>Official samples hub<\/td>\n<td>oracle-samples on GitHub<\/td>\n<td>Look for OCI AI\/Generative AI samples (verify repo relevance): https:\/\/github.com\/oracle-samples<\/td>\n<\/tr>\n<tr>\n<td>Architecture guidance<\/td>\n<td>Oracle Architecture Center<\/td>\n<td>Reference architectures and patterns (search for \u201cgenerative ai\u201d): https:\/\/docs.oracle.com\/en\/solutions\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>Developers, DevOps, platform teams<\/td>\n<td>Cloud + DevOps practices; may include OCI and AI ops topics (verify offerings)<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Beginners to intermediate engineers<\/td>\n<td>DevOps\/SCM and automation foundations; may complement OCI deployments<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud engineers, operations<\/td>\n<td>Cloud operations and deployment practices (verify OCI modules)<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs, reliability engineers<\/td>\n<td>SRE practices for production systems (useful for AI services ops)<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops + AI\/automation practitioners<\/td>\n<td>AIOps concepts, monitoring, automation around AI workloads<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>Cloud\/DevOps training content (verify specifics)<\/td>\n<td>Engineers seeking hands-on guidance<\/td>\n<td>https:\/\/www.rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps training and mentoring (verify course list)<\/td>\n<td>Beginners to intermediate DevOps engineers<\/td>\n<td>https:\/\/www.devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Consulting\/training platform (verify services)<\/td>\n<td>Teams needing short-term enablement<\/td>\n<td>https:\/\/www.devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>Operational support and training resources (verify)<\/td>\n<td>Ops\/SRE teams<\/td>\n<td>https:\/\/www.devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps\/engineering consulting (verify offerings)<\/td>\n<td>Architecture, implementation, operations<\/td>\n<td>Deploy an OCI-based RAG service; implement CI\/CD and observability<\/td>\n<td>https:\/\/cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps and cloud consulting\/training (verify offerings)<\/td>\n<td>Enablement, platform rollout<\/td>\n<td>Establish IAM\/compartments, build deployment templates, run workshops<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting services (verify offerings)<\/td>\n<td>Delivery and operationalization<\/td>\n<td>Build production pipelines, monitoring, incident response processes<\/td>\n<td>https:\/\/www.devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before this service<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OCI fundamentals:<\/li>\n<li>Tenancy, compartments, regions<\/li>\n<li>IAM users, groups, policies<\/li>\n<li>VCN basics (subnets, gateways)<\/li>\n<li>API basics:<\/li>\n<li>REST concepts, auth, SDK usage<\/li>\n<li>Security fundamentals:<\/li>\n<li>least privilege, secrets management, logging hygiene<\/li>\n<li>Basic AI concepts:<\/li>\n<li>tokens, prompts, temperature, embeddings, vector search<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after this service<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>RAG system design:<\/li>\n<li>chunking strategies, evaluation, citations<\/li>\n<li>Vector databases\/search:<\/li>\n<li>indexing, recall\/precision tuning, hybrid search<\/li>\n<li>Production operations:<\/li>\n<li>SLOs for AI features, drift monitoring, prompt versioning<\/li>\n<li>Governance:<\/li>\n<li>data classification, privacy engineering, red-team testing for prompt injection<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use it<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud engineer \/ platform engineer (governance + deployment)<\/li>\n<li>Solutions architect (end-to-end AI architecture)<\/li>\n<li>Backend developer (API integrations)<\/li>\n<li>DevOps\/SRE (reliability, monitoring, cost controls)<\/li>\n<li>Security engineer (data protection and IAM)<\/li>\n<li>Data engineer (pipelines and indexing)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (if available)<\/h3>\n\n\n\n<p>Oracle\u2019s certification catalog changes. Check Oracle University \/ Oracle training for:\n&#8211; OCI foundations\n&#8211; OCI architect tracks\n&#8211; AI\/analytics tracks (if offered)<\/p>\n\n\n\n<p>Start here and search for OCI certifications:\n&#8211; https:\/\/education.oracle.com\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Build a <strong>document summarization pipeline<\/strong> for Object Storage uploads.<\/li>\n<li>Build a <strong>RAG assistant<\/strong> for operational runbooks with citations and access control.<\/li>\n<li>Implement <strong>cost controls<\/strong>: per-user quotas + dashboards for token usage.<\/li>\n<li>Add <strong>prompt injection defenses<\/strong> and test with a red-team prompt set.<\/li>\n<li>Build an <strong>agent-assist<\/strong> tool that drafts replies with required policy citations.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Compartment (OCI)<\/strong>: A logical container for organizing and isolating cloud resources with IAM policies.<\/li>\n<li><strong>IAM Policy (OCI)<\/strong>: A rule defining who can do what in which scope (tenancy\/compartment).<\/li>\n<li><strong>Principal<\/strong>: An identity that can authenticate (user, instance principal, resource principal).<\/li>\n<li><strong>Prompt<\/strong>: Input instructions and context sent to a generative model.<\/li>\n<li><strong>Token<\/strong>: A unit of text used for metering and model processing (not exactly a word).<\/li>\n<li><strong>Temperature<\/strong>: A parameter controlling randomness in model output (higher = more varied).<\/li>\n<li><strong>Embeddings<\/strong>: Vector representations of text used for semantic similarity search.<\/li>\n<li><strong>Vector store<\/strong>: A database\/index optimized for storing and searching vectors by similarity.<\/li>\n<li><strong>RAG (Retrieval-Augmented Generation)<\/strong>: Pattern that retrieves relevant documents and uses them as context for generation.<\/li>\n<li><strong>Prompt injection<\/strong>: An attack where malicious instructions are embedded in content to override system behavior.<\/li>\n<li><strong>Least privilege<\/strong>: Security principle of granting only required access and nothing more.<\/li>\n<li><strong>On-demand serving<\/strong>: Shared capacity model where you pay by usage (term may vary; verify OCI terminology).<\/li>\n<li><strong>Dedicated serving\/capacity<\/strong>: Provisioned capacity for predictable performance (availability varies).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p>Generative AI in <strong>Oracle Cloud<\/strong> (Analytics and AI) is a managed service for calling foundation models\u2014commonly for chat\/text generation and embeddings\u2014using OCI-native IAM, compartments, and audit capabilities. It matters because it enables teams to deliver AI-powered features quickly without operating GPU-based inference infrastructure, while still fitting enterprise governance and security patterns.<\/p>\n\n\n\n<p>Cost is primarily driven by <strong>model choice<\/strong>, <strong>token\/embedding volume<\/strong>, and <strong>request rates<\/strong>, with indirect costs from vector storage, compute runtimes, logging, and network egress. Security success depends on <strong>least-privilege IAM<\/strong>, careful <strong>secrets handling<\/strong>, controlled <strong>logging\/redaction<\/strong>, and robust <strong>RAG defenses<\/strong> against prompt injection and data leakage.<\/p>\n\n\n\n<p>Use Generative AI when you want managed inference tightly integrated with OCI security and operations. For the next learning step, build a small RAG service: embeddings + vector search + grounded prompting\u2014then add production controls (rate limiting, monitoring, cost budgets, and security reviews) before scaling to real users.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Analytics and AI<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[66,62],"tags":[],"class_list":["post-838","post","type-post","status-publish","format-standard","hentry","category-analytics-and-ai","category-oracle-cloud"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/838","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=838"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/838\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=838"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=838"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=838"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}