{"id":833,"date":"2026-04-16T08:13:56","date_gmt":"2026-04-16T08:13:56","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/oracle-cloud-ai-data-platform-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics-and-ai\/"},"modified":"2026-04-16T08:13:56","modified_gmt":"2026-04-16T08:13:56","slug":"oracle-cloud-ai-data-platform-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics-and-ai","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/oracle-cloud-ai-data-platform-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics-and-ai\/","title":{"rendered":"Oracle Cloud AI Data Platform Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Analytics and AI"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p>Analytics and AI<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What this service is<\/h3>\n\n\n\n<p><strong>AI Data Platform<\/strong> in <strong>Oracle Cloud<\/strong> is best understood as a <em>platform capability<\/em>\u2014a set of OCI services and reference patterns used together to ingest, store, govern, transform, and serve data for analytics and AI\/ML workloads.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">One-paragraph simple explanation<\/h3>\n\n\n\n<p>If you want to build AI features (recommendations, forecasting, anomaly detection, copilots, RAG, etc.), you first need reliable data pipelines, secure storage, governance, and a way to train and deploy models. An <strong>AI Data Platform<\/strong> on <strong>Oracle Cloud<\/strong> combines OCI data services (storage, databases, data integration) with AI services (Data Science, AI Services\/Generative AI where applicable) so teams can deliver production AI faster and more safely.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">One-paragraph technical explanation<\/h3>\n\n\n\n<p>Technically, an AI Data Platform on OCI typically includes: ingestion into <strong>Object Storage<\/strong> and\/or OCI databases, transformation with <strong>Data Integration<\/strong>, <strong>Data Flow<\/strong> (Spark), or SQL engines, governance with <strong>Data Catalog<\/strong>, IAM policies and tagging, and model development\/deployment with <strong>OCI Data Science<\/strong>. Observability is provided by OCI <strong>Logging<\/strong>, <strong>Monitoring<\/strong>, <strong>Audit<\/strong>, and security by <strong>Vault<\/strong>, KMS, private networking, and least-privilege IAM. The \u201cplatform\u201d is the architecture plus operating model, not just a single API endpoint.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What problem it solves<\/h3>\n\n\n\n<p>Teams often struggle with:\n&#8211; Data scattered across systems with inconsistent governance and lineage\n&#8211; Slow, fragile ETL\/ELT pipelines that break ML training\n&#8211; Difficulty moving models from notebooks to secure, scalable deployments\n&#8211; Cost surprises from unmanaged compute\/storage growth\n&#8211; Security\/compliance gaps (PII exposure, weak access controls, missing auditability)<\/p>\n\n\n\n<p>An AI Data Platform on Oracle Cloud addresses these gaps by standardizing how data is collected, curated, secured, and operationalized for analytics and AI.<\/p>\n\n\n\n<blockquote>\n<p><strong>Important naming note (verify in official docs):<\/strong> \u201cAI Data Platform\u201d is widely used as a <em>solution term<\/em> across the industry. In OCI, the exact product SKUs are typically the underlying services (for example, <strong>OCI Data Science<\/strong>, <strong>Object Storage<\/strong>, <strong>Data Integration<\/strong>, <strong>Data Catalog<\/strong>, <strong>Autonomous Database<\/strong>, etc.). If Oracle Cloud has introduced an explicitly named service \u201cAI Data Platform\u201d in your tenancy\/region, verify its current official definition and scope in the Oracle documentation and console. This tutorial remains strictly aligned to OCI\u2019s currently documented building blocks used to implement an AI data platform.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is AI Data Platform?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Official purpose<\/h3>\n\n\n\n<p>On <strong>Oracle Cloud<\/strong>, <strong>AI Data Platform<\/strong> refers to the end-to-end capability to:\n&#8211; Collect and store data for analytics and AI\n&#8211; Curate and govern data assets\n&#8211; Transform data into features and training datasets\n&#8211; Train, evaluate, and deploy ML models\n&#8211; Operate data and ML workloads securely at scale<\/p>\n\n\n\n<p>Because \u201cAI Data Platform\u201d may not appear as a single standalone OCI product page in all accounts, treat it as a <strong>platform architecture<\/strong> implemented with OCI services. <strong>Verify in official docs<\/strong> if your organization has access to a specific OCI product branded \u201cAI Data Platform.\u201d<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Core capabilities (platform-level)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data ingestion<\/strong>: batch and streaming ingestion from apps, SaaS, and databases<\/li>\n<li><strong>Data storage<\/strong>: data lake patterns on Object Storage; warehouse\/lakehouse patterns using databases and SQL engines<\/li>\n<li><strong>Data processing<\/strong>: ETL\/ELT, Spark, SQL transformations, data quality checks<\/li>\n<li><strong>Governance<\/strong>: cataloging, discovery, tags, access control, audit<\/li>\n<li><strong>AI\/ML lifecycle<\/strong>: notebooks, training, experiment tracking (capabilities vary), model registry, deployment endpoints<\/li>\n<li><strong>Operations<\/strong>: monitoring, logging, CI\/CD integration, cost controls<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Major components (typical OCI services used)<\/h3>\n\n\n\n<p>Common OCI services in an AI Data Platform architecture include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI Object Storage<\/strong>: data lake storage for raw\/curated data and model artifacts<br\/>\n  Docs: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Object\/home.htm<\/li>\n<li><strong>OCI Data Integration<\/strong>: managed data integration (ETL\/ELT) and pipeline orchestration<br\/>\n  Docs: https:\/\/docs.oracle.com\/en-us\/iaas\/data-integration\/using\/home.htm<\/li>\n<li><strong>OCI Data Catalog<\/strong>: metadata management and data discovery<br\/>\n  Docs: https:\/\/docs.oracle.com\/en-us\/iaas\/data-catalog\/using\/home.htm<\/li>\n<li><strong>OCI Data Flow<\/strong>: serverless Apache Spark for large-scale transformation (where used)<br\/>\n  Docs: https:\/\/docs.oracle.com\/en-us\/iaas\/data-flow\/using\/home.htm<\/li>\n<li><strong>Oracle Autonomous Database (ADW\/ATP)<\/strong>: governed analytics and app data store (optional)<br\/>\n  Docs: https:\/\/docs.oracle.com\/en-us\/iaas\/autonomous-database\/index.html<\/li>\n<li><strong>OCI Data Science<\/strong>: ML development, training, and model deployment<br\/>\n  Docs: https:\/\/docs.oracle.com\/en-us\/iaas\/data-science\/using\/home.htm<\/li>\n<li><strong>Observability &amp; Security<\/strong>: Logging, Monitoring, Audit, Vault, IAM, Cloud Guard (as applicable)<br\/>\n  Docs: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Security\/home.htm<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Service type<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AI Data Platform<\/strong> (as used here) is a <strong>solution architecture \/ platform pattern<\/strong> built from multiple managed OCI services.<\/li>\n<li>Individual components are <strong>regional OCI services<\/strong> in specific OCI regions (availability varies by region and tenancy; <strong>verify in official docs and the OCI console<\/strong>).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scope: regional\/global\/project\/account<\/h3>\n\n\n\n<p>Because the \u201cplatform\u201d spans services:\n&#8211; <strong>IAM<\/strong> and compartments apply <strong>tenancy-wide<\/strong>.\n&#8211; Most data\/AI services are <strong>regional<\/strong> and deployed into a <strong>compartment<\/strong>.\n&#8211; Networking is <strong>regional<\/strong> (VCNs, subnets, private endpoints).\n&#8211; Governance and logging can be configured across compartments and regions.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How it fits into the Oracle Cloud ecosystem<\/h3>\n\n\n\n<p>An AI Data Platform on OCI typically sits between:\n&#8211; <strong>Source systems<\/strong> (apps, databases, SaaS, logs, IoT, clickstreams)\n&#8211; <strong>OCI data services<\/strong> (Object Storage, Autonomous Database, MySQL HeatWave where applicable, etc.)\n&#8211; <strong>OCI AI services<\/strong> (Data Science, AI Services, Generative AI\u2014availability varies)\n&#8211; <strong>Downstream consumers<\/strong> (Oracle Analytics Cloud, BI tools, APIs, microservices, data sharing)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use AI Data Platform?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Faster time-to-value<\/strong>: standard ingestion \u2192 curated datasets \u2192 models \u2192 deployment<\/li>\n<li><strong>Consistency<\/strong>: one platform operating model across teams reduces duplication<\/li>\n<li><strong>Risk reduction<\/strong>: governance, audit, and least-privilege access are built into the platform<\/li>\n<li><strong>Better customer experiences<\/strong>: personalization, automation, forecasting become feasible with reliable data<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Scalable storage and compute<\/strong>: decouple storage (Object Storage) from compute (Spark\/DB\/DS training)<\/li>\n<li><strong>Repeatable pipelines<\/strong>: Data Integration\/Data Flow provide reproducible transformations<\/li>\n<li><strong>Model operationalization<\/strong>: OCI Data Science model deployments provide managed inference endpoints (verify supported runtimes and limits in docs)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Central monitoring<\/strong> with OCI Monitoring and Logging<\/li>\n<li><strong>Standardized environments<\/strong> using compartments, tagging, and policies<\/li>\n<li><strong>Easier lifecycle management<\/strong> with clear separation of dev\/test\/prod<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>IAM policies<\/strong> and compartments enforce boundaries<\/li>\n<li><strong>Encryption<\/strong> at rest and in transit (service-dependent; verify per service)<\/li>\n<li><strong>Auditability<\/strong> via OCI Audit logs<\/li>\n<li><strong>Secrets management<\/strong> via OCI Vault<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Object Storage scales for large datasets<\/li>\n<li>Spark\/SQL engines can scale for ETL and feature engineering<\/li>\n<li>Data Science training and inference can scale using appropriate compute shapes (verify available shapes and quotas)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose it<\/h3>\n\n\n\n<p>Choose an AI Data Platform approach on OCI when you need:\n&#8211; Multiple data sources and multiple consumers\n&#8211; Governance requirements (PII, regulated data, data lineage)\n&#8211; Repeatable ML pipelines and controlled model deployments\n&#8211; Multi-team access with consistent security controls\n&#8211; Clear cost management across workloads<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should not choose it<\/h3>\n\n\n\n<p>Avoid building a full AI Data Platform (initially) when:\n&#8211; You have a single small dataset and a single model in a PoC\n&#8211; You lack ownership\/operating model (no team to run it)\n&#8211; Your organization only needs embedded analytics in one app and no broader data reuse\n&#8211; You cannot commit to basic governance, tagging, and monitoring (platform without ops becomes technical debt)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is AI Data Platform used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Financial services (fraud detection, credit risk, AML analytics)<\/li>\n<li>Retail\/e-commerce (recommendations, demand forecasting, personalization)<\/li>\n<li>Manufacturing (predictive maintenance, quality analytics)<\/li>\n<li>Healthcare\/life sciences (claims analytics, patient operations\u2014subject to compliance)<\/li>\n<li>Telecom (churn prediction, network anomaly detection)<\/li>\n<li>Energy (load forecasting, asset monitoring)<\/li>\n<li>Public sector (case triage, resource allocation\u2014policy constraints apply)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data engineering teams (pipelines, lake\/warehouse)<\/li>\n<li>ML engineering teams (training, deployment, MLOps)<\/li>\n<li>Analytics teams (semantic models, dashboards, KPI layers)<\/li>\n<li>Platform\/SRE teams (networking, IAM, monitoring)<\/li>\n<li>Security teams (controls, audit, compliance)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Batch ETL\/ELT and scheduled pipelines<\/li>\n<li>Streaming ingestion and near-real-time scoring (where used)<\/li>\n<li>Feature engineering at scale<\/li>\n<li>Model training, evaluation, registry, and deployment<\/li>\n<li>RAG pipelines (document ingestion \u2192 embeddings \u2192 retrieval) if using supported OCI services (verify product availability)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data lake on Object Storage + curated zones (raw\/bronze, clean\/silver, curated\/gold)<\/li>\n<li>Warehouse-centric (Autonomous Data Warehouse for BI + ML)<\/li>\n<li>Lakehouse-like patterns (depending on chosen engines; verify supported features)<\/li>\n<li>Microservices consuming ML inference endpoints via REST<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Real-world deployment contexts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Multi-compartment enterprise landing zones<\/li>\n<li>Hybrid environments with on-prem sources (VPN\/FastConnect)<\/li>\n<li>Multi-region strategies for DR and data residency (verify service-specific multi-region support)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Production vs dev\/test usage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Dev\/Test<\/strong>: smaller datasets, minimal shapes, relaxed SLAs, sandbox compartments<\/li>\n<li><strong>Prod<\/strong>: private endpoints, strict IAM, robust monitoring, CI\/CD, backup\/DR, data retention and masking<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p>Below are realistic use cases where an AI Data Platform on Oracle Cloud is a strong fit.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Centralized governed data lake for ML<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> ML teams can\u2019t find trusted datasets; data is duplicated across buckets and spreadsheets.<\/li>\n<li><strong>Why this fits:<\/strong> Object Storage + Data Catalog + IAM compartments provide controlled data zones with searchable metadata.<\/li>\n<li><strong>Example:<\/strong> Marketing and fraud teams share curated customer and transaction datasets with consistent tags and access policies.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Automated batch ingestion from operational databases<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Nightly exports break, schema drift causes silent failures.<\/li>\n<li><strong>Why this fits:<\/strong> Data Integration pipelines can formalize ingestion and transformations with scheduling and retries (verify exact features).<\/li>\n<li><strong>Example:<\/strong> Ingest orders and returns from an operational database to Object Storage, then curate to ADW for analytics.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Feature engineering at scale with Spark<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Python notebooks can\u2019t process tens of millions of rows efficiently.<\/li>\n<li><strong>Why this fits:<\/strong> Data Flow (Spark) can compute aggregates, windows, and joins at scale and write results back to Object Storage.<\/li>\n<li><strong>Example:<\/strong> Build 90-day purchase frequency and recency features for churn prediction.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Train and deploy a classification model with managed inference<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Moving from notebook to an API is slow; ops teams don\u2019t want bespoke servers.<\/li>\n<li><strong>Why this fits:<\/strong> OCI Data Science supports model artifacts and managed deployments (verify supported frameworks).<\/li>\n<li><strong>Example:<\/strong> Deploy a fraud risk model behind a REST endpoint consumed by a checkout service.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Secure AI experimentation environment for multiple teams<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Teams share credentials and buckets; no reproducibility.<\/li>\n<li><strong>Why this fits:<\/strong> Compartment isolation + IAM groups + Vault + standard tagging create safe, traceable environments.<\/li>\n<li><strong>Example:<\/strong> Separate compartments for data science dev and prod; enforce \u201cno public buckets\u201d via policy and review.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Data quality gates before model training<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Model performance drops due to missing values, duplicates, and drift in data.<\/li>\n<li><strong>Why this fits:<\/strong> ETL pipelines can incorporate validation and quarantine zones (implementation-specific).<\/li>\n<li><strong>Example:<\/strong> Reject daily loads if null rates exceed thresholds; alert via Monitoring.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Curated analytics + ML on Autonomous Data Warehouse<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> BI dashboards and ML features disagree due to different logic.<\/li>\n<li><strong>Why this fits:<\/strong> Store curated facts\/dimensions in ADW; reuse in both BI and ML training datasets.<\/li>\n<li><strong>Example:<\/strong> Revenue metrics used in Oracle Analytics Cloud and ML forecasting share the same SQL transformations.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Model governance and auditability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> No one knows which model version is running, who deployed it, or what data it used.<\/li>\n<li><strong>Why this fits:<\/strong> OCI Audit + standardized model registry practices + tagging + deployment logs provide traceability.<\/li>\n<li><strong>Example:<\/strong> For each deployment, record model version, training dataset snapshot path, and approver.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Near-real-time scoring for operational decisions (architecture-dependent)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Batch scoring is too slow for fraud or next-best-action.<\/li>\n<li><strong>Why this fits:<\/strong> Combine streaming ingestion (if used) + low-latency inference endpoint.<\/li>\n<li><strong>Example:<\/strong> Event stream triggers a scoring call; decision is returned to the app in milliseconds\/seconds depending on design.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Multi-tenant analytics and AI platform for internal business units<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Shared platform must isolate teams while enabling shared reference datasets.<\/li>\n<li><strong>Why this fits:<\/strong> OCI compartments + IAM policies + shared curated zones.<\/li>\n<li><strong>Example:<\/strong> Central data platform team manages base datasets; business units get isolated workspaces.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) Document analytics and retrieval workflows (availability-dependent)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Teams need to search and extract insights from PDFs and tickets.<\/li>\n<li><strong>Why this fits:<\/strong> Object Storage for documents; AI services for extraction\/embeddings if available in region (verify).<\/li>\n<li><strong>Example:<\/strong> Ingest PDFs into Object Storage; extract text and metadata; store embeddings for search.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) Cost-controlled sandbox for student labs and PoCs<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Labs get abandoned; resources keep running.<\/li>\n<li><strong>Why this fits:<\/strong> Always Free eligible resources where possible + budgets\/quotas + mandatory tags.<\/li>\n<li><strong>Example:<\/strong> Students use a small Object Storage bucket and a time-boxed notebook session; scheduled cleanup.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<p>Because <strong>AI Data Platform<\/strong> is implemented via OCI services, features map to platform capabilities. Below are the core features you typically assemble on Oracle Cloud, with practical benefits and caveats.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Data lake storage on OCI Object Storage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Durable, scalable object storage for raw, curated, and analytics-ready datasets; also stores model artifacts.<\/li>\n<li><strong>Why it matters:<\/strong> Decouples storage from compute; supports large datasets cheaply relative to block storage.<\/li>\n<li><strong>Practical benefit:<\/strong> Use simple bucket path conventions (raw\/curated\/features\/models) and lifecycle rules.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Access control is critical\u2014misconfigured buckets can expose data. Use IAM and avoid public buckets unless explicitly required.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Metadata cataloging with OCI Data Catalog<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Catalogs data assets, technical metadata, business metadata, and supports discovery.<\/li>\n<li><strong>Why it matters:<\/strong> ML and analytics teams need to find \u201ctrusted\u201d datasets and understand schemas.<\/li>\n<li><strong>Practical benefit:<\/strong> Faster onboarding and fewer duplicate datasets.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Catalog usefulness depends on consistent registration and stewardship. Verify connectors and supported sources in your region.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Managed ETL\/ELT with OCI Data Integration<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Creates and runs data pipelines for ingestion and transformations (batch-oriented).<\/li>\n<li><strong>Why it matters:<\/strong> Reduces custom scripting and improves operational reliability.<\/li>\n<li><strong>Practical benefit:<\/strong> Scheduling, parameterization, standardized execution (capabilities vary\u2014verify).<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Not every transformation is easiest in a GUI; complex logic may still require Spark\/SQL\/code.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Serverless Spark processing with OCI Data Flow (optional)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Runs Apache Spark applications without managing clusters.<\/li>\n<li><strong>Why it matters:<\/strong> Handles large-scale joins, aggregations, and feature engineering.<\/li>\n<li><strong>Practical benefit:<\/strong> Spin up processing when needed; stop paying when jobs end (pricing is job\/compute-based).<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Requires Spark skills; job tuning and data layout (partitioning, file formats) still matters.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Analytics warehouse with Autonomous Database (optional)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Provides managed Oracle database with automation features; ADW is typically used for analytics.<\/li>\n<li><strong>Why it matters:<\/strong> Strong SQL performance, governance, and compatibility with enterprise BI.<\/li>\n<li><strong>Practical benefit:<\/strong> Curated datasets become easy to query and secure.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Not every workload belongs in a warehouse; large raw data may remain in Object Storage.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) ML development and training with OCI Data Science<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Notebook sessions, jobs, environments (conda), and managed model deployments (verify exact current features).<\/li>\n<li><strong>Why it matters:<\/strong> Moves ML from ad-hoc notebooks to repeatable training and deployment patterns.<\/li>\n<li><strong>Practical benefit:<\/strong> Standardized environments and controlled endpoints for inference.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Costs can rise if notebooks\/deployments run continuously; enforce shutdown policies.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Model deployment as an endpoint (OCI Data Science)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Hosts a model as a managed REST endpoint.<\/li>\n<li><strong>Why it matters:<\/strong> Application teams consume AI via API without building custom serving infrastructure.<\/li>\n<li><strong>Practical benefit:<\/strong> Versioned deployments; consistent authentication and logging patterns.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Latency and throughput depend on shape sizing and model complexity. Verify payload limits, concurrency behavior, and autoscaling support.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) IAM, compartments, and policy-based governance<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Enforces least-privilege access to buckets, catalogs, pipelines, notebooks, and deployments.<\/li>\n<li><strong>Why it matters:<\/strong> Prevents cross-team data leakage and unsafe production changes.<\/li>\n<li><strong>Practical benefit:<\/strong> Clear separation of duties (data engineer vs ML engineer vs operator).<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Misconfigured policies are a top cause of access issues; invest in policy review and testing.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Encryption and secrets with OCI Vault<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Central management of encryption keys and secrets (depending on usage).<\/li>\n<li><strong>Why it matters:<\/strong> Avoid storing credentials in notebooks or code.<\/li>\n<li><strong>Practical benefit:<\/strong> Rotations and audit-friendly secret handling.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Applications must be coded to retrieve secrets securely; Vault access must be locked down.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Observability (Logging, Monitoring, Audit)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Central logs, metrics, alarms, and audit trails.<\/li>\n<li><strong>Why it matters:<\/strong> Production AI fails quietly without monitoring (data freshness, pipeline failures, endpoint errors).<\/li>\n<li><strong>Practical benefit:<\/strong> Alerts for job failures, 5xx errors, unusual spend.<\/li>\n<li><strong>Limitations\/caveats:<\/strong> Logs and metrics can add cost; define retention and sampling.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level architecture<\/h3>\n\n\n\n<p>An AI Data Platform on Oracle Cloud usually follows a layered pattern:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Ingestion layer<\/strong>: data arrives from sources (databases, SaaS, files, events)<\/li>\n<li><strong>Storage layer<\/strong>: Object Storage holds raw and curated data; databases hold structured\/serving data<\/li>\n<li><strong>Processing layer<\/strong>: Data Integration\/Data Flow\/SQL transform data into curated datasets and features<\/li>\n<li><strong>Governance layer<\/strong>: Data Catalog, tagging, policies, audit<\/li>\n<li><strong>AI layer<\/strong>: Data Science notebooks\/jobs train models; deployments serve predictions<\/li>\n<li><strong>Consumption layer<\/strong>: dashboards, APIs, applications<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Request\/data\/control flow<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data flow (batch)<\/strong>: Sources \u2192 ingestion pipeline \u2192 raw bucket \u2192 transformation job \u2192 curated\/features bucket or warehouse \u2192 ML training dataset<\/li>\n<li><strong>Model flow<\/strong>: Training job \u2192 model artifact in Object Storage \u2192 model registry \u2192 deployment endpoint<\/li>\n<li><strong>Inference flow<\/strong>: App calls endpoint \u2192 endpoint loads model \u2192 returns prediction \u2192 logs\/metrics emitted<\/li>\n<li><strong>Control flow<\/strong>: IAM policies govern every call; Audit logs record administrative actions<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Integrations with related OCI services<\/h3>\n\n\n\n<p>Common integrations include:\n&#8211; <strong>Networking<\/strong>: VCN, private subnets, service gateways (for private Object Storage access)\n&#8211; <strong>Security<\/strong>: Vault for secrets; Cloud Guard (if enabled) for posture monitoring; WAF\/API Gateway if exposing endpoints\n&#8211; <strong>DevOps<\/strong>: OCI DevOps (or external CI\/CD) to deploy pipeline definitions and model-serving code (implementation-specific; verify)\n&#8211; <strong>Data analytics<\/strong>: Oracle Analytics Cloud consuming curated data from ADW or Object Storage<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services (typical)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Object Storage<\/strong> is almost always central.<\/li>\n<li><strong>IAM<\/strong> and <strong>compartments<\/strong> are foundational.<\/li>\n<li><strong>Logging\/Audit<\/strong> should be enabled early for traceability.<\/li>\n<li><strong>VCN<\/strong> is required for private deployments and enterprise connectivity.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Human users authenticate via OCI console\/SSO; access is governed by IAM groups and policies.<\/li>\n<li>Workloads (notebooks, jobs, services) typically use <strong>resource principals<\/strong> or OCI SDK\/CLI auth patterns (verify current recommended approach per service).<\/li>\n<li>Avoid embedding API keys in notebooks; prefer instance principals\/resource principals where supported.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Public endpoints are simplest for labs but riskier for production.<\/li>\n<li>Production designs commonly use:<\/li>\n<li>Private subnets for notebooks\/deployments<\/li>\n<li>Service Gateway for private access to Object Storage<\/li>\n<li>NAT Gateway for outbound internet (if needed)<\/li>\n<li>Private endpoints and security lists\/NSGs<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring\/logging\/governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Define standard log sources: pipeline runs, job runs, deployment access logs<\/li>\n<li>Define standard metrics and alarms:<\/li>\n<li>Pipeline\/job failures<\/li>\n<li>Data freshness (custom metric)<\/li>\n<li>Endpoint error rate and latency<\/li>\n<li>Spend anomalies (Budgets)<\/li>\n<li>Use tags (cost center, env, owner) for chargeback and cleanup automation<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  A[Data Sources] --&gt; B[OCI Object Storage&lt;br\/&gt;Raw Zone]\n  B --&gt; C[Transform&lt;br\/&gt;OCI Data Integration or Data Flow]\n  C --&gt; D[Curated\/Features Zone&lt;br\/&gt;Object Storage or ADW]\n  D --&gt; E[OCI Data Science&lt;br\/&gt;Training]\n  E --&gt; F[Model Artifact&lt;br\/&gt;Object Storage]\n  F --&gt; G[OCI Data Science&lt;br\/&gt;Model Deployment Endpoint]\n  G --&gt; H[Apps \/ BI \/ Services]\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph OnPrem[On-Prem \/ External]\n    S1[Operational DB]\n    S2[SaaS Apps]\n    S3[Files\/Events]\n  end\n\n  subgraph OCI[Oracle Cloud (OCI Region)]\n    subgraph Net[VCN]\n      subgraph Priv[Private Subnets]\n        DI[OCI Data Integration&lt;br\/&gt;Pipelines]\n        DF[OCI Data Flow&lt;br\/&gt;Spark Jobs]\n        DSN[OCI Data Science&lt;br\/&gt;Notebook\/Jobs]\n        DSE[OCI Data Science&lt;br\/&gt;Model Deployment]\n      end\n      SGW[Service Gateway]\n      NAT[NAT Gateway]\n      APIGW[API Gateway or LB&lt;br\/&gt;(optional)]\n    end\n\n    OSRAW[Object Storage&lt;br\/&gt;Raw Bucket]\n    OSCUR[Object Storage&lt;br\/&gt;Curated\/Features]\n    ADW[Autonomous Data Warehouse&lt;br\/&gt;(optional)]\n    DCAT[OCI Data Catalog]\n    VAULT[OCI Vault]\n    OBS[Logging\/Monitoring]\n    AUD[Audit]\n    IAM[IAM + Compartments + Tags]\n  end\n\n  S1 --&gt; DI\n  S2 --&gt; DI\n  S3 --&gt; OSRAW\n\n  DI --&gt; OSRAW\n  OSRAW --&gt; DF\n  DF --&gt; OSCUR\n  OSCUR --&gt; ADW\n\n  OSCUR --&gt; DSN\n  ADW --&gt; DSN\n\n  DSN --&gt; DSE\n  DSE --&gt; APIGW\n  APIGW --&gt; APP[Production Apps]\n\n  DCAT --- OSRAW\n  DCAT --- OSCUR\n  VAULT --- DI\n  VAULT --- DSN\n  VAULT --- DSE\n  OBS --- DI\n  OBS --- DF\n  OBS --- DSE\n  AUD --- IAM\n\n  SGW --- OSRAW\n  SGW --- OSCUR\n  NAT --- DSN\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Account\/tenancy requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An <strong>Oracle Cloud (OCI) tenancy<\/strong> with permissions to create resources in a compartment.<\/li>\n<li>Ability to use services relevant to your platform design (Object Storage, Data Science, etc.).<\/li>\n<li>Some services may require explicit enablement in your tenancy\/region. <strong>Verify in the OCI console and official docs<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions \/ IAM roles<\/h3>\n\n\n\n<p>At minimum, you need permission to:\n&#8211; Create and manage Object Storage buckets and objects\n&#8211; Create and manage OCI Data Science resources (projects, notebook sessions, models, deployments)\n&#8211; Use Logging\/Monitoring if you enable them<\/p>\n\n\n\n<p>Typical OCI policy patterns (examples; adjust to your compartment and group names):<\/p>\n\n\n\n<pre><code class=\"language-text\">allow group DataPlatformAdmins to manage object-family in compartment &lt;compartment-name&gt;\nallow group DataPlatformAdmins to manage data-science-family in compartment &lt;compartment-name&gt;\nallow group DataPlatformAdmins to manage data-integration-family in compartment &lt;compartment-name&gt;\nallow group DataPlatformAdmins to manage data-catalog-family in compartment &lt;compartment-name&gt;\nallow group DataPlatformAdmins to read metrics in compartment &lt;compartment-name&gt;\nallow group DataPlatformAdmins to use cloud-shell in tenancy\n<\/code><\/pre>\n\n\n\n<blockquote>\n<p>Policies and \u201c<em>-family\u201d names vary by service and time. <\/em><em>Verify the correct policy verbs and resource families in official OCI IAM policy documentation<\/em>*: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Identity\/home.htm<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Billing requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Many OCI services are usage-based; some have Always Free tiers or free trial credits depending on your account. <strong>Verify your tenancy\u2019s Free Tier eligibility<\/strong>.<\/li>\n<li>You must have a payment method on file for pay-as-you-go accounts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">CLI\/SDK\/tools needed<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI Console<\/strong> access<\/li>\n<li><strong>OCI Cloud Shell<\/strong> (recommended for this lab) or local installation of:<\/li>\n<li>OCI CLI: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/API\/SDKDocs\/cliinstall.htm<\/li>\n<li>Python 3.9+ (if doing local scripts)<\/li>\n<li>(Optional) Git for version control<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Service availability differs by region. Before designing your platform, confirm:<\/li>\n<li>OCI Data Science availability<\/li>\n<li>Data Integration\/Data Catalog availability<\/li>\n<li>Any AI Services\/Generative AI availability (if you plan to use them)<\/li>\n<li>Use the OCI console region selector and service documentation pages. <strong>Verify in official docs<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas\/limits<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Expect quotas for:<\/li>\n<li>Number of notebook sessions and deployments<\/li>\n<li>Total OCPUs\/compute for Data Science jobs\/deployments<\/li>\n<li>Object Storage request rates (rarely limiting for beginners)<\/li>\n<li>Check <strong>Service Limits<\/strong> in the OCI console.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services<\/h3>\n\n\n\n<p>For the hands-on tutorial in section 10, you will need:\n&#8211; Object Storage\n&#8211; OCI Data Science\n&#8211; IAM group\/policies\n&#8211; (Optional) Logging enabled for your deployment endpoint<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Current pricing model (how to think about cost)<\/h3>\n\n\n\n<p>Because <strong>AI Data Platform<\/strong> on Oracle Cloud is a composition of services, your total cost is the <strong>sum of<\/strong>:\n&#8211; Storage (Object Storage, database storage)\n&#8211; Data processing (Data Flow\/Spark, Data Integration runs)\n&#8211; ML compute (Data Science notebooks\/jobs\/deployments)\n&#8211; Networking (egress, NAT, load balancers, FastConnect\/VPN)\n&#8211; Observability (log storage\/ingestion where applicable)\n&#8211; Optional: Oracle Analytics Cloud, Autonomous Database, etc.<\/p>\n\n\n\n<p><strong>Official OCI pricing landing page:<\/strong><br\/>\nhttps:\/\/www.oracle.com\/cloud\/pricing\/<\/p>\n\n\n\n<p><strong>OCI cost estimator (official):<\/strong><br\/>\nhttps:\/\/www.oracle.com\/cloud\/costestimator.html<\/p>\n\n\n\n<blockquote>\n<p>Exact prices vary by region, service SKU, compute shape, and sometimes contract terms. Use the pricing pages and estimator for your region and tenancy.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing dimensions (typical)<\/h3>\n\n\n\n<p>Below are common cost dimensions you must account for (verify per service):\n&#8211; <strong>Object Storage<\/strong>\n  &#8211; GB-month stored by storage tier\n  &#8211; Requests (PUT\/GET\/LIST) may be priced depending on tier\n  &#8211; Data egress out of OCI region (internet egress)\n&#8211; <strong>Data Integration<\/strong>\n  &#8211; Often priced by usage units (for example, OCPU-hours or equivalent service units). <strong>Verify the current billing metric<\/strong>.\n&#8211; <strong>Data Flow<\/strong>\n  &#8211; Spark job compute time (OCPU-hours) and memory; storage I\/O indirectly impacts cost through runtime.\n&#8211; <strong>Autonomous Database<\/strong>\n  &#8211; OCPU-hours (or ECPU) and storage; optional auto-scaling.\n&#8211; <strong>OCI Data Science<\/strong>\n  &#8211; Notebook session compute time (shape-based)\n  &#8211; Job runs (compute time)\n  &#8211; Model deployments (running compute time)\n&#8211; <strong>Logging<\/strong>\n  &#8211; Ingestion and retention can be billable depending on configuration; verify current model.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Free tier (if applicable)<\/h3>\n\n\n\n<p>OCI has an Always Free tier for certain services and shapes, but not all AI\/analytics services are Always Free. <strong>Verify<\/strong>:\n&#8211; Whether your region supports Always Free resources you intend to use\n&#8211; Whether OCI Data Science has free-tier options (often it is paid usage)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Cost drivers (what makes bills grow)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Leaving <strong>notebook sessions<\/strong> running 24\/7<\/li>\n<li>Running <strong>model deployments<\/strong> continuously with oversized shapes<\/li>\n<li>Reprocessing full datasets instead of incremental loads<\/li>\n<li>Storing multiple copies of large datasets (raw + curated + features + backups) without lifecycle rules<\/li>\n<li>Excessive log retention and verbose logs<\/li>\n<li>Network egress to the internet or across regions<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Hidden or indirect costs<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>NAT Gateway<\/strong> data processing costs (if you use private subnets and outbound internet)<\/li>\n<li><strong>Load balancer \/ API gateway<\/strong> costs if you place inference behind them<\/li>\n<li><strong>Backups<\/strong> and snapshots (databases)<\/li>\n<li><strong>Cross-region replication<\/strong> (Object Storage replication, DR patterns)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network\/data transfer implications<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Intra-region traffic between OCI services is typically cheaper than internet egress, but pricing rules can be nuanced. <strong>Verify<\/strong>:<\/li>\n<li>Egress to the internet<\/li>\n<li>Cross-region transfer<\/li>\n<li>FastConnect costs if hybrid<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>small shapes<\/strong> for notebooks and deployments; scale only when required.<\/li>\n<li>Enforce <strong>auto-shutdown<\/strong> for notebooks (if supported) or implement an operations process.<\/li>\n<li>Use <strong>lifecycle policies<\/strong> in Object Storage to move older data to cheaper tiers or delete it.<\/li>\n<li>Partition data and use efficient formats (Parquet\/ORC) to reduce Spark runtime.<\/li>\n<li>Right-size log ingestion and retention; store only what you need for audit and troubleshooting.<\/li>\n<li>Tag everything; use budgets and alerts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (no fabricated numbers)<\/h3>\n\n\n\n<p>A minimal learning setup typically includes:\n&#8211; 1 small Object Storage bucket (raw + curated + models)\n&#8211; 1 short-lived notebook session for exploration\/training\n&#8211; 1 short-lived model deployment for endpoint testing<\/p>\n\n\n\n<p>Because compute shapes and regional prices vary, use the OCI Cost Estimator to model:\n&#8211; <strong>Notebook hours<\/strong> (e.g., a few hours per week)\n&#8211; <strong>Deployment hours<\/strong> (e.g., run only during tests)\n&#8211; <strong>Storage GB-month<\/strong> (e.g., a few GB)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p>In production, expect additional spending for:\n&#8211; Always-on inference endpoints (possibly multiple for blue\/green or canary)\n&#8211; Data processing schedules (daily\/hourly Spark jobs)\n&#8211; Warehouse compute (ADW auto-scaling)\n&#8211; Network\/security infrastructure (private endpoints, API gateways)\n&#8211; Higher log volume, longer retention, and compliance requirements<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<p>This lab builds a small, real \u201cAI Data Platform\u201d slice on <strong>Oracle Cloud<\/strong>:\n&#8211; Store a dataset in <strong>Object Storage<\/strong>\n&#8211; Train a simple ML model using <strong>OCI Data Science<\/strong>\n&#8211; Deploy the model as an <strong>inference endpoint<\/strong>\n&#8211; Call the endpoint and validate predictions\n&#8211; Clean up safely to avoid ongoing costs<\/p>\n\n\n\n<blockquote>\n<p>Steps reflect common OCI capabilities. Some UI labels and options can change. If you see differences, <strong>follow the latest OCI Data Science documentation<\/strong>: https:\/\/docs.oracle.com\/en-us\/iaas\/data-science\/using\/home.htm<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<p>Build an end-to-end minimal AI workflow on Oracle Cloud that demonstrates the core mechanics of an AI Data Platform: <strong>data storage \u2192 model training \u2192 model deployment \u2192 inference \u2192 operations<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p>You will:\n1. Create (or choose) a compartment for the lab\n2. Create an Object Storage bucket and upload a sample dataset\n3. Create an OCI Data Science project and notebook session\n4. Train a basic model and save the model artifact\n5. Register the model and deploy it\n6. Invoke the endpoint and verify results\n7. Troubleshoot common issues\n8. Clean up all resources<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Prepare your compartment, group, and policies<\/h3>\n\n\n\n<p><strong>Goal:<\/strong> Ensure you have a safe place to create resources and enough IAM permissions.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p>In the OCI Console, create a compartment (or reuse one):\n   &#8211; Name: <code>ai-data-platform-lab<\/code>\n   &#8211; (Optional) Add tags like <code>env=lab<\/code>, <code>owner=&lt;yourname&gt;<\/code><\/p>\n<\/li>\n<li>\n<p>Confirm your user is in a group with policies similar to:\n   &#8211; Manage Object Storage\n   &#8211; Manage Data Science resources<\/p>\n<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> You can create buckets and Data Science resources in the <code>ai-data-platform-lab<\/code> compartment.<\/p>\n\n\n\n<p><strong>Verification:<\/strong>\n&#8211; In the console, open <strong>Identity &amp; Security \u2192 Policies<\/strong> and confirm policies apply to your group and compartment.\n&#8211; If you can create a bucket in Step 2, IAM is likely sufficient.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create an Object Storage bucket and upload a dataset<\/h3>\n\n\n\n<p><strong>Goal:<\/strong> Establish the \u201cdata lake\u201d storage area.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">2.1 Create a bucket<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to <strong>Storage \u2192 Object Storage &amp; Archive Storage \u2192 Buckets<\/strong><\/li>\n<li>Select compartment: <code>ai-data-platform-lab<\/code><\/li>\n<li>Click <strong>Create Bucket<\/strong><\/li>\n<li>Bucket name: <code>ai-data-platform-lab-raw<\/code><\/li>\n<li>Default storage tier is fine for a lab.<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> Bucket exists and is accessible.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">2.2 Upload a sample dataset<\/h4>\n\n\n\n<p>Use <strong>Cloud Shell<\/strong> to create a small dataset locally and upload it.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Open <strong>Cloud Shell<\/strong> from the OCI Console.<\/li>\n<li>Create a working directory and a CSV file:<\/li>\n<\/ol>\n\n\n\n<pre><code class=\"language-bash\">mkdir -p ~\/ai-data-platform-lab\ncd ~\/ai-data-platform-lab\n\ncat &gt; churn_small.csv &lt;&lt; 'EOF'\ncustomer_id,age,tenure_months,monthly_charges,has_internet,churned\nC001,34,12,70.2,1,0\nC002,57,2,95.1,1,1\nC003,45,24,55.0,0,0\nC004,29,5,88.3,1,1\nC005,62,36,40.1,0,0\nC006,41,8,79.5,1,0\nC007,38,3,99.9,1,1\nC008,50,18,65.4,1,0\nEOF\n<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\" start=\"3\">\n<li>Upload it to your bucket using OCI CLI.<\/li>\n<\/ol>\n\n\n\n<p>First, confirm your namespace:<\/p>\n\n\n\n<pre><code class=\"language-bash\">oci os ns get\n<\/code><\/pre>\n\n\n\n<p>Upload:<\/p>\n\n\n\n<pre><code class=\"language-bash\">BUCKET=\"ai-data-platform-lab-raw\"\noci os object put --bucket-name \"$BUCKET\" --file churn_small.csv\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong> The object <code>churn_small.csv<\/code> is stored in Object Storage.<\/p>\n\n\n\n<p><strong>Verification:<\/strong><\/p>\n\n\n\n<pre><code class=\"language-bash\">oci os object list --bucket-name \"$BUCKET\" --query \"data[].name\"\n<\/code><\/pre>\n\n\n\n<p>You should see <code>churn_small.csv<\/code>.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Create an OCI Data Science project and notebook session<\/h3>\n\n\n\n<p><strong>Goal:<\/strong> Create a workspace for training.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to <strong>Analytics &amp; AI \u2192 Data Science<\/strong><\/li>\n<li>Select compartment: <code>ai-data-platform-lab<\/code><\/li>\n<li>\n<p>Create a <strong>Project<\/strong>\n   &#8211; Name: <code>ai-data-platform-lab-project<\/code><\/p>\n<\/li>\n<li>\n<p>Create a <strong>Notebook Session<\/strong> in that project.\n   &#8211; Name: <code>ai-data-platform-lab-nb<\/code>\n   &#8211; Shape: choose a small\/low-cost shape appropriate for the lab (availability varies by region; <strong>verify<\/strong>)\n   &#8211; Networking: for a beginner lab, public networking may be the default. For production, prefer private subnet + service gateway.<\/p>\n<\/li>\n<\/ol>\n\n\n\n<p>Wait until the notebook is in <strong>Active<\/strong> state, then click <strong>Open<\/strong>.<\/p>\n\n\n\n<p><strong>Expected outcome:<\/strong> You have a running notebook environment.<\/p>\n\n\n\n<p><strong>Verification:<\/strong>\n&#8211; Notebook session shows <strong>Active<\/strong>\n&#8211; You can open JupyterLab (or the current notebook UI)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Load data from Object Storage and train a small model<\/h3>\n\n\n\n<p><strong>Goal:<\/strong> Train a simple churn model and package it for deployment.<\/p>\n\n\n\n<p>In the notebook, run the following Python code.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">4.1 Install\/import dependencies<\/h4>\n\n\n\n<p>Most OCI Data Science notebook environments include common packages, but versions vary.<\/p>\n\n\n\n<pre><code class=\"language-python\">import pandas as pd\nimport numpy as np\n\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.metrics import classification_report\n\nimport joblib\nimport os\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">4.2 Download the dataset from Object Storage<\/h4>\n\n\n\n<p>Use the OCI Python SDK. If the SDK isn\u2019t available in your environment, install it (<code>pip install oci<\/code>)\u2014but prefer the prebuilt environment if provided by OCI.<\/p>\n\n\n\n<pre><code class=\"language-python\">import oci\nfrom oci.object_storage import ObjectStorageClient\n\n# Notebook sessions often support resource principals.\n# If resource principal auth doesn't work in your environment, verify the recommended auth method in docs.\n# Fallback approach (less ideal) is using config file auth, but avoid putting keys in notebooks.\nsigner = oci.auth.signers.get_resource_principals_signer()\nconfig = {\"region\": os.environ.get(\"OCI_REGION\")}  # some envs set region automatically\nclient = ObjectStorageClient(config=config, signer=signer)\n\nnamespace = client.get_namespace().data\nbucket_name = \"ai-data-platform-lab-raw\"\nobject_name = \"churn_small.csv\"\n\nresp = client.get_object(namespace, bucket_name, object_name)\ndata = resp.data.content.decode(\"utf-8\")\n\nwith open(\"churn_small.csv\", \"w\") as f:\n    f.write(data)\n\ndf = pd.read_csv(\"churn_small.csv\")\ndf\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong> The dataframe prints with 8 rows and the columns shown in the CSV.<\/p>\n\n\n\n<p><strong>If this fails:<\/strong> Jump to <strong>Troubleshooting<\/strong> \u2192 \u201cResource principal \/ SDK auth errors\u201d.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">4.3 Prepare features and train<\/h4>\n\n\n\n<pre><code class=\"language-python\"># Basic feature prep (toy example)\nX = df[[\"age\", \"tenure_months\", \"monthly_charges\", \"has_internet\"]].astype(float)\ny = df[\"churned\"].astype(int)\n\nX_train, X_test, y_train, y_test = train_test_split(\n    X, y, test_size=0.25, random_state=42, stratify=y\n)\n\nmodel = LogisticRegression()\nmodel.fit(X_train, y_train)\n\npred = model.predict(X_test)\nprint(classification_report(y_test, pred))\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong> A classification report is printed. With a tiny dataset, metrics will be unstable; that\u2019s fine for a deployment mechanics lab.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">4.4 Save the model artifact<\/h4>\n\n\n\n<pre><code class=\"language-python\">artifact_dir = \"model_artifact\"\nos.makedirs(artifact_dir, exist_ok=True)\n\njoblib.dump(model, os.path.join(artifact_dir, \"model.joblib\"))\n\n# Save minimal metadata\nwith open(os.path.join(artifact_dir, \"metadata.txt\"), \"w\") as f:\n    f.write(\"model=logistic_regression\\nfeatures=age,tenure_months,monthly_charges,has_internet\\n\")\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong> <code>model_artifact\/model.joblib<\/code> exists.<\/p>\n\n\n\n<p><strong>Verification:<\/strong><\/p>\n\n\n\n<pre><code class=\"language-python\">os.listdir(\"model_artifact\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Create a scoring script (entrypoint) for deployment<\/h3>\n\n\n\n<p>OCI model deployments need a way to load the model and handle requests. The exact format depends on OCI Data Science deployment requirements and base images.<\/p>\n\n\n\n<p><strong>Verify the current \u201cModel Deployment\u201d serving format in official docs<\/strong> and adapt accordingly:\nhttps:\/\/docs.oracle.com\/en-us\/iaas\/data-science\/using\/model_deployments.htm (navigate from Data Science docs)<\/p>\n\n\n\n<p>Below is a generic Python scoring example (you may need to adapt file names and handler signatures to OCI\u2019s required format).<\/p>\n\n\n\n<p>Create <code>score.py<\/code>:<\/p>\n\n\n\n<pre><code class=\"language-python\">score_py = r'''\nimport json\nimport joblib\nimport numpy as np\nimport os\n\nMODEL_PATH = os.environ.get(\"MODEL_PATH\", \"model.joblib\")\n\n_model = None\n\ndef load_model():\n    global _model\n    if _model is None:\n        _model = joblib.load(MODEL_PATH)\n    return _model\n\ndef predict(data):\n    \"\"\"\n    Expected input JSON:\n    {\n      \"instances\": [\n        {\"age\": 40, \"tenure_months\": 10, \"monthly_charges\": 80.0, \"has_internet\": 1}\n      ]\n    }\n    \"\"\"\n    model = load_model()\n    instances = data.get(\"instances\", [])\n    X = []\n    for row in instances:\n        X.append([\n            float(row[\"age\"]),\n            float(row[\"tenure_months\"]),\n            float(row[\"monthly_charges\"]),\n            float(row[\"has_internet\"]),\n        ])\n    X = np.array(X)\n    proba = model.predict_proba(X)[:, 1].tolist()\n    pred = model.predict(X).tolist()\n    return {\"predictions\": pred, \"probabilities\": proba}\n\ndef handler(input_data: str):\n    try:\n        payload = json.loads(input_data)\n        result = predict(payload)\n        return json.dumps(result)\n    except Exception as e:\n        return json.dumps({\"error\": str(e)})\n'''\nwith open(os.path.join(artifact_dir, \"score.py\"), \"w\") as f:\n    f.write(score_py)\n\nprint(\"Wrote model_artifact\/score.py\")\n<\/code><\/pre>\n\n\n\n<p>Now validate locally:<\/p>\n\n\n\n<pre><code class=\"language-python\">import json, joblib\nos.environ[\"MODEL_PATH\"] = os.path.join(\"model_artifact\", \"model.joblib\")\n\nfrom importlib.machinery import SourceFileLoader\nsc = SourceFileLoader(\"score\", os.path.join(\"model_artifact\", \"score.py\")).load_module()\n\ntest_payload = {\n    \"instances\": [{\"age\": 57, \"tenure_months\": 2, \"monthly_charges\": 95.1, \"has_internet\": 1}]\n}\nprint(sc.handler(json.dumps(test_payload)))\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong> A JSON response with <code>predictions<\/code> and <code>probabilities<\/code>.<\/p>\n\n\n\n<blockquote>\n<p>If OCI requires a different handler interface, keep the model logic but adjust the wrapper to OCI\u2019s required format (verify in docs).<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Upload the model artifact to Object Storage (optional but common)<\/h3>\n\n\n\n<p>Many workflows store model artifacts in Object Storage for traceability.<\/p>\n\n\n\n<p>From the notebook (or Cloud Shell), create a zip:<\/p>\n\n\n\n<pre><code class=\"language-python\">import shutil\nshutil.make_archive(\"churn_model_artifact\", \"zip\", artifact_dir)\n<\/code><\/pre>\n\n\n\n<p>Upload to Object Storage (you can do this from Cloud Shell for simplicity):<\/p>\n\n\n\n<pre><code class=\"language-bash\">cd ~\/ai-data-platform-lab  # or wherever you downloaded the zip\n# If the zip is in the notebook filesystem, download it or upload from the notebook if supported.\n\n# Example upload command (run where the zip exists):\noci os object put --bucket-name ai-data-platform-lab-raw --file churn_model_artifact.zip\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong> Artifact zip exists in the bucket.<\/p>\n\n\n\n<p><strong>Verification:<\/strong><\/p>\n\n\n\n<pre><code class=\"language-bash\">oci os object list --bucket-name ai-data-platform-lab-raw --query \"data[].name\"\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 7: Register the model and create a model deployment<\/h3>\n\n\n\n<p><strong>Goal:<\/strong> Create a managed inference endpoint.<\/p>\n\n\n\n<p>In the OCI Console:\n1. Go to <strong>Analytics &amp; AI \u2192 Data Science \u2192 Projects \u2192 ai-data-platform-lab-project<\/strong>\n2. Find <strong>Models<\/strong> (or \u201cModel catalog\/Model artifacts\u201d depending on UI)\n3. Create a <strong>Model<\/strong>\n   &#8211; Provide a name: <code>churn-lr-v1<\/code>\n   &#8211; Upload the model artifact (zip) or point to an artifact location (depending on supported flow)\n   &#8211; Specify artifact files and runtime details as required (verify exact fields)<\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"4\">\n<li>Create a <strong>Model Deployment<\/strong>\n   &#8211; Name: <code>churn-lr-deploy<\/code>\n   &#8211; Choose a small shape for cost control\n   &#8211; Choose networking mode:<ul>\n<li>For a lab: public endpoint if allowed<\/li>\n<li>For production: private endpoint in a private subnet and expose via API Gateway\/LB<\/li>\n<li>Create the deployment<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p>Wait until status is <strong>Active<\/strong>.<\/p>\n\n\n\n<p><strong>Expected outcome:<\/strong> You have a deployment with an HTTPS endpoint URL.<\/p>\n\n\n\n<p><strong>Verification:<\/strong>\n&#8211; Deployment status shows <strong>Active<\/strong>\n&#8211; Endpoint URL is visible in the deployment details<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 8: Invoke the endpoint<\/h3>\n\n\n\n<p><strong>Goal:<\/strong> Test inference end-to-end.<\/p>\n\n\n\n<p>The authentication method depends on how OCI Data Science secures the endpoint (IAM auth, token-based, etc.). <strong>Verify the current invocation method in the model deployment documentation<\/strong>.<\/p>\n\n\n\n<p>A common pattern is to use an OCI SDK\/CLI-signed request or a provided invocation token. If your deployment provides a simple test interface in console, use it first.<\/p>\n\n\n\n<p>If an HTTP invocation with a bearer token is supported, you might do something like:<\/p>\n\n\n\n<pre><code class=\"language-bash\">ENDPOINT_URL=\"https:\/\/&lt;your-model-deployment-endpoint&gt;\"\nTOKEN=\"&lt;your-token-if-applicable&gt;\"\n\ncurl -sS -X POST \"$ENDPOINT_URL\" \\\n  -H \"Content-Type: application\/json\" \\\n  -H \"Authorization: Bearer $TOKEN\" \\\n  -d '{\n    \"instances\": [\n      {\"age\": 57, \"tenure_months\": 2, \"monthly_charges\": 95.1, \"has_internet\": 1}\n    ]\n  }'\n<\/code><\/pre>\n\n\n\n<p><strong>Expected outcome:<\/strong> JSON response with prediction and probability.<\/p>\n\n\n\n<p><strong>If you receive 401\/403:<\/strong> see Troubleshooting \u2192 \u201cEndpoint auth errors\u201d.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p>Use this checklist:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>Object Storage<\/strong>\n   &#8211; Bucket exists\n   &#8211; <code>churn_small.csv<\/code> exists<\/p>\n<\/li>\n<li>\n<p><strong>Notebook<\/strong>\n   &#8211; Notebook session is Active\n   &#8211; You successfully read the dataset and trained a model\n   &#8211; Artifact contains <code>model.joblib<\/code> and <code>score.py<\/code><\/p>\n<\/li>\n<li>\n<p><strong>Model<\/strong>\n   &#8211; Model registered successfully<\/p>\n<\/li>\n<li>\n<p><strong>Deployment<\/strong>\n   &#8211; Deployment is Active\n   &#8211; Endpoint invocation returns predictions<\/p>\n<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: \u201cNotAuthorizedOrNotFound\u201d when reading Object Storage<\/h4>\n\n\n\n<p><strong>Cause:<\/strong> IAM policy missing for Object Storage or wrong compartment\/bucket name.<br\/>\n<strong>Fix:<\/strong>\n&#8211; Confirm bucket compartment and name\n&#8211; Confirm group policies include <code>manage object-family<\/code> (or at least read access)\n&#8211; Verify you\u2019re using the correct namespace and region<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: Resource principal \/ SDK auth errors in notebook<\/h4>\n\n\n\n<p><strong>Cause:<\/strong> Notebook environment not configured for resource principals, or code expects env vars that aren\u2019t set.<br\/>\n<strong>Fix:<\/strong>\n&#8211; Check OCI Data Science docs for the recommended authentication method for notebooks.\n&#8211; Try retrieving region from notebook metadata or set it explicitly in the SDK config.\n&#8211; As a last resort (not recommended for shared environments), use OCI config file auth\u2014prefer secure methods.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: Deployment fails to start \/ stuck provisioning<\/h4>\n\n\n\n<p><strong>Cause:<\/strong> Shape quota exceeded, missing network access, artifact format mismatch, or required handler not found.<br\/>\n<strong>Fix:<\/strong>\n&#8211; Check <strong>Work Requests<\/strong> and deployment logs in the console\n&#8211; Confirm quotas in <strong>Service Limits<\/strong>\n&#8211; Verify your artifact structure matches OCI\u2019s required format (this is the most common issue)\n&#8211; Try a simpler base runtime or official sample format (verify in docs)<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: Endpoint returns 500 errors<\/h4>\n\n\n\n<p><strong>Cause:<\/strong> Scoring script exception, missing dependency, incorrect request format.<br\/>\n<strong>Fix:<\/strong>\n&#8211; Check deployment logs\n&#8211; Validate JSON schema your handler expects\n&#8211; Add defensive checks and better error output in the scoring script<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: Endpoint auth errors (401\/403)<\/h4>\n\n\n\n<p><strong>Cause:<\/strong> Wrong auth method\/token, calling from unauthorized principal\/network.<br\/>\n<strong>Fix:<\/strong>\n&#8211; Verify the endpoint\u2019s required auth method in docs\n&#8211; Ensure caller identity has permissions\n&#8211; If endpoint is private, call from within the VCN (bastion\/compute\/Cloud Shell may not have access)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p>To avoid ongoing charges, clean up in this order:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Delete model deployment<\/strong> (<code>churn-lr-deploy<\/code>)<\/li>\n<li><strong>Delete model registration<\/strong> (<code>churn-lr-v1<\/code>) if not needed<\/li>\n<li><strong>Terminate notebook session<\/strong> (<code>ai-data-platform-lab-nb<\/code>)<\/li>\n<li><strong>Delete Object Storage objects and bucket<\/strong>\n   &#8211; Delete objects (<code>churn_small.csv<\/code>, <code>churn_model_artifact.zip<\/code>) then delete the bucket<\/li>\n<li>(Optional) Delete the project<\/li>\n<li>(Optional) Delete the compartment (only if it was created solely for this lab and is empty)<\/li>\n<\/ol>\n\n\n\n<p><strong>Verification:<\/strong> In the compartment, ensure no running Data Science resources remain and the bucket is removed.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Separate zones in Object Storage:<\/li>\n<li><code>raw\/<\/code> for immutable ingested data<\/li>\n<li><code>curated\/<\/code> for cleaned, conformed datasets<\/li>\n<li><code>features\/<\/code> for ML features<\/li>\n<li><code>models\/<\/code> for model artifacts<\/li>\n<li>Use <strong>immutable data snapshots<\/strong> for training reproducibility (date-partitioned paths).<\/li>\n<li>Prefer <strong>private networking<\/strong> for production notebooks, pipelines, and endpoints.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>least privilege<\/strong> and separate roles:<\/li>\n<li>Data engineers: manage pipelines + storage paths<\/li>\n<li>Data scientists: read curated\/features, write models, manage notebooks<\/li>\n<li>ML engineers\/operators: manage deployments in prod<\/li>\n<li>Use separate compartments for <code>dev<\/code>, <code>test<\/code>, <code>prod<\/code>.<\/li>\n<li>Enforce mandatory tags: <code>env<\/code>, <code>owner<\/code>, <code>cost-center<\/code>, <code>data-classification<\/code>.<\/li>\n<li>Avoid embedding API keys and secrets in notebooks; use <strong>OCI Vault<\/strong> and resource principals where supported.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Auto-stop notebooks; delete unused model deployments.<\/li>\n<li>Use lifecycle rules to transition or delete old raw\/temporary data.<\/li>\n<li>Compress and columnarize data (Parquet) to reduce Spark runtime.<\/li>\n<li>Track cost by tags; set budgets and alerts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Partition datasets by date and common query keys.<\/li>\n<li>For Spark, tune:<\/li>\n<li>number of partitions<\/li>\n<li>file sizes (avoid too many tiny files)<\/li>\n<li>caching strategies for repeated transformations<\/li>\n<li>For model deployments, load the model once at startup (lazy load) and reuse for requests.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Make pipelines <strong>idempotent<\/strong>: rerunning a job should not corrupt curated data.<\/li>\n<li>Use atomic writes:<\/li>\n<li>write to a temporary path<\/li>\n<li>validate<\/li>\n<li>then move\/rename to final path if supported (object storage semantics differ; design carefully)<\/li>\n<li>Add retry logic and dead-letter\/quarantine patterns for bad data.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Centralize logs and define standard dashboards:<\/li>\n<li>pipeline success rate<\/li>\n<li>data freshness<\/li>\n<li>endpoint latency and 5xx rate<\/li>\n<li>Use runbooks and on-call ownership for production endpoints.<\/li>\n<li>Keep a clear model release process (approval gates, canary, rollback).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Naming convention example:<\/li>\n<li>Buckets: <code>&lt;org&gt;-&lt;env&gt;-ai-raw<\/code>, <code>&lt;org&gt;-&lt;env&gt;-ai-curated<\/code><\/li>\n<li>Models: <code>&lt;usecase&gt;-&lt;algo&gt;-v&lt;version&gt;<\/code><\/li>\n<li>Deployments: <code>&lt;usecase&gt;-&lt;env&gt;-&lt;version&gt;<\/code><\/li>\n<li>Tag everything at creation time; enforce via policy\/process where possible.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OCI <strong>IAM<\/strong> controls access with:<\/li>\n<li>Users, groups, dynamic groups<\/li>\n<li>Policies scoped to compartments<\/li>\n<li>For workloads, prefer <strong>resource principals<\/strong> (or instance principals) where supported to avoid distributing API keys. <strong>Verify service support<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>At rest:<\/strong> OCI services typically encrypt data at rest by default (service-dependent). Verify encryption behavior for:<\/li>\n<li>Object Storage buckets<\/li>\n<li>Data Science artifacts<\/li>\n<li>Databases<\/li>\n<li><strong>In transit:<\/strong> Use TLS for API calls and endpoint invocation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid public endpoints for:<\/li>\n<li>notebooks<\/li>\n<li>internal datasets<\/li>\n<li>production inference<\/li>\n<li>Use private subnets and restrict traffic with NSGs.<\/li>\n<li>Use API Gateway\/WAF patterns when exposing inference externally.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Store secrets in <strong>OCI Vault<\/strong> (API keys, DB passwords).<\/li>\n<li>Never store secrets in notebooks, git repos, or plain-text object storage.<\/li>\n<li>Use rotation policies where feasible.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enable <strong>OCI Audit<\/strong> (usually enabled by default at tenancy level; verify).<\/li>\n<li>Ensure logs exist for:<\/li>\n<li>model deployments (requests\/errors)<\/li>\n<li>pipeline runs<\/li>\n<li>administrative actions (policy changes, endpoint changes)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data residency: choose OCI regions that meet residency requirements.<\/li>\n<li>PII\/PHI handling:<\/li>\n<li>minimize access<\/li>\n<li>masking\/tokenization where required<\/li>\n<li>separate compartments and stricter policies for sensitive datasets<\/li>\n<li>Retention policies: define and enforce retention per dataset class.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Public Object Storage buckets containing customer data<\/li>\n<li>Overbroad policies (<code>manage all-resources in tenancy<\/code>)<\/li>\n<li>Shared user accounts for notebooks and deployments<\/li>\n<li>No separation between dev and prod compartments<\/li>\n<li>No audit review process for who deployed models and when<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Private inference endpoints + API gateway with auth and rate limiting<\/li>\n<li>Use mTLS or OAuth-based patterns where appropriate (implementation-dependent)<\/li>\n<li>Implement approval gates for promoting models to production<\/li>\n<li>Continuous vulnerability scanning for custom images (if you use them; verify OCI support)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<blockquote>\n<p>These are common platform-level pitfalls; service-specific limits change over time. <strong>Verify limits and current behaviors in official OCI docs and service limits pages.<\/strong><\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Known limitations (typical)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Service availability varies by region<\/strong> for Data Integration, Data Catalog, Data Science, and AI services.<\/li>\n<li>Model deployment formats and supported frameworks can be restrictive; confirm supported runtimes.<\/li>\n<li>Some orchestration needs may exceed what GUI-based tools provide; you may need custom orchestration.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data Science compute quotas and deployment counts can block scaling.<\/li>\n<li>Spark job concurrency and maximum resources are quota-bound.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Regional constraints<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Some AI services (especially generative AI) may have limited regional availability.<\/li>\n<li>Cross-region DR increases complexity and cost.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing surprises<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Leaving deployments and notebooks running is the #1 surprise.<\/li>\n<li>NAT\/data egress costs can appear when private subnets access the internet.<\/li>\n<li>Log volume and retention can generate unexpected costs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compatibility issues<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>SDK versions in notebook environments may differ from your local environment.<\/li>\n<li>Artifact packaging requirements can change; use OCI-provided examples.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational gotchas<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>No consistent tagging leads to \u201corphaned\u201d resources and waste.<\/li>\n<li>Lack of data versioning breaks reproducibility.<\/li>\n<li>Training on mutable datasets causes \u201cmodel drift\u201d debugging nightmares.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Migration challenges<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Moving from on-prem ETL tools requires rethinking governance and scheduling.<\/li>\n<li>Data formats and partitioning strategies matter; lift-and-shift files without optimization can perform poorly.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Vendor-specific nuances<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OCI IAM policies are powerful but can be tricky; test policies in lower environments.<\/li>\n<li>Service-to-service private access patterns (service gateways, private endpoints) require careful network design.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<p>An AI Data Platform is more about the <em>assembled architecture<\/em> than any single component. Here\u2019s how Oracle Cloud\u2019s typical approach compares.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Nearest services in Oracle Cloud<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI Data Science<\/strong>: ML training and deployment (core AI layer)<\/li>\n<li><strong>Oracle Analytics Cloud (OAC)<\/strong>: BI dashboards and self-service analytics<\/li>\n<li><strong>Autonomous Data Warehouse (ADW)<\/strong>: curated analytics store<\/li>\n<li><strong>OCI Data Integration \/ Data Flow<\/strong>: data pipelines and transformation<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Nearest services in other clouds<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AWS<\/strong>: S3 + Glue + Lake Formation + SageMaker<\/li>\n<li><strong>Azure<\/strong>: Fabric \/ Synapse patterns + Azure ML + Purview<\/li>\n<li><strong>Google Cloud<\/strong>: GCS + Dataflow\/Dataproc + Vertex AI + Dataplex<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Open-source \/ self-managed alternatives<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Databricks (managed platform, multi-cloud)<\/li>\n<li>Kubeflow + MLflow + Airflow on Kubernetes<\/li>\n<li>Spark + Airflow + a model server (KServe\/Triton) self-managed<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Comparison table<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Oracle Cloud AI Data Platform (pattern using OCI services)<\/strong><\/td>\n<td>Organizations building governed analytics + ML on OCI<\/td>\n<td>Strong OCI integration (IAM, compartments), enterprise security patterns, managed building blocks<\/td>\n<td>Requires architecture decisions across multiple services; availability varies by region<\/td>\n<td>You already run workloads on OCI and need a consistent data\u2192ML\u2192deployment platform<\/td>\n<\/tr>\n<tr>\n<td><strong>OCI Data Science only<\/strong><\/td>\n<td>Small ML teams that already have data pipelines<\/td>\n<td>Quick start for notebooks\/training\/deployments<\/td>\n<td>Doesn\u2019t solve ingestion\/governance alone<\/td>\n<td>You have a curated dataset already and need to operationalize ML<\/td>\n<\/tr>\n<tr>\n<td><strong>Oracle Analytics Cloud + ADW<\/strong><\/td>\n<td>BI-first organizations<\/td>\n<td>Fast analytics, governed warehouse approach<\/td>\n<td>ML ops may still need Data Science integration<\/td>\n<td>Primary goal is dashboards\/semantic metrics with some ML augmentation<\/td>\n<\/tr>\n<tr>\n<td><strong>AWS Lakehouse + SageMaker<\/strong><\/td>\n<td>AWS-native teams<\/td>\n<td>Broad ecosystem, many integrations<\/td>\n<td>Complexity; governance requires careful setup<\/td>\n<td>You are standardized on AWS and want deep integration with AWS services<\/td>\n<\/tr>\n<tr>\n<td><strong>Azure Fabric\/Azure ML<\/strong><\/td>\n<td>Microsoft-native teams<\/td>\n<td>Integrated governance with Purview, strong enterprise identity patterns<\/td>\n<td>Cost and complexity; product boundaries evolving<\/td>\n<td>You are standardized on Azure\/Microsoft and need integrated data+AI<\/td>\n<\/tr>\n<tr>\n<td><strong>GCP Vertex AI + BigQuery<\/strong><\/td>\n<td>Teams with BigQuery-centric analytics<\/td>\n<td>Strong managed ML integrations, scalable analytics<\/td>\n<td>Region\/service constraints, learning curve<\/td>\n<td>You run analytics primarily in BigQuery and want tight ML integration<\/td>\n<\/tr>\n<tr>\n<td><strong>Self-managed OSS (Spark\/Airflow\/MLflow\/KServe)<\/strong><\/td>\n<td>Highly customized needs, strict portability<\/td>\n<td>Maximum control, portability, no vendor lock-in<\/td>\n<td>High ops burden, security hardening required<\/td>\n<td>You have a mature platform team and need bespoke workflows or portability<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example: Bank fraud scoring + governance<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Fraud models exist in research but can\u2019t be deployed safely. Data is sensitive (PII), and audits require traceability.<\/li>\n<li><strong>Proposed architecture (OCI):<\/strong><\/li>\n<li>Ingest transactions to <strong>Object Storage raw<\/strong><\/li>\n<li>Transform and create features via <strong>Data Integration<\/strong> and\/or <strong>Data Flow<\/strong><\/li>\n<li>Store curated aggregates in <strong>Autonomous Data Warehouse<\/strong><\/li>\n<li>Catalog datasets and apply classifications in <strong>OCI Data Catalog<\/strong><\/li>\n<li>Train models in <strong>OCI Data Science<\/strong> using curated snapshots<\/li>\n<li>Deploy model endpoints privately in a <strong>VCN<\/strong>, exposed via <strong>API Gateway<\/strong><\/li>\n<li>Use <strong>Vault<\/strong> for secrets, <strong>Audit<\/strong> for change tracking, <strong>Logging\/Monitoring<\/strong> for ops<\/li>\n<li><strong>Why this service approach was chosen:<\/strong> Strong compartment-based governance, private networking patterns, and managed training\/deployment reduce risk and accelerate compliance alignment.<\/li>\n<li><strong>Expected outcomes:<\/strong><\/li>\n<li>Faster model promotion to production<\/li>\n<li>Audit-ready trail of model versions, approvals, and deployments<\/li>\n<li>Reduced data leakage risk via least-privilege and private endpoints<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example: SaaS churn prediction<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> A small SaaS company wants churn prediction and a weekly \u201cat-risk customers\u201d list. They have limited ops capacity.<\/li>\n<li><strong>Proposed architecture (OCI):<\/strong><\/li>\n<li>Upload product usage exports to <strong>Object Storage<\/strong><\/li>\n<li>Use a lightweight transform (Data Integration or scripted job) to curate features<\/li>\n<li>Train a model in <strong>OCI Data Science<\/strong> weekly<\/li>\n<li>Deploy a small endpoint or do batch scoring and write results back to Object Storage<\/li>\n<li>Basic monitoring and budget alerts<\/li>\n<li><strong>Why this service approach was chosen:<\/strong> Minimal infrastructure management; pay-as-you-go scaling; easy to start small and grow.<\/li>\n<li><strong>Expected outcomes:<\/strong><\/li>\n<li>A repeatable weekly churn score process<\/li>\n<li>A simple API or report that customer success can use<\/li>\n<li>Cost control via scheduled jobs and turning off idle resources<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1) Is \u201cAI Data Platform\u201d a single OCI service?<\/h3>\n\n\n\n<p>Often it\u2019s used as a <strong>platform\/solution term<\/strong> rather than one SKU. In OCI, you typically assemble it from services like Object Storage, Data Integration, Data Catalog, Data Flow, Autonomous Database, and Data Science. <strong>Verify in official docs<\/strong> if your tenancy includes a specifically branded \u201cAI Data Platform\u201d service.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2) What is the minimum set of services to get started?<\/h3>\n\n\n\n<p>A minimal setup is:\n&#8211; <strong>Object Storage<\/strong> for datasets and artifacts\n&#8211; <strong>OCI Data Science<\/strong> for training and deployment<br\/>\nOptionally add Data Catalog and Data Integration as you scale governance and pipelines.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3) Do I need a VCN for the platform?<\/h3>\n\n\n\n<p>For simple labs, you may use public endpoints. For production, you should plan a <strong>VCN with private subnets<\/strong>, service gateway access to Object Storage, and controlled ingress\/egress.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4) How do I prevent public data exposure?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid public buckets<\/li>\n<li>Use least-privilege IAM policies<\/li>\n<li>Use private endpoints and NSGs for workloads<\/li>\n<li>Use Vault for secrets and never store credentials in plain text<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) How do I version datasets for reproducible training?<\/h3>\n\n\n\n<p>Use immutable paths, for example:\n&#8211; <code>oci:\/\/bucket@namespace\/raw\/transactions\/dt=2026-04-16\/<\/code>\nand store the training snapshot path with the model metadata.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">6) How do I version models?<\/h3>\n\n\n\n<p>Use a model naming\/metadata standard:\n&#8211; <code>fraud-xgboost-v12<\/code>\nand record:\n&#8211; training dataset snapshot\n&#8211; code version (git commit)\n&#8211; hyperparameters\n&#8211; evaluation metrics<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">7) Can I do streaming ingestion?<\/h3>\n\n\n\n<p>Yes, but it depends on which OCI services you select (for example, OCI Streaming) and your architecture. This tutorial focuses on a batch workflow. <strong>Verify streaming service availability and integration patterns<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">8) What formats should I store in Object Storage?<\/h3>\n\n\n\n<p>For analytics and Spark, prefer columnar formats like <strong>Parquet<\/strong> where possible. For interchange and small datasets, CSV is fine.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">9) How do I control costs in Data Science?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Stop notebook sessions when not in use<\/li>\n<li>Delete or scale down model deployments<\/li>\n<li>Use budgets and tags<\/li>\n<li>Right-size shapes and avoid always-on endpoints for infrequent use cases<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) How do I secure model endpoints?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Private deployment in a VCN<\/li>\n<li>API Gateway with authentication\/authorization<\/li>\n<li>Rate limiting and request validation<\/li>\n<li>Logging and alerting on unusual error rates<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) What monitoring should I implement first?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Pipeline\/job success\/failure<\/li>\n<li>Data freshness (time since last successful load)<\/li>\n<li>Endpoint latency, throughput, and 4xx\/5xx rates<\/li>\n<li>Budget alerts for spend anomalies<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) How do I handle PII?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Classify datasets and restrict access<\/li>\n<li>Mask\/tokenize where required<\/li>\n<li>Use separate compartments and tighter policies for sensitive data<\/li>\n<li>Ensure audit trails are retained per compliance needs<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">13) Can I use third-party tools with OCI?<\/h3>\n\n\n\n<p>Yes. Many teams use Terraform, GitHub Actions, dbt, Airflow, MLflow, and BI tools. Integration approach varies; validate security and network connectivity.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">14) Is this a replacement for a data warehouse?<\/h3>\n\n\n\n<p>Not necessarily. Many AI platforms use both:\n&#8211; Object Storage for raw and large-scale data\n&#8211; A warehouse (ADW) for curated analytics and governed SQL consumption<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">15) What is the fastest path from PoC to production?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Start with a minimal pipeline and a single model<\/li>\n<li>Add governance (catalog, tags, access controls)<\/li>\n<li>Add CI\/CD for training and deployments<\/li>\n<li>Add monitoring\/alerting and runbooks<\/li>\n<li>Introduce private networking and approval gates for production<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn AI Data Platform<\/h2>\n\n\n\n<p>Because AI Data Platform on Oracle Cloud is built from multiple services, the best learning path is to follow official OCI documentation for each building block.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI Documentation Home \u2014 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/home.htm<\/td>\n<td>Entry point for all OCI services used in an AI Data Platform<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI Data Science \u2014 https:\/\/docs.oracle.com\/en-us\/iaas\/data-science\/using\/home.htm<\/td>\n<td>Core docs for notebooks, training jobs, and model deployments<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI Object Storage \u2014 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Object\/home.htm<\/td>\n<td>Data lake storage fundamentals, security, lifecycle policies<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI Data Integration \u2014 https:\/\/docs.oracle.com\/en-us\/iaas\/data-integration\/using\/home.htm<\/td>\n<td>Building and operating ingestion\/transformation pipelines<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI Data Catalog \u2014 https:\/\/docs.oracle.com\/en-us\/iaas\/data-catalog\/using\/home.htm<\/td>\n<td>Metadata management and data discovery patterns<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI Data Flow (Spark) \u2014 https:\/\/docs.oracle.com\/en-us\/iaas\/data-flow\/using\/home.htm<\/td>\n<td>Serverless Spark for transformations and feature engineering<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI IAM \u2014 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Identity\/home.htm<\/td>\n<td>Policies, groups, compartments, and security boundaries<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI Vault \u2014 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/KeyManagement\/home.htm<\/td>\n<td>Key management and secret handling patterns<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI Logging \u2014 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Logging\/home.htm<\/td>\n<td>Central logging for pipelines and deployments<\/td>\n<\/tr>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI Audit \u2014 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Audit\/home.htm<\/td>\n<td>Change tracking and compliance evidence<\/td>\n<\/tr>\n<tr>\n<td>Official pricing<\/td>\n<td>Oracle Cloud Pricing \u2014 https:\/\/www.oracle.com\/cloud\/pricing\/<\/td>\n<td>Official service pricing and SKU breakdowns<\/td>\n<\/tr>\n<tr>\n<td>Official cost estimator<\/td>\n<td>OCI Cost Estimator \u2014 https:\/\/www.oracle.com\/cloud\/costestimator.html<\/td>\n<td>Build region-specific cost estimates without guessing numbers<\/td>\n<\/tr>\n<tr>\n<td>Official tutorials\/labs<\/td>\n<td>Oracle Cloud Free Tier \/ Getting Started \u2014 https:\/\/www.oracle.com\/cloud\/free\/<\/td>\n<td>Account setup and free-tier guidance (eligibility varies)<\/td>\n<\/tr>\n<tr>\n<td>Official videos<\/td>\n<td>Oracle Cloud Infrastructure YouTube \u2014 https:\/\/www.youtube.com\/user\/OracleCloudInfrastructure<\/td>\n<td>Product walkthroughs and architecture sessions (verify relevance to your services)<\/td>\n<\/tr>\n<tr>\n<td>Official samples (verify)<\/td>\n<td>Oracle OCI GitHub org \u2014 https:\/\/github.com\/oracle<\/td>\n<td>Many OCI examples exist; search for Data Science and data platform samples and verify they are maintained<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>Beginners to experienced engineers<\/td>\n<td>DevOps, cloud operations, CI\/CD, platform practices that support data\/AI platforms<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Students and early-career professionals<\/td>\n<td>Software lifecycle, DevOps fundamentals, tooling basics<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud engineers, SREs, ops teams<\/td>\n<td>Cloud operations practices, monitoring, reliability, cost awareness<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs, platform engineers<\/td>\n<td>Reliability engineering, incident response, observability, SLIs\/SLOs<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops teams and platform teams<\/td>\n<td>AIOps concepts, monitoring automation, operational analytics<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>Cloud\/DevOps training and guidance (verify offerings)<\/td>\n<td>Engineers seeking mentoring-style learning<\/td>\n<td>https:\/\/rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps tooling and practices (verify offerings)<\/td>\n<td>Beginners to intermediate DevOps learners<\/td>\n<td>https:\/\/www.devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps services\/training resources (verify offerings)<\/td>\n<td>Teams seeking short-term help or coaching<\/td>\n<td>https:\/\/www.devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support and learning resources (verify offerings)<\/td>\n<td>Ops\/DevOps teams needing practical support<\/td>\n<td>https:\/\/www.devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company Name<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps consulting (verify service catalog)<\/td>\n<td>Platform setup, automation, operational practices<\/td>\n<td>OCI landing zone planning, IaC pipelines, monitoring setups<\/td>\n<td>https:\/\/cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>Training + consulting (verify consulting offerings)<\/td>\n<td>Skills enablement and process\/tooling improvements<\/td>\n<td>MLOps enablement workshops, CI\/CD design for data\/AI workloads<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting (verify scope)<\/td>\n<td>DevOps transformations, toolchain design<\/td>\n<td>Cost governance, logging\/monitoring, secure release pipelines<\/td>\n<td>https:\/\/www.devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before this service (recommended foundations)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>OCI fundamentals<\/strong>\n   &#8211; Compartments, IAM, VCN basics, regions\/availability domains<br\/>\n   Docs: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/home.htm<\/li>\n<li><strong>Linux and CLI basics<\/strong>\n   &#8211; Cloud Shell, bash, basic networking<\/li>\n<li><strong>Data fundamentals<\/strong>\n   &#8211; CSV\/JSON\/Parquet, schema design, partitioning\n   &#8211; SQL fundamentals<\/li>\n<li><strong>Python fundamentals<\/strong>\n   &#8211; pandas, scikit-learn basics, packaging<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after this service (to operate it in production)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Infrastructure as Code<\/strong>\n   &#8211; Terraform for OCI resource provisioning (verify OCI Terraform provider docs)<\/li>\n<li><strong>Data engineering at scale<\/strong>\n   &#8211; Spark tuning, incremental ETL, data quality frameworks<\/li>\n<li><strong>MLOps<\/strong>\n   &#8211; Model CI\/CD, deployment strategies (blue\/green, canary), model monitoring<\/li>\n<li><strong>Security engineering<\/strong>\n   &#8211; Private networking, Vault integrations, policy reviews, threat modeling<\/li>\n<li><strong>Reliability<\/strong>\n   &#8211; SLOs for data freshness and inference latency, incident response<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use it<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud Engineer \/ Platform Engineer<\/li>\n<li>Data Engineer<\/li>\n<li>Analytics Engineer<\/li>\n<li>ML Engineer \/ MLOps Engineer<\/li>\n<li>Data Scientist (production-oriented)<\/li>\n<li>Security Engineer (cloud\/data security)<\/li>\n<li>SRE (for AI services and data pipelines)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (if available)<\/h3>\n\n\n\n<p>Oracle Cloud certifications change over time and by track. Use Oracle\u2019s official certification site to verify current OCI certifications relevant to data\/AI:\n&#8211; https:\/\/education.oracle.com\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Build a multi-zone data lake with lifecycle rules and tags<\/li>\n<li>Implement a daily batch pipeline with validation and quarantine<\/li>\n<li>Train a model weekly and deploy a canary endpoint<\/li>\n<li>Add budgets, alerts, and dashboards for pipeline health and spend<\/li>\n<li>Implement private endpoint inference with API Gateway and audit logging<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Compartment (OCI):<\/strong> Logical container to isolate and manage resources with IAM policies.<\/li>\n<li><strong>IAM Policy (OCI):<\/strong> Text-based rule that grants permissions to groups or dynamic groups within a scope.<\/li>\n<li><strong>Object Storage:<\/strong> Storage service for unstructured data objects; used for data lakes and artifacts.<\/li>\n<li><strong>Data lake:<\/strong> Storage-centric architecture storing raw and curated datasets for multiple uses.<\/li>\n<li><strong>ETL\/ELT:<\/strong> Extract-Transform-Load \/ Extract-Load-Transform data processing patterns.<\/li>\n<li><strong>Feature engineering:<\/strong> Creating model input variables (features) from raw data.<\/li>\n<li><strong>Model artifact:<\/strong> Packaged files needed to run inference (model weights, preprocessing code).<\/li>\n<li><strong>Inference endpoint:<\/strong> Network-accessible service that returns predictions from a deployed model.<\/li>\n<li><strong>Resource principal:<\/strong> OCI mechanism allowing a resource (like a notebook\/job) to call OCI APIs without user API keys (support varies; verify).<\/li>\n<li><strong>VCN:<\/strong> Virtual Cloud Network\u2014your private network in OCI.<\/li>\n<li><strong>Service Gateway:<\/strong> Enables private access from a VCN to OCI public services (like Object Storage) without internet.<\/li>\n<li><strong>NSG (Network Security Group):<\/strong> Virtual firewall rules applied to VNICs for traffic control.<\/li>\n<li><strong>Audit logs:<\/strong> Records of API calls and administrative actions for compliance and troubleshooting.<\/li>\n<li><strong>Tagging:<\/strong> Applying metadata to resources for governance, cost tracking, and automation.<\/li>\n<li><strong>Data freshness:<\/strong> How up-to-date your curated datasets are relative to sources.<\/li>\n<li><strong>Idempotent pipeline:<\/strong> A pipeline that can run multiple times without producing incorrect duplicates or corruption.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p><strong>AI Data Platform<\/strong> on <strong>Oracle Cloud<\/strong> (Analytics and AI) is best treated as an <strong>end-to-end platform architecture<\/strong>\u2014not just one feature\u2014built from OCI services such as <strong>Object Storage<\/strong>, <strong>Data Integration<\/strong>, <strong>Data Catalog<\/strong>, <strong>Data Flow<\/strong>, <strong>Autonomous Database<\/strong>, and <strong>OCI Data Science<\/strong>.<\/p>\n\n\n\n<p>It matters because AI outcomes depend on disciplined data ingestion, governance, and operationalized model deployment. On the cost side, the biggest drivers are <strong>running compute<\/strong> (notebooks, Spark jobs, model deployments) and <strong>data growth<\/strong> (storage copies, retention). On the security side, success depends on <strong>compartment design, least-privilege IAM, private networking for production, Vault-backed secrets, and auditability<\/strong>.<\/p>\n\n\n\n<p>Use this approach when you need a repeatable, governed path from data to deployed AI in OCI. Start small (Object Storage + Data Science), then add governance and pipeline automation as your scope grows.<\/p>\n\n\n\n<p><strong>Next learning step:<\/strong> Follow the official OCI Data Science and Object Storage documentation, then expand your lab into a multi-compartment dev\/test\/prod setup with tagging, budgets, and private endpoints:\n&#8211; https:\/\/docs.oracle.com\/en-us\/iaas\/data-science\/using\/home.htm\n&#8211; https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Object\/home.htm<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Analytics and AI<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[66,62],"tags":[],"class_list":["post-833","post","type-post","status-publish","format-standard","hentry","category-analytics-and-ai","category-oracle-cloud"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/833","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=833"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/833\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=833"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=833"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=833"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}