{"id":918,"date":"2026-04-16T16:56:43","date_gmt":"2026-04-16T16:56:43","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/oracle-cloud-data-integrator-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-integration\/"},"modified":"2026-04-16T16:56:43","modified_gmt":"2026-04-16T16:56:43","slug":"oracle-cloud-data-integrator-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-integration","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/oracle-cloud-data-integrator-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-integration\/","title":{"rendered":"Oracle Cloud Data Integrator Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Integration"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Integration<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What this service is<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Data Integrator<\/strong> in <strong>Oracle Cloud<\/strong> is a managed, cloud-native service used to design, run, and operationalize data ingestion and transformation workflows\u2014typically moving data between Oracle and non-Oracle sources, files in Object Storage, and target analytics stores such as Autonomous Database.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">One-paragraph simple explanation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">If you need to <strong>load data from one place to another on a schedule<\/strong> (for example, CSV files in Object Storage into an Autonomous Data Warehouse table), Data Integrator provides a <strong>visual, managed<\/strong> way to build that pipeline, run it reliably, and monitor it\u2014without standing up and maintaining your own ETL servers.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">One-paragraph technical explanation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Technically, Data Integrator is an OCI-managed data integration runtime with a <strong>design-time studio<\/strong> (projects, data assets, connections, data flows\/pipelines, tasks, schedules) and an <strong>execution engine<\/strong> that runs jobs in Oracle Cloud. It integrates with OCI Identity and Access Management (IAM) for control plane authorization and can connect to data sources\/targets via OCI networking (public endpoints and\/or private connectivity depending on your configuration). It emits operational telemetry through OCI logging\/monitoring capabilities (availability varies by feature; verify in official docs).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What problem it solves<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Data Integrator solves the common problems of:\n&#8211; <strong>Building repeatable ETL\/ELT pipelines<\/strong> without custom scripts per dataset\n&#8211; <strong>Reducing operational overhead<\/strong> (patching\/maintaining ETL servers)\n&#8211; <strong>Standardizing ingestion and transformation<\/strong> across teams\n&#8211; <strong>Scheduling and monitoring<\/strong> data movement jobs in a governed way\n&#8211; <strong>Integrating with Oracle Cloud data platforms<\/strong> (Object Storage, Autonomous Database, and other OCI services)<\/p>\n\n\n\n<blockquote>\n<p>Naming note (important): In current Oracle Cloud documentation and Console navigation, the managed service is commonly labeled <strong>\u201cData Integration\u201d<\/strong> (OCI Data Integration). The term <strong>\u201cOracle Data Integrator (ODI)\u201d<\/strong> is also a separate, long-standing product (often on-premises or self-managed). This tutorial uses <strong>Data Integrator<\/strong> as the primary name (as requested) and maps it to the <strong>OCI-managed Data Integration service<\/strong>. Verify the exact branding in your region\/console because Oracle product names can evolve.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is Data Integrator?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Official purpose<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Data Integrator\u2019s purpose in Oracle Cloud is to provide a <strong>managed data integration service<\/strong> to <strong>ingest, transform, and load data<\/strong> across common enterprise sources and targets, with orchestration, scheduling, and monitoring.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Core capabilities<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Typical capabilities include:\n&#8211; <strong>Design-time development<\/strong> of integration logic using a web UI (projects, data flows\/pipelines)\n&#8211; <strong>Connectivity<\/strong> to common sources\/targets (Object Storage, Oracle databases, and other supported systems)\n&#8211; <strong>Data preparation\/transformation<\/strong> (mappings, joins, filters, derived columns\u2014exact transforms depend on connector\/runtime; verify in official docs)\n&#8211; <strong>Orchestration<\/strong> (pipelines\/tasks, dependencies, schedules)\n&#8211; <strong>Operational management<\/strong> (job runs, status, logs\/diagnostics)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Major components (conceptual)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">While exact names in the UI can vary, the service typically revolves around:\n&#8211; <strong>Workspace<\/strong>: Top-level environment in a region\/compartment where you build and run integrations\n&#8211; <strong>Projects\/Folders<\/strong>: Organize integration artifacts\n&#8211; <strong>Data Assets<\/strong>: Definitions of external systems (e.g., Object Storage, Autonomous Database)\n&#8211; <strong>Connections<\/strong>: Credentials and connectivity configuration for a data asset\n&#8211; <strong>Data Flows \/ Pipelines<\/strong>: The actual ingestion and transformation logic\n&#8211; <strong>Tasks \/ Schedules<\/strong>: Operationalization\u2014run now, run on a schedule, manage dependencies\n&#8211; <strong>Application\/Runtime<\/strong>: The managed compute\/runtimes that execute jobs (capacity\/scaling and billing are part of pricing model)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Service type<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Type<\/strong>: Managed cloud service (PaaS-style), focused on data integration workloads<\/li>\n<li><strong>Operational model<\/strong>: You design in the Oracle Cloud Console (or APIs where available), then the service runs jobs on managed infrastructure.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scope (regional\/global, tenancy\/compartment)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Tenancy<\/strong>: Resources exist within an OCI <strong>tenancy<\/strong><\/li>\n<li><strong>Region<\/strong>: Workspaces are typically <strong>regional<\/strong> (you create a workspace in a chosen OCI region)<\/li>\n<li><strong>Compartment<\/strong>: Resources are usually created in an OCI <strong>compartment<\/strong> for governance and access control<\/li>\n<li><strong>Project-scoped artifacts<\/strong>: Projects and integration artifacts live inside the workspace<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">(Confirm exact resource scoping and supported regions in official docs for your tenancy.)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How it fits into the Oracle Cloud ecosystem<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Data Integrator is commonly used alongside:\n&#8211; <strong>Oracle Cloud Infrastructure (OCI) Object Storage<\/strong> for landing files\/data\n&#8211; <strong>Autonomous Database (ATP\/ADW)<\/strong> for analytics and warehousing\n&#8211; <strong>Oracle Cloud Networking (VCN, private endpoints, service gateways)<\/strong> for secure connectivity\n&#8211; <strong>OCI IAM<\/strong> for access control\n&#8211; <strong>OCI Logging\/Monitoring\/Audit<\/strong> for operational governance\n&#8211; Optional ecosystem services such as <strong>Data Catalog<\/strong>, <strong>GoldenGate<\/strong>, <strong>Oracle Analytics Cloud<\/strong>, and <strong>Oracle Integration<\/strong> depending on your architecture<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use Data Integrator?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Faster time-to-value<\/strong>: Teams can build ingestion pipelines quickly using a managed service.<\/li>\n<li><strong>Lower operational burden<\/strong>: No ETL servers to patch\/scale manually.<\/li>\n<li><strong>Consistency and governance<\/strong>: Standardized patterns for ingestion, transformations, and scheduling.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed runtime<\/strong>: Execution is handled by Oracle Cloud; you focus on logic.<\/li>\n<li><strong>Native alignment with Oracle data platforms<\/strong>: Particularly strong fit when your targets are Autonomous Database or other Oracle-managed data services.<\/li>\n<li><strong>Repeatable workflows<\/strong>: Versioned artifacts, reusable connections, and orchestrated pipelines.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Scheduling<\/strong>: Built-in scheduling and dependency handling (verify exact scheduling options and granularity).<\/li>\n<li><strong>Observability<\/strong>: Job run history and diagnostics are available in the service; integration with OCI observability features may apply (verify).<\/li>\n<li><strong>Separation of concerns<\/strong>: Workspace\/project organization supports multi-team environments.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI IAM control plane<\/strong>: Fine-grained policies at compartment level.<\/li>\n<li><strong>Network controls<\/strong>: Can be designed for private connectivity patterns within OCI (where supported).<\/li>\n<li><strong>Auditability<\/strong>: OCI Audit can capture API actions for governance.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Elastic managed execution<\/strong>: Suitable for variable workloads and bursty ingestion patterns (exact scaling model depends on service; verify in docs).<\/li>\n<li><strong>Parallelization features<\/strong>: May exist for file loads or data movement depending on connector and task configuration.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose it<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Choose Data Integrator when:\n&#8211; You\u2019re on <strong>Oracle Cloud<\/strong> and need a managed service for data ingestion\/orchestration.\n&#8211; Your targets include <strong>Autonomous Database<\/strong> or you frequently use <strong>Object Storage<\/strong> as a landing zone.\n&#8211; You need <strong>repeatable scheduled pipelines<\/strong> with centralized monitoring and access control.\n&#8211; You want to avoid operating an ETL cluster (Airflow\/Spark) for moderate complexity pipelines.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When they should not choose it<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Consider alternatives when:\n&#8211; You require <strong>complex distributed processing<\/strong> (multi-terabyte transformations requiring Spark clusters) and Data Integrator\u2019s runtime model doesn\u2019t match your needs.\n&#8211; You need <strong>real-time CDC replication<\/strong> at high volume\u2014often better served by <strong>OCI GoldenGate<\/strong>.\n&#8211; Your organization already standardized on another integration platform (e.g., Azure Data Factory, AWS Glue) and multi-cloud friction outweighs benefits.\n&#8211; You need <strong>full code-first<\/strong> workflows with deep CI\/CD integration and you cannot meet that with Data Integrator\u2019s current APIs (verify API coverage).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is Data Integrator used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Commonly used in:\n&#8211; Finance and insurance (risk reporting, regulatory extracts)\n&#8211; Retail and e-commerce (sales, inventory, customer analytics)\n&#8211; Healthcare (operational analytics, claims, patient systems\u2014subject to compliance)\n&#8211; Telecom (billing analytics, customer churn pipelines)\n&#8211; Manufacturing (IoT data landing to analytics stores)\n&#8211; Public sector (data consolidation, dashboards, reporting)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data engineering teams<\/li>\n<li>Analytics engineering teams<\/li>\n<li>Cloud platform teams supporting data platforms<\/li>\n<li>Integration teams consolidating enterprise data<\/li>\n<li>App teams that need lightweight ingestion into a warehouse<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Batch ingestion from files (CSV\/JSON\/Parquet depending on support)<\/li>\n<li>Batch ELT\/ETL into Oracle analytics targets<\/li>\n<li>Scheduled refresh pipelines for BI tools<\/li>\n<li>Landing-zone to curated-zone transformations<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Object Storage \u201cdata lake landing\u201d \u2192 Autonomous Data Warehouse<\/li>\n<li>Multi-source ingestion \u2192 standardized warehouse model (star\/snowflake)<\/li>\n<li>Staging schema \u2192 curated schema<\/li>\n<li>\u201cExtract from operational DB nightly\u201d \u2192 reporting DB<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Real-world deployment contexts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Production<\/strong>: Managed schedules, least-privilege IAM, private networking, tagging, runbooks, alerting<\/li>\n<li><strong>Dev\/Test<\/strong>: Separate workspaces or separate compartments; smaller schedules; sample datasets<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Below are <strong>10 realistic use cases<\/strong> for Data Integrator in Oracle Cloud.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Object Storage CSV to Autonomous Data Warehouse (daily load)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Finance receives daily CSV extracts and needs them loaded into ADW.<\/li>\n<li><strong>Why Data Integrator fits<\/strong>: Managed file ingestion, mapping, scheduling, and monitoring.<\/li>\n<li><strong>Example<\/strong>: A daily <code>transactions_YYYYMMDD.csv<\/code> lands in an OCI bucket; Data Integrator loads it to <code>DW.TRANSACTIONS_STAGE<\/code> then merges into <code>DW.TRANSACTIONS<\/code>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Multi-file ingestion with schema drift handling (lightweight)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Vendors add columns occasionally; ingestion breaks frequently.<\/li>\n<li><strong>Why it fits<\/strong>: Data flow mappings can be updated centrally; some connectors support flexible mappings (verify schema drift capabilities).<\/li>\n<li><strong>Example<\/strong>: Vendor adds <code>region_code<\/code>; you update mapping once and redeploy.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Operational DB to reporting DB refresh (nightly batch)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Operational Oracle DB is too busy for BI queries.<\/li>\n<li><strong>Why it fits<\/strong>: Scheduled extraction and load into reporting schema.<\/li>\n<li><strong>Example<\/strong>: Nightly job extracts orders\/customers and loads them into ADW reporting tables.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Standardized ingestion framework for multiple departments<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Each team writes scripts; no standard monitoring\/governance.<\/li>\n<li><strong>Why it fits<\/strong>: Central workspace patterns, shared connections, consistent scheduling.<\/li>\n<li><strong>Example<\/strong>: Shared \u201clanding-to-staging\u201d templates; each department onboards new datasets quickly.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Data quality checkpoints during load (basic validations)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Bad rows cause downstream reporting issues.<\/li>\n<li><strong>Why it fits<\/strong>: Transform steps can filter\/reject invalid records (capability depends on transformations available; verify).<\/li>\n<li><strong>Example<\/strong>: Filter rows where <code>amount &lt; 0<\/code>, output rejects to a quarantine table.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Orchestrated pipeline: ingest \u2192 transform \u2192 publish<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: You need multi-step jobs with dependencies.<\/li>\n<li><strong>Why it fits<\/strong>: Pipelines\/tasks can enforce ordering and handle failures.<\/li>\n<li><strong>Example<\/strong>: Step 1 load staging; step 2 run transform; step 3 refresh aggregate table.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Cross-compartment shared data platform (governed)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Platform team owns data services; app teams need controlled access.<\/li>\n<li><strong>Why it fits<\/strong>: Compartment-based IAM and policies.<\/li>\n<li><strong>Example<\/strong>: Platform compartment hosts Data Integrator; app compartments grant least-privilege access to run specific tasks.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Migration from self-managed ETL to managed OCI<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Legacy ETL servers are costly and hard to patch.<\/li>\n<li><strong>Why it fits<\/strong>: Replace routine batch ETL jobs with managed service.<\/li>\n<li><strong>Example<\/strong>: Replace cron + scripts that pull files from SFTP (after landing to OCI) with Data Integrator schedules.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Pre-load transformations to standardize reference data<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Multiple systems use different code sets.<\/li>\n<li><strong>Why it fits<\/strong>: Transform stage can map codes to standardized dimension tables.<\/li>\n<li><strong>Example<\/strong>: Map <code>status<\/code> values (<code>A\/ACTIVE\/1<\/code>) into canonical <code>DIM_STATUS<\/code>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Controlled reprocessing\/backfills<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Need to re-run loads for a historical date range.<\/li>\n<li><strong>Why it fits<\/strong>: Parameterized runs (if supported) and repeatable pipelines.<\/li>\n<li><strong>Example<\/strong>: Backfill last 30 days of files after a bug fix, without manual SQL scripting.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<blockquote>\n<p>Feature availability can vary by region and by connector type. Always confirm with the official Data Integration documentation for your tenancy.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">1) Workspaces (environment boundary)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Provides an isolated environment to manage projects, connections, jobs, and run history.<\/li>\n<li><strong>Why it matters<\/strong>: Supports dev\/test\/prod separation and team organization.<\/li>\n<li><strong>Practical benefit<\/strong>: Clear ownership and governance at the workspace level.<\/li>\n<li><strong>Caveats<\/strong>: Workspaces are typically regional; cross-region designs require explicit planning.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Projects and artifact organization<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Organizes data flows\/pipelines, connections, and tasks into logical groups.<\/li>\n<li><strong>Why it matters<\/strong>: Maintainability for larger estates.<\/li>\n<li><strong>Practical benefit<\/strong>: Reusable patterns and consistent naming\/tagging.<\/li>\n<li><strong>Caveats<\/strong>: Establish conventions early; refactoring later is painful.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Data assets (source\/target definitions)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Represents a system like Object Storage or a database service.<\/li>\n<li><strong>Why it matters<\/strong>: Centralizes system configuration and governance.<\/li>\n<li><strong>Practical benefit<\/strong>: Multiple pipelines can reuse the same data asset.<\/li>\n<li><strong>Caveats<\/strong>: Connectivity requirements (network, credentials) must be correct for reliable runs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Connections (credentials and connectivity)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Stores connection details used by jobs (endpoints, usernames, passwords\/keys).<\/li>\n<li><strong>Why it matters<\/strong>: Security and operational consistency.<\/li>\n<li><strong>Practical benefit<\/strong>: Update credentials once without rewriting pipelines.<\/li>\n<li><strong>Caveats<\/strong>: Secret handling options vary\u2014prefer OCI Vault integration if supported; otherwise tightly control who can view\/edit connections.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Data flows (mapping and transformations)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Defines how data is read, transformed, and written.<\/li>\n<li><strong>Why it matters<\/strong>: This is where the \u201cETL\/ELT logic\u201d lives.<\/li>\n<li><strong>Practical benefit<\/strong>: Visual mapping reduces custom code for common transformations.<\/li>\n<li><strong>Caveats<\/strong>: Very complex transformations might be better in SQL on the target (ELT) or in a dedicated compute engine; decide based on performance and governance.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Pipelines (orchestration)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Chains steps together (ingest, transform, publish), handling dependencies and flow control.<\/li>\n<li><strong>Why it matters<\/strong>: Production pipelines usually require multiple steps.<\/li>\n<li><strong>Practical benefit<\/strong>: Fewer external schedulers; clearer run lineage.<\/li>\n<li><strong>Caveats<\/strong>: Understand failure behavior and retry semantics; verify how retries and partial failures are handled.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Tasks and scheduling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Runs a data flow\/pipeline on demand or on a schedule.<\/li>\n<li><strong>Why it matters<\/strong>: Operationalization is what turns a design into a service.<\/li>\n<li><strong>Practical benefit<\/strong>: Predictable refresh cadence for analytics.<\/li>\n<li><strong>Caveats<\/strong>: Scheduling granularity, time zone handling, and concurrency limits should be validated in docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Monitoring and run history<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Shows status, run duration, and error details for tasks.<\/li>\n<li><strong>Why it matters<\/strong>: Troubleshooting and SLA management.<\/li>\n<li><strong>Practical benefit<\/strong>: Faster incident response with centralized run diagnostics.<\/li>\n<li><strong>Caveats<\/strong>: For enterprise observability, confirm integration with OCI Logging\/Monitoring and export patterns (if required).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) IAM integration (control plane authorization)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Uses OCI IAM groups\/policies to authorize workspace and artifact management.<\/li>\n<li><strong>Why it matters<\/strong>: Least privilege and auditability.<\/li>\n<li><strong>Practical benefit<\/strong>: Platform teams can delegate safely.<\/li>\n<li><strong>Caveats<\/strong>: The exact policy verbs\/resource-types must match Data Integrator\u2019s IAM model\u2014use official policy examples.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) APIs\/Automation (where available)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Enables automation via OCI APIs\/SDK\/CLI (coverage varies).<\/li>\n<li><strong>Why it matters<\/strong>: CI\/CD and platform operations.<\/li>\n<li><strong>Practical benefit<\/strong>: Repeatable provisioning, promotion between environments.<\/li>\n<li><strong>Caveats<\/strong>: Verify current API support for the artifacts you need (workspace, tasks, runs, etc.).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level architecture<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">At a high level, Data Integrator has:\n1. A <strong>control plane<\/strong>: where you define artifacts (workspaces, connections, flows, tasks).\n2. A <strong>runtime plane<\/strong>: managed execution environment that reads from sources and writes to targets.\n3. <strong>Integration points<\/strong>: IAM, networking, Object Storage, databases, logging\/monitoring.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Request\/data\/control flow<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>User (or automation) creates\/updates artifacts in the Data Integrator workspace.<\/li>\n<li>A task is started (manual trigger or schedule).<\/li>\n<li>Runtime retrieves connection details and accesses sources\/targets.<\/li>\n<li>Data is extracted, transformed, and loaded.<\/li>\n<li>Runtime emits status and logs; the job is visible in run history.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Integrations with related Oracle Cloud services<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Common integrations include:\n&#8211; <strong>OCI Object Storage<\/strong>: landing zone for files and staging data\n&#8211; <strong>Autonomous Database<\/strong>: common analytics target\n&#8211; <strong>OCI IAM<\/strong>: access control to manage and run integration assets\n&#8211; <strong>OCI Vault<\/strong> (optional): secrets storage (verify connector support)\n&#8211; <strong>OCI Logging\/Monitoring<\/strong> (optional): operational visibility (verify exact integration points)\n&#8211; <strong>VCN \/ private networking<\/strong> (optional): private endpoints for databases and private access patterns<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Your pipeline usually depends on:\n&#8211; Object Storage buckets, objects, and policies\n&#8211; Target databases (Autonomous Database or DB systems)\n&#8211; Network path between Data Integrator runtime and the endpoints (public or private)\n&#8211; IAM policies for all involved services<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model (practical view)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Control plane<\/strong>: IAM policies decide who can create\/manage\/run workspaces and artifacts.<\/li>\n<li><strong>Data plane access to sources\/targets<\/strong>:<\/li>\n<li>Object Storage access can be via <strong>OCI IAM + resource principals<\/strong> (service-to-service) in some patterns, or via credentials\/config depending on how the connector works (verify).<\/li>\n<li>Database access is typically via <strong>database credentials<\/strong> and secure connectivity options (TLS; wallet for Autonomous Database patterns).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model (practical view)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Typical patterns:\n&#8211; <strong>Public endpoints<\/strong>: simplest for labs; ensure you restrict access.\n&#8211; <strong>Private endpoints<\/strong>: preferred for production; requires VCN planning, DNS, and routing.\n&#8211; <strong>Service gateway<\/strong>: can keep Object Storage access private within OCI.\n&#8211; <strong>NAT gateway<\/strong>: for outbound access if needed (avoid if you can).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring\/logging\/governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use OCI <strong>Audit<\/strong> to track who created\/modified artifacts.<\/li>\n<li>Use task run history for operational checks.<\/li>\n<li>Consider exporting logs\/metrics into centralized tooling if your org requires it (verify native integration points).<\/li>\n<li>Use <strong>tagging<\/strong> to separate cost centers, environments, owners.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  U[Engineer \/ Data Analyst] --&gt;|Design &amp; Run| DI[Data Integrator Workspace]\n  OS[(OCI Object Storage Bucket)] --&gt;|Read CSV\/Files| DI\n  DI --&gt;|Load Tables| ADB[(Autonomous Database)]\n  DI --&gt; RH[Run History \/ Logs]\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph Tenancy[OCI Tenancy]\n    subgraph Net[VCN (Production)]\n      PE1[Private Endpoint \/ Private Access\\n(to Autonomous Database)]\n      SG[Service Gateway\\n(private access to Object Storage)]\n    end\n\n    subgraph DIW[Data Integrator Workspace (Region)]\n      CP[Control Plane:\\nProjects, Connections, Tasks]\n      RT[Managed Runtime:\\nJob Execution]\n    end\n\n    subgraph Data[Data Platform]\n      OS[(Object Storage:\\nLanding + Archive)]\n      ADB[(Autonomous Database:\\nStaging + Curated)]\n    end\n\n    IAM[OCI IAM Policies &amp; Groups]\n    AUD[OCI Audit]\n    MON[OCI Monitoring\/Logging\\n(verify integration details)]\n  end\n\n  IAM --&gt; CP\n  CP --&gt; RT\n  OS --&gt; RT\n  RT --&gt; ADB\n  AUD --&gt; CP\n  RT --&gt; MON\n  SG --- OS\n  PE1 --- ADB\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Tenancy and compartment requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An active <strong>Oracle Cloud (OCI) tenancy<\/strong><\/li>\n<li>A <strong>compartment<\/strong> where you can create:<\/li>\n<li>Data Integrator workspace<\/li>\n<li>Object Storage bucket<\/li>\n<li>Autonomous Database (for this lab)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions \/ IAM roles<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You need permissions to:\n&#8211; Create\/manage Data Integrator workspaces and artifacts\n&#8211; Read\/write Object Storage objects in a bucket\n&#8211; Create\/manage Autonomous Database (or at least connect and create tables)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">OCI IAM policies for Data Integrator use service-specific resource types and verbs. Because policy syntax can change and differs by feature, use the official IAM policy examples from Oracle docs for <strong>Data Integration<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Start here (official docs entry point; navigate to \u201cPolicies\u201d \/ \u201cIAM\u201d sections):\n&#8211; https:\/\/docs.oracle.com\/en-us\/iaas\/data-integration\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Billing requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A paid OCI account or sufficient free-tier capacity.<\/li>\n<li>This lab can be designed to be low-cost if you use:<\/li>\n<li><strong>Autonomous Database Always Free<\/strong> (if available in your region\/tenancy)<\/li>\n<li>Small test files (KB\/MB scale)<\/li>\n<li>You may still incur charges for storage, data egress, or non-free resources.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">CLI\/SDK\/tools needed (optional but useful)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OCI Console access (required)<\/li>\n<li>Optional:<\/li>\n<li><strong>OCI CLI<\/strong> for uploading files and basic checks: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/API\/SDKDocs\/cliinstall.htm<\/li>\n<li>A SQL client (SQL Developer, SQLcl, or Autonomous Database SQL Worksheet in Console)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data Integrator (OCI Data Integration) is not necessarily available in every region.<\/li>\n<li>Verify region availability in Oracle Cloud documentation or in the Console region selector.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas\/limits<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Service limits exist for:<\/li>\n<li>Number of workspaces<\/li>\n<li>Concurrent runs<\/li>\n<li>Artifact counts<\/li>\n<li>Runtime capacity\/billing dimensions<\/li>\n<li>Check OCI service limits for Data Integration in your region\/tenancy. (Limits change; do not rely on blog posts.)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For this tutorial:\n&#8211; OCI Object Storage\n&#8211; Autonomous Database (ATP\/ADW)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<blockquote>\n<p>Do not rely on static numbers in articles. Oracle Cloud pricing varies by region, currency, and sometimes by contract\/commitment. Always confirm via official pricing pages.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Current pricing model (how to think about it)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Data Integrator pricing is typically <strong>usage-based<\/strong>, where you pay for the <strong>data integration runtime<\/strong> consumed to execute data flows\/pipelines and operationalize workloads. The exact billing metric may be expressed in:\n&#8211; <strong>OCPU-hours<\/strong> or similar compute-time units for the integration runtime\n&#8211; Additional charges for related resources you use (Object Storage, Autonomous Database, networking)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Because Oracle may adjust SKUs and units, <strong>verify the exact meter names and units<\/strong> in the official pricing page for \u201cData Integration\u201d.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing dimensions to check<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">When estimating cost, confirm these dimensions in official pricing:\n&#8211; Runtime compute consumption per hour (or per run)\n&#8211; Any per-connector, per-feature, or per-capacity pricing (if applicable)\n&#8211; Additional charges for:\n  &#8211; Object Storage (GB-month, requests)\n  &#8211; Autonomous Database (ECPU\/OCPU, storage) unless Always Free\n  &#8211; Data transfer (especially <strong>internet egress<\/strong>)\n  &#8211; Logging retention\/export (if applicable)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Free tier (if applicable)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Autonomous Database Always Free may cover a small target DB for labs.<\/li>\n<li>OCI Object Storage has low cost and sometimes free allocations.<\/li>\n<li>Whether Data Integrator itself has a free tier depends on current Oracle offerings\u2014<strong>verify in official pricing<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost drivers (direct)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Total runtime hours of Data Integrator jobs (more frequent schedules, longer runs)<\/li>\n<li>Larger data volumes (longer run times, more resource usage)<\/li>\n<li>Concurrency (multiple pipelines at once)<\/li>\n<li>Complex transformations (increases runtime)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Hidden or indirect costs<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Autonomous Database<\/strong> compute and storage (if not Always Free)<\/li>\n<li><strong>Object Storage<\/strong> storage growth + request costs<\/li>\n<li><strong>Data egress<\/strong> if moving data out of OCI<\/li>\n<li>Operational tooling costs if you export logs to third-party systems<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network\/data transfer implications<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data transfer <strong>within OCI<\/strong> is usually cheaper than internet egress, but pricing depends on path and services.<\/li>\n<li>If your sources\/targets are outside OCI (on-prem or other clouds), plan for:<\/li>\n<li>VPN\/FastConnect costs<\/li>\n<li>Egress\/ingress charges<\/li>\n<li>Latency and throughput constraints<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Start with <strong>daily<\/strong> or <strong>hourly<\/strong> schedules only where required.<\/li>\n<li>Minimize unnecessary reprocessing:<\/li>\n<li>Load only new partitions\/files<\/li>\n<li>Use watermarking (if supported) or file naming conventions<\/li>\n<li>Prefer <strong>ELT<\/strong> (push transformations into the database) for large transforms if it reduces integration runtime (validate performance).<\/li>\n<li>Use <strong>small dev\/test workspaces<\/strong> and smaller sample datasets.<\/li>\n<li>Tag resources for chargeback.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (no fabricated numbers)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">A realistic low-cost starter footprint is:\n&#8211; Object Storage bucket with a few MB of CSV files\n&#8211; Autonomous Database Always Free (if available)\n&#8211; Data Integrator running a small daily load that completes in minutes<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To estimate cost precisely:\n1. Check the official Data Integration pricing line items.\n2. Estimate runs\/day \u00d7 average runtime minutes\/run.\n3. Add storage and DB cost (if not free).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In production, your main cost drivers typically become:\n&#8211; Multiple pipelines running frequently (hourly or near-real-time batches)\n&#8211; Larger data volumes (GB\u2013TB per day)\n&#8211; Higher concurrency and longer run durations\n&#8211; Non-free Autonomous Database compute for larger warehouses<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Official pricing references<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Oracle Cloud price list (official): https:\/\/www.oracle.com\/cloud\/price-list\/<\/li>\n<li>Oracle Cloud cost estimator (official): https:\/\/www.oracle.com\/cloud\/costestimator.html<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">For Data Integrator-specific pricing lines, navigate the price list to the relevant service section (often listed as <strong>Data Integration<\/strong>).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This lab builds a real (small) pipeline:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Load a CSV file from OCI Object Storage into an Autonomous Database table using Data Integrator<\/strong>, then validate the rows in the database.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Create a minimal data ingestion workflow in <strong>Oracle Cloud Data Integrator<\/strong><\/li>\n<li>Source: <strong>OCI Object Storage<\/strong> (<code>customers.csv<\/code>)<\/li>\n<li>Target: <strong>Autonomous Database<\/strong> (<code>CUSTOMERS<\/code> table)<\/li>\n<li>Run once manually, validate results, then clean up<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You will:\n1. Create an Autonomous Database (Always Free if available) and a target table.\n2. Create an Object Storage bucket and upload a sample CSV.\n3. Create a Data Integrator workspace.\n4. Define data assets and connections (Object Storage + Autonomous Database).\n5. Build a data flow to map CSV columns to table columns.\n6. Create a task and run it.\n7. Validate row counts in Autonomous Database.\n8. Clean up resources to avoid ongoing cost.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Prepare the target Autonomous Database and table<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">1.1 Create an Autonomous Database (Console)<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">In OCI Console:\n1. Navigate to <strong>Oracle Database<\/strong> \u2192 <strong>Autonomous Database<\/strong>.\n2. Click <strong>Create Autonomous Database<\/strong>.\n3. Choose:\n   &#8211; <strong>Compartment<\/strong>: your lab compartment\n   &#8211; <strong>Workload type<\/strong>: Autonomous Data Warehouse or Autonomous Transaction Processing (either works for this lab)\n   &#8211; <strong>Always Free<\/strong>: enable if available\n4. Set admin password and create the database.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>: You have a running Autonomous Database instance.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">1.2 Create a database user and table<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Use <strong>Database Actions \/ SQL Worksheet<\/strong> (available from the Autonomous Database details page), or connect via a SQL client.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Run SQL (adjust username\/password as needed):<\/p>\n\n\n\n<pre><code class=\"language-sql\">-- Create a least-privileged schema for the lab\nCREATE USER di_lab IDENTIFIED BY \"Use-A-Strong-Password-Here\";\n\nGRANT CREATE SESSION TO di_lab;\nGRANT CREATE TABLE TO di_lab;\nGRANT CREATE SEQUENCE TO di_lab;\nGRANT CREATE PROCEDURE TO di_lab;\n\n-- Optional for easier lab work (consider restricting in real environments)\n-- GRANT UNLIMITED TABLESPACE TO di_lab;\n\nALTER SESSION SET CURRENT_SCHEMA = di_lab;\n\nCREATE TABLE customers (\n  customer_id NUMBER PRIMARY KEY,\n  full_name   VARCHAR2(200),\n  email       VARCHAR2(320),\n  country     VARCHAR2(100),\n  created_at  DATE\n);\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>: Table <code>DI_LAB.CUSTOMERS<\/code> exists.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Verification<\/h4>\n\n\n\n<pre><code class=\"language-sql\">SELECT table_name FROM user_tables WHERE table_name = 'CUSTOMERS';\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create an Object Storage bucket and upload a CSV file<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">2.1 Create a bucket (Console)<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to <strong>Storage<\/strong> \u2192 <strong>Buckets<\/strong> \u2192 <strong>Create Bucket<\/strong><\/li>\n<li>Choose a name, for example: <code>di-lab-bucket-&lt;unique&gt;<\/code><\/li>\n<li>Keep defaults (Standard storage) for the lab.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>: Bucket is created.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">2.2 Create a sample CSV file<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Create a local file named <code>customers.csv<\/code> with this content:<\/p>\n\n\n\n<pre><code class=\"language-csv\">customer_id,full_name,email,country,created_at\n1,Ada Lovelace,ada@example.com,UK,2024-01-15\n2,Grace Hopper,grace@example.com,US,2024-02-20\n3,Alan Turing,alan@example.com,UK,2024-03-05\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">2.3 Upload the CSV<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">In bucket details:\n&#8211; <strong>Objects<\/strong> \u2192 <strong>Upload<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Upload <code>customers.csv<\/code> at the bucket root (or in a folder like <code>landing\/<\/code>\u2014just remember the path).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>: <code>customers.csv<\/code> is visible in bucket objects list.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Verification<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Click the object and confirm:\n&#8211; Name and size look correct\n&#8211; Storage tier is Standard<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Create a Data Integrator workspace<\/h3>\n\n\n\n<blockquote>\n<p>Console navigation may appear as <strong>Data Integration<\/strong> in the OCI Console (service naming varies). The underlying managed service is what this tutorial calls <strong>Data Integrator<\/strong>.<\/p>\n<\/blockquote>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Navigate to the Data Integration service:\n   &#8211; Search for <strong>Data Integration<\/strong> in the OCI Console search bar<\/li>\n<li>Click <strong>Create workspace<\/strong><\/li>\n<li>Provide:\n   &#8211; Name: <code>di-lab-workspace<\/code>\n   &#8211; Compartment: your lab compartment<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>: Workspace status becomes <strong>Active<\/strong>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Verification<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Open the workspace and confirm you can access its design environment.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Create data assets and connections (Object Storage + Autonomous Database)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You need two endpoints:\n&#8211; Source: Object Storage bucket\/object\n&#8211; Target: Autonomous Database schema<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">4.1 Create an Object Storage data asset + connection<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Inside the workspace (exact UI labels vary):\n1. Go to <strong>Data Assets<\/strong> \u2192 <strong>Create<\/strong>\n2. Choose <strong>Object Storage<\/strong> (or equivalent connector)\n3. Enter required fields (typically):\n   &#8211; Tenancy\/namespace (Object Storage namespace)\n   &#8211; Bucket name\n   &#8211; Region\n4. Create a <strong>Connection<\/strong> for it<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>: Data asset and connection show as \u201cAvailable\/Active\u201d.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Notes<\/strong>\n&#8211; Access method varies. Some OCI services support service-to-service authentication patterns; others require credentials or policies. Follow the connector instructions shown in your workspace UI.\n&#8211; If the connector requires IAM policies, use the official docs for Data Integration IAM and Object Storage policies.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">4.2 Create an Autonomous Database data asset + connection<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Inside the workspace:\n1. <strong>Data Assets<\/strong> \u2192 <strong>Create<\/strong>\n2. Choose <strong>Autonomous Database<\/strong> (or Oracle Database connector appropriate for ADB)\n3. Provide connection properties, typically:\n   &#8211; Database service details (OCID or connection string depending on UI)\n   &#8211; Username: <code>di_lab<\/code>\n   &#8211; Password: the password you set\n   &#8211; Wallet\/TLS settings if required by the connector<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>: Database connection tests successfully.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Important<\/strong>: Autonomous Database connectivity can require:\n&#8211; Wallet configuration (for some connection methods)\n&#8211; Network allowlist or \u201callow OCI services\u201d options (naming varies)\n&#8211; Public vs private endpoint choices<br\/>\nBecause these specifics vary by region and ADB settings, <strong>follow the connection wizard guidance and verify in official docs<\/strong>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Verification<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Use the connection \u201cTest\u201d feature (if available) to confirm both connections are valid.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Build a data flow to load <code>customers.csv<\/code> into the <code>CUSTOMERS<\/code> table<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Inside the workspace:\n1. Go to <strong>Projects<\/strong> \u2192 <strong>Create Project<\/strong>\n   &#8211; Name: <code>di_lab_project<\/code>\n2. Within the project, create a <strong>Data Flow<\/strong> (or mapping\/data flow artifact)\n3. Configure the <strong>Source<\/strong>:\n   &#8211; Choose Object Storage connection\n   &#8211; Select the file <code>customers.csv<\/code>\n   &#8211; Configure format as CSV\n   &#8211; Confirm the header row is enabled\n4. Configure schema\/columns:\n   &#8211; <code>customer_id<\/code> (number)\n   &#8211; <code>full_name<\/code> (string)\n   &#8211; <code>email<\/code> (string)\n   &#8211; <code>country<\/code> (string)\n   &#8211; <code>created_at<\/code> (date)<\/p>\n\n\n\n<ol class=\"wp-block-list\" start=\"5\">\n<li>Configure the <strong>Target<\/strong>:\n   &#8211; Choose Autonomous Database connection\n   &#8211; Schema: <code>DI_LAB<\/code>\n   &#8211; Table: <code>CUSTOMERS<\/code><\/li>\n<li>\n<p>Map fields source \u2192 target:\n   &#8211; <code>customer_id<\/code> \u2192 <code>customer_id<\/code>\n   &#8211; <code>full_name<\/code> \u2192 <code>full_name<\/code>\n   &#8211; <code>email<\/code> \u2192 <code>email<\/code>\n   &#8211; <code>country<\/code> \u2192 <code>country<\/code>\n   &#8211; <code>created_at<\/code> \u2192 <code>created_at<\/code><\/p>\n<\/li>\n<li>\n<p>Choose the write disposition:\n   &#8211; For a first run, select <strong>Insert<\/strong> (append) or <strong>Truncate + load<\/strong> depending on your goal.\n   &#8211; For repeatable labs, <strong>Truncate + load<\/strong> is simpler if supported.<\/p>\n<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>: Data flow is saved and valid (no validation errors).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Verification<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Use a \u201cValidate\u201d action (if available) on the data flow and confirm no missing mappings or type errors are reported.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Create and run a task<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>From the data flow, choose <strong>Create Task<\/strong> (or go to Tasks and create one referencing your flow).<\/li>\n<li>Name: <code>load_customers_once<\/code><\/li>\n<li>Run the task immediately.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>: Task run status becomes <strong>Succeeded<\/strong> after a short time. If it fails, use the run logs to troubleshoot.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Connect to Autonomous Database (SQL Worksheet) as <code>DI_LAB<\/code> and run:<\/p>\n\n\n\n<pre><code class=\"language-sql\">SELECT COUNT(*) AS row_count FROM customers;\n\nSELECT * FROM customers ORDER BY customer_id;\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>:\n&#8211; Row count is <code>3<\/code>\n&#8211; The rows match the CSV content\n&#8211; <code>created_at<\/code> values are parsed as dates (format handling may require adjustment depending on connector settings)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If <code>created_at<\/code> is null or errors occurred, adjust the CSV date format settings in your source configuration or add a transformation step to parse dates.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Common issues and fixes:<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">1) Object Storage access denied (403 \/ permission errors)<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause<\/strong>: Missing Object Storage policies, wrong bucket\/namespace, or connection auth misconfigured.<\/li>\n<li><strong>Fix<\/strong>:<\/li>\n<li>Re-check bucket name and namespace<\/li>\n<li>Confirm the Data Integrator connector\u2019s required IAM policies (official docs)<\/li>\n<li>Confirm the bucket is in the same region (or that cross-region access is supported)<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">2) Autonomous Database connection fails<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause<\/strong>: Incorrect username\/password, wallet\/TLS requirement, network access restrictions.<\/li>\n<li><strong>Fix<\/strong>:<\/li>\n<li>Test DB login directly via SQL Worksheet using the same credentials<\/li>\n<li>Confirm whether the connector requires a wallet<\/li>\n<li>Check ADB networking settings (public\/private endpoint)<\/li>\n<li>Verify whether ADB has an option to allow access from OCI services (wording varies)<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">3) Date parsing errors for <code>created_at<\/code><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause<\/strong>: CSV date format mismatch.<\/li>\n<li><strong>Fix<\/strong>:<\/li>\n<li>Configure the date format in the CSV source settings if available<\/li>\n<li>Or map <code>created_at<\/code> via a transform (e.g., parse <code>YYYY-MM-DD<\/code>) if supported<\/li>\n<li>As a fallback, load into a VARCHAR2 staging column then transform in SQL<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">4) Duplicate key error on <code>customer_id<\/code><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause<\/strong>: Re-running an \u201cInsert\u201d load without truncation.<\/li>\n<li><strong>Fix<\/strong>:<\/li>\n<li>Use \u201cTruncate + load\u201d or<\/li>\n<li>Delete existing rows before load or<\/li>\n<li>Implement upsert\/merge pattern (often done as a pipeline step using SQL on the target)<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">5) Column mapping\/type mismatch<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause<\/strong>: Connector inferred types incorrectly.<\/li>\n<li><strong>Fix<\/strong>:<\/li>\n<li>Explicitly define schema in source settings<\/li>\n<li>Cast\/convert in a transform step<\/li>\n<li>Ensure target columns have compatible types\/lengths<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To avoid ongoing cost and clutter, delete lab resources you don\u2019t need:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Data Integrator<\/strong>:\n   &#8211; Delete the task(s), data flow(s), project, and workspace (if not used elsewhere).<\/li>\n<li><strong>Object Storage<\/strong>:\n   &#8211; Delete the object <code>customers.csv<\/code>\n   &#8211; Delete the bucket (must be empty)<\/li>\n<li><strong>Autonomous Database<\/strong>:\n   &#8211; If it was created only for this lab, terminate it (Always Free resources can still be terminated safely).\n   &#8211; Or keep it if you plan more labs; remove the <code>DI_LAB<\/code> schema and objects:<\/li>\n<\/ol>\n\n\n\n<pre><code class=\"language-sql\">-- As ADMIN:\nDROP USER di_lab CASCADE;\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use a <strong>landing \u2192 staging \u2192 curated<\/strong> model:<\/li>\n<li>Landing: raw files in Object Storage (immutable)<\/li>\n<li>Staging: load raw tables in database<\/li>\n<li>Curated: transformed, business-ready tables<\/li>\n<li>Prefer <strong>idempotent pipelines<\/strong>:<\/li>\n<li>Re-running a job should not corrupt data<\/li>\n<li>Use partitioning, truncation, or merge patterns<\/li>\n<li>Keep transformations close to where they run best:<\/li>\n<li>Heavy relational transforms often run efficiently in the database (ELT)<\/li>\n<li>Simple standardization can be handled in data flows<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Follow <strong>least privilege<\/strong>:<\/li>\n<li>Separate \u201cbuilders\u201d (design) from \u201coperators\u201d (run\/monitor).<\/li>\n<li>Use <strong>separate compartments<\/strong> for dev\/test\/prod.<\/li>\n<li>Restrict who can <strong>view\/edit connections<\/strong> (credentials exposure risk).<\/li>\n<li>Use OCI <strong>Vault<\/strong> for secrets if supported by the connector; otherwise tightly control access and rotation processes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Schedule only as often as needed.<\/li>\n<li>Avoid full reloads when incremental loads are possible.<\/li>\n<li>Archive old landing files to cheaper storage tiers if appropriate.<\/li>\n<li>Monitor runtime duration\u2014optimize the slow steps first.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>For file ingestion:<\/li>\n<li>Use appropriately sized files (not too many tiny files; not a single huge file) based on connector guidance.<\/li>\n<li>For database loads:<\/li>\n<li>Load into staging tables then transform with set-based SQL<\/li>\n<li>Use indexing carefully; avoid heavy indexes on staging tables during load<\/li>\n<li>Test concurrency limits and tune scheduling windows.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Implement retry strategy:<\/li>\n<li>Retries for transient network errors<\/li>\n<li>No retries for deterministic schema errors (fix and redeploy)<\/li>\n<li>Build alerting around failures:<\/li>\n<li>Use OCI events\/notifications patterns if supported (verify) or external monitoring integration.<\/li>\n<li>Keep raw landing data immutable for replay.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Define runbooks:<\/li>\n<li>Where to check job failures<\/li>\n<li>How to re-run safely<\/li>\n<li>How to backfill data<\/li>\n<li>Tag everything: <code>env<\/code>, <code>owner<\/code>, <code>cost-center<\/code>, <code>data-domain<\/code>.<\/li>\n<li>Maintain version control for transformation logic:<\/li>\n<li>If Data Integrator supports export\/import of artifacts, incorporate it into CI\/CD (verify current capabilities).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Naming pattern example:<\/li>\n<li>Workspaces: <code>di-&lt;env&gt;-&lt;region&gt;-&lt;team&gt;<\/code><\/li>\n<li>Projects: <code>&lt;domain&gt;-pipelines<\/code><\/li>\n<li>Tasks: <code>&lt;source&gt;-to-&lt;target&gt;-&lt;frequency&gt;<\/code><\/li>\n<li>Tag with:<\/li>\n<li><code>Environment=Dev|Test|Prod<\/code><\/li>\n<li><code>DataDomain=Finance|Sales|Ops<\/code><\/li>\n<li><code>OwnerEmail=...<\/code><\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI IAM<\/strong> controls who can:<\/li>\n<li>Create\/manage workspaces<\/li>\n<li>Create\/edit connections and data flows<\/li>\n<li>Run tasks and view run history<\/li>\n<li>Separate permissions for:<\/li>\n<li>Platform admins<\/li>\n<li>Data engineers<\/li>\n<li>Operators\/analysts (read-only monitoring)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Because exact policy statements are service-specific, use the official Data Integration IAM policy documentation:\n&#8211; https:\/\/docs.oracle.com\/en-us\/iaas\/data-integration\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>At rest<\/strong>:<\/li>\n<li>Object Storage encrypts data at rest (Oracle-managed keys by default; customer-managed keys available with OCI Vault in many cases).<\/li>\n<li>Autonomous Database encrypts data at rest.<\/li>\n<li><strong>In transit<\/strong>:<\/li>\n<li>Use TLS connections to databases and HTTPS for Object Storage endpoints.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer <strong>private connectivity<\/strong> where possible:<\/li>\n<li>Private endpoints for Autonomous Database<\/li>\n<li>Service Gateway for Object Storage access (keeps traffic off the public internet)<\/li>\n<li>For labs, public endpoints are acceptable but restrict:<\/li>\n<li>DB network allowlists<\/li>\n<li>Bucket access policies<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid embedding passwords in scripts.<\/li>\n<li>Rotate DB credentials regularly.<\/li>\n<li>Use Vault-backed secrets if Data Integrator supports it; otherwise restrict connection edit permissions and audit changes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OCI <strong>Audit<\/strong> can capture control plane actions (who changed what).<\/li>\n<li>Use Data Integrator run logs\/history for operational traces.<\/li>\n<li>If your compliance program requires centralized logging, verify supported export\/integration methods.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data residency: choose region carefully; workspaces are regional.<\/li>\n<li>PII\/PHI handling:<\/li>\n<li>Mask or tokenize data where required<\/li>\n<li>Restrict access to landing and curated zones<\/li>\n<li>Maintain data retention and deletion policies<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Granting broad IAM permissions to too many users<\/li>\n<li>Allowing public DB access from anywhere<\/li>\n<li>Storing sensitive landing files without lifecycle\/retention controls<\/li>\n<li>Letting many users view\/edit connections containing passwords<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use separate prod workspace and compartment with tight IAM.<\/li>\n<li>Use private connectivity for production targets.<\/li>\n<li>Implement tagging and budget alerts for spend governance.<\/li>\n<li>Establish a credential rotation and incident response process.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Because service behavior and limits can change, treat this section as a checklist and confirm details in official docs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Known limitations (typical categories)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Connector limitations<\/strong>: Not all sources\/targets support the same transformations or pushdown optimizations.<\/li>\n<li><strong>File format nuances<\/strong>: CSV parsing rules (quotes, delimiters, date formats) often cause early failures.<\/li>\n<li><strong>Concurrency\/service limits<\/strong>: Maximum concurrent runs per workspace may apply.<\/li>\n<li><strong>Cross-region complexity<\/strong>: Workspaces are regional; cross-region access can add latency and egress costs.<\/li>\n<li><strong>Private networking setup<\/strong>: Private endpoints require careful VCN\/DNS\/routing planning.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Workspaces per region\/compartment<\/li>\n<li>Concurrent task runs<\/li>\n<li>Maximum artifact counts<br\/>\nCheck OCI service limits for Data Integration in your region.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Regional constraints<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data Integrator may not be available in all OCI regions.<\/li>\n<li>Some connectors\/features may be region-limited.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing surprises<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Frequent schedules that run longer than expected drive runtime cost.<\/li>\n<li>Reprocessing large datasets repeatedly increases runtime.<\/li>\n<li>Egress costs if data leaves OCI.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compatibility issues<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Autonomous Database connectivity requirements vary by configuration.<\/li>\n<li>Object Storage access policies must be correct for the connector\u2019s auth method.<\/li>\n<li>Date\/time parsing and character encoding can differ between source files and target DB.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational gotchas<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Re-running \u201cInsert\u201d tasks can cause duplicate keys.<\/li>\n<li>Schema changes in CSV headers can break mappings.<\/li>\n<li>\u201cSuccess\u201d status may still include rejected rows depending on load mode\u2014validate row counts and error tables if present.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Migration challenges<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If migrating from ODI or custom ETL:<\/li>\n<li>Some transformation logic may need redesign.<\/li>\n<li>Operational semantics (scheduling, retries, error handling) will differ.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Vendor-specific nuances<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Oracle Cloud\u2019s separation of compartments, regions, and policies is powerful but requires governance discipline.<\/li>\n<li>Always confirm how the runtime authenticates to Object Storage and databases for your chosen connector.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Data Integrator is one option among managed integration and ETL tools. Below is a practical comparison.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Oracle Cloud Data Integrator (Data Integration service)<\/strong><\/td>\n<td>OCI-native batch ingestion\/orchestration<\/td>\n<td>Managed service; strong fit with Object Storage + Autonomous Database; IAM\/compartment governance<\/td>\n<td>Connector\/feature coverage varies; may not suit heavy distributed compute; verify API\/CI-CD depth<\/td>\n<td>You run analytics on OCI and want managed pipelines<\/td>\n<\/tr>\n<tr>\n<td><strong>Oracle GoldenGate (OCI)<\/strong><\/td>\n<td>Real-time CDC replication<\/td>\n<td>Low-latency replication; operational DB change capture<\/td>\n<td>Not a general ETL tool; can be more complex\/costly<\/td>\n<td>You need near-real-time replication\/CDC<\/td>\n<\/tr>\n<tr>\n<td><strong>Oracle Integration (OIC)<\/strong><\/td>\n<td>Application integration, SaaS integration<\/td>\n<td>Strong SaaS adapters and app workflows<\/td>\n<td>Not primarily for large-scale data ingestion\/ETL<\/td>\n<td>You integrate business apps and events more than bulk data<\/td>\n<\/tr>\n<tr>\n<td><strong>Oracle Data Integrator (ODI) self-managed<\/strong><\/td>\n<td>Enterprises needing full ODI features\/control<\/td>\n<td>Mature ETL tooling; deep enterprise patterns<\/td>\n<td>You operate infrastructure; patching\/upgrades; licensing complexity<\/td>\n<td>You already standardized on ODI and need advanced features<\/td>\n<\/tr>\n<tr>\n<td><strong>AWS Glue<\/strong><\/td>\n<td>ETL in AWS<\/td>\n<td>Serverless Spark; strong AWS integrations<\/td>\n<td>Different cloud; migration overhead; cost model differs<\/td>\n<td>Your data platform is in AWS<\/td>\n<\/tr>\n<tr>\n<td><strong>Azure Data Factory<\/strong><\/td>\n<td>ETL\/orchestration in Azure<\/td>\n<td>Broad connectors; enterprise orchestration<\/td>\n<td>Different cloud; pricing\/ops differences<\/td>\n<td>Your data platform is in Azure<\/td>\n<\/tr>\n<tr>\n<td><strong>Google Cloud Data Fusion \/ Dataflow<\/strong><\/td>\n<td>ETL + pipelines in GCP<\/td>\n<td>Strong pipeline processing<\/td>\n<td>Different cloud; learning curve<\/td>\n<td>Your data platform is in GCP<\/td>\n<\/tr>\n<tr>\n<td><strong>Apache Airflow (self-managed\/managed)<\/strong><\/td>\n<td>Orchestration-first workflows<\/td>\n<td>Code-first; flexible; huge ecosystem<\/td>\n<td>Requires ops; ETL still needs tools (Spark\/dbt)<\/td>\n<td>You want orchestration framework and already run data tools<\/td>\n<\/tr>\n<tr>\n<td><strong>dbt (core\/cloud)<\/strong><\/td>\n<td>SQL-based transformations in warehouse<\/td>\n<td>Great for ELT; version control friendly<\/td>\n<td>Not an ingestion tool; needs upstream loader<\/td>\n<td>Your data is already in the warehouse and transforms are SQL-first<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example: governed ingestion into an OCI analytics platform<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A large enterprise has multiple upstream systems delivering daily extracts and needs a governed, repeatable ingestion mechanism into ADW for enterprise reporting.<\/li>\n<li><strong>Proposed architecture<\/strong>:<\/li>\n<li>Upstream exports land in <strong>OCI Object Storage<\/strong> (per domain buckets\/prefixes).<\/li>\n<li>Data Integrator runs <strong>domain pipelines<\/strong>:<ul>\n<li>Load landing files into staging schema<\/li>\n<li>Apply transformations and publish curated tables<\/li>\n<\/ul>\n<\/li>\n<li><strong>Autonomous Data Warehouse<\/strong> stores curated data marts.<\/li>\n<li>IAM policies restrict each domain team to their project artifacts.<\/li>\n<li>Tagging and budgets provide cost governance.<\/li>\n<li><strong>Why Data Integrator was chosen<\/strong>:<\/li>\n<li>OCI-native managed execution<\/li>\n<li>Strong fit with Object Storage + ADW<\/li>\n<li>Built-in scheduling and run history for operations<\/li>\n<li><strong>Expected outcomes<\/strong>:<\/li>\n<li>Faster onboarding of new datasets<\/li>\n<li>Reduced ETL server maintenance<\/li>\n<li>Improved auditability and consistent run operations<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example: simple analytics ingestion without managing ETL servers<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: A startup wants daily analytics from exported app data but doesn\u2019t want to run Airflow\/Spark.<\/li>\n<li><strong>Proposed architecture<\/strong>:<\/li>\n<li>App exports a CSV daily to <strong>Object Storage<\/strong><\/li>\n<li>Data Integrator loads it into <strong>Autonomous Database<\/strong> (Always Free for early stage where possible)<\/li>\n<li>BI connects to Autonomous Database for dashboards<\/li>\n<li><strong>Why Data Integrator was chosen<\/strong>:<\/li>\n<li>Quick to implement with minimal ops<\/li>\n<li>Low overhead for scheduling and monitoring<\/li>\n<li><strong>Expected outcomes<\/strong>:<\/li>\n<li>Reliable daily refresh<\/li>\n<li>Simple operational model<\/li>\n<li>Easy path to scale by upgrading DB and increasing pipeline complexity later<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">1) <strong>Is Data Integrator the same as Oracle Data Integrator (ODI)?<\/strong><br\/>\nNo. ODI is a separate product (often self-managed and historically on-prem). In Oracle Cloud, the managed service is commonly documented as <strong>OCI Data Integration<\/strong>. This tutorial uses \u201cData Integrator\u201d to refer to that managed OCI service.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">2) <strong>Is Data Integrator an ETL or ELT tool?<\/strong><br\/>\nIt can support ETL-style transformations in flows and also ELT-style patterns where transformations run in the target database. The best approach depends on workload and connector behavior.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">3) <strong>Where do Data Integrator jobs run?<\/strong><br\/>\nThey run on Oracle-managed runtime infrastructure associated with the service. You don\u2019t manage servers directly.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">4) <strong>Can I use Data Integrator with Autonomous Database?<\/strong><br\/>\nYes\u2014this is a common pattern. Connectivity details (wallet, public\/private endpoints) depend on configuration; follow the connector wizard and docs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">5) <strong>Can it load data from Object Storage?<\/strong><br\/>\nYes\u2014Object Storage is a common landing zone. Ensure IAM\/policies and bucket access are correctly configured.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">6) <strong>Does it support incremental loads?<\/strong><br\/>\nIncremental patterns are typically implemented using watermark columns, partitions, file naming conventions, or merge steps. Exact built-in support depends on connectors and features\u2014verify in docs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">7) <strong>How do I schedule pipelines?<\/strong><br\/>\nYou create tasks and attach schedules in the workspace. Scheduling frequency\/granularity depends on service capabilities.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">8) <strong>How do I monitor failures?<\/strong><br\/>\nUse task run history and logs in the Data Integrator workspace. For enterprise alerting, verify integrations with OCI Monitoring\/Notifications or Events.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">9) <strong>Can I keep traffic private (no public internet)?<\/strong><br\/>\nOften yes with private endpoints\/service gateways and proper VCN design, but exact support depends on connectors and your database configuration. Verify in OCI docs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">10) <strong>How is access controlled?<\/strong><br\/>\nThrough OCI IAM policies at tenancy\/compartment scope. You can separate design permissions from run\/monitor permissions.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">11) <strong>Does Data Integrator store my database passwords?<\/strong><br\/>\nConnections commonly store credentials. Prefer OCI Vault integration if supported; otherwise tightly control access to connections and rotate credentials.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">12) <strong>Can I promote artifacts from dev to prod?<\/strong><br\/>\nMany teams use export\/import or API-based automation where available. Confirm current supported promotion mechanisms in the docs for your region.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">13) <strong>What\u2019s the best practice for schema changes in incoming files?<\/strong><br\/>\nUse a staging layer and implement controlled schema evolution:\n&#8211; land raw files immutably\n&#8211; load to staging\n&#8211; update mappings deliberately and deploy<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">14) <strong>Is Data Integrator suitable for real-time streaming?<\/strong><br\/>\nTypically it\u2019s used for batch-oriented integration. For real-time CDC\/replication, OCI GoldenGate is often a better fit.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">15) <strong>How do I estimate cost accurately?<\/strong><br\/>\nMeasure average runtime per job, multiply by schedule frequency, then apply the official Data Integration pricing meter plus dependent services (Object Storage, DB, data transfer). Use Oracle\u2019s cost estimator.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn Data Integrator<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI Data Integration docs: https:\/\/docs.oracle.com\/en-us\/iaas\/data-integration\/<\/td>\n<td>Primary source for features, connectors, IAM policies, and how-to guides<\/td>\n<\/tr>\n<tr>\n<td>Official pricing<\/td>\n<td>Oracle Cloud Price List: https:\/\/www.oracle.com\/cloud\/price-list\/<\/td>\n<td>Official, up-to-date pricing SKUs and units (region\/contract dependent)<\/td>\n<\/tr>\n<tr>\n<td>Official calculator<\/td>\n<td>Oracle Cloud Cost Estimator: https:\/\/www.oracle.com\/cloud\/costestimator.html<\/td>\n<td>Helps estimate total cost across services (DB, storage, data integration runtime)<\/td>\n<\/tr>\n<tr>\n<td>Official OCI docs (IAM)<\/td>\n<td>OCI IAM overview: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Identity\/home.htm<\/td>\n<td>Required for secure policy design and least-privilege access<\/td>\n<\/tr>\n<tr>\n<td>Official Object Storage docs<\/td>\n<td>Object Storage overview: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Object\/Concepts\/objectstorageoverview.htm<\/td>\n<td>Bucket policies, namespaces, lifecycle rules, and access models<\/td>\n<\/tr>\n<tr>\n<td>Official Autonomous Database docs<\/td>\n<td>Autonomous Database docs: https:\/\/docs.oracle.com\/en-us\/iaas\/autonomous-database\/<\/td>\n<td>Connectivity, wallets, network access, users\/schemas for lab and production<\/td>\n<\/tr>\n<tr>\n<td>Architecture center<\/td>\n<td>Oracle Cloud Architecture Center: https:\/\/www.oracle.com\/cloud\/architecture\/<\/td>\n<td>Reference architectures and best practices for OCI data platforms<\/td>\n<\/tr>\n<tr>\n<td>Official tutorials<\/td>\n<td>Oracle Cloud Tutorials landing: https:\/\/docs.oracle.com\/en\/learn\/<\/td>\n<td>Hands-on labs across OCI; search within for data integration patterns<\/td>\n<\/tr>\n<tr>\n<td>Videos\/webinars<\/td>\n<td>Oracle Cloud Infrastructure YouTube: https:\/\/www.youtube.com\/@OracleCloudInfrastructure<\/td>\n<td>Product walkthroughs and architecture sessions (search for Data Integration)<\/td>\n<\/tr>\n<tr>\n<td>SDK\/CLI<\/td>\n<td>OCI CLI installation: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/API\/SDKDocs\/cliinstall.htm<\/td>\n<td>Useful for repeatable uploads, automation, and operational scripts<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>Cloud\/DevOps engineers, platform teams<\/td>\n<td>OCI fundamentals, DevOps practices, integration and automation foundations<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Beginners to intermediate engineers<\/td>\n<td>SCM\/DevOps toolchains that often support integration delivery<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud operations teams<\/td>\n<td>Cloud operations practices, monitoring, governance, runbooks<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs, operations engineers<\/td>\n<td>Reliability engineering, incident response, operational maturity<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops + data\/AI practitioners<\/td>\n<td>AIOps concepts, operational analytics, monitoring automation<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>DevOps\/cloud training and mentoring (verify offerings)<\/td>\n<td>Individuals and teams seeking structured guidance<\/td>\n<td>https:\/\/rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps training platform (verify course catalog)<\/td>\n<td>Engineers building practical DevOps\/cloud skills<\/td>\n<td>https:\/\/www.devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps services\/training (verify specifics)<\/td>\n<td>Teams needing short-term advisory or coaching<\/td>\n<td>https:\/\/www.devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support and learning resources (verify services)<\/td>\n<td>Ops\/DevOps teams needing troubleshooting help<\/td>\n<td>https:\/\/www.devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps consulting (verify portfolio)<\/td>\n<td>Cloud adoption, automation, operations<\/td>\n<td>Designing OCI landing zones; setting up CI\/CD; governance patterns<\/td>\n<td>https:\/\/cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps\/cloud consulting and training<\/td>\n<td>Platform engineering and enablement<\/td>\n<td>Data platform ops model; pipeline standards; IAM and policy design workshops<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting services (verify offerings)<\/td>\n<td>DevOps transformation and cloud operations<\/td>\n<td>Build runbooks\/monitoring; release automation; security reviews<\/td>\n<td>https:\/\/www.devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before Data Integrator<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI fundamentals<\/strong>:<\/li>\n<li>Compartments, IAM users\/groups\/policies<\/li>\n<li>Regions, availability domains<\/li>\n<li><strong>Networking basics<\/strong>:<\/li>\n<li>VCNs, subnets, routing, service gateways (for private Object Storage access)<\/li>\n<li><strong>Data basics<\/strong>:<\/li>\n<li>Relational modeling, SQL<\/li>\n<li>File formats (CSV conventions, encoding, delimiters)<\/li>\n<li><strong>Autonomous Database basics<\/strong>:<\/li>\n<li>Schemas, tables, constraints<\/li>\n<li>Loading patterns and data types<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after Data Integrator<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Advanced orchestration<\/strong>:<\/li>\n<li>Multi-step pipelines, backfills, dependency management<\/li>\n<li><strong>Data quality and governance<\/strong>:<\/li>\n<li>Data validation frameworks, data catalogs, lineage concepts<\/li>\n<li><strong>Security and compliance<\/strong>:<\/li>\n<li>Vault\/KMS, private endpoints, audit and logging pipelines<\/li>\n<li><strong>Real-time data movement<\/strong>:<\/li>\n<li>OCI GoldenGate for CDC use cases<\/li>\n<li><strong>Analytics<\/strong>:<\/li>\n<li>Oracle Analytics Cloud, semantic modeling, performance tuning<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use it<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data Engineer (OCI)<\/li>\n<li>Analytics Engineer<\/li>\n<li>Cloud Data Platform Engineer<\/li>\n<li>Integration Engineer (data-focused)<\/li>\n<li>Platform Engineer supporting data pipelines<\/li>\n<li>Data Operations \/ Data Reliability Engineer<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (if available)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Oracle certifications change frequently. If you want a certification path:\n&#8211; Start with OCI foundations certifications (if applicable)\n&#8211; Look for OCI data platform certifications covering Autonomous Database and analytics services<br\/>\nVerify current Oracle certification tracks here:\n&#8211; https:\/\/education.oracle.com\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Build a landing-to-curated pipeline with:<\/li>\n<li>raw landing files in Object Storage<\/li>\n<li>staging and curated schemas in Autonomous Database<\/li>\n<li>Implement backfill logic:<\/li>\n<li>load all files for a date range<\/li>\n<li>validate counts and enforce idempotency<\/li>\n<li>Add data quality checks:<\/li>\n<li>reject invalid emails to a quarantine table<\/li>\n<li>generate a load summary table per run<\/li>\n<li>Build a cost dashboard:<\/li>\n<li>tag resources<\/li>\n<li>track job runtimes and estimate monthly consumption<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI (Oracle Cloud Infrastructure)<\/strong>: Oracle\u2019s public cloud platform offering compute, storage, networking, and managed services.<\/li>\n<li><strong>Integration (category)<\/strong>: Services and patterns used to connect systems, move data, and orchestrate workflows.<\/li>\n<li><strong>Data Integrator<\/strong>: In this tutorial, the OCI-managed <strong>Data Integration<\/strong> service used to design and run data ingestion and transformation pipelines.<\/li>\n<li><strong>Workspace<\/strong>: An isolated environment within Data Integrator where you create and run integration artifacts.<\/li>\n<li><strong>Compartment<\/strong>: OCI governance boundary used to organize resources and apply IAM policies.<\/li>\n<li><strong>Data Asset<\/strong>: A logical definition of a source\/target system (e.g., Object Storage, database).<\/li>\n<li><strong>Connection<\/strong>: The connectivity and credential configuration used to access a data asset.<\/li>\n<li><strong>Data Flow<\/strong>: A mapping\/transformation workflow that defines how data moves from source to target.<\/li>\n<li><strong>Pipeline<\/strong>: An orchestration artifact chaining multiple steps\/tasks.<\/li>\n<li><strong>Task<\/strong>: An executable unit that runs a data flow or pipeline.<\/li>\n<li><strong>Autonomous Database (ATP\/ADW)<\/strong>: Oracle-managed database service with automated operations and built-in security features.<\/li>\n<li><strong>Object Storage<\/strong>: OCI service for storing unstructured data (files\/objects) in buckets.<\/li>\n<li><strong>IAM Policy<\/strong>: OCI access control rules defining who can do what in which compartment.<\/li>\n<li><strong>Service Gateway<\/strong>: OCI networking feature enabling private access to Oracle services like Object Storage from a VCN.<\/li>\n<li><strong>Private Endpoint<\/strong>: Private network access to a managed service without using a public IP (availability depends on service\/config).<\/li>\n<li><strong>ETL\/ELT<\/strong>: Extract-Transform-Load \/ Extract-Load-Transform data integration patterns.<\/li>\n<li><strong>CDC (Change Data Capture)<\/strong>: Capturing and replicating data changes (often near-real-time), commonly done with tools like GoldenGate.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Data Integrator in <strong>Oracle Cloud<\/strong> (commonly documented as <strong>OCI Data Integration<\/strong>) is a managed <strong>Integration<\/strong> service for designing and running data ingestion and transformation pipelines\u2014especially strong for patterns like <strong>Object Storage \u2192 Autonomous Database<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It matters because it reduces the operational burden of self-managed ETL tooling, provides scheduling and run history for production operations, and fits naturally into OCI governance via compartments and IAM policies. Cost is primarily driven by <strong>runtime consumption<\/strong> plus dependent services (Object Storage, databases, and any data transfer). Security hinges on least-privilege IAM, careful handling of connection credentials, and private networking where appropriate.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Use Data Integrator when you want managed, repeatable batch ingestion and orchestration in OCI. For real-time CDC replication, consider OCI GoldenGate; for application-to-application workflows, consider Oracle Integration.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: read the official OCI Data Integration documentation and then expand this lab into a production-ready pattern with staging\/curated schemas, idempotent loads, and monitored schedules:\n&#8211; https:\/\/docs.oracle.com\/en-us\/iaas\/data-integration\/<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Integration<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[48,62],"tags":[],"class_list":["post-918","post","type-post","status-publish","format-standard","hentry","category-integration","category-oracle-cloud"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/918","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=918"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/918\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=918"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=918"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=918"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}