Oracle Cloud Data Integrator Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Integration

1. Introduction

What this service is

Data Integrator in Oracle Cloud is a managed, cloud-native service used to design, run, and operationalize data ingestion and transformation workflows—typically moving data between Oracle and non-Oracle sources, files in Object Storage, and target analytics stores such as Autonomous Database.

One-paragraph simple explanation

If you need to load data from one place to another on a schedule (for example, CSV files in Object Storage into an Autonomous Data Warehouse table), Data Integrator provides a visual, managed way to build that pipeline, run it reliably, and monitor it—without standing up and maintaining your own ETL servers.

One-paragraph technical explanation

Technically, Data Integrator is an OCI-managed data integration runtime with a design-time studio (projects, data assets, connections, data flows/pipelines, tasks, schedules) and an execution engine that runs jobs in Oracle Cloud. It integrates with OCI Identity and Access Management (IAM) for control plane authorization and can connect to data sources/targets via OCI networking (public endpoints and/or private connectivity depending on your configuration). It emits operational telemetry through OCI logging/monitoring capabilities (availability varies by feature; verify in official docs).

What problem it solves

Data Integrator solves the common problems of: – Building repeatable ETL/ELT pipelines without custom scripts per dataset – Reducing operational overhead (patching/maintaining ETL servers) – Standardizing ingestion and transformation across teams – Scheduling and monitoring data movement jobs in a governed way – Integrating with Oracle Cloud data platforms (Object Storage, Autonomous Database, and other OCI services)

Naming note (important): In current Oracle Cloud documentation and Console navigation, the managed service is commonly labeled “Data Integration” (OCI Data Integration). The term “Oracle Data Integrator (ODI)” is also a separate, long-standing product (often on-premises or self-managed). This tutorial uses Data Integrator as the primary name (as requested) and maps it to the OCI-managed Data Integration service. Verify the exact branding in your region/console because Oracle product names can evolve.

2. What is Data Integrator?

Official purpose

Data Integrator’s purpose in Oracle Cloud is to provide a managed data integration service to ingest, transform, and load data across common enterprise sources and targets, with orchestration, scheduling, and monitoring.

Core capabilities

Typical capabilities include: – Design-time development of integration logic using a web UI (projects, data flows/pipelines) – Connectivity to common sources/targets (Object Storage, Oracle databases, and other supported systems) – Data preparation/transformation (mappings, joins, filters, derived columns—exact transforms depend on connector/runtime; verify in official docs) – Orchestration (pipelines/tasks, dependencies, schedules) – Operational management (job runs, status, logs/diagnostics)

Major components (conceptual)

While exact names in the UI can vary, the service typically revolves around: – Workspace: Top-level environment in a region/compartment where you build and run integrations – Projects/Folders: Organize integration artifacts – Data Assets: Definitions of external systems (e.g., Object Storage, Autonomous Database) – Connections: Credentials and connectivity configuration for a data asset – Data Flows / Pipelines: The actual ingestion and transformation logic – Tasks / Schedules: Operationalization—run now, run on a schedule, manage dependencies – Application/Runtime: The managed compute/runtimes that execute jobs (capacity/scaling and billing are part of pricing model)

Service type

Type: Managed cloud service (PaaS-style), focused on data integration workloads
Operational model: You design in the Oracle Cloud Console (or APIs where available), then the service runs jobs on managed infrastructure.

Scope (regional/global, tenancy/compartment)

Tenancy: Resources exist within an OCI tenancy
Region: Workspaces are typically regional (you create a workspace in a chosen OCI region)
Compartment: Resources are usually created in an OCI compartment for governance and access control
Project-scoped artifacts: Projects and integration artifacts live inside the workspace

(Confirm exact resource scoping and supported regions in official docs for your tenancy.)

How it fits into the Oracle Cloud ecosystem

Data Integrator is commonly used alongside: – Oracle Cloud Infrastructure (OCI) Object Storage for landing files/data – Autonomous Database (ATP/ADW) for analytics and warehousing – Oracle Cloud Networking (VCN, private endpoints, service gateways) for secure connectivity – OCI IAM for access control – OCI Logging/Monitoring/Audit for operational governance – Optional ecosystem services such as Data Catalog, GoldenGate, Oracle Analytics Cloud, and Oracle Integration depending on your architecture

3. Why use Data Integrator?

Business reasons

Faster time-to-value: Teams can build ingestion pipelines quickly using a managed service.
Lower operational burden: No ETL servers to patch/scale manually.
Consistency and governance: Standardized patterns for ingestion, transformations, and scheduling.

Technical reasons

Managed runtime: Execution is handled by Oracle Cloud; you focus on logic.
Native alignment with Oracle data platforms: Particularly strong fit when your targets are Autonomous Database or other Oracle-managed data services.
Repeatable workflows: Versioned artifacts, reusable connections, and orchestrated pipelines.

Operational reasons

Scheduling: Built-in scheduling and dependency handling (verify exact scheduling options and granularity).
Observability: Job run history and diagnostics are available in the service; integration with OCI observability features may apply (verify).
Separation of concerns: Workspace/project organization supports multi-team environments.

Security/compliance reasons

OCI IAM control plane: Fine-grained policies at compartment level.
Network controls: Can be designed for private connectivity patterns within OCI (where supported).
Auditability: OCI Audit can capture API actions for governance.

Scalability/performance reasons

Elastic managed execution: Suitable for variable workloads and bursty ingestion patterns (exact scaling model depends on service; verify in docs).
Parallelization features: May exist for file loads or data movement depending on connector and task configuration.

When teams should choose it

Choose Data Integrator when: – You’re on Oracle Cloud and need a managed service for data ingestion/orchestration. – Your targets include Autonomous Database or you frequently use Object Storage as a landing zone. – You need repeatable scheduled pipelines with centralized monitoring and access control. – You want to avoid operating an ETL cluster (Airflow/Spark) for moderate complexity pipelines.

When they should not choose it

Consider alternatives when: – You require complex distributed processing (multi-terabyte transformations requiring Spark clusters) and Data Integrator’s runtime model doesn’t match your needs. – You need real-time CDC replication at high volume—often better served by OCI GoldenGate. – Your organization already standardized on another integration platform (e.g., Azure Data Factory, AWS Glue) and multi-cloud friction outweighs benefits. – You need full code-first workflows with deep CI/CD integration and you cannot meet that with Data Integrator’s current APIs (verify API coverage).

4. Where is Data Integrator used?

Industries

Commonly used in: – Finance and insurance (risk reporting, regulatory extracts) – Retail and e-commerce (sales, inventory, customer analytics) – Healthcare (operational analytics, claims, patient systems—subject to compliance) – Telecom (billing analytics, customer churn pipelines) – Manufacturing (IoT data landing to analytics stores) – Public sector (data consolidation, dashboards, reporting)

Team types

Data engineering teams
Analytics engineering teams
Cloud platform teams supporting data platforms
Integration teams consolidating enterprise data
App teams that need lightweight ingestion into a warehouse

Workloads

Batch ingestion from files (CSV/JSON/Parquet depending on support)
Batch ELT/ETL into Oracle analytics targets
Scheduled refresh pipelines for BI tools
Landing-zone to curated-zone transformations

Architectures

Object Storage “data lake landing” → Autonomous Data Warehouse
Multi-source ingestion → standardized warehouse model (star/snowflake)
Staging schema → curated schema
“Extract from operational DB nightly” → reporting DB

Real-world deployment contexts

Production: Managed schedules, least-privilege IAM, private networking, tagging, runbooks, alerting
Dev/Test: Separate workspaces or separate compartments; smaller schedules; sample datasets

5. Top Use Cases and Scenarios

Below are 10 realistic use cases for Data Integrator in Oracle Cloud.

1) Object Storage CSV to Autonomous Data Warehouse (daily load)

Problem: Finance receives daily CSV extracts and needs them loaded into ADW.
Why Data Integrator fits: Managed file ingestion, mapping, scheduling, and monitoring.
Example: A daily transactions_YYYYMMDD.csv lands in an OCI bucket; Data Integrator loads it to DW.TRANSACTIONS_STAGE then merges into DW.TRANSACTIONS.

2) Multi-file ingestion with schema drift handling (lightweight)

Problem: Vendors add columns occasionally; ingestion breaks frequently.
Why it fits: Data flow mappings can be updated centrally; some connectors support flexible mappings (verify schema drift capabilities).
Example: Vendor adds region_code; you update mapping once and redeploy.

3) Operational DB to reporting DB refresh (nightly batch)

Problem: Operational Oracle DB is too busy for BI queries.
Why it fits: Scheduled extraction and load into reporting schema.
Example: Nightly job extracts orders/customers and loads them into ADW reporting tables.

4) Standardized ingestion framework for multiple departments

Problem: Each team writes scripts; no standard monitoring/governance.
Why it fits: Central workspace patterns, shared connections, consistent scheduling.
Example: Shared “landing-to-staging” templates; each department onboards new datasets quickly.

5) Data quality checkpoints during load (basic validations)

Problem: Bad rows cause downstream reporting issues.
Why it fits: Transform steps can filter/reject invalid records (capability depends on transformations available; verify).
Example: Filter rows where amount < 0, output rejects to a quarantine table.

6) Orchestrated pipeline: ingest → transform → publish

Problem: You need multi-step jobs with dependencies.
Why it fits: Pipelines/tasks can enforce ordering and handle failures.
Example: Step 1 load staging; step 2 run transform; step 3 refresh aggregate table.

7) Cross-compartment shared data platform (governed)

Problem: Platform team owns data services; app teams need controlled access.
Why it fits: Compartment-based IAM and policies.
Example: Platform compartment hosts Data Integrator; app compartments grant least-privilege access to run specific tasks.

8) Migration from self-managed ETL to managed OCI

Problem: Legacy ETL servers are costly and hard to patch.
Why it fits: Replace routine batch ETL jobs with managed service.
Example: Replace cron + scripts that pull files from SFTP (after landing to OCI) with Data Integrator schedules.

9) Pre-load transformations to standardize reference data

Problem: Multiple systems use different code sets.
Why it fits: Transform stage can map codes to standardized dimension tables.
Example: Map status values (A/ACTIVE/1) into canonical DIM_STATUS.

10) Controlled reprocessing/backfills

Problem: Need to re-run loads for a historical date range.
Why it fits: Parameterized runs (if supported) and repeatable pipelines.
Example: Backfill last 30 days of files after a bug fix, without manual SQL scripting.

6. Core Features

Feature availability can vary by region and by connector type. Always confirm with the official Data Integration documentation for your tenancy.

1) Workspaces (environment boundary)

What it does: Provides an isolated environment to manage projects, connections, jobs, and run history.
Why it matters: Supports dev/test/prod separation and team organization.
Practical benefit: Clear ownership and governance at the workspace level.
Caveats: Workspaces are typically regional; cross-region designs require explicit planning.

2) Projects and artifact organization

What it does: Organizes data flows/pipelines, connections, and tasks into logical groups.
Why it matters: Maintainability for larger estates.
Practical benefit: Reusable patterns and consistent naming/tagging.
Caveats: Establish conventions early; refactoring later is painful.

3) Data assets (source/target definitions)

What it does: Represents a system like Object Storage or a database service.
Why it matters: Centralizes system configuration and governance.
Practical benefit: Multiple pipelines can reuse the same data asset.
Caveats: Connectivity requirements (network, credentials) must be correct for reliable runs.

4) Connections (credentials and connectivity)

What it does: Stores connection details used by jobs (endpoints, usernames, passwords/keys).
Why it matters: Security and operational consistency.
Practical benefit: Update credentials once without rewriting pipelines.
Caveats: Secret handling options vary—prefer OCI Vault integration if supported; otherwise tightly control who can view/edit connections.

5) Data flows (mapping and transformations)

What it does: Defines how data is read, transformed, and written.
Why it matters: This is where the “ETL/ELT logic” lives.
Practical benefit: Visual mapping reduces custom code for common transformations.
Caveats: Very complex transformations might be better in SQL on the target (ELT) or in a dedicated compute engine; decide based on performance and governance.

6) Pipelines (orchestration)

What it does: Chains steps together (ingest, transform, publish), handling dependencies and flow control.
Why it matters: Production pipelines usually require multiple steps.
Practical benefit: Fewer external schedulers; clearer run lineage.
Caveats: Understand failure behavior and retry semantics; verify how retries and partial failures are handled.

7) Tasks and scheduling

What it does: Runs a data flow/pipeline on demand or on a schedule.
Why it matters: Operationalization is what turns a design into a service.
Practical benefit: Predictable refresh cadence for analytics.
Caveats: Scheduling granularity, time zone handling, and concurrency limits should be validated in docs.

8) Monitoring and run history

What it does: Shows status, run duration, and error details for tasks.
Why it matters: Troubleshooting and SLA management.
Practical benefit: Faster incident response with centralized run diagnostics.
Caveats: For enterprise observability, confirm integration with OCI Logging/Monitoring and export patterns (if required).

9) IAM integration (control plane authorization)

What it does: Uses OCI IAM groups/policies to authorize workspace and artifact management.
Why it matters: Least privilege and auditability.
Practical benefit: Platform teams can delegate safely.
Caveats: The exact policy verbs/resource-types must match Data Integrator’s IAM model—use official policy examples.

10) APIs/Automation (where available)

What it does: Enables automation via OCI APIs/SDK/CLI (coverage varies).
Why it matters: CI/CD and platform operations.
Practical benefit: Repeatable provisioning, promotion between environments.
Caveats: Verify current API support for the artifacts you need (workspace, tasks, runs, etc.).

7. Architecture and How It Works

High-level architecture

At a high level, Data Integrator has: 1. A control plane: where you define artifacts (workspaces, connections, flows, tasks). 2. A runtime plane: managed execution environment that reads from sources and writes to targets. 3. Integration points: IAM, networking, Object Storage, databases, logging/monitoring.

Request/data/control flow

User (or automation) creates/updates artifacts in the Data Integrator workspace.
A task is started (manual trigger or schedule).
Runtime retrieves connection details and accesses sources/targets.
Data is extracted, transformed, and loaded.
Runtime emits status and logs; the job is visible in run history.

Integrations with related Oracle Cloud services

Common integrations include: – OCI Object Storage: landing zone for files and staging data – Autonomous Database: common analytics target – OCI IAM: access control to manage and run integration assets – OCI Vault (optional): secrets storage (verify connector support) – OCI Logging/Monitoring (optional): operational visibility (verify exact integration points) – VCN / private networking (optional): private endpoints for databases and private access patterns

Dependency services

Your pipeline usually depends on: – Object Storage buckets, objects, and policies – Target databases (Autonomous Database or DB systems) – Network path between Data Integrator runtime and the endpoints (public or private) – IAM policies for all involved services

Security/authentication model (practical view)

Control plane: IAM policies decide who can create/manage/run workspaces and artifacts.
Data plane access to sources/targets:
Object Storage access can be via OCI IAM + resource principals (service-to-service) in some patterns, or via credentials/config depending on how the connector works (verify).
Database access is typically via database credentials and secure connectivity options (TLS; wallet for Autonomous Database patterns).

Networking model (practical view)

Typical patterns: – Public endpoints: simplest for labs; ensure you restrict access. – Private endpoints: preferred for production; requires VCN planning, DNS, and routing. – Service gateway: can keep Object Storage access private within OCI. – NAT gateway: for outbound access if needed (avoid if you can).

Monitoring/logging/governance considerations

Use OCI Audit to track who created/modified artifacts.
Use task run history for operational checks.
Consider exporting logs/metrics into centralized tooling if your org requires it (verify native integration points).
Use tagging to separate cost centers, environments, owners.

Simple architecture diagram (Mermaid)

flowchart LR
  U[Engineer / Data Analyst] -->|Design & Run| DI[Data Integrator Workspace]
  OS[(OCI Object Storage Bucket)] -->|Read CSV/Files| DI
  DI -->|Load Tables| ADB[(Autonomous Database)]
  DI --> RH[Run History / Logs]

Production-style architecture diagram (Mermaid)

flowchart TB
  subgraph Tenancy[OCI Tenancy]
    subgraph Net[VCN (Production)]
      PE1[Private Endpoint / Private Access\n(to Autonomous Database)]
      SG[Service Gateway\n(private access to Object Storage)]
    end

    subgraph DIW[Data Integrator Workspace (Region)]
      CP[Control Plane:\nProjects, Connections, Tasks]
      RT[Managed Runtime:\nJob Execution]
    end

    subgraph Data[Data Platform]
      OS[(Object Storage:\nLanding + Archive)]
      ADB[(Autonomous Database:\nStaging + Curated)]
    end

    IAM[OCI IAM Policies & Groups]
    AUD[OCI Audit]
    MON[OCI Monitoring/Logging\n(verify integration details)]
  end

  IAM --> CP
  CP --> RT
  OS --> RT
  RT --> ADB
  AUD --> CP
  RT --> MON
  SG --- OS
  PE1 --- ADB

8. Prerequisites

Tenancy and compartment requirements

An active Oracle Cloud (OCI) tenancy
A compartment where you can create:
Data Integrator workspace
Object Storage bucket
Autonomous Database (for this lab)

Permissions / IAM roles

You need permissions to: – Create/manage Data Integrator workspaces and artifacts – Read/write Object Storage objects in a bucket – Create/manage Autonomous Database (or at least connect and create tables)

OCI IAM policies for Data Integrator use service-specific resource types and verbs. Because policy syntax can change and differs by feature, use the official IAM policy examples from Oracle docs for Data Integration.

Start here (official docs entry point; navigate to “Policies” / “IAM” sections): – https://docs.oracle.com/en-us/iaas/data-integration/

Billing requirements

A paid OCI account or sufficient free-tier capacity.
This lab can be designed to be low-cost if you use:
Autonomous Database Always Free (if available in your region/tenancy)
Small test files (KB/MB scale)
You may still incur charges for storage, data egress, or non-free resources.

CLI/SDK/tools needed (optional but useful)

OCI Console access (required)
Optional:
OCI CLI for uploading files and basic checks: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm
A SQL client (SQL Developer, SQLcl, or Autonomous Database SQL Worksheet in Console)

Region availability

Data Integrator (OCI Data Integration) is not necessarily available in every region.
Verify region availability in Oracle Cloud documentation or in the Console region selector.

Quotas/limits

Service limits exist for:
Number of workspaces
Concurrent runs
Artifact counts
Runtime capacity/billing dimensions
Check OCI service limits for Data Integration in your region/tenancy. (Limits change; do not rely on blog posts.)

Prerequisite services

For this tutorial: – OCI Object Storage – Autonomous Database (ATP/ADW)

9. Pricing / Cost

Do not rely on static numbers in articles. Oracle Cloud pricing varies by region, currency, and sometimes by contract/commitment. Always confirm via official pricing pages.

Current pricing model (how to think about it)

Data Integrator pricing is typically usage-based, where you pay for the data integration runtime consumed to execute data flows/pipelines and operationalize workloads. The exact billing metric may be expressed in: – OCPU-hours or similar compute-time units for the integration runtime – Additional charges for related resources you use (Object Storage, Autonomous Database, networking)

Because Oracle may adjust SKUs and units, verify the exact meter names and units in the official pricing page for “Data Integration”.

Pricing dimensions to check

When estimating cost, confirm these dimensions in official pricing: – Runtime compute consumption per hour (or per run) – Any per-connector, per-feature, or per-capacity pricing (if applicable) – Additional charges for: – Object Storage (GB-month, requests) – Autonomous Database (ECPU/OCPU, storage) unless Always Free – Data transfer (especially internet egress) – Logging retention/export (if applicable)

Free tier (if applicable)

Autonomous Database Always Free may cover a small target DB for labs.
OCI Object Storage has low cost and sometimes free allocations.
Whether Data Integrator itself has a free tier depends on current Oracle offerings—verify in official pricing.

Cost drivers (direct)

Total runtime hours of Data Integrator jobs (more frequent schedules, longer runs)
Larger data volumes (longer run times, more resource usage)
Concurrency (multiple pipelines at once)
Complex transformations (increases runtime)

Hidden or indirect costs

Autonomous Database compute and storage (if not Always Free)
Object Storage storage growth + request costs
Data egress if moving data out of OCI
Operational tooling costs if you export logs to third-party systems

Network/data transfer implications

Data transfer within OCI is usually cheaper than internet egress, but pricing depends on path and services.
If your sources/targets are outside OCI (on-prem or other clouds), plan for:
VPN/FastConnect costs
Egress/ingress charges
Latency and throughput constraints

How to optimize cost

Start with daily or hourly schedules only where required.
Minimize unnecessary reprocessing:
Load only new partitions/files
Use watermarking (if supported) or file naming conventions
Prefer ELT (push transformations into the database) for large transforms if it reduces integration runtime (validate performance).
Use small dev/test workspaces and smaller sample datasets.
Tag resources for chargeback.

Example low-cost starter estimate (no fabricated numbers)

A realistic low-cost starter footprint is: – Object Storage bucket with a few MB of CSV files – Autonomous Database Always Free (if available) – Data Integrator running a small daily load that completes in minutes

To estimate cost precisely: 1. Check the official Data Integration pricing line items. 2. Estimate runs/day × average runtime minutes/run. 3. Add storage and DB cost (if not free).

Example production cost considerations

In production, your main cost drivers typically become: – Multiple pipelines running frequently (hourly or near-real-time batches) – Larger data volumes (GB–TB per day) – Higher concurrency and longer run durations – Non-free Autonomous Database compute for larger warehouses

Official pricing references

Oracle Cloud price list (official): https://www.oracle.com/cloud/price-list/
Oracle Cloud cost estimator (official): https://www.oracle.com/cloud/costestimator.html

For Data Integrator-specific pricing lines, navigate the price list to the relevant service section (often listed as Data Integration).

10. Step-by-Step Hands-On Tutorial

This lab builds a real (small) pipeline:

Load a CSV file from OCI Object Storage into an Autonomous Database table using Data Integrator, then validate the rows in the database.

Objective

Create a minimal data ingestion workflow in Oracle Cloud Data Integrator
Source: OCI Object Storage (customers.csv)
Target: Autonomous Database (CUSTOMERS table)
Run once manually, validate results, then clean up

Lab Overview

You will: 1. Create an Autonomous Database (Always Free if available) and a target table. 2. Create an Object Storage bucket and upload a sample CSV. 3. Create a Data Integrator workspace. 4. Define data assets and connections (Object Storage + Autonomous Database). 5. Build a data flow to map CSV columns to table columns. 6. Create a task and run it. 7. Validate row counts in Autonomous Database. 8. Clean up resources to avoid ongoing cost.

Step 1: Prepare the target Autonomous Database and table

1.1 Create an Autonomous Database (Console)

In OCI Console: 1. Navigate to Oracle Database → Autonomous Database. 2. Click Create Autonomous Database. 3. Choose: – Compartment: your lab compartment – Workload type: Autonomous Data Warehouse or Autonomous Transaction Processing (either works for this lab) – Always Free: enable if available 4. Set admin password and create the database.

Expected outcome: You have a running Autonomous Database instance.

1.2 Create a database user and table

Use Database Actions / SQL Worksheet (available from the Autonomous Database details page), or connect via a SQL client.

Run SQL (adjust username/password as needed):

-- Create a least-privileged schema for the lab
CREATE USER di_lab IDENTIFIED BY "Use-A-Strong-Password-Here";

GRANT CREATE SESSION TO di_lab;
GRANT CREATE TABLE TO di_lab;
GRANT CREATE SEQUENCE TO di_lab;
GRANT CREATE PROCEDURE TO di_lab;

-- Optional for easier lab work (consider restricting in real environments)
-- GRANT UNLIMITED TABLESPACE TO di_lab;

ALTER SESSION SET CURRENT_SCHEMA = di_lab;

CREATE TABLE customers (
  customer_id NUMBER PRIMARY KEY,
  full_name   VARCHAR2(200),
  email       VARCHAR2(320),
  country     VARCHAR2(100),
  created_at  DATE
);

Expected outcome: Table DI_LAB.CUSTOMERS exists.

Verification

SELECT table_name FROM user_tables WHERE table_name = 'CUSTOMERS';

Step 2: Create an Object Storage bucket and upload a CSV file

2.1 Create a bucket (Console)

Go to Storage → Buckets → Create Bucket
Choose a name, for example: di-lab-bucket-<unique>
Keep defaults (Standard storage) for the lab.

Expected outcome: Bucket is created.

2.2 Create a sample CSV file

Create a local file named customers.csv with this content:

customer_id,full_name,email,country,created_at
1,Ada Lovelace,ada@example.com,UK,2024-01-15
2,Grace Hopper,grace@example.com,US,2024-02-20
3,Alan Turing,alan@example.com,UK,2024-03-05

2.3 Upload the CSV

In bucket details: – Objects → Upload

Upload customers.csv at the bucket root (or in a folder like landing/—just remember the path).

Expected outcome: customers.csv is visible in bucket objects list.

Verification

Click the object and confirm: – Name and size look correct – Storage tier is Standard

Step 3: Create a Data Integrator workspace

Console navigation may appear as Data Integration in the OCI Console (service naming varies). The underlying managed service is what this tutorial calls Data Integrator.

Navigate to the Data Integration service: – Search for Data Integration in the OCI Console search bar
Click Create workspace
Provide: – Name: di-lab-workspace – Compartment: your lab compartment

Expected outcome: Workspace status becomes Active.

Verification

Open the workspace and confirm you can access its design environment.

Step 4: Create data assets and connections (Object Storage + Autonomous Database)

You need two endpoints: – Source: Object Storage bucket/object – Target: Autonomous Database schema

4.1 Create an Object Storage data asset + connection

Inside the workspace (exact UI labels vary): 1. Go to Data Assets → Create 2. Choose Object Storage (or equivalent connector) 3. Enter required fields (typically): – Tenancy/namespace (Object Storage namespace) – Bucket name – Region 4. Create a Connection for it

Expected outcome: Data asset and connection show as “Available/Active”.

Notes – Access method varies. Some OCI services support service-to-service authentication patterns; others require credentials or policies. Follow the connector instructions shown in your workspace UI. – If the connector requires IAM policies, use the official docs for Data Integration IAM and Object Storage policies.

4.2 Create an Autonomous Database data asset + connection

Inside the workspace: 1. Data Assets → Create 2. Choose Autonomous Database (or Oracle Database connector appropriate for ADB) 3. Provide connection properties, typically: – Database service details (OCID or connection string depending on UI) – Username: di_lab – Password: the password you set – Wallet/TLS settings if required by the connector

Expected outcome: Database connection tests successfully.

Important: Autonomous Database connectivity can require: – Wallet configuration (for some connection methods) – Network allowlist or “allow OCI services” options (naming varies) – Public vs private endpoint choices
Because these specifics vary by region and ADB settings, follow the connection wizard guidance and verify in official docs.

Verification

Use the connection “Test” feature (if available) to confirm both connections are valid.

Step 5: Build a data flow to load `customers.csv` into the `CUSTOMERS` table

Inside the workspace: 1. Go to Projects → Create Project – Name: di_lab_project 2. Within the project, create a Data Flow (or mapping/data flow artifact) 3. Configure the Source: – Choose Object Storage connection – Select the file customers.csv – Configure format as CSV – Confirm the header row is enabled 4. Configure schema/columns: – customer_id (number) – full_name (string) – email (string) – country (string) – created_at (date)

Configure the Target: – Choose Autonomous Database connection – Schema: DI_LAB – Table: CUSTOMERS
Map fields source → target: – customer_id → customer_id – full_name → full_name – email → email – country → country – created_at → created_at
Choose the write disposition: – For a first run, select Insert (append) or Truncate + load depending on your goal. – For repeatable labs, Truncate + load is simpler if supported.

Expected outcome: Data flow is saved and valid (no validation errors).

Verification

Use a “Validate” action (if available) on the data flow and confirm no missing mappings or type errors are reported.

Step 6: Create and run a task

From the data flow, choose Create Task (or go to Tasks and create one referencing your flow).
Name: load_customers_once
Run the task immediately.

Expected outcome: Task run status becomes Succeeded after a short time. If it fails, use the run logs to troubleshoot.

Validation

Connect to Autonomous Database (SQL Worksheet) as DI_LAB and run:

SELECT COUNT(*) AS row_count FROM customers;

SELECT * FROM customers ORDER BY customer_id;

Expected outcome: – Row count is 3 – The rows match the CSV content – created_at values are parsed as dates (format handling may require adjustment depending on connector settings)

If created_at is null or errors occurred, adjust the CSV date format settings in your source configuration or add a transformation step to parse dates.

Troubleshooting

Common issues and fixes:

1) Object Storage access denied (403 / permission errors)

Cause: Missing Object Storage policies, wrong bucket/namespace, or connection auth misconfigured.
Fix:
Re-check bucket name and namespace
Confirm the Data Integrator connector’s required IAM policies (official docs)
Confirm the bucket is in the same region (or that cross-region access is supported)

2) Autonomous Database connection fails

Cause: Incorrect username/password, wallet/TLS requirement, network access restrictions.
Fix:
Test DB login directly via SQL Worksheet using the same credentials
Confirm whether the connector requires a wallet
Check ADB networking settings (public/private endpoint)
Verify whether ADB has an option to allow access from OCI services (wording varies)

3) Date parsing errors for `created_at`

Cause: CSV date format mismatch.
Fix:
Configure the date format in the CSV source settings if available
Or map created_at via a transform (e.g., parse YYYY-MM-DD) if supported
As a fallback, load into a VARCHAR2 staging column then transform in SQL

4) Duplicate key error on `customer_id`

Cause: Re-running an “Insert” load without truncation.
Fix:
Use “Truncate + load” or
Delete existing rows before load or
Implement upsert/merge pattern (often done as a pipeline step using SQL on the target)

5) Column mapping/type mismatch

Cause: Connector inferred types incorrectly.
Fix:
Explicitly define schema in source settings
Cast/convert in a transform step
Ensure target columns have compatible types/lengths

Cleanup

To avoid ongoing cost and clutter, delete lab resources you don’t need:

Data Integrator: – Delete the task(s), data flow(s), project, and workspace (if not used elsewhere).
Object Storage: – Delete the object customers.csv – Delete the bucket (must be empty)
Autonomous Database: – If it was created only for this lab, terminate it (Always Free resources can still be terminated safely). – Or keep it if you plan more labs; remove the DI_LAB schema and objects:

-- As ADMIN:
DROP USER di_lab CASCADE;

11. Best Practices

Architecture best practices

Use a landing → staging → curated model:
Landing: raw files in Object Storage (immutable)
Staging: load raw tables in database
Curated: transformed, business-ready tables
Prefer idempotent pipelines:
Re-running a job should not corrupt data
Use partitioning, truncation, or merge patterns
Keep transformations close to where they run best:
Heavy relational transforms often run efficiently in the database (ELT)
Simple standardization can be handled in data flows

IAM/security best practices

Follow least privilege:
Separate “builders” (design) from “operators” (run/monitor).
Use separate compartments for dev/test/prod.
Restrict who can view/edit connections (credentials exposure risk).
Use OCI Vault for secrets if supported by the connector; otherwise tightly control access and rotation processes.

Cost best practices

Schedule only as often as needed.
Avoid full reloads when incremental loads are possible.
Archive old landing files to cheaper storage tiers if appropriate.
Monitor runtime duration—optimize the slow steps first.

Performance best practices

For file ingestion:
Use appropriately sized files (not too many tiny files; not a single huge file) based on connector guidance.
For database loads:
Load into staging tables then transform with set-based SQL
Use indexing carefully; avoid heavy indexes on staging tables during load
Test concurrency limits and tune scheduling windows.

Reliability best practices

Implement retry strategy:
Retries for transient network errors
No retries for deterministic schema errors (fix and redeploy)
Build alerting around failures:
Use OCI events/notifications patterns if supported (verify) or external monitoring integration.
Keep raw landing data immutable for replay.

Operations best practices

Define runbooks:
Where to check job failures
How to re-run safely
How to backfill data
Tag everything: env, owner, cost-center, data-domain.
Maintain version control for transformation logic:
If Data Integrator supports export/import of artifacts, incorporate it into CI/CD (verify current capabilities).

Governance/tagging/naming best practices

Naming pattern example:
Workspaces: di-<env>-<region>-<team>
Projects: <domain>-pipelines
Tasks: <source>-to-<target>-<frequency>
Tag with:
Environment=Dev|Test|Prod
DataDomain=Finance|Sales|Ops
OwnerEmail=...

12. Security Considerations

Identity and access model

OCI IAM controls who can:
Create/manage workspaces
Create/edit connections and data flows
Run tasks and view run history
Separate permissions for:
Platform admins
Data engineers
Operators/analysts (read-only monitoring)

Because exact policy statements are service-specific, use the official Data Integration IAM policy documentation: – https://docs.oracle.com/en-us/iaas/data-integration/

Encryption

At rest:
Object Storage encrypts data at rest (Oracle-managed keys by default; customer-managed keys available with OCI Vault in many cases).
Autonomous Database encrypts data at rest.
In transit:
Use TLS connections to databases and HTTPS for Object Storage endpoints.

Network exposure

Prefer private connectivity where possible:
Private endpoints for Autonomous Database
Service Gateway for Object Storage access (keeps traffic off the public internet)
For labs, public endpoints are acceptable but restrict:
DB network allowlists
Bucket access policies

Secrets handling

Avoid embedding passwords in scripts.
Rotate DB credentials regularly.
Use Vault-backed secrets if Data Integrator supports it; otherwise restrict connection edit permissions and audit changes.

Audit/logging

OCI Audit can capture control plane actions (who changed what).
Use Data Integrator run logs/history for operational traces.
If your compliance program requires centralized logging, verify supported export/integration methods.

Compliance considerations

Data residency: choose region carefully; workspaces are regional.
PII/PHI handling:
Mask or tokenize data where required
Restrict access to landing and curated zones
Maintain data retention and deletion policies

Common security mistakes

Granting broad IAM permissions to too many users
Allowing public DB access from anywhere
Storing sensitive landing files without lifecycle/retention controls
Letting many users view/edit connections containing passwords

Secure deployment recommendations

Use separate prod workspace and compartment with tight IAM.
Use private connectivity for production targets.
Implement tagging and budget alerts for spend governance.
Establish a credential rotation and incident response process.

13. Limitations and Gotchas

Because service behavior and limits can change, treat this section as a checklist and confirm details in official docs.

Known limitations (typical categories)

Connector limitations: Not all sources/targets support the same transformations or pushdown optimizations.
File format nuances: CSV parsing rules (quotes, delimiters, date formats) often cause early failures.
Concurrency/service limits: Maximum concurrent runs per workspace may apply.
Cross-region complexity: Workspaces are regional; cross-region access can add latency and egress costs.
Private networking setup: Private endpoints require careful VCN/DNS/routing planning.

Quotas

Workspaces per region/compartment
Concurrent task runs
Maximum artifact counts
Check OCI service limits for Data Integration in your region.

Regional constraints

Data Integrator may not be available in all OCI regions.
Some connectors/features may be region-limited.

Pricing surprises

Frequent schedules that run longer than expected drive runtime cost.
Reprocessing large datasets repeatedly increases runtime.
Egress costs if data leaves OCI.

Compatibility issues

Autonomous Database connectivity requirements vary by configuration.
Object Storage access policies must be correct for the connector’s auth method.
Date/time parsing and character encoding can differ between source files and target DB.

Operational gotchas

Re-running “Insert” tasks can cause duplicate keys.
Schema changes in CSV headers can break mappings.
“Success” status may still include rejected rows depending on load mode—validate row counts and error tables if present.

Migration challenges

If migrating from ODI or custom ETL:
Some transformation logic may need redesign.
Operational semantics (scheduling, retries, error handling) will differ.

Vendor-specific nuances

Oracle Cloud’s separation of compartments, regions, and policies is powerful but requires governance discipline.
Always confirm how the runtime authenticates to Object Storage and databases for your chosen connector.

14. Comparison with Alternatives

Data Integrator is one option among managed integration and ETL tools. Below is a practical comparison.

Option	Best For	Strengths	Weaknesses	When to Choose
Oracle Cloud Data Integrator (Data Integration service)	OCI-native batch ingestion/orchestration	Managed service; strong fit with Object Storage + Autonomous Database; IAM/compartment governance	Connector/feature coverage varies; may not suit heavy distributed compute; verify API/CI-CD depth	You run analytics on OCI and want managed pipelines
Oracle GoldenGate (OCI)	Real-time CDC replication	Low-latency replication; operational DB change capture	Not a general ETL tool; can be more complex/costly	You need near-real-time replication/CDC
Oracle Integration (OIC)	Application integration, SaaS integration	Strong SaaS adapters and app workflows	Not primarily for large-scale data ingestion/ETL	You integrate business apps and events more than bulk data
Oracle Data Integrator (ODI) self-managed	Enterprises needing full ODI features/control	Mature ETL tooling; deep enterprise patterns	You operate infrastructure; patching/upgrades; licensing complexity	You already standardized on ODI and need advanced features
AWS Glue	ETL in AWS	Serverless Spark; strong AWS integrations	Different cloud; migration overhead; cost model differs	Your data platform is in AWS
Azure Data Factory	ETL/orchestration in Azure	Broad connectors; enterprise orchestration	Different cloud; pricing/ops differences	Your data platform is in Azure
Google Cloud Data Fusion / Dataflow	ETL + pipelines in GCP	Strong pipeline processing	Different cloud; learning curve	Your data platform is in GCP
Apache Airflow (self-managed/managed)	Orchestration-first workflows	Code-first; flexible; huge ecosystem	Requires ops; ETL still needs tools (Spark/dbt)	You want orchestration framework and already run data tools
dbt (core/cloud)	SQL-based transformations in warehouse	Great for ELT; version control friendly	Not an ingestion tool; needs upstream loader	Your data is already in the warehouse and transforms are SQL-first

15. Real-World Example

Enterprise example: governed ingestion into an OCI analytics platform

Problem: A large enterprise has multiple upstream systems delivering daily extracts and needs a governed, repeatable ingestion mechanism into ADW for enterprise reporting.
Proposed architecture:
Upstream exports land in OCI Object Storage (per domain buckets/prefixes).
Data Integrator runs domain pipelines:
- Load landing files into staging schema
- Apply transformations and publish curated tables
Autonomous Data Warehouse stores curated data marts.
IAM policies restrict each domain team to their project artifacts.
Tagging and budgets provide cost governance.
Why Data Integrator was chosen:
OCI-native managed execution
Strong fit with Object Storage + ADW
Built-in scheduling and run history for operations
Expected outcomes:
Faster onboarding of new datasets
Reduced ETL server maintenance
Improved auditability and consistent run operations

Startup/small-team example: simple analytics ingestion without managing ETL servers

Problem: A startup wants daily analytics from exported app data but doesn’t want to run Airflow/Spark.
Proposed architecture:
App exports a CSV daily to Object Storage
Data Integrator loads it into Autonomous Database (Always Free for early stage where possible)
BI connects to Autonomous Database for dashboards
Why Data Integrator was chosen:
Quick to implement with minimal ops
Low overhead for scheduling and monitoring
Expected outcomes:
Reliable daily refresh
Simple operational model
Easy path to scale by upgrading DB and increasing pipeline complexity later

16. FAQ

1) Is Data Integrator the same as Oracle Data Integrator (ODI)?
No. ODI is a separate product (often self-managed and historically on-prem). In Oracle Cloud, the managed service is commonly documented as OCI Data Integration. This tutorial uses “Data Integrator” to refer to that managed OCI service.

2) Is Data Integrator an ETL or ELT tool?
It can support ETL-style transformations in flows and also ELT-style patterns where transformations run in the target database. The best approach depends on workload and connector behavior.

3) Where do Data Integrator jobs run?
They run on Oracle-managed runtime infrastructure associated with the service. You don’t manage servers directly.

4) Can I use Data Integrator with Autonomous Database?
Yes—this is a common pattern. Connectivity details (wallet, public/private endpoints) depend on configuration; follow the connector wizard and docs.

5) Can it load data from Object Storage?
Yes—Object Storage is a common landing zone. Ensure IAM/policies and bucket access are correctly configured.

6) Does it support incremental loads?
Incremental patterns are typically implemented using watermark columns, partitions, file naming conventions, or merge steps. Exact built-in support depends on connectors and features—verify in docs.

7) How do I schedule pipelines?
You create tasks and attach schedules in the workspace. Scheduling frequency/granularity depends on service capabilities.

8) How do I monitor failures?
Use task run history and logs in the Data Integrator workspace. For enterprise alerting, verify integrations with OCI Monitoring/Notifications or Events.

9) Can I keep traffic private (no public internet)?
Often yes with private endpoints/service gateways and proper VCN design, but exact support depends on connectors and your database configuration. Verify in OCI docs.

10) How is access controlled?
Through OCI IAM policies at tenancy/compartment scope. You can separate design permissions from run/monitor permissions.

11) Does Data Integrator store my database passwords?
Connections commonly store credentials. Prefer OCI Vault integration if supported; otherwise tightly control access to connections and rotate credentials.

12) Can I promote artifacts from dev to prod?
Many teams use export/import or API-based automation where available. Confirm current supported promotion mechanisms in the docs for your region.

13) What’s the best practice for schema changes in incoming files?
Use a staging layer and implement controlled schema evolution: – land raw files immutably – load to staging – update mappings deliberately and deploy

14) Is Data Integrator suitable for real-time streaming?
Typically it’s used for batch-oriented integration. For real-time CDC/replication, OCI GoldenGate is often a better fit.

15) How do I estimate cost accurately?
Measure average runtime per job, multiply by schedule frequency, then apply the official Data Integration pricing meter plus dependent services (Object Storage, DB, data transfer). Use Oracle’s cost estimator.

17. Top Online Resources to Learn Data Integrator

Resource Type	Name	Why It Is Useful
Official documentation	OCI Data Integration docs: https://docs.oracle.com/en-us/iaas/data-integration/	Primary source for features, connectors, IAM policies, and how-to guides
Official pricing	Oracle Cloud Price List: https://www.oracle.com/cloud/price-list/	Official, up-to-date pricing SKUs and units (region/contract dependent)
Official calculator	Oracle Cloud Cost Estimator: https://www.oracle.com/cloud/costestimator.html	Helps estimate total cost across services (DB, storage, data integration runtime)
Official OCI docs (IAM)	OCI IAM overview: https://docs.oracle.com/en-us/iaas/Content/Identity/home.htm	Required for secure policy design and least-privilege access
Official Object Storage docs	Object Storage overview: https://docs.oracle.com/en-us/iaas/Content/Object/Concepts/objectstorageoverview.htm	Bucket policies, namespaces, lifecycle rules, and access models
Official Autonomous Database docs	Autonomous Database docs: https://docs.oracle.com/en-us/iaas/autonomous-database/	Connectivity, wallets, network access, users/schemas for lab and production
Architecture center	Oracle Cloud Architecture Center: https://www.oracle.com/cloud/architecture/	Reference architectures and best practices for OCI data platforms
Official tutorials	Oracle Cloud Tutorials landing: https://docs.oracle.com/en/learn/	Hands-on labs across OCI; search within for data integration patterns
Videos/webinars	Oracle Cloud Infrastructure YouTube: https://www.youtube.com/@OracleCloudInfrastructure	Product walkthroughs and architecture sessions (search for Data Integration)
SDK/CLI	OCI CLI installation: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm	Useful for repeatable uploads, automation, and operational scripts

18. Training and Certification Providers

Institute	Suitable Audience	Likely Learning Focus	Mode	Website URL
DevOpsSchool.com	Cloud/DevOps engineers, platform teams	OCI fundamentals, DevOps practices, integration and automation foundations	Check website	https://www.devopsschool.com/
ScmGalaxy.com	Beginners to intermediate engineers	SCM/DevOps toolchains that often support integration delivery	Check website	https://www.scmgalaxy.com/
CLoudOpsNow.in	Cloud operations teams	Cloud operations practices, monitoring, governance, runbooks	Check website	https://www.cloudopsnow.in/
SreSchool.com	SREs, operations engineers	Reliability engineering, incident response, operational maturity	Check website	https://www.sreschool.com/
AiOpsSchool.com	Ops + data/AI practitioners	AIOps concepts, operational analytics, monitoring automation	Check website	https://www.aiopsschool.com/

19. Top Trainers

Platform/Site	Likely Specialization	Suitable Audience	Website URL
RajeshKumar.xyz	DevOps/cloud training and mentoring (verify offerings)	Individuals and teams seeking structured guidance	https://rajeshkumar.xyz/
devopstrainer.in	DevOps training platform (verify course catalog)	Engineers building practical DevOps/cloud skills	https://www.devopstrainer.in/
devopsfreelancer.com	Freelance DevOps services/training (verify specifics)	Teams needing short-term advisory or coaching	https://www.devopsfreelancer.com/
devopssupport.in	DevOps support and learning resources (verify services)	Ops/DevOps teams needing troubleshooting help	https://www.devopssupport.in/

20. Top Consulting Companies

Company	Likely Service Area	Where They May Help	Consulting Use Case Examples	Website URL
cotocus.com	Cloud/DevOps consulting (verify portfolio)	Cloud adoption, automation, operations	Designing OCI landing zones; setting up CI/CD; governance patterns	https://cotocus.com/
DevOpsSchool.com	DevOps/cloud consulting and training	Platform engineering and enablement	Data platform ops model; pipeline standards; IAM and policy design workshops	https://www.devopsschool.com/
DEVOPSCONSULTING.IN	DevOps consulting services (verify offerings)	DevOps transformation and cloud operations	Build runbooks/monitoring; release automation; security reviews	https://www.devopsconsulting.in/

21. Career and Learning Roadmap

What to learn before Data Integrator

OCI fundamentals:
Compartments, IAM users/groups/policies
Regions, availability domains
Networking basics:
VCNs, subnets, routing, service gateways (for private Object Storage access)
Data basics:
Relational modeling, SQL
File formats (CSV conventions, encoding, delimiters)
Autonomous Database basics:
Schemas, tables, constraints
Loading patterns and data types

What to learn after Data Integrator

Advanced orchestration:
Multi-step pipelines, backfills, dependency management
Data quality and governance:
Data validation frameworks, data catalogs, lineage concepts
Security and compliance:
Vault/KMS, private endpoints, audit and logging pipelines
Real-time data movement:
OCI GoldenGate for CDC use cases
Analytics:
Oracle Analytics Cloud, semantic modeling, performance tuning

Job roles that use it

Data Engineer (OCI)
Analytics Engineer
Cloud Data Platform Engineer
Integration Engineer (data-focused)
Platform Engineer supporting data pipelines
Data Operations / Data Reliability Engineer

Certification path (if available)

Oracle certifications change frequently. If you want a certification path: – Start with OCI foundations certifications (if applicable) – Look for OCI data platform certifications covering Autonomous Database and analytics services
Verify current Oracle certification tracks here: – https://education.oracle.com/

Project ideas for practice

Build a landing-to-curated pipeline with:
raw landing files in Object Storage
staging and curated schemas in Autonomous Database
Implement backfill logic:
load all files for a date range
validate counts and enforce idempotency
Add data quality checks:
reject invalid emails to a quarantine table
generate a load summary table per run
Build a cost dashboard:
tag resources
track job runtimes and estimate monthly consumption

22. Glossary

OCI (Oracle Cloud Infrastructure): Oracle’s public cloud platform offering compute, storage, networking, and managed services.
Integration (category): Services and patterns used to connect systems, move data, and orchestrate workflows.
Data Integrator: In this tutorial, the OCI-managed Data Integration service used to design and run data ingestion and transformation pipelines.
Workspace: An isolated environment within Data Integrator where you create and run integration artifacts.
Compartment: OCI governance boundary used to organize resources and apply IAM policies.
Data Asset: A logical definition of a source/target system (e.g., Object Storage, database).
Connection: The connectivity and credential configuration used to access a data asset.
Data Flow: A mapping/transformation workflow that defines how data moves from source to target.
Pipeline: An orchestration artifact chaining multiple steps/tasks.
Task: An executable unit that runs a data flow or pipeline.
Autonomous Database (ATP/ADW): Oracle-managed database service with automated operations and built-in security features.
Object Storage: OCI service for storing unstructured data (files/objects) in buckets.
IAM Policy: OCI access control rules defining who can do what in which compartment.
Service Gateway: OCI networking feature enabling private access to Oracle services like Object Storage from a VCN.
Private Endpoint: Private network access to a managed service without using a public IP (availability depends on service/config).
ETL/ELT: Extract-Transform-Load / Extract-Load-Transform data integration patterns.
CDC (Change Data Capture): Capturing and replicating data changes (often near-real-time), commonly done with tools like GoldenGate.

23. Summary

Data Integrator in Oracle Cloud (commonly documented as OCI Data Integration) is a managed Integration service for designing and running data ingestion and transformation pipelines—especially strong for patterns like Object Storage → Autonomous Database.

It matters because it reduces the operational burden of self-managed ETL tooling, provides scheduling and run history for production operations, and fits naturally into OCI governance via compartments and IAM policies. Cost is primarily driven by runtime consumption plus dependent services (Object Storage, databases, and any data transfer). Security hinges on least-privilege IAM, careful handling of connection credentials, and private networking where appropriate.

Use Data Integrator when you want managed, repeatable batch ingestion and orchestration in OCI. For real-time CDC replication, consider OCI GoldenGate; for application-to-application workflows, consider Oracle Integration.

Next step: read the official OCI Data Integration documentation and then expand this lab into a production-ready pattern with staging/curated schemas, idempotent loads, and monitored schedules: – https://docs.oracle.com/en-us/iaas/data-integration/

rajeshkumar

Category

1. Introduction

What this service is

One-paragraph simple explanation

One-paragraph technical explanation

What problem it solves

2. What is Data Integrator?

Official purpose

Core capabilities

Major components (conceptual)

Service type

Scope (regional/global, tenancy/compartment)

How it fits into the Oracle Cloud ecosystem

3. Why use Data Integrator?

Business reasons

Technical reasons

Operational reasons

Security/compliance reasons

Scalability/performance reasons

When teams should choose it

When they should not choose it

4. Where is Data Integrator used?

Industries

Team types

Workloads

Architectures

Real-world deployment contexts

5. Top Use Cases and Scenarios

1) Object Storage CSV to Autonomous Data Warehouse (daily load)

2) Multi-file ingestion with schema drift handling (lightweight)

3) Operational DB to reporting DB refresh (nightly batch)

4) Standardized ingestion framework for multiple departments

5) Data quality checkpoints during load (basic validations)

6) Orchestrated pipeline: ingest → transform → publish

7) Cross-compartment shared data platform (governed)

8) Migration from self-managed ETL to managed OCI

9) Pre-load transformations to standardize reference data

10) Controlled reprocessing/backfills

6. Core Features

1) Workspaces (environment boundary)

2) Projects and artifact organization

3) Data assets (source/target definitions)

4) Connections (credentials and connectivity)

5) Data flows (mapping and transformations)

6) Pipelines (orchestration)

7) Tasks and scheduling

8) Monitoring and run history

9) IAM integration (control plane authorization)

10) APIs/Automation (where available)

7. Architecture and How It Works

High-level architecture

Request/data/control flow

Integrations with related Oracle Cloud services

Dependency services

Security/authentication model (practical view)

Networking model (practical view)

Monitoring/logging/governance considerations

Simple architecture diagram (Mermaid)

Production-style architecture diagram (Mermaid)

8. Prerequisites

Tenancy and compartment requirements

Permissions / IAM roles

Billing requirements

CLI/SDK/tools needed (optional but useful)

Region availability

Quotas/limits

Prerequisite services

9. Pricing / Cost

Current pricing model (how to think about it)

Pricing dimensions to check

Free tier (if applicable)

Cost drivers (direct)

Hidden or indirect costs

Network/data transfer implications

How to optimize cost

Example low-cost starter estimate (no fabricated numbers)

Example production cost considerations

Official pricing references

10. Step-by-Step Hands-On Tutorial

Objective

Step 5: Build a data flow to load `customers.csv` into the `CUSTOMERS` table

3) Date parsing errors for `created_at`

4) Duplicate key error on `customer_id`