Oracle Cloud Data Integrator Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Integration

Category

Integration

1. Introduction

What this service is

Data Integrator in Oracle Cloud is a managed, cloud-native service used to design, run, and operationalize data ingestion and transformation workflows—typically moving data between Oracle and non-Oracle sources, files in Object Storage, and target analytics stores such as Autonomous Database.

One-paragraph simple explanation

If you need to load data from one place to another on a schedule (for example, CSV files in Object Storage into an Autonomous Data Warehouse table), Data Integrator provides a visual, managed way to build that pipeline, run it reliably, and monitor it—without standing up and maintaining your own ETL servers.

One-paragraph technical explanation

Technically, Data Integrator is an OCI-managed data integration runtime with a design-time studio (projects, data assets, connections, data flows/pipelines, tasks, schedules) and an execution engine that runs jobs in Oracle Cloud. It integrates with OCI Identity and Access Management (IAM) for control plane authorization and can connect to data sources/targets via OCI networking (public endpoints and/or private connectivity depending on your configuration). It emits operational telemetry through OCI logging/monitoring capabilities (availability varies by feature; verify in official docs).

What problem it solves

Data Integrator solves the common problems of: – Building repeatable ETL/ELT pipelines without custom scripts per dataset – Reducing operational overhead (patching/maintaining ETL servers) – Standardizing ingestion and transformation across teams – Scheduling and monitoring data movement jobs in a governed way – Integrating with Oracle Cloud data platforms (Object Storage, Autonomous Database, and other OCI services)

Naming note (important): In current Oracle Cloud documentation and Console navigation, the managed service is commonly labeled “Data Integration” (OCI Data Integration). The term “Oracle Data Integrator (ODI)” is also a separate, long-standing product (often on-premises or self-managed). This tutorial uses Data Integrator as the primary name (as requested) and maps it to the OCI-managed Data Integration service. Verify the exact branding in your region/console because Oracle product names can evolve.


2. What is Data Integrator?

Official purpose

Data Integrator’s purpose in Oracle Cloud is to provide a managed data integration service to ingest, transform, and load data across common enterprise sources and targets, with orchestration, scheduling, and monitoring.

Core capabilities

Typical capabilities include: – Design-time development of integration logic using a web UI (projects, data flows/pipelines) – Connectivity to common sources/targets (Object Storage, Oracle databases, and other supported systems) – Data preparation/transformation (mappings, joins, filters, derived columns—exact transforms depend on connector/runtime; verify in official docs) – Orchestration (pipelines/tasks, dependencies, schedules) – Operational management (job runs, status, logs/diagnostics)

Major components (conceptual)

While exact names in the UI can vary, the service typically revolves around: – Workspace: Top-level environment in a region/compartment where you build and run integrations – Projects/Folders: Organize integration artifacts – Data Assets: Definitions of external systems (e.g., Object Storage, Autonomous Database) – Connections: Credentials and connectivity configuration for a data asset – Data Flows / Pipelines: The actual ingestion and transformation logic – Tasks / Schedules: Operationalization—run now, run on a schedule, manage dependencies – Application/Runtime: The managed compute/runtimes that execute jobs (capacity/scaling and billing are part of pricing model)

Service type

  • Type: Managed cloud service (PaaS-style), focused on data integration workloads
  • Operational model: You design in the Oracle Cloud Console (or APIs where available), then the service runs jobs on managed infrastructure.

Scope (regional/global, tenancy/compartment)

  • Tenancy: Resources exist within an OCI tenancy
  • Region: Workspaces are typically regional (you create a workspace in a chosen OCI region)
  • Compartment: Resources are usually created in an OCI compartment for governance and access control
  • Project-scoped artifacts: Projects and integration artifacts live inside the workspace

(Confirm exact resource scoping and supported regions in official docs for your tenancy.)

How it fits into the Oracle Cloud ecosystem

Data Integrator is commonly used alongside: – Oracle Cloud Infrastructure (OCI) Object Storage for landing files/data – Autonomous Database (ATP/ADW) for analytics and warehousing – Oracle Cloud Networking (VCN, private endpoints, service gateways) for secure connectivity – OCI IAM for access control – OCI Logging/Monitoring/Audit for operational governance – Optional ecosystem services such as Data Catalog, GoldenGate, Oracle Analytics Cloud, and Oracle Integration depending on your architecture


3. Why use Data Integrator?

Business reasons

  • Faster time-to-value: Teams can build ingestion pipelines quickly using a managed service.
  • Lower operational burden: No ETL servers to patch/scale manually.
  • Consistency and governance: Standardized patterns for ingestion, transformations, and scheduling.

Technical reasons

  • Managed runtime: Execution is handled by Oracle Cloud; you focus on logic.
  • Native alignment with Oracle data platforms: Particularly strong fit when your targets are Autonomous Database or other Oracle-managed data services.
  • Repeatable workflows: Versioned artifacts, reusable connections, and orchestrated pipelines.

Operational reasons

  • Scheduling: Built-in scheduling and dependency handling (verify exact scheduling options and granularity).
  • Observability: Job run history and diagnostics are available in the service; integration with OCI observability features may apply (verify).
  • Separation of concerns: Workspace/project organization supports multi-team environments.

Security/compliance reasons

  • OCI IAM control plane: Fine-grained policies at compartment level.
  • Network controls: Can be designed for private connectivity patterns within OCI (where supported).
  • Auditability: OCI Audit can capture API actions for governance.

Scalability/performance reasons

  • Elastic managed execution: Suitable for variable workloads and bursty ingestion patterns (exact scaling model depends on service; verify in docs).
  • Parallelization features: May exist for file loads or data movement depending on connector and task configuration.

When teams should choose it

Choose Data Integrator when: – You’re on Oracle Cloud and need a managed service for data ingestion/orchestration. – Your targets include Autonomous Database or you frequently use Object Storage as a landing zone. – You need repeatable scheduled pipelines with centralized monitoring and access control. – You want to avoid operating an ETL cluster (Airflow/Spark) for moderate complexity pipelines.

When they should not choose it

Consider alternatives when: – You require complex distributed processing (multi-terabyte transformations requiring Spark clusters) and Data Integrator’s runtime model doesn’t match your needs. – You need real-time CDC replication at high volume—often better served by OCI GoldenGate. – Your organization already standardized on another integration platform (e.g., Azure Data Factory, AWS Glue) and multi-cloud friction outweighs benefits. – You need full code-first workflows with deep CI/CD integration and you cannot meet that with Data Integrator’s current APIs (verify API coverage).


4. Where is Data Integrator used?

Industries

Commonly used in: – Finance and insurance (risk reporting, regulatory extracts) – Retail and e-commerce (sales, inventory, customer analytics) – Healthcare (operational analytics, claims, patient systems—subject to compliance) – Telecom (billing analytics, customer churn pipelines) – Manufacturing (IoT data landing to analytics stores) – Public sector (data consolidation, dashboards, reporting)

Team types

  • Data engineering teams
  • Analytics engineering teams
  • Cloud platform teams supporting data platforms
  • Integration teams consolidating enterprise data
  • App teams that need lightweight ingestion into a warehouse

Workloads

  • Batch ingestion from files (CSV/JSON/Parquet depending on support)
  • Batch ELT/ETL into Oracle analytics targets
  • Scheduled refresh pipelines for BI tools
  • Landing-zone to curated-zone transformations

Architectures

  • Object Storage “data lake landing” → Autonomous Data Warehouse
  • Multi-source ingestion → standardized warehouse model (star/snowflake)
  • Staging schema → curated schema
  • “Extract from operational DB nightly” → reporting DB

Real-world deployment contexts

  • Production: Managed schedules, least-privilege IAM, private networking, tagging, runbooks, alerting
  • Dev/Test: Separate workspaces or separate compartments; smaller schedules; sample datasets

5. Top Use Cases and Scenarios

Below are 10 realistic use cases for Data Integrator in Oracle Cloud.

1) Object Storage CSV to Autonomous Data Warehouse (daily load)

  • Problem: Finance receives daily CSV extracts and needs them loaded into ADW.
  • Why Data Integrator fits: Managed file ingestion, mapping, scheduling, and monitoring.
  • Example: A daily transactions_YYYYMMDD.csv lands in an OCI bucket; Data Integrator loads it to DW.TRANSACTIONS_STAGE then merges into DW.TRANSACTIONS.

2) Multi-file ingestion with schema drift handling (lightweight)

  • Problem: Vendors add columns occasionally; ingestion breaks frequently.
  • Why it fits: Data flow mappings can be updated centrally; some connectors support flexible mappings (verify schema drift capabilities).
  • Example: Vendor adds region_code; you update mapping once and redeploy.

3) Operational DB to reporting DB refresh (nightly batch)

  • Problem: Operational Oracle DB is too busy for BI queries.
  • Why it fits: Scheduled extraction and load into reporting schema.
  • Example: Nightly job extracts orders/customers and loads them into ADW reporting tables.

4) Standardized ingestion framework for multiple departments

  • Problem: Each team writes scripts; no standard monitoring/governance.
  • Why it fits: Central workspace patterns, shared connections, consistent scheduling.
  • Example: Shared “landing-to-staging” templates; each department onboards new datasets quickly.

5) Data quality checkpoints during load (basic validations)

  • Problem: Bad rows cause downstream reporting issues.
  • Why it fits: Transform steps can filter/reject invalid records (capability depends on transformations available; verify).
  • Example: Filter rows where amount < 0, output rejects to a quarantine table.

6) Orchestrated pipeline: ingest → transform → publish

  • Problem: You need multi-step jobs with dependencies.
  • Why it fits: Pipelines/tasks can enforce ordering and handle failures.
  • Example: Step 1 load staging; step 2 run transform; step 3 refresh aggregate table.

7) Cross-compartment shared data platform (governed)

  • Problem: Platform team owns data services; app teams need controlled access.
  • Why it fits: Compartment-based IAM and policies.
  • Example: Platform compartment hosts Data Integrator; app compartments grant least-privilege access to run specific tasks.

8) Migration from self-managed ETL to managed OCI

  • Problem: Legacy ETL servers are costly and hard to patch.
  • Why it fits: Replace routine batch ETL jobs with managed service.
  • Example: Replace cron + scripts that pull files from SFTP (after landing to OCI) with Data Integrator schedules.

9) Pre-load transformations to standardize reference data

  • Problem: Multiple systems use different code sets.
  • Why it fits: Transform stage can map codes to standardized dimension tables.
  • Example: Map status values (A/ACTIVE/1) into canonical DIM_STATUS.

10) Controlled reprocessing/backfills

  • Problem: Need to re-run loads for a historical date range.
  • Why it fits: Parameterized runs (if supported) and repeatable pipelines.
  • Example: Backfill last 30 days of files after a bug fix, without manual SQL scripting.

6. Core Features

Feature availability can vary by region and by connector type. Always confirm with the official Data Integration documentation for your tenancy.

1) Workspaces (environment boundary)

  • What it does: Provides an isolated environment to manage projects, connections, jobs, and run history.
  • Why it matters: Supports dev/test/prod separation and team organization.
  • Practical benefit: Clear ownership and governance at the workspace level.
  • Caveats: Workspaces are typically regional; cross-region designs require explicit planning.

2) Projects and artifact organization

  • What it does: Organizes data flows/pipelines, connections, and tasks into logical groups.
  • Why it matters: Maintainability for larger estates.
  • Practical benefit: Reusable patterns and consistent naming/tagging.
  • Caveats: Establish conventions early; refactoring later is painful.

3) Data assets (source/target definitions)

  • What it does: Represents a system like Object Storage or a database service.
  • Why it matters: Centralizes system configuration and governance.
  • Practical benefit: Multiple pipelines can reuse the same data asset.
  • Caveats: Connectivity requirements (network, credentials) must be correct for reliable runs.

4) Connections (credentials and connectivity)

  • What it does: Stores connection details used by jobs (endpoints, usernames, passwords/keys).
  • Why it matters: Security and operational consistency.
  • Practical benefit: Update credentials once without rewriting pipelines.
  • Caveats: Secret handling options vary—prefer OCI Vault integration if supported; otherwise tightly control who can view/edit connections.

5) Data flows (mapping and transformations)

  • What it does: Defines how data is read, transformed, and written.
  • Why it matters: This is where the “ETL/ELT logic” lives.
  • Practical benefit: Visual mapping reduces custom code for common transformations.
  • Caveats: Very complex transformations might be better in SQL on the target (ELT) or in a dedicated compute engine; decide based on performance and governance.

6) Pipelines (orchestration)

  • What it does: Chains steps together (ingest, transform, publish), handling dependencies and flow control.
  • Why it matters: Production pipelines usually require multiple steps.
  • Practical benefit: Fewer external schedulers; clearer run lineage.
  • Caveats: Understand failure behavior and retry semantics; verify how retries and partial failures are handled.

7) Tasks and scheduling

  • What it does: Runs a data flow/pipeline on demand or on a schedule.
  • Why it matters: Operationalization is what turns a design into a service.
  • Practical benefit: Predictable refresh cadence for analytics.
  • Caveats: Scheduling granularity, time zone handling, and concurrency limits should be validated in docs.

8) Monitoring and run history

  • What it does: Shows status, run duration, and error details for tasks.
  • Why it matters: Troubleshooting and SLA management.
  • Practical benefit: Faster incident response with centralized run diagnostics.
  • Caveats: For enterprise observability, confirm integration with OCI Logging/Monitoring and export patterns (if required).

9) IAM integration (control plane authorization)

  • What it does: Uses OCI IAM groups/policies to authorize workspace and artifact management.
  • Why it matters: Least privilege and auditability.
  • Practical benefit: Platform teams can delegate safely.
  • Caveats: The exact policy verbs/resource-types must match Data Integrator’s IAM model—use official policy examples.

10) APIs/Automation (where available)

  • What it does: Enables automation via OCI APIs/SDK/CLI (coverage varies).
  • Why it matters: CI/CD and platform operations.
  • Practical benefit: Repeatable provisioning, promotion between environments.
  • Caveats: Verify current API support for the artifacts you need (workspace, tasks, runs, etc.).

7. Architecture and How It Works

High-level architecture

At a high level, Data Integrator has: 1. A control plane: where you define artifacts (workspaces, connections, flows, tasks). 2. A runtime plane: managed execution environment that reads from sources and writes to targets. 3. Integration points: IAM, networking, Object Storage, databases, logging/monitoring.

Request/data/control flow

  1. User (or automation) creates/updates artifacts in the Data Integrator workspace.
  2. A task is started (manual trigger or schedule).
  3. Runtime retrieves connection details and accesses sources/targets.
  4. Data is extracted, transformed, and loaded.
  5. Runtime emits status and logs; the job is visible in run history.

Integrations with related Oracle Cloud services

Common integrations include: – OCI Object Storage: landing zone for files and staging data – Autonomous Database: common analytics target – OCI IAM: access control to manage and run integration assets – OCI Vault (optional): secrets storage (verify connector support) – OCI Logging/Monitoring (optional): operational visibility (verify exact integration points) – VCN / private networking (optional): private endpoints for databases and private access patterns

Dependency services

Your pipeline usually depends on: – Object Storage buckets, objects, and policies – Target databases (Autonomous Database or DB systems) – Network path between Data Integrator runtime and the endpoints (public or private) – IAM policies for all involved services

Security/authentication model (practical view)

  • Control plane: IAM policies decide who can create/manage/run workspaces and artifacts.
  • Data plane access to sources/targets:
  • Object Storage access can be via OCI IAM + resource principals (service-to-service) in some patterns, or via credentials/config depending on how the connector works (verify).
  • Database access is typically via database credentials and secure connectivity options (TLS; wallet for Autonomous Database patterns).

Networking model (practical view)

Typical patterns: – Public endpoints: simplest for labs; ensure you restrict access. – Private endpoints: preferred for production; requires VCN planning, DNS, and routing. – Service gateway: can keep Object Storage access private within OCI. – NAT gateway: for outbound access if needed (avoid if you can).

Monitoring/logging/governance considerations

  • Use OCI Audit to track who created/modified artifacts.
  • Use task run history for operational checks.
  • Consider exporting logs/metrics into centralized tooling if your org requires it (verify native integration points).
  • Use tagging to separate cost centers, environments, owners.

Simple architecture diagram (Mermaid)

flowchart LR
  U[Engineer / Data Analyst] -->|Design & Run| DI[Data Integrator Workspace]
  OS[(OCI Object Storage Bucket)] -->|Read CSV/Files| DI
  DI -->|Load Tables| ADB[(Autonomous Database)]
  DI --> RH[Run History / Logs]

Production-style architecture diagram (Mermaid)

flowchart TB
  subgraph Tenancy[OCI Tenancy]
    subgraph Net[VCN (Production)]
      PE1[Private Endpoint / Private Access\n(to Autonomous Database)]
      SG[Service Gateway\n(private access to Object Storage)]
    end

    subgraph DIW[Data Integrator Workspace (Region)]
      CP[Control Plane:\nProjects, Connections, Tasks]
      RT[Managed Runtime:\nJob Execution]
    end

    subgraph Data[Data Platform]
      OS[(Object Storage:\nLanding + Archive)]
      ADB[(Autonomous Database:\nStaging + Curated)]
    end

    IAM[OCI IAM Policies & Groups]
    AUD[OCI Audit]
    MON[OCI Monitoring/Logging\n(verify integration details)]
  end

  IAM --> CP
  CP --> RT
  OS --> RT
  RT --> ADB
  AUD --> CP
  RT --> MON
  SG --- OS
  PE1 --- ADB

8. Prerequisites

Tenancy and compartment requirements

  • An active Oracle Cloud (OCI) tenancy
  • A compartment where you can create:
  • Data Integrator workspace
  • Object Storage bucket
  • Autonomous Database (for this lab)

Permissions / IAM roles

You need permissions to: – Create/manage Data Integrator workspaces and artifacts – Read/write Object Storage objects in a bucket – Create/manage Autonomous Database (or at least connect and create tables)

OCI IAM policies for Data Integrator use service-specific resource types and verbs. Because policy syntax can change and differs by feature, use the official IAM policy examples from Oracle docs for Data Integration.

Start here (official docs entry point; navigate to “Policies” / “IAM” sections): – https://docs.oracle.com/en-us/iaas/data-integration/

Billing requirements

  • A paid OCI account or sufficient free-tier capacity.
  • This lab can be designed to be low-cost if you use:
  • Autonomous Database Always Free (if available in your region/tenancy)
  • Small test files (KB/MB scale)
  • You may still incur charges for storage, data egress, or non-free resources.

CLI/SDK/tools needed (optional but useful)

  • OCI Console access (required)
  • Optional:
  • OCI CLI for uploading files and basic checks: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm
  • A SQL client (SQL Developer, SQLcl, or Autonomous Database SQL Worksheet in Console)

Region availability

  • Data Integrator (OCI Data Integration) is not necessarily available in every region.
  • Verify region availability in Oracle Cloud documentation or in the Console region selector.

Quotas/limits

  • Service limits exist for:
  • Number of workspaces
  • Concurrent runs
  • Artifact counts
  • Runtime capacity/billing dimensions
  • Check OCI service limits for Data Integration in your region/tenancy. (Limits change; do not rely on blog posts.)

Prerequisite services

For this tutorial: – OCI Object Storage – Autonomous Database (ATP/ADW)


9. Pricing / Cost

Do not rely on static numbers in articles. Oracle Cloud pricing varies by region, currency, and sometimes by contract/commitment. Always confirm via official pricing pages.

Current pricing model (how to think about it)

Data Integrator pricing is typically usage-based, where you pay for the data integration runtime consumed to execute data flows/pipelines and operationalize workloads. The exact billing metric may be expressed in: – OCPU-hours or similar compute-time units for the integration runtime – Additional charges for related resources you use (Object Storage, Autonomous Database, networking)

Because Oracle may adjust SKUs and units, verify the exact meter names and units in the official pricing page for “Data Integration”.

Pricing dimensions to check

When estimating cost, confirm these dimensions in official pricing: – Runtime compute consumption per hour (or per run) – Any per-connector, per-feature, or per-capacity pricing (if applicable) – Additional charges for: – Object Storage (GB-month, requests) – Autonomous Database (ECPU/OCPU, storage) unless Always Free – Data transfer (especially internet egress) – Logging retention/export (if applicable)

Free tier (if applicable)

  • Autonomous Database Always Free may cover a small target DB for labs.
  • OCI Object Storage has low cost and sometimes free allocations.
  • Whether Data Integrator itself has a free tier depends on current Oracle offerings—verify in official pricing.

Cost drivers (direct)

  • Total runtime hours of Data Integrator jobs (more frequent schedules, longer runs)
  • Larger data volumes (longer run times, more resource usage)
  • Concurrency (multiple pipelines at once)
  • Complex transformations (increases runtime)

Hidden or indirect costs

  • Autonomous Database compute and storage (if not Always Free)
  • Object Storage storage growth + request costs
  • Data egress if moving data out of OCI
  • Operational tooling costs if you export logs to third-party systems

Network/data transfer implications

  • Data transfer within OCI is usually cheaper than internet egress, but pricing depends on path and services.
  • If your sources/targets are outside OCI (on-prem or other clouds), plan for:
  • VPN/FastConnect costs
  • Egress/ingress charges
  • Latency and throughput constraints

How to optimize cost

  • Start with daily or hourly schedules only where required.
  • Minimize unnecessary reprocessing:
  • Load only new partitions/files
  • Use watermarking (if supported) or file naming conventions
  • Prefer ELT (push transformations into the database) for large transforms if it reduces integration runtime (validate performance).
  • Use small dev/test workspaces and smaller sample datasets.
  • Tag resources for chargeback.

Example low-cost starter estimate (no fabricated numbers)

A realistic low-cost starter footprint is: – Object Storage bucket with a few MB of CSV files – Autonomous Database Always Free (if available) – Data Integrator running a small daily load that completes in minutes

To estimate cost precisely: 1. Check the official Data Integration pricing line items. 2. Estimate runs/day × average runtime minutes/run. 3. Add storage and DB cost (if not free).

Example production cost considerations

In production, your main cost drivers typically become: – Multiple pipelines running frequently (hourly or near-real-time batches) – Larger data volumes (GB–TB per day) – Higher concurrency and longer run durations – Non-free Autonomous Database compute for larger warehouses

Official pricing references

  • Oracle Cloud price list (official): https://www.oracle.com/cloud/price-list/
  • Oracle Cloud cost estimator (official): https://www.oracle.com/cloud/costestimator.html

For Data Integrator-specific pricing lines, navigate the price list to the relevant service section (often listed as Data Integration).


10. Step-by-Step Hands-On Tutorial

This lab builds a real (small) pipeline:

Load a CSV file from OCI Object Storage into an Autonomous Database table using Data Integrator, then validate the rows in the database.

Objective

  • Create a minimal data ingestion workflow in Oracle Cloud Data Integrator
  • Source: OCI Object Storage (customers.csv)
  • Target: Autonomous Database (CUSTOMERS table)
  • Run once manually, validate results, then clean up

Lab Overview

You will: 1. Create an Autonomous Database (Always Free if available) and a target table. 2. Create an Object Storage bucket and upload a sample CSV. 3. Create a Data Integrator workspace. 4. Define data assets and connections (Object Storage + Autonomous Database). 5. Build a data flow to map CSV columns to table columns. 6. Create a task and run it. 7. Validate row counts in Autonomous Database. 8. Clean up resources to avoid ongoing cost.


Step 1: Prepare the target Autonomous Database and table

1.1 Create an Autonomous Database (Console)

In OCI Console: 1. Navigate to Oracle DatabaseAutonomous Database. 2. Click Create Autonomous Database. 3. Choose: – Compartment: your lab compartment – Workload type: Autonomous Data Warehouse or Autonomous Transaction Processing (either works for this lab) – Always Free: enable if available 4. Set admin password and create the database.

Expected outcome: You have a running Autonomous Database instance.

1.2 Create a database user and table

Use Database Actions / SQL Worksheet (available from the Autonomous Database details page), or connect via a SQL client.

Run SQL (adjust username/password as needed):

-- Create a least-privileged schema for the lab
CREATE USER di_lab IDENTIFIED BY "Use-A-Strong-Password-Here";

GRANT CREATE SESSION TO di_lab;
GRANT CREATE TABLE TO di_lab;
GRANT CREATE SEQUENCE TO di_lab;
GRANT CREATE PROCEDURE TO di_lab;

-- Optional for easier lab work (consider restricting in real environments)
-- GRANT UNLIMITED TABLESPACE TO di_lab;

ALTER SESSION SET CURRENT_SCHEMA = di_lab;

CREATE TABLE customers (
  customer_id NUMBER PRIMARY KEY,
  full_name   VARCHAR2(200),
  email       VARCHAR2(320),
  country     VARCHAR2(100),
  created_at  DATE
);

Expected outcome: Table DI_LAB.CUSTOMERS exists.

Verification

SELECT table_name FROM user_tables WHERE table_name = 'CUSTOMERS';

Step 2: Create an Object Storage bucket and upload a CSV file

2.1 Create a bucket (Console)

  1. Go to StorageBucketsCreate Bucket
  2. Choose a name, for example: di-lab-bucket-<unique>
  3. Keep defaults (Standard storage) for the lab.

Expected outcome: Bucket is created.

2.2 Create a sample CSV file

Create a local file named customers.csv with this content:

customer_id,full_name,email,country,created_at
1,Ada Lovelace,ada@example.com,UK,2024-01-15
2,Grace Hopper,grace@example.com,US,2024-02-20
3,Alan Turing,alan@example.com,UK,2024-03-05

2.3 Upload the CSV

In bucket details: – ObjectsUpload

Upload customers.csv at the bucket root (or in a folder like landing/—just remember the path).

Expected outcome: customers.csv is visible in bucket objects list.

Verification

Click the object and confirm: – Name and size look correct – Storage tier is Standard


Step 3: Create a Data Integrator workspace

Console navigation may appear as Data Integration in the OCI Console (service naming varies). The underlying managed service is what this tutorial calls Data Integrator.

  1. Navigate to the Data Integration service: – Search for Data Integration in the OCI Console search bar
  2. Click Create workspace
  3. Provide: – Name: di-lab-workspace – Compartment: your lab compartment

Expected outcome: Workspace status becomes Active.

Verification

Open the workspace and confirm you can access its design environment.


Step 4: Create data assets and connections (Object Storage + Autonomous Database)

You need two endpoints: – Source: Object Storage bucket/object – Target: Autonomous Database schema

4.1 Create an Object Storage data asset + connection

Inside the workspace (exact UI labels vary): 1. Go to Data AssetsCreate 2. Choose Object Storage (or equivalent connector) 3. Enter required fields (typically): – Tenancy/namespace (Object Storage namespace) – Bucket name – Region 4. Create a Connection for it

Expected outcome: Data asset and connection show as “Available/Active”.

Notes – Access method varies. Some OCI services support service-to-service authentication patterns; others require credentials or policies. Follow the connector instructions shown in your workspace UI. – If the connector requires IAM policies, use the official docs for Data Integration IAM and Object Storage policies.

4.2 Create an Autonomous Database data asset + connection

Inside the workspace: 1. Data AssetsCreate 2. Choose Autonomous Database (or Oracle Database connector appropriate for ADB) 3. Provide connection properties, typically: – Database service details (OCID or connection string depending on UI) – Username: di_lab – Password: the password you set – Wallet/TLS settings if required by the connector

Expected outcome: Database connection tests successfully.

Important: Autonomous Database connectivity can require: – Wallet configuration (for some connection methods) – Network allowlist or “allow OCI services” options (naming varies) – Public vs private endpoint choices
Because these specifics vary by region and ADB settings, follow the connection wizard guidance and verify in official docs.

Verification

Use the connection “Test” feature (if available) to confirm both connections are valid.


Step 5: Build a data flow to load customers.csv into the CUSTOMERS table

Inside the workspace: 1. Go to ProjectsCreate Project – Name: di_lab_project 2. Within the project, create a Data Flow (or mapping/data flow artifact) 3. Configure the Source: – Choose Object Storage connection – Select the file customers.csv – Configure format as CSV – Confirm the header row is enabled 4. Configure schema/columns: – customer_id (number) – full_name (string) – email (string) – country (string) – created_at (date)

  1. Configure the Target: – Choose Autonomous Database connection – Schema: DI_LAB – Table: CUSTOMERS
  2. Map fields source → target: – customer_idcustomer_idfull_namefull_nameemailemailcountrycountrycreated_atcreated_at

  3. Choose the write disposition: – For a first run, select Insert (append) or Truncate + load depending on your goal. – For repeatable labs, Truncate + load is simpler if supported.

Expected outcome: Data flow is saved and valid (no validation errors).

Verification

Use a “Validate” action (if available) on the data flow and confirm no missing mappings or type errors are reported.


Step 6: Create and run a task

  1. From the data flow, choose Create Task (or go to Tasks and create one referencing your flow).
  2. Name: load_customers_once
  3. Run the task immediately.

Expected outcome: Task run status becomes Succeeded after a short time. If it fails, use the run logs to troubleshoot.


Validation

Connect to Autonomous Database (SQL Worksheet) as DI_LAB and run:

SELECT COUNT(*) AS row_count FROM customers;

SELECT * FROM customers ORDER BY customer_id;

Expected outcome: – Row count is 3 – The rows match the CSV content – created_at values are parsed as dates (format handling may require adjustment depending on connector settings)

If created_at is null or errors occurred, adjust the CSV date format settings in your source configuration or add a transformation step to parse dates.


Troubleshooting

Common issues and fixes:

1) Object Storage access denied (403 / permission errors)

  • Cause: Missing Object Storage policies, wrong bucket/namespace, or connection auth misconfigured.
  • Fix:
  • Re-check bucket name and namespace
  • Confirm the Data Integrator connector’s required IAM policies (official docs)
  • Confirm the bucket is in the same region (or that cross-region access is supported)

2) Autonomous Database connection fails

  • Cause: Incorrect username/password, wallet/TLS requirement, network access restrictions.
  • Fix:
  • Test DB login directly via SQL Worksheet using the same credentials
  • Confirm whether the connector requires a wallet
  • Check ADB networking settings (public/private endpoint)
  • Verify whether ADB has an option to allow access from OCI services (wording varies)

3) Date parsing errors for created_at

  • Cause: CSV date format mismatch.
  • Fix:
  • Configure the date format in the CSV source settings if available
  • Or map created_at via a transform (e.g., parse YYYY-MM-DD) if supported
  • As a fallback, load into a VARCHAR2 staging column then transform in SQL

4) Duplicate key error on customer_id

  • Cause: Re-running an “Insert” load without truncation.
  • Fix:
  • Use “Truncate + load” or
  • Delete existing rows before load or
  • Implement upsert/merge pattern (often done as a pipeline step using SQL on the target)

5) Column mapping/type mismatch

  • Cause: Connector inferred types incorrectly.
  • Fix:
  • Explicitly define schema in source settings
  • Cast/convert in a transform step
  • Ensure target columns have compatible types/lengths

Cleanup

To avoid ongoing cost and clutter, delete lab resources you don’t need:

  1. Data Integrator: – Delete the task(s), data flow(s), project, and workspace (if not used elsewhere).
  2. Object Storage: – Delete the object customers.csv – Delete the bucket (must be empty)
  3. Autonomous Database: – If it was created only for this lab, terminate it (Always Free resources can still be terminated safely). – Or keep it if you plan more labs; remove the DI_LAB schema and objects:
-- As ADMIN:
DROP USER di_lab CASCADE;

11. Best Practices

Architecture best practices

  • Use a landing → staging → curated model:
  • Landing: raw files in Object Storage (immutable)
  • Staging: load raw tables in database
  • Curated: transformed, business-ready tables
  • Prefer idempotent pipelines:
  • Re-running a job should not corrupt data
  • Use partitioning, truncation, or merge patterns
  • Keep transformations close to where they run best:
  • Heavy relational transforms often run efficiently in the database (ELT)
  • Simple standardization can be handled in data flows

IAM/security best practices

  • Follow least privilege:
  • Separate “builders” (design) from “operators” (run/monitor).
  • Use separate compartments for dev/test/prod.
  • Restrict who can view/edit connections (credentials exposure risk).
  • Use OCI Vault for secrets if supported by the connector; otherwise tightly control access and rotation processes.

Cost best practices

  • Schedule only as often as needed.
  • Avoid full reloads when incremental loads are possible.
  • Archive old landing files to cheaper storage tiers if appropriate.
  • Monitor runtime duration—optimize the slow steps first.

Performance best practices

  • For file ingestion:
  • Use appropriately sized files (not too many tiny files; not a single huge file) based on connector guidance.
  • For database loads:
  • Load into staging tables then transform with set-based SQL
  • Use indexing carefully; avoid heavy indexes on staging tables during load
  • Test concurrency limits and tune scheduling windows.

Reliability best practices

  • Implement retry strategy:
  • Retries for transient network errors
  • No retries for deterministic schema errors (fix and redeploy)
  • Build alerting around failures:
  • Use OCI events/notifications patterns if supported (verify) or external monitoring integration.
  • Keep raw landing data immutable for replay.

Operations best practices

  • Define runbooks:
  • Where to check job failures
  • How to re-run safely
  • How to backfill data
  • Tag everything: env, owner, cost-center, data-domain.
  • Maintain version control for transformation logic:
  • If Data Integrator supports export/import of artifacts, incorporate it into CI/CD (verify current capabilities).

Governance/tagging/naming best practices

  • Naming pattern example:
  • Workspaces: di-<env>-<region>-<team>
  • Projects: <domain>-pipelines
  • Tasks: <source>-to-<target>-<frequency>
  • Tag with:
  • Environment=Dev|Test|Prod
  • DataDomain=Finance|Sales|Ops
  • OwnerEmail=...

12. Security Considerations

Identity and access model

  • OCI IAM controls who can:
  • Create/manage workspaces
  • Create/edit connections and data flows
  • Run tasks and view run history
  • Separate permissions for:
  • Platform admins
  • Data engineers
  • Operators/analysts (read-only monitoring)

Because exact policy statements are service-specific, use the official Data Integration IAM policy documentation: – https://docs.oracle.com/en-us/iaas/data-integration/

Encryption

  • At rest:
  • Object Storage encrypts data at rest (Oracle-managed keys by default; customer-managed keys available with OCI Vault in many cases).
  • Autonomous Database encrypts data at rest.
  • In transit:
  • Use TLS connections to databases and HTTPS for Object Storage endpoints.

Network exposure

  • Prefer private connectivity where possible:
  • Private endpoints for Autonomous Database
  • Service Gateway for Object Storage access (keeps traffic off the public internet)
  • For labs, public endpoints are acceptable but restrict:
  • DB network allowlists
  • Bucket access policies

Secrets handling

  • Avoid embedding passwords in scripts.
  • Rotate DB credentials regularly.
  • Use Vault-backed secrets if Data Integrator supports it; otherwise restrict connection edit permissions and audit changes.

Audit/logging

  • OCI Audit can capture control plane actions (who changed what).
  • Use Data Integrator run logs/history for operational traces.
  • If your compliance program requires centralized logging, verify supported export/integration methods.

Compliance considerations

  • Data residency: choose region carefully; workspaces are regional.
  • PII/PHI handling:
  • Mask or tokenize data where required
  • Restrict access to landing and curated zones
  • Maintain data retention and deletion policies

Common security mistakes

  • Granting broad IAM permissions to too many users
  • Allowing public DB access from anywhere
  • Storing sensitive landing files without lifecycle/retention controls
  • Letting many users view/edit connections containing passwords

Secure deployment recommendations

  • Use separate prod workspace and compartment with tight IAM.
  • Use private connectivity for production targets.
  • Implement tagging and budget alerts for spend governance.
  • Establish a credential rotation and incident response process.

13. Limitations and Gotchas

Because service behavior and limits can change, treat this section as a checklist and confirm details in official docs.

Known limitations (typical categories)

  • Connector limitations: Not all sources/targets support the same transformations or pushdown optimizations.
  • File format nuances: CSV parsing rules (quotes, delimiters, date formats) often cause early failures.
  • Concurrency/service limits: Maximum concurrent runs per workspace may apply.
  • Cross-region complexity: Workspaces are regional; cross-region access can add latency and egress costs.
  • Private networking setup: Private endpoints require careful VCN/DNS/routing planning.

Quotas

  • Workspaces per region/compartment
  • Concurrent task runs
  • Maximum artifact counts
    Check OCI service limits for Data Integration in your region.

Regional constraints

  • Data Integrator may not be available in all OCI regions.
  • Some connectors/features may be region-limited.

Pricing surprises

  • Frequent schedules that run longer than expected drive runtime cost.
  • Reprocessing large datasets repeatedly increases runtime.
  • Egress costs if data leaves OCI.

Compatibility issues

  • Autonomous Database connectivity requirements vary by configuration.
  • Object Storage access policies must be correct for the connector’s auth method.
  • Date/time parsing and character encoding can differ between source files and target DB.

Operational gotchas

  • Re-running “Insert” tasks can cause duplicate keys.
  • Schema changes in CSV headers can break mappings.
  • “Success” status may still include rejected rows depending on load mode—validate row counts and error tables if present.

Migration challenges

  • If migrating from ODI or custom ETL:
  • Some transformation logic may need redesign.
  • Operational semantics (scheduling, retries, error handling) will differ.

Vendor-specific nuances

  • Oracle Cloud’s separation of compartments, regions, and policies is powerful but requires governance discipline.
  • Always confirm how the runtime authenticates to Object Storage and databases for your chosen connector.

14. Comparison with Alternatives

Data Integrator is one option among managed integration and ETL tools. Below is a practical comparison.

Option Best For Strengths Weaknesses When to Choose
Oracle Cloud Data Integrator (Data Integration service) OCI-native batch ingestion/orchestration Managed service; strong fit with Object Storage + Autonomous Database; IAM/compartment governance Connector/feature coverage varies; may not suit heavy distributed compute; verify API/CI-CD depth You run analytics on OCI and want managed pipelines
Oracle GoldenGate (OCI) Real-time CDC replication Low-latency replication; operational DB change capture Not a general ETL tool; can be more complex/costly You need near-real-time replication/CDC
Oracle Integration (OIC) Application integration, SaaS integration Strong SaaS adapters and app workflows Not primarily for large-scale data ingestion/ETL You integrate business apps and events more than bulk data
Oracle Data Integrator (ODI) self-managed Enterprises needing full ODI features/control Mature ETL tooling; deep enterprise patterns You operate infrastructure; patching/upgrades; licensing complexity You already standardized on ODI and need advanced features
AWS Glue ETL in AWS Serverless Spark; strong AWS integrations Different cloud; migration overhead; cost model differs Your data platform is in AWS
Azure Data Factory ETL/orchestration in Azure Broad connectors; enterprise orchestration Different cloud; pricing/ops differences Your data platform is in Azure
Google Cloud Data Fusion / Dataflow ETL + pipelines in GCP Strong pipeline processing Different cloud; learning curve Your data platform is in GCP
Apache Airflow (self-managed/managed) Orchestration-first workflows Code-first; flexible; huge ecosystem Requires ops; ETL still needs tools (Spark/dbt) You want orchestration framework and already run data tools
dbt (core/cloud) SQL-based transformations in warehouse Great for ELT; version control friendly Not an ingestion tool; needs upstream loader Your data is already in the warehouse and transforms are SQL-first

15. Real-World Example

Enterprise example: governed ingestion into an OCI analytics platform

  • Problem: A large enterprise has multiple upstream systems delivering daily extracts and needs a governed, repeatable ingestion mechanism into ADW for enterprise reporting.
  • Proposed architecture:
  • Upstream exports land in OCI Object Storage (per domain buckets/prefixes).
  • Data Integrator runs domain pipelines:
    • Load landing files into staging schema
    • Apply transformations and publish curated tables
  • Autonomous Data Warehouse stores curated data marts.
  • IAM policies restrict each domain team to their project artifacts.
  • Tagging and budgets provide cost governance.
  • Why Data Integrator was chosen:
  • OCI-native managed execution
  • Strong fit with Object Storage + ADW
  • Built-in scheduling and run history for operations
  • Expected outcomes:
  • Faster onboarding of new datasets
  • Reduced ETL server maintenance
  • Improved auditability and consistent run operations

Startup/small-team example: simple analytics ingestion without managing ETL servers

  • Problem: A startup wants daily analytics from exported app data but doesn’t want to run Airflow/Spark.
  • Proposed architecture:
  • App exports a CSV daily to Object Storage
  • Data Integrator loads it into Autonomous Database (Always Free for early stage where possible)
  • BI connects to Autonomous Database for dashboards
  • Why Data Integrator was chosen:
  • Quick to implement with minimal ops
  • Low overhead for scheduling and monitoring
  • Expected outcomes:
  • Reliable daily refresh
  • Simple operational model
  • Easy path to scale by upgrading DB and increasing pipeline complexity later

16. FAQ

1) Is Data Integrator the same as Oracle Data Integrator (ODI)?
No. ODI is a separate product (often self-managed and historically on-prem). In Oracle Cloud, the managed service is commonly documented as OCI Data Integration. This tutorial uses “Data Integrator” to refer to that managed OCI service.

2) Is Data Integrator an ETL or ELT tool?
It can support ETL-style transformations in flows and also ELT-style patterns where transformations run in the target database. The best approach depends on workload and connector behavior.

3) Where do Data Integrator jobs run?
They run on Oracle-managed runtime infrastructure associated with the service. You don’t manage servers directly.

4) Can I use Data Integrator with Autonomous Database?
Yes—this is a common pattern. Connectivity details (wallet, public/private endpoints) depend on configuration; follow the connector wizard and docs.

5) Can it load data from Object Storage?
Yes—Object Storage is a common landing zone. Ensure IAM/policies and bucket access are correctly configured.

6) Does it support incremental loads?
Incremental patterns are typically implemented using watermark columns, partitions, file naming conventions, or merge steps. Exact built-in support depends on connectors and features—verify in docs.

7) How do I schedule pipelines?
You create tasks and attach schedules in the workspace. Scheduling frequency/granularity depends on service capabilities.

8) How do I monitor failures?
Use task run history and logs in the Data Integrator workspace. For enterprise alerting, verify integrations with OCI Monitoring/Notifications or Events.

9) Can I keep traffic private (no public internet)?
Often yes with private endpoints/service gateways and proper VCN design, but exact support depends on connectors and your database configuration. Verify in OCI docs.

10) How is access controlled?
Through OCI IAM policies at tenancy/compartment scope. You can separate design permissions from run/monitor permissions.

11) Does Data Integrator store my database passwords?
Connections commonly store credentials. Prefer OCI Vault integration if supported; otherwise tightly control access to connections and rotate credentials.

12) Can I promote artifacts from dev to prod?
Many teams use export/import or API-based automation where available. Confirm current supported promotion mechanisms in the docs for your region.

13) What’s the best practice for schema changes in incoming files?
Use a staging layer and implement controlled schema evolution: – land raw files immutably – load to staging – update mappings deliberately and deploy

14) Is Data Integrator suitable for real-time streaming?
Typically it’s used for batch-oriented integration. For real-time CDC/replication, OCI GoldenGate is often a better fit.

15) How do I estimate cost accurately?
Measure average runtime per job, multiply by schedule frequency, then apply the official Data Integration pricing meter plus dependent services (Object Storage, DB, data transfer). Use Oracle’s cost estimator.


17. Top Online Resources to Learn Data Integrator

Resource Type Name Why It Is Useful
Official documentation OCI Data Integration docs: https://docs.oracle.com/en-us/iaas/data-integration/ Primary source for features, connectors, IAM policies, and how-to guides
Official pricing Oracle Cloud Price List: https://www.oracle.com/cloud/price-list/ Official, up-to-date pricing SKUs and units (region/contract dependent)
Official calculator Oracle Cloud Cost Estimator: https://www.oracle.com/cloud/costestimator.html Helps estimate total cost across services (DB, storage, data integration runtime)
Official OCI docs (IAM) OCI IAM overview: https://docs.oracle.com/en-us/iaas/Content/Identity/home.htm Required for secure policy design and least-privilege access
Official Object Storage docs Object Storage overview: https://docs.oracle.com/en-us/iaas/Content/Object/Concepts/objectstorageoverview.htm Bucket policies, namespaces, lifecycle rules, and access models
Official Autonomous Database docs Autonomous Database docs: https://docs.oracle.com/en-us/iaas/autonomous-database/ Connectivity, wallets, network access, users/schemas for lab and production
Architecture center Oracle Cloud Architecture Center: https://www.oracle.com/cloud/architecture/ Reference architectures and best practices for OCI data platforms
Official tutorials Oracle Cloud Tutorials landing: https://docs.oracle.com/en/learn/ Hands-on labs across OCI; search within for data integration patterns
Videos/webinars Oracle Cloud Infrastructure YouTube: https://www.youtube.com/@OracleCloudInfrastructure Product walkthroughs and architecture sessions (search for Data Integration)
SDK/CLI OCI CLI installation: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm Useful for repeatable uploads, automation, and operational scripts

18. Training and Certification Providers

Institute Suitable Audience Likely Learning Focus Mode Website URL
DevOpsSchool.com Cloud/DevOps engineers, platform teams OCI fundamentals, DevOps practices, integration and automation foundations Check website https://www.devopsschool.com/
ScmGalaxy.com Beginners to intermediate engineers SCM/DevOps toolchains that often support integration delivery Check website https://www.scmgalaxy.com/
CLoudOpsNow.in Cloud operations teams Cloud operations practices, monitoring, governance, runbooks Check website https://www.cloudopsnow.in/
SreSchool.com SREs, operations engineers Reliability engineering, incident response, operational maturity Check website https://www.sreschool.com/
AiOpsSchool.com Ops + data/AI practitioners AIOps concepts, operational analytics, monitoring automation Check website https://www.aiopsschool.com/

19. Top Trainers

Platform/Site Likely Specialization Suitable Audience Website URL
RajeshKumar.xyz DevOps/cloud training and mentoring (verify offerings) Individuals and teams seeking structured guidance https://rajeshkumar.xyz/
devopstrainer.in DevOps training platform (verify course catalog) Engineers building practical DevOps/cloud skills https://www.devopstrainer.in/
devopsfreelancer.com Freelance DevOps services/training (verify specifics) Teams needing short-term advisory or coaching https://www.devopsfreelancer.com/
devopssupport.in DevOps support and learning resources (verify services) Ops/DevOps teams needing troubleshooting help https://www.devopssupport.in/

20. Top Consulting Companies

Company Likely Service Area Where They May Help Consulting Use Case Examples Website URL
cotocus.com Cloud/DevOps consulting (verify portfolio) Cloud adoption, automation, operations Designing OCI landing zones; setting up CI/CD; governance patterns https://cotocus.com/
DevOpsSchool.com DevOps/cloud consulting and training Platform engineering and enablement Data platform ops model; pipeline standards; IAM and policy design workshops https://www.devopsschool.com/
DEVOPSCONSULTING.IN DevOps consulting services (verify offerings) DevOps transformation and cloud operations Build runbooks/monitoring; release automation; security reviews https://www.devopsconsulting.in/

21. Career and Learning Roadmap

What to learn before Data Integrator

  • OCI fundamentals:
  • Compartments, IAM users/groups/policies
  • Regions, availability domains
  • Networking basics:
  • VCNs, subnets, routing, service gateways (for private Object Storage access)
  • Data basics:
  • Relational modeling, SQL
  • File formats (CSV conventions, encoding, delimiters)
  • Autonomous Database basics:
  • Schemas, tables, constraints
  • Loading patterns and data types

What to learn after Data Integrator

  • Advanced orchestration:
  • Multi-step pipelines, backfills, dependency management
  • Data quality and governance:
  • Data validation frameworks, data catalogs, lineage concepts
  • Security and compliance:
  • Vault/KMS, private endpoints, audit and logging pipelines
  • Real-time data movement:
  • OCI GoldenGate for CDC use cases
  • Analytics:
  • Oracle Analytics Cloud, semantic modeling, performance tuning

Job roles that use it

  • Data Engineer (OCI)
  • Analytics Engineer
  • Cloud Data Platform Engineer
  • Integration Engineer (data-focused)
  • Platform Engineer supporting data pipelines
  • Data Operations / Data Reliability Engineer

Certification path (if available)

Oracle certifications change frequently. If you want a certification path: – Start with OCI foundations certifications (if applicable) – Look for OCI data platform certifications covering Autonomous Database and analytics services
Verify current Oracle certification tracks here: – https://education.oracle.com/

Project ideas for practice

  • Build a landing-to-curated pipeline with:
  • raw landing files in Object Storage
  • staging and curated schemas in Autonomous Database
  • Implement backfill logic:
  • load all files for a date range
  • validate counts and enforce idempotency
  • Add data quality checks:
  • reject invalid emails to a quarantine table
  • generate a load summary table per run
  • Build a cost dashboard:
  • tag resources
  • track job runtimes and estimate monthly consumption

22. Glossary

  • OCI (Oracle Cloud Infrastructure): Oracle’s public cloud platform offering compute, storage, networking, and managed services.
  • Integration (category): Services and patterns used to connect systems, move data, and orchestrate workflows.
  • Data Integrator: In this tutorial, the OCI-managed Data Integration service used to design and run data ingestion and transformation pipelines.
  • Workspace: An isolated environment within Data Integrator where you create and run integration artifacts.
  • Compartment: OCI governance boundary used to organize resources and apply IAM policies.
  • Data Asset: A logical definition of a source/target system (e.g., Object Storage, database).
  • Connection: The connectivity and credential configuration used to access a data asset.
  • Data Flow: A mapping/transformation workflow that defines how data moves from source to target.
  • Pipeline: An orchestration artifact chaining multiple steps/tasks.
  • Task: An executable unit that runs a data flow or pipeline.
  • Autonomous Database (ATP/ADW): Oracle-managed database service with automated operations and built-in security features.
  • Object Storage: OCI service for storing unstructured data (files/objects) in buckets.
  • IAM Policy: OCI access control rules defining who can do what in which compartment.
  • Service Gateway: OCI networking feature enabling private access to Oracle services like Object Storage from a VCN.
  • Private Endpoint: Private network access to a managed service without using a public IP (availability depends on service/config).
  • ETL/ELT: Extract-Transform-Load / Extract-Load-Transform data integration patterns.
  • CDC (Change Data Capture): Capturing and replicating data changes (often near-real-time), commonly done with tools like GoldenGate.

23. Summary

Data Integrator in Oracle Cloud (commonly documented as OCI Data Integration) is a managed Integration service for designing and running data ingestion and transformation pipelines—especially strong for patterns like Object Storage → Autonomous Database.

It matters because it reduces the operational burden of self-managed ETL tooling, provides scheduling and run history for production operations, and fits naturally into OCI governance via compartments and IAM policies. Cost is primarily driven by runtime consumption plus dependent services (Object Storage, databases, and any data transfer). Security hinges on least-privilege IAM, careful handling of connection credentials, and private networking where appropriate.

Use Data Integrator when you want managed, repeatable batch ingestion and orchestration in OCI. For real-time CDC replication, consider OCI GoldenGate; for application-to-application workflows, consider Oracle Integration.

Next step: read the official OCI Data Integration documentation and then expand this lab into a production-ready pattern with staging/curated schemas, idempotent loads, and monitored schedules: – https://docs.oracle.com/en-us/iaas/data-integration/