Category
Integration
1. Introduction
Oracle Cloud Data Integration is a managed service in Oracle Cloud Infrastructure (OCI) for building, running, and monitoring data movement and transformation workflows—commonly called ETL/ELT—across Oracle and non-Oracle systems.
In simple terms: Data Integration helps you pull data from one place, clean/transform it, and load it into another place, using a visual designer and managed execution so you don’t have to run your own ETL servers.
Technically, Data Integration provides a workspace-based design environment (projects, folders, data assets/connections, tasks, and pipelines) plus a managed runtime for executing data flows and orchestration. It integrates with OCI Identity and Access Management (IAM), compartments, policies, and OCI governance services such as Audit. You typically use it to implement ingestion into analytics platforms (like Autonomous Data Warehouse), operational reporting stores, or curated data lakes.
The problem it solves: teams need repeatable, secure, observable, cost-controlled ways to integrate data across applications and databases—without building a patchwork of scripts, cron jobs, and long-lived ETL servers.
Naming check: The service is commonly referred to as OCI Data Integration in official Oracle documentation. It is distinct from Oracle Data Integrator (ODI) (a separate product) and from Oracle Integration (application integration/iPaaS). This tutorial focuses only on Oracle Cloud (OCI) Data Integration.
2. What is Data Integration?
Official purpose (what Oracle positions it to do)
Oracle Cloud Data Integration is a fully managed cloud service for designing and running data pipelines that ingest, transform, and load data between heterogeneous sources and targets. It is intended to support common data engineering patterns—batch ingestion, transformations, incremental loads (where supported by source/target patterns), and orchestration—using a visual, metadata-driven approach.
For the canonical definition and current scope, verify in the official docs:
https://docs.oracle.com/en-us/iaas/data-integration/home.htm
Core capabilities (high level)
- Design-time tooling in the OCI Console: create projects, define sources/targets, build data flows and pipelines.
- Connections to data systems via “data assets” (connectors vary by environment and Oracle updates; verify supported connectors in docs for your region).
- Transformations using data-flow steps (select, filter, join, aggregate, derive columns, mapping, etc.—exact transformation set depends on current release).
- Orchestration with pipelines: chain tasks, manage dependencies, handle failures and retries (capabilities vary; verify current pipeline controls in docs).
- Operational execution and monitoring: run tasks, view runs, check statuses, troubleshoot failures.
- OCI-native governance: IAM policies, compartments, tagging, Audit integration.
Major components (how you work with the service)
While exact UI labels evolve, the core concepts in Data Integration generally include:
- Workspace: the top-level container where you design and operate. Usually created per environment (dev/test/prod) and per domain/team.
- Projects and folders: organize integration assets by subject area (finance, customer, telemetry, etc.).
- Data assets / connections: represent sources/targets and how to connect (credentials, endpoints, wallets, etc.).
- Tasks:
- Data flows: transformation logic (mapping and shaping data).
- Pipelines: orchestration logic (sequence, dependency, branching where supported).
- Other task types may exist depending on current release; verify in docs.
- Applications / publications (if present in your tenancy): promote or package artifacts for deployment between environments. Verify the current lifecycle model in official docs.
- Work requests / runs: execution records you monitor for success/failure and runtime metrics.
Service type
- Managed cloud service (serverless-style from the user perspective): you design and trigger jobs; Oracle operates the underlying service components.
- Strongly aligned with Integration category, but focused specifically on data integration rather than application/event integration.
Scope: regional vs global; tenancy/compartment model
- Data Integration is an OCI regional service: a workspace exists in a specific OCI region.
- Resources are governed using tenancy, compartments, and IAM policies.
- You usually design separate workspaces per region and per environment.
Always validate current regional availability in OCI documentation and the OCI Console region selector.
How it fits into the Oracle Cloud ecosystem
Data Integration often sits in the middle of these OCI building blocks:
- Sources/targets: Autonomous Database (ATP/ADW), Oracle Database on OCI, Object Storage, and potentially other supported systems/connectors.
- Data lake and analytics: Object Storage (raw/curated zones), Autonomous Data Warehouse, Oracle Analytics Cloud (downstream).
- Governance: IAM policies, compartments, tagging, Audit.
- Operations: OCI Monitoring/Logging (where supported), Notifications/Alarms around job states (often via integration patterns; verify supported hooks).
3. Why use Data Integration?
Business reasons
- Faster delivery of data pipelines: visual development and reusable assets reduce time-to-value.
- Lower operational overhead: less infrastructure to manage compared to self-hosted ETL servers.
- Standardization: consistent patterns for ingestion and transformations across teams.
- Auditability: better traceability than scattered scripts.
Technical reasons
- Metadata-driven development: organize connections, schemas, and tasks as managed artifacts.
- Repeatable orchestration: schedule/trigger workflows (depending on available scheduling features and your orchestration approach).
- OCI-native integration: works naturally with compartments, IAM, and OCI database services.
- Separation of design and execution: build once, run reliably.
Operational reasons
- Central monitoring: view execution status, runs, failures, and (where available) logs.
- Environment separation: manage dev/test/prod with compartments and workspaces.
- Governance: tagging, access control, and audit trails.
Security/compliance reasons
- IAM-based access: least-privilege policies per compartment/team.
- Audit events: OCI Audit captures relevant API activity.
- Network control patterns: can be paired with private endpoints and VCN designs depending on sources/targets (verify per connector).
Scalability/performance reasons
- Managed scaling: avoids fixed-capacity ETL servers.
- Parallelism patterns: data flows typically support distributed processing patterns for transformations (verify current runtime details and limits).
When teams should choose Data Integration
Choose Oracle Cloud Data Integration when: – You need batch ingestion + transformation in OCI. – Your primary targets are Autonomous Data Warehouse or other OCI data platforms. – You want OCI-governed pipelines managed via compartments/IAM. – You want to reduce custom scripting and improve reliability.
When teams should not choose it
Consider alternatives when: – You need real-time CDC replication with low latency (evaluate Oracle GoldenGate for OCI). – You need event-driven application integration and SaaS connectors at the application workflow level (evaluate Oracle Integration). – You need fully custom Spark control, notebooks, or bespoke code-first pipelines (evaluate OCI Data Flow, or code-first orchestration like Airflow on Kubernetes/Compute). – You need complex cross-cloud networking patterns that aren’t supported by the connectors/runtime model (validate connector and networking support first).
4. Where is Data Integration used?
Industries
- Financial services: daily regulatory reports, risk aggregation, customer 360.
- Retail/e-commerce: sales/returns analytics, inventory reconciliation, clickstream batch ingestion.
- Healthcare/life sciences: claims data normalization, batch de-identification staging (with strict governance).
- Telecom: CDR aggregation, churn analytics.
- Manufacturing/IoT: batch ingestion of telemetry files, quality metrics.
- Public sector: data consolidation for reporting, data lake standardization.
Team types
- Data engineering teams building ingestion/transform pipelines.
- Platform teams standardizing integration patterns.
- Analytics engineering teams curating dimensional models.
- DevOps/SRE teams supporting reliability and cost governance.
- Security teams enforcing IAM, encryption, and auditing.
Workloads
- Batch ELT/ETL: nightly loads, hourly loads, backfills.
- Data lake zone processing: raw → staged → curated.
- Warehouse loading: star schema, slowly changing dimensions (implementation depends on design patterns).
- Operational reporting extracts.
Architectures
- Lakehouse-style: Object Storage as lake + curated ADW marts.
- Hub-and-spoke integration: standardize ingestion into a central curated store.
- Multi-compartment enterprise governance: separate domains with shared platform.
Real-world deployment contexts
- Production: scheduled and monitored, strict IAM, alarms, runbooks, cost controls.
- Dev/test: smaller datasets, sandbox workspaces, experimental transformations.
- Migration: moving from ODI/Informatica/Talend-style on-prem ETL to managed OCI patterns.
5. Top Use Cases and Scenarios
Below are realistic scenarios where Oracle Cloud Data Integration is commonly a good fit.
1) Load CSV files from Object Storage into Autonomous Data Warehouse
- Problem: Analysts drop files into a bucket; the warehouse needs structured tables.
- Why Data Integration fits: Visual mapping + managed runs; integrates naturally with OCI.
- Example: Daily
orders_YYYYMMDD.csvlands in Object Storage → Data Integration loads intoDW_ORDERS_STAGE.
2) Curate a raw data lake into a “silver” zone
- Problem: Raw files are messy (types, missing fields, inconsistent formats).
- Why it fits: Data flow transformations can standardize and validate data.
- Example: Raw JSON exports → normalized Parquet-like structures (format support varies; verify) → curated bucket prefix.
3) Join multiple source tables into a reporting mart
- Problem: Reporting needs denormalized tables for BI.
- Why it fits: Visual join/aggregate transformations.
- Example: Join customers + orders + payments →
MART_CUSTOMER_REVENUE.
4) Standardize dimensions (conformed dimensions)
- Problem: Different systems represent “product” differently.
- Why it fits: Central transformation logic with reusable components.
- Example: ERP products + e-commerce products mapped to a single
DIM_PRODUCT.
5) Batch ingestion from Oracle Database into ADW
- Problem: Operational Oracle DB data must be copied nightly to analytics.
- Why it fits: Strong Oracle-to-Oracle integration patterns; managed credentials and execution.
- Example: Nightly extract of
SALES_TXN→ transform → load intoDW_SALES_FACT.
6) Mask or tokenize data before analytics (basic patterns)
- Problem: Sensitive fields must not reach general analytics tables.
- Why it fits: Transformations can remove/hash fields; governance via compartments and IAM.
- Example: Hash email, truncate addresses, remove SSNs before loading curated tables. (For strong masking, evaluate Oracle Data Safe and database-native controls too.)
7) Build a parameterized pipeline for multiple regions/business units
- Problem: Same pipeline must run for BU=A, BU=B with different source paths.
- Why it fits: Parameterization patterns reduce duplication (verify exact parameter features).
- Example:
source_prefix=/raw/bu=${BU}/parameter drives the ingestion path.
8) Backfill historical data with controlled runs
- Problem: Need to load 2 years of historical files without breaking production.
- Why it fits: Managed runs + organized projects; easier run tracking and retry.
- Example: Run pipeline per month partition, validate counts, then proceed.
9) Data quality checks as part of pipeline (basic validation)
- Problem: Downstream BI breaks when null rates spike or schema changes.
- Why it fits: Add validation steps and fail-fast patterns (implementation depends on supported transforms).
- Example: If
order_idnull rate > 0, stop pipeline and notify.
10) Replace cron + SQL scripts with governed orchestration
- Problem: “Works on my VM” scripts are hard to maintain and audit.
- Why it fits: Centralized jobs, IAM access, run tracking, and repeatability.
- Example: Replace shell scripts that call SQL*Plus with a pipeline that runs consistently.
6. Core Features
Feature availability can change by region and over time. For the most accurate list, verify in official docs: https://docs.oracle.com/en-us/iaas/data-integration/home.htm
Workspaces (design + operations boundary)
- What it does: Provides an isolated environment to create and operate integration artifacts.
- Why it matters: Enables clean separation between teams and environments.
- Practical benefit: Easier governance (IAM/tagging), predictable organization.
- Caveats: Workspaces are regional; cross-region designs need explicit data movement patterns.
Projects and folders (asset organization)
- What it does: Lets you group pipelines, flows, connections, and related artifacts.
- Why it matters: Keeps large integration estates manageable.
- Practical benefit: Teams can align projects to domains (Finance, HR, Sales).
- Caveats: Organization doesn’t replace IAM; use compartments and policies for access control.
Data assets / connections (source/target definitions)
- What it does: Stores metadata and connection details for sources/targets.
- Why it matters: Reuse connection definitions and manage credentials centrally.
- Practical benefit: Faster onboarding; fewer hardcoded secrets in scripts.
- Caveats: Supported connectors vary; validate that your exact source/target and auth method are supported.
Data flows (transformations)
- What it does: Implements transformation logic—mapping columns, filtering, joining, aggregating, deriving fields.
- Why it matters: Converts raw data into analytics-ready datasets.
- Practical benefit: Visual logic is easier to review and maintain than ad-hoc scripts for many teams.
- Caveats: Not every transformation pattern is available visually; complex logic might require database-side SQL transformations or alternative services.
Pipelines (orchestration)
- What it does: Orchestrates multiple tasks with dependencies.
- Why it matters: Real pipelines need steps: ingest → transform → load → validate → publish.
- Practical benefit: One place to manage execution order and outcomes.
- Caveats: Advanced branching/looping patterns may be limited; verify current orchestration capabilities.
Parameterization and reusability (where supported)
- What it does: Allows using parameters for environment-specific or run-specific values (paths, table names, dates).
- Why it matters: Promotes reuse and reduces duplication.
- Practical benefit: Same pipeline can run for different partitions or BUs.
- Caveats: Parameter scoping rules and supported parameter types vary—confirm in docs.
Execution management and run history
- What it does: Provides a history of runs (status, timing, failures).
- Why it matters: Troubleshooting depends on visibility.
- Practical benefit: Faster root cause analysis than searching through VM logs.
- Caveats: Log detail and retention may vary; confirm how to export logs and what is retained.
IAM integration (compartment-based governance)
- What it does: Uses OCI IAM for authentication/authorization to create/manage DI resources.
- Why it matters: Least privilege and separation of duties.
- Practical benefit: Platform teams can restrict production changes.
- Caveats: Access to external data sources also requires correct policies and networking patterns.
OCI Audit integration
- What it does: Records relevant API events for governance and compliance.
- Why it matters: You need a trail of who changed what.
- Practical benefit: Supports compliance controls and investigations.
- Caveats: Audit captures control-plane events, not necessarily every row-level data operation.
7. Architecture and How It Works
High-level service architecture
Data Integration typically separates concerns into: – Control plane: UI/API actions (create workspace, define assets, run tasks). Governed by IAM, recorded by Audit. – Data plane (runtime): Executes flows/pipelines and reads/writes data to configured systems.
You design integrations as artifacts in a workspace. When you trigger a run, the managed runtime connects to your sources/targets using the configuration and credentials, performs transformations, and writes results. Operational metadata (run status) is tracked for monitoring and troubleshooting.
Request / data / control flow (conceptual)
- User (or automation) calls OCI APIs / Console to create and configure DI artifacts.
- User triggers a task run (manual, scheduled, or programmatic—verify scheduling and APIs).
- DI runtime reads from source(s), transforms, writes to target(s).
- Run status and logs are stored for inspection; Audit logs capture changes.
Integrations with related OCI services (common patterns)
- Object Storage: staging and lake storage for raw/curated files.
- Autonomous Database (ATP/ADW): common targets for analytics and marts.
- Oracle Database on Compute/Exadata Cloud Service: operational sources/targets.
- OCI Vault: store secrets/keys (where supported by the connection model and your design).
- IAM/Compartments/Tags: governance and access control.
- VCN / Private Endpoints: private connectivity patterns for databases (depends on target configuration and service capabilities—verify in docs).
Dependency services (what you still need)
Data Integration does not replace: – Your storage (Object Storage buckets) and lifecycle policies. – Your database (Autonomous or DB on OCI) and its scaling/backups. – Your network architecture (VCNs, subnets, routing, DNS). – Your operations (alerting, runbooks, on-call).
Security/authentication model (practical view)
- Access to Data Integration resources is controlled by OCI IAM policies.
- Access from the runtime to sources/targets depends on:
- How the connector authenticates (user/password, wallet, token, etc.).
- Whether the runtime can reach the endpoint (public vs private networking).
- Policies that allow required OCI operations (for example, reading objects from a bucket).
Because IAM policy statements and connector auth differ by scenario, always confirm the exact policy examples for Data Integration in official docs.
Networking model (practical view)
- If your source/target is public (public Object Storage endpoint, public database endpoint), connectivity is simpler—but may not be acceptable for production security.
- For production, many teams prefer private endpoints and VCN-only access to databases and services. Confirm whether and how Data Integration supports private connectivity in your region and for your connector types.
Monitoring/logging/governance considerations
- OCI Audit: captures API changes and access patterns.
- Run monitoring: check task run states and errors in the Data Integration UI.
- OCI Logging/Monitoring: integration points vary; if you need centralized logs/metrics, verify current capabilities and consider exporting run outcomes to a monitoring system.
Simple architecture diagram (conceptual)
flowchart LR
U[Engineer / Analyst] -->|Console/API| DI[OCI Data Integration Workspace]
DI -->|Run Data Flow / Pipeline| RT[Managed Runtime]
RT --> OS[(OCI Object Storage)]
RT --> ADW[(Autonomous Data Warehouse)]
DI --> AUD[OCI Audit]
Production-style architecture diagram (more realistic)
flowchart TB
subgraph Tenancy[OCI Tenancy]
subgraph Net[VCN / Networking]
DBP[(Private ADW / DB Endpoint)]
NAT[NAT Gateway or Service Gateway]
end
subgraph Gov[Governance]
IAM[IAM Policies & Compartments]
AUD[OCI Audit]
TAG[Tags / Cost Tracking]
end
subgraph Data[Data Layer]
OSRAW[(Object Storage - Raw Zone)]
OSCUR[(Object Storage - Curated Zone)]
DW[(Autonomous Data Warehouse - Marts)]
end
subgraph DI[Data Integration]
WS[Workspace (Dev/Test/Prod)]
PJ[Projects / Folders]
DF[Data Flows]
PL[Pipelines]
RUN[Runs / Work Requests]
end
end
IAM --> WS
WS --> DF
WS --> PL
DF --> RUN
PL --> RUN
RUN --> OSRAW
RUN --> OSCUR
RUN --> DBP
DBP --> DW
WS --> AUD
TAG --> WS
NAT --> DBP
8. Prerequisites
Tenancy/account requirements
- An Oracle Cloud (OCI) tenancy with permission to use Data Integration in a region where it is available.
- A compartment strategy (at minimum: one compartment for this lab).
Permissions / IAM roles
You need permissions to: – Create/manage Data Integration workspaces and artifacts. – Read/write to Object Storage (for source/target files). – Connect to and create objects in the target database (Autonomous Database recommended for the lab).
OCI policies for Data Integration use specific resource types and verbs. Because policy syntax can evolve, use Oracle’s official policy examples as the source of truth and adapt to your compartments and groups:
- Data Integration documentation home: https://docs.oracle.com/en-us/iaas/data-integration/home.htm
- OCI IAM policy reference: https://docs.oracle.com/en-us/iaas/Content/Identity/home.htm
Billing requirements
- Data Integration is a paid OCI service (unless covered by specific promotions). You need a valid billing setup.
- If you use Autonomous Database Always Free, that can reduce costs for the target, but Data Integration usage may still generate charges depending on tenancy and region. Verify current Free Tier eligibility: https://www.oracle.com/cloud/free/
Tools needed
- OCI Console access (web browser).
- Optional but useful:
- OCI CLI (for Object Storage operations and automation): https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm
- A SQL client for Autonomous Database:
- Database Actions (web UI) or SQL Developer. Database Actions is typically easiest.
Region availability
- Choose an OCI region where Data Integration is available.
- Confirm via the OCI Console service list or the service documentation.
Quotas/limits
- OCI enforces service limits/quotas (workspaces, runs, concurrency, etc.).
- Check in Console: Governance & Administration → Limits, Quotas and Usage.
- If you hit limits, request an increase via OCI support (process depends on your account).
Prerequisite services for this lab
- Object Storage bucket (for source CSV file).
- Autonomous Database (ATP or ADW; Always Free works well for learning).
- Data Integration workspace.
9. Pricing / Cost
Do not rely on any blog for pricing numbers. OCI pricing is region-specific and may change. Use official pages.
Current pricing model (how costs are typically measured)
Oracle Cloud Data Integration pricing is usage-based. In practice, your bill is driven by:
– Data Integration job execution consumption (often measured in compute/time units for the managed runtime).
Verify the exact billing metric and unit names (for example, OCPU-hours or equivalent) on the official pricing page for your region.
– Underlying services you use:
– Object Storage capacity and requests
– Autonomous Database compute and storage (if not Always Free)
– Data transfer (cross-region, internet egress)
– Logging/Monitoring ingestion (if exporting logs)
Official pricing sources
- OCI price list (search for “Data Integration”): https://www.oracle.com/cloud/price-list/
- OCI Cost Estimator: https://www.oracle.com/cloud/costestimator.html
- OCI Free Tier overview (to reduce lab cost): https://www.oracle.com/cloud/free/
Pricing dimensions to understand (cost drivers)
- Number of runs and runtime duration – More frequent pipelines (e.g., every 5 minutes) cost more than nightly batch.
- Data volume processed – Larger datasets typically increase runtime and consumption.
- Transformation complexity – Joins, aggregations, and wide transformations usually cost more than simple copies.
- Concurrency – Running many pipelines simultaneously can increase consumption and hit limits.
- Network placement – Moving data across regions or out to the internet can add data transfer charges.
- Source/target performance – Slow databases can increase runtime and therefore cost.
Free tier considerations
- OCI Free Tier is mainly about compute/database/storage products. Data Integration may not be Always Free in your region/tenancy.
Verify current Free Tier eligibility for Data Integration specifically in official pages or your tenancy’s subscription details.
Hidden/indirect costs (common surprises)
- Autonomous Database scaling: if you auto-scale or choose higher CPU/storage, DI loads may trigger more DB usage.
- Object Storage request costs: frequent small-file processing increases request counts.
- Logging ingestion/retention: exporting logs at high volume can cost money.
- Cross-region traffic: replicating data or reading across regions can add transfer fees.
Network/data transfer implications
- In OCI, egress to the internet and some inter-region transfers are charged.
- Keep your Data Integration workspace, Object Storage, and database in the same region for cost and performance unless you have a strong reason not to.
How to optimize cost (practical checklist)
- Prefer batch windows over continuous micro-batching unless you truly need it.
- Consolidate small files into fewer larger files (where your pipeline supports it).
- Push heavy transformations into the database when it’s cheaper/faster and fits governance.
- Use partitioned loads (date partitions) and incremental patterns where feasible.
- Separate dev/test/prod and turn off non-production schedules.
- Tag resources for cost tracking (Cost Analysis works best with consistent tags).
Example low-cost starter estimate (how to think about it)
A learning lab typically includes: – 1 Data Integration workspace – A few small runs (MBs of CSV) – Always Free Autonomous Database (if eligible) – Minimal Object Storage
Your cost will depend on the minimum billable runtime units and the billing metric for Data Integration in your region. Use the Cost Estimator and run a small test, then check Billing → Cost Analysis.
Example production cost considerations (what to plan for)
In production, plan for: – Daily/hourly schedules across multiple domains (runs/day) – Backfills (temporary cost spikes) – Separate environments (dev/test/prod) – Higher log retention and monitoring exports – Stronger networking (private endpoints) and possible added network components
10. Step-by-Step Hands-On Tutorial
This lab builds a simple but real pipeline: load a CSV file from OCI Object Storage into an Autonomous Database table using Oracle Cloud Data Integration.
Objective
Create an OCI Data Integration workspace and a basic data flow that: 1. Reads a CSV file from an Object Storage bucket 2. Maps columns to a target table 3. Loads the data into an Autonomous Database table 4. Verifies row counts and cleans up
Lab Overview
You will: 1. Create/prepare an Autonomous Database table 2. Create an Object Storage bucket and upload a sample CSV 3. Create a Data Integration workspace and project 4. Create connections (data assets) to Object Storage and Autonomous Database 5. Build and run a Data Flow (or equivalent task) to load data 6. Validate results 7. Clean up resources to avoid ongoing cost
Notes before you start: – UI labels can vary slightly by OCI Console updates. – If any option differs in your tenancy, follow the closest equivalent and verify with the official docs: https://docs.oracle.com/en-us/iaas/data-integration/home.htm
Step 1: Create (or choose) a compartment for the lab
- In the OCI Console, open Identity & Security → Compartments.
- Create a compartment such as
lab-data-integration(or reuse an existing lab compartment). - Record: – Compartment name – Compartment OCID (optional but useful)
Expected outcome: You have a compartment where you will create the bucket, database, and Data Integration workspace.
Step 2: Create an Autonomous Database (ATP or ADW) and a target table
If you already have an Autonomous Database, you can reuse it.
- Go to Oracle Database → Autonomous Database.
- Click Create Autonomous Database.
- For low cost, choose an Always Free option if available in your region/tenancy.
- Set:
– Display name:
adb-di-lab– Database name: something short likeDILAB– Admin password: store securely - Create the database and wait for it to become Available.
Now create a table using Database Actions: 1. Open the Autonomous Database details page. 2. Click Database Actions → SQL. 3. Run:
CREATE TABLE DI_CUSTOMERS (
CUSTOMER_ID NUMBER,
FIRST_NAME VARCHAR2(100),
LAST_NAME VARCHAR2(100),
EMAIL VARCHAR2(200),
SIGNUP_DATE DATE
);
Expected outcome: Autonomous Database is running and has an empty DI_CUSTOMERS table.
Verification:
SELECT COUNT(*) FROM DI_CUSTOMERS;
Should return 0.
Step 3: Create an Object Storage bucket and upload a sample CSV
- Go to Storage → Buckets.
- Ensure you are in the same region and compartment.
- Click Create Bucket:
– Name:
di-lab-bucket-<unique-suffix>– Default storage tier is fine for a lab. - Open the bucket → Upload.
Create a local file named customers.csv with this content:
CUSTOMER_ID,FIRST_NAME,LAST_NAME,EMAIL,SIGNUP_DATE
1,Ana,Gomez,ana.gomez@example.com,2024-01-15
2,Sam,Lee,sam.lee@example.com,2024-02-20
3,Priya,Shah,priya.shah@example.com,2024-03-05
4,Noah,Kim,noah.kim@example.com,2024-03-18
Upload it to the bucket (root or a prefix like input/).
Expected outcome: The bucket contains customers.csv.
Verification: Click the object and confirm size and last modified timestamp.
Step 4: Create a Data Integration workspace
- In the OCI Console, go to Data Integration.
- Click Create workspace.
- Choose the lab compartment.
- Name:
di-workspace-lab - Create.
Wait until the workspace is active.
Expected outcome: Workspace exists and you can open it.
Step 5: Create a Data Integration project
- Open the workspace.
- Create a Project:
– Name:
customer-load-lab - Optionally create folders such as:
–
connections–dataflows–pipelines
Expected outcome: You have a project where you’ll build assets.
Step 6: Configure access (IAM and policies) for Object Storage and Autonomous Database
Data Integration needs permission to interact with OCI resources (like Object Storage), and it needs valid database credentials/connectivity for Autonomous Database.
Because the exact policy statements and resource types can vary, follow Oracle’s official policy examples for Data Integration and apply least privilege in your compartment.
Start here: – Data Integration docs: https://docs.oracle.com/en-us/iaas/data-integration/home.htm – IAM policy reference: https://docs.oracle.com/en-us/iaas/Content/Identity/home.htm
Common pattern to validate in docs (do not copy blindly): – Allow your admin group to manage Data Integration resources in the lab compartment. – Allow Data Integration service access to read objects from the specific bucket (or bucket compartment). – Ensure database connectivity and credentials are available for the connector method used.
Expected outcome: Policies are in place; no authorization errors when testing connections later.
Verification: You should be able to create data assets and browse/select the Object Storage object from within Data Integration (or at least run a task that reads it).
Step 7: Create a connection (data asset) to Object Storage
In the Data Integration workspace (inside your project):
- Navigate to Data Assets or Connections (terminology may vary).
- Create a new data asset for Object Storage.
- Provide: – Compartment/bucket details – Namespace (Object Storage namespace from tenancy) – Bucket name – Authentication method per the UI (often OCI-native/IAM-based)
Expected outcome: Object Storage data asset is created.
Verification: Use any available Test Connection or browse feature (if provided) to confirm you can locate customers.csv.
If you cannot browse but creation succeeds, proceed; the real test is running the flow.
Step 8: Create a connection (data asset) to Autonomous Database
- Create a new data asset for Autonomous Database (or Oracle Database).
- Provide connection details:
– Database OCID or connection string (depending on UI)
– Username (e.g.,
ADMINor a dedicated ETL user) – Password (store securely) – Wallet/SSL settings if required by the connector
Recommended for production: create a dedicated database user with least privileges (create session + insert/select on target schema), not ADMIN.
Expected outcome: Autonomous Database data asset is created.
Verification: Use Test Connection if available.
Step 9: Create a Data Flow to load customers.csv into DI_CUSTOMERS
- Create a new Data Flow in your project.
- Add a Source:
– Source type: Object Storage
– Select the bucket and object
customers.csv– Configure CSV format:- Header row present: yes
- Delimiter: comma
- Date format:
YYYY-MM-DD(or configure a parsing rule if the UI requires it)
- Add transformations as needed:
– Ensure column names map correctly:
CUSTOMER_ID→ numberFIRST_NAME,LAST_NAME,EMAIL→ stringsSIGNUP_DATE→ date (parse from string)
- Add a Target:
– Target type: Autonomous Database
– Target table:
DI_CUSTOMERS– Write mode: for a lab, choose a safe mode:- If you want repeatable runs: TRUNCATE then INSERT (if supported), or delete rows before loading.
- If you want append-only: INSERT.
- Save the Data Flow.
Expected outcome: A saved Data Flow that reads from Object Storage and writes to the database.
Verification: Validate the data flow graph (most UIs provide a validation step). Resolve schema/type mapping warnings.
Step 10: Run the Data Flow (or create a Task and run it)
Depending on your UI, you may: – Run the Data Flow directly, or – Create a Task from the Data Flow and run the task
- Click Run.
- Observe the run status (Submitted → Running → Succeeded/Failed).
- Open run details if available.
Expected outcome: Run completes successfully.
Validation
Validate in Autonomous Database
In Database Actions → SQL:
SELECT COUNT(*) AS row_count FROM DI_CUSTOMERS;
Expected: 4
Check the data:
SELECT
CUSTOMER_ID, FIRST_NAME, LAST_NAME, EMAIL,
TO_CHAR(SIGNUP_DATE, 'YYYY-MM-DD') AS SIGNUP_DATE
FROM DI_CUSTOMERS
ORDER BY CUSTOMER_ID;
Expected: rows 1–4 with correct values.
Validate in Data Integration
- The run should show Succeeded.
- If a run history is available, confirm runtime and any warnings.
Troubleshooting
Error: Authorization failed / NotAuthorizedOrNotFound
- Cause: missing IAM policy for Data Integration to access the bucket or DI resources.
- Fix:
- Confirm you are in the correct compartment.
- Review policies using official policy examples for Data Integration.
- Verify the bucket is in the same compartment you granted access to.
Error: Cannot connect to Autonomous Database
- Cause: wrong credentials, missing wallet/SSL config, network restrictions (private endpoint).
- Fix:
- Re-test with Database Actions using the same user.
- If the DB is private, confirm Data Integration supports the required private connectivity pattern and that your VCN/security lists/NSGs allow it.
- Confirm the connector’s required connection string/wallet details in the docs.
Error: Date parsing / invalid month
- Cause: CSV date format doesn’t match parsing rule.
- Fix:
- Ensure date format
YYYY-MM-DD. - Add an explicit cast/parse transformation (if available).
Error: Column mapping mismatch
- Cause: CSV headers don’t match target columns, or inferred types differ.
- Fix:
- Ensure
customers.csvheader names match expected mappings. - Add an explicit mapping step and cast types.
Error: Duplicate rows on re-run
- Cause: using INSERT append mode.
- Fix:
- Use truncate + load pattern (if supported), or run
TRUNCATE TABLE DI_CUSTOMERSbefore re-running.
Cleanup
To avoid ongoing cost and clutter, delete lab resources:
-
Data Integration – Delete the task(s), data flows, and project (optional). – Delete the workspace
di-workspace-lab(if not needed). -
Object Storage – Delete
customers.csv. – Delete the bucket. -
Autonomous Database – Drop the table (optional):
sql DROP TABLE DI_CUSTOMERS PURGE;– Terminate the Autonomous Database if it was created only for this lab (unless Always Free and you want to keep it). -
IAM policies – Remove any lab-only policies you created (keep least privilege).
11. Best Practices
Architecture best practices
- Separate dev/test/prod using compartments and separate workspaces.
- Adopt a layered data architecture (raw → staged → curated → marts).
- Keep data close to compute: same region for workspace, buckets, and DB targets.
- Prefer idempotent designs:
- Partitioned loads (by date)
- Merge/upsert patterns (when supported and appropriate)
- Staging + swap for stable publishing
IAM/security best practices
- Use least privilege:
- Separate “designers” (create/update flows) from “operators” (run/monitor).
- Use dedicated DB users for Data Integration with minimal privileges.
- Store secrets appropriately:
- Prefer OCI Vault patterns where supported; otherwise restrict who can view/edit connection assets.
- Apply tagging consistently (environment, cost center, owner, data domain).
Cost best practices
- Avoid high-frequency schedules for batch workloads.
- Reduce small-file overhead by consolidating files upstream.
- Monitor job runtimes and tune transformations.
- Use Cost Analysis with tags to detect runaway costs early.
Performance best practices
- Push down transformations to the database when that is faster/cheaper and aligns with governance.
- Use partitioned reads/writes where supported.
- Avoid unnecessary wide joins; pre-filter data early in the flow.
Reliability best practices
- Build pipelines with:
- Clear failure handling (stop on critical step failure)
- Retries for transient errors (where supported)
- Validation steps (row counts, null checks)
- Maintain runbooks:
- What to do on failure
- How to replay/backfill safely
- Escalation path
Operations best practices
- Standardize naming:
di-<env>-<domain>-<purpose>- Keep an asset inventory per workspace (projects, connections, schedules).
- Establish change management:
- Peer reviews of flows/pipelines
- Controlled promotion to production (verify DI’s promotion model in your tenancy)
Governance best practices
- Use compartments to model ownership and data domains.
- Tag everything for cost and ownership.
- Document data lineage externally if you need full lineage (Data Integration alone may not cover enterprise lineage requirements; consider OCI Data Catalog patterns where appropriate).
12. Security Considerations
Identity and access model
- Data Integration uses OCI IAM for:
- User authentication to the Console/API
- Authorization to manage workspaces and artifacts
- Use groups and policies rather than individual user grants.
- Separate duties:
- Data engineers: design assets
- Operators: run/monitor
- Security/admin: manage policies
Encryption
- OCI services typically encrypt data at rest by default (service-dependent). Confirm encryption behavior for:
- Object Storage buckets
- Autonomous Database
- For sensitive workloads, use customer-managed keys where required (OCI Vault + KMS), and verify Data Integration compatibility with CMEK scenarios for each dependent service.
Network exposure
- Prefer private connectivity for production databases where possible.
- If your DB is publicly accessible, restrict with:
- IP allowlists (if applicable)
- Strong credentials
- Minimal privileges
- Keep the Data Integration workspace and data sources in the same region to minimize exposure and transfer.
Secrets handling
- Avoid embedding secrets in scripts; store them in managed connection objects with restricted access.
- Rotate DB passwords and update connection assets as part of your security hygiene.
- Consider using database auth patterns that reduce static secrets (availability varies; verify in docs).
Audit/logging
- OCI Audit captures relevant administrative operations.
- Ensure Audit logs are retained per compliance requirements.
- If you need centralized observability, integrate with OCI Logging/Monitoring where supported and define alert rules around job failures.
Compliance considerations
- Map controls to:
- Access control (IAM policies)
- Change management (who can modify flows)
- Data protection (encryption, masking)
- Logging and retention (Audit, run history)
- For regulated data, ensure the entire path (source, transport, target, backups) meets compliance requirements.
Common security mistakes
- Using
ADMINfor database loads in production. - Overly broad IAM policies at the tenancy root.
- Leaving public endpoints open without strong restrictions.
- Allowing all developers to edit production connections and credentials.
- No audit review process.
Secure deployment recommendations
- Use compartment isolation per environment.
- Create a dedicated “integration runtime” DB user per pipeline domain.
- Apply tagging and resource naming conventions.
- Regularly review policies and connection assets permissions.
13. Limitations and Gotchas
Limits and capabilities vary by region and release. Always verify in official docs and your tenancy’s service limits page.
Known limitations (categories)
- Connector availability: not every data source is supported natively; some require staging via Object Storage or database links. Verify connector list.
- Private networking: private endpoint support depends on the connector and service capabilities; validate before committing to architecture.
- Advanced orchestration: complex branching/looping and event triggers may be limited compared to dedicated orchestrators.
- Real-time CDC: Data Integration is generally a batch integration tool; for CDC replication, evaluate GoldenGate.
Quotas and concurrency
- Workspaces, projects, tasks, and concurrent runs may have limits.
- Concurrency spikes during backfills can hit limits and increase costs.
Regional constraints
- Workspaces are regional; cross-region pipelines require explicit patterns and may incur data transfer charges.
Pricing surprises
- Backfills can run for hours/days, driving consumption.
- Many small files can inflate processing overhead and Object Storage request costs.
- Storing extensive logs externally can add Logging costs.
Compatibility issues
- CSV/JSON schema drift: header changes can break mappings.
- Date/time parsing differences between source formats and database types.
- Character set issues (UTF-8 vs other encodings) if files originate from legacy systems.
Operational gotchas
- Re-runs can cause duplicates without idempotent design.
- Credential rotations can silently break scheduled loads if not updated.
- Lack of standardized naming makes incident response slower.
Migration challenges
- Migrating from ODI/Informatica/Talend may require redesign:
- Different transformation semantics
- Different operational model (managed vs self-hosted)
- Different scheduling/orchestration patterns
Vendor-specific nuances
- Oracle database targets can be very fast, but you must still design:
- Load strategy (append vs merge)
- Index maintenance timing
- Constraints handling
14. Comparison with Alternatives
Data Integration is one option in a broader integration and data engineering toolbox.
Alternatives in Oracle Cloud (OCI)
- Oracle GoldenGate (OCI): best for real-time CDC replication.
- OCI Data Flow: serverless Apache Spark jobs for code-first transformations.
- Oracle Integration: iPaaS for application/SaaS integration and process automation (not a data engineering ETL tool first).
- OCI Data Catalog: metadata management and governance (complements DI; not an ETL runtime).
Alternatives in other clouds
- AWS Glue: managed ETL + data catalog integration.
- Azure Data Factory: orchestration + connectors + mapping data flows.
- Google Cloud Data Fusion / Dataflow: visual pipeline (Data Fusion) and managed stream/batch processing (Dataflow).
Open-source / self-managed
- Apache Airflow (self-managed or managed elsewhere): orchestration (not ETL itself).
- Apache NiFi: flow-based ingestion.
- dbt: SQL-based transformations in the warehouse (often complements ingestion tools).
- Spark on Kubernetes: maximum control, maximum ops overhead.
Comparison table
| Option | Best For | Strengths | Weaknesses | When to Choose |
|---|---|---|---|---|
| OCI Data Integration | Batch ingestion + transformation in OCI | Managed design/runtime, OCI-native governance, good fit with ADW/Object Storage | Connector/networking constraints, not CDC-first, orchestration depth may be limited | You want OCI-governed ETL/ELT without running servers |
| Oracle GoldenGate (OCI) | Real-time replication / CDC | Low-latency change capture, replication patterns | More specialized, can be costlier/complex | You need near real-time data movement with CDC |
| OCI Data Flow | Code-first Spark processing | Flexible, scalable Spark jobs | More engineering/ops than visual ETL | You need custom Spark logic beyond visual transforms |
| Oracle Integration | App/SaaS integration | SaaS adapters, process automation | Not designed primarily for large-scale data engineering | You integrate applications and events, not bulk analytics loads |
| AWS Glue | ETL on AWS | Strong AWS ecosystem integration | Different cloud; migration overhead | Your platform is AWS-first |
| Azure Data Factory | Data integration on Azure | Mature orchestration + connectors | Different cloud | Your platform is Azure-first |
| Airflow (self-managed) | Orchestration across tools | Very flexible DAG orchestration | You manage infra and reliability | You need multi-tool orchestration and have ops maturity |
15. Real-World Example
Enterprise example: retail analytics modernization
Problem A retail enterprise has: – Oracle E-Business Suite/ERP on Oracle Database – Daily store sales files landing in Object Storage – A mandate to build a governed analytics platform on Autonomous Data Warehouse
They need consistent, auditable pipelines with environment separation and access controls.
Proposed architecture
– Object Storage:
– /raw/pos/ for store extracts
– /raw/erp/ for exports
– /curated/ for standardized datasets
– OCI Data Integration:
– Separate workspaces per environment (dev/test/prod)
– Projects per domain: sales, inventory, customer
– Pipelines orchestrating: ingest → transform → load ADW → validate
– Autonomous Data Warehouse:
– Staging schema + curated marts
– Governance:
– IAM policies per team
– Tags for cost allocation
– Audit reviews for production changes
Why Data Integration was chosen – Visual development accelerates delivery across multiple teams. – OCI-native IAM and compartments align with enterprise governance. – Managed runtime reduces operational burden versus self-hosted ETL servers.
Expected outcomes – Reduced pipeline failures via standardized orchestration and monitoring – Faster onboarding for new subject areas – Improved auditability and controlled promotion to production – Predictable costs through tagging and run discipline
Startup/small-team example: SaaS product usage analytics
Problem A startup collects daily usage exports (CSV) and wants to build KPIs in a warehouse without hiring a full-time platform engineer to manage ETL servers.
Proposed architecture – Object Storage bucket receives daily exports from the application. – OCI Data Integration: – Single workspace for staging + transformations – A small set of data flows loading into ADW – Autonomous Database (Always Free initially; later scale up): – Simple schema for dashboards
Why Data Integration was chosen – Minimal infrastructure management. – Quick to build and change transformations. – Strong fit with OCI-native services used by the startup.
Expected outcomes – Working dashboards in days rather than weeks – Low operational overhead – Smooth scaling path as data volume grows
16. FAQ
1) Is Oracle Cloud Data Integration the same as Oracle Data Integrator (ODI)?
No. OCI Data Integration is a managed OCI service. ODI is a separate product (often on-prem or self-managed on cloud). Validate product scope in Oracle docs for your exact environment.
2) Is Data Integration the same as Oracle Integration?
No. Oracle Integration is an iPaaS focused on application integration and process automation. Data Integration is focused on data ingestion and transformation pipelines.
3) Do I need to run servers or clusters for Data Integration?
Typically no—Data Integration is managed. You design and run jobs; Oracle manages service infrastructure. Verify runtime characteristics and limits in official docs.
4) Is Data Integration regional?
Yes, workspaces are created in a specific OCI region. Keep sources/targets in-region when possible.
5) Can Data Integration load into Autonomous Data Warehouse?
Yes, ADW is a common target. You configure a connection and load into tables.
6) Can Data Integration read files from Object Storage?
Yes, Object Storage is a common source/landing zone for CSV and other file-based ingestion patterns (format support depends on connector features).
7) How do I schedule pipelines?
Scheduling options depend on current service features and your chosen approach. If native scheduling is limited for your needs, orchestrate runs externally (for example, with OCI services or CI/CD). Verify current scheduling features in docs.
8) How do I implement incremental loads?
Common patterns include:
– Partitioned loads by date
– Change-tracking columns in source tables
– Staging + merge/upsert in the database
Exact implementation depends on connectors and transformation features.
9) Does Data Integration support CDC?
Data Integration is generally batch-oriented. For CDC replication, evaluate Oracle GoldenGate.
10) Can I deploy the same pipeline to dev/test/prod?
Yes, typically by using separate workspaces and consistent naming/parameters. Confirm current promotion/export/import capabilities in your tenancy.
11) How do I secure database credentials used by Data Integration?
Use dedicated DB users with least privileges and restrict access to connection assets. Use OCI Vault patterns where supported by your connector model.
12) What’s the best way to avoid duplicates when re-running?
Use idempotent patterns: – Truncate-and-load for full refresh tables – Partition overwrite – Merge/upsert keyed by business key and effective dates
13) How do I monitor failures?
Use Data Integration run history and error details. For production, integrate job outcomes with your alerting process (Notifications/alarms patterns vary—verify available integrations).
14) How do I estimate costs?
Use the official pricing page and the OCI Cost Estimator:
– https://www.oracle.com/cloud/price-list/
– https://www.oracle.com/cloud/costestimator.html
Then validate by running a small workload and reviewing Billing → Cost Analysis.
15) What’s the easiest beginner lab?
Load a small CSV from Object Storage into an Always Free Autonomous Database table (the lab in this tutorial).
17. Top Online Resources to Learn Data Integration
| Resource Type | Name | Why It Is Useful |
|---|---|---|
| Official documentation | OCI Data Integration Docs | Source of truth for concepts, features, limits, and how-to steps. https://docs.oracle.com/en-us/iaas/data-integration/home.htm |
| Official documentation | OCI IAM Docs | Required for correct policies, compartments, dynamic groups, and security model. https://docs.oracle.com/en-us/iaas/Content/Identity/home.htm |
| Official pricing | OCI Price List | Find Data Integration pricing dimensions for your region. https://www.oracle.com/cloud/price-list/ |
| Official calculator | OCI Cost Estimator | Model Data Integration + Object Storage + DB costs. https://www.oracle.com/cloud/costestimator.html |
| Official free tier | OCI Free Tier | Reduce lab cost; check what is Always Free. https://www.oracle.com/cloud/free/ |
| Architecture guidance | OCI Architecture Center | Reference architectures and best practices (search for data integration and analytics patterns). https://docs.oracle.com/en/solutions/ |
| Tutorials | OCI Tutorials (Oracle) | Step-by-step labs for OCI services; search for Data Integration. https://docs.oracle.com/en/learn/ |
| Videos | Oracle Cloud YouTube channel | Product overviews and demos; verify freshness by date. https://www.youtube.com/@OracleCloudInfrastructure |
| Samples | Oracle GitHub (official org) | Some OCI services provide samples; search repositories for “data integration”. https://github.com/oracle |
| Community (reputable) | Oracle Cloud Customer Connect | Practical discussions and Q&A validate answers against docs. https://cloudcustomerconnect.oracle.com/ |
18. Training and Certification Providers
| Institute | Suitable Audience | Likely Learning Focus | Mode | Website URL |
|---|---|---|---|---|
| DevOpsSchool.com | Engineers, DevOps, cloud practitioners | Cloud/DevOps training; may include OCI and integration fundamentals | Check website | https://www.devopsschool.com/ |
| ScmGalaxy.com | Beginners to intermediate IT professionals | DevOps, SCM, automation foundations that support integration operations | Check website | https://www.scmgalaxy.com/ |
| CLoudOpsNow.in | Cloud ops and platform teams | Cloud operations practices, monitoring, governance | Check website | https://www.cloudopsnow.in/ |
| SreSchool.com | SREs, operations, reliability engineers | Reliability engineering practices for running production pipelines | Check website | https://www.sreschool.com/ |
| AiOpsSchool.com | Ops and platform teams exploring AIOps | Monitoring/automation concepts that can complement data pipeline ops | Check website | https://www.aiopsschool.com/ |
19. Top Trainers
| Platform/Site | Likely Specialization | Suitable Audience | Website URL |
|---|---|---|---|
| RajeshKumar.xyz | DevOps/cloud training content (verify current offerings) | Beginners to intermediate | https://rajeshkumar.xyz/ |
| devopstrainer.in | DevOps training and coaching (verify OCI coverage) | DevOps engineers and students | https://www.devopstrainer.in/ |
| devopsfreelancer.com | Freelance DevOps/platform help (verify services) | Teams needing short-term guidance | https://www.devopsfreelancer.com/ |
| devopssupport.in | DevOps support/training style services (verify scope) | Ops teams and learners | https://www.devopssupport.in/ |
20. Top Consulting Companies
| Company | Likely Service Area | Where They May Help | Consulting Use Case Examples | Website URL |
|---|---|---|---|---|
| cotocus.com | Cloud/DevOps consulting (verify service catalog) | Architecture, implementation support, operations setup | Landing zone setup, CI/CD for data pipelines, monitoring runbooks | https://cotocus.com/ |
| DevOpsSchool.com | Training + consulting (verify offerings) | Enablement + implementation guidance | Platform standardization, governance/tagging strategy, operational maturity | https://www.devopsschool.com/ |
| DEVOPSCONSULTING.IN | DevOps consulting (verify service catalog) | Delivery assistance and ops processes | Automation, infrastructure-as-code support, operational playbooks | https://www.devopsconsulting.in/ |
21. Career and Learning Roadmap
What to learn before Data Integration
- OCI fundamentals:
- Tenancy, compartments, IAM users/groups/policies
- VCN basics (subnets, routing, security lists/NSGs)
- Object Storage concepts (buckets, namespaces, lifecycle)
- Data fundamentals:
- Relational modeling, SQL basics
- CSV/file formats and schema basics
- ETL vs ELT patterns
- Security basics:
- Least privilege
- Secret management patterns
What to learn after Data Integration
- Advanced analytics architecture:
- Data lakehouse patterns on OCI
- Dimensional modeling (Kimball) and data vault concepts
- Orchestration and platform engineering:
- CI/CD for data pipelines (Git-based workflows)
- Testing strategies for data (unit tests, reconciliation)
- Specialized tools:
- Oracle GoldenGate for CDC
- OCI Data Flow for advanced Spark workloads
- Data governance tools (OCI Data Catalog) for lineage and discovery
Job roles that use it
- Data Engineer (OCI)
- Analytics Engineer
- Cloud Engineer / Platform Engineer (data platform)
- DevOps/SRE supporting data pipelines
- Solution Architect (data and analytics)
Certification path (if available)
Oracle certification offerings change over time. For current OCI certification paths, verify on Oracle University: – https://education.oracle.com/
Look for OCI-focused tracks related to data management, integration, and analytics.
Project ideas for practice
- Build a raw-to-curated pipeline with partitioned loads (daily folders).
- Implement an idempotent load pattern (staging + merge).
- Add validation steps (row counts, null checks) and a failure notification pattern.
- Create separate dev/prod workspaces and practice promoting artifacts.
- Cost governance: tag everything and produce a weekly cost report by tag.
22. Glossary
- ADW (Autonomous Data Warehouse): Oracle’s managed analytics database service on OCI.
- ATP (Autonomous Transaction Processing): Oracle’s managed transactional database service on OCI.
- Bucket: Object Storage container for objects (files).
- Compartment: OCI logical container for resources and access control.
- Control plane: Management layer (create/update/run configuration).
- Data asset / connection: Definition of a source/target system and how to connect to it.
- Data flow: A transformation pipeline that reads, transforms, and writes data.
- Data plane/runtime: Execution layer that moves/transforms data.
- ETL/ELT: Extract-Transform-Load / Extract-Load-Transform integration patterns.
- IAM policy: Rules that define who can do what to which resources.
- Idempotent load: A load that can be rerun without creating duplicates or incorrect results.
- Object Storage namespace: Tenancy-level identifier used in Object Storage endpoints.
- Pipeline: Orchestration of tasks with dependencies and run order.
- Run / work request: Execution record of a task/pipeline.
- VCN: Virtual Cloud Network—your private network in OCI.
23. Summary
Oracle Cloud Data Integration is OCI’s managed service in the Integration category for building and operating batch-oriented data ingestion and transformation pipelines. It fits best when you want OCI-native governance (IAM/compartments/tags), a visual development experience, and a managed runtime—especially for common patterns like Object Storage to Autonomous Database loads.
Cost is primarily driven by job execution consumption (verify exact billing units on the official pricing page) and by dependent services like Object Storage and Autonomous Database. Security and compliance depend on least-privilege IAM policies, careful credential handling, network design (public vs private endpoints), and using Audit/run history for traceability.
Use Data Integration when you need governed ETL/ELT in OCI; consider GoldenGate for CDC and Data Flow for code-first Spark. Next, deepen skills by implementing idempotent patterns, environment promotion, and operational monitoring runbooks—then validate everything against the official docs: https://docs.oracle.com/en-us/iaas/data-integration/home.htm