Oracle Cloud Data Catalog Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Data Management

1. Introduction

Oracle Cloud Data Catalog is Oracle Cloud Infrastructure’s managed service for discovering, organizing, and governing metadata about the data your organization stores across databases, data lakes, and analytics platforms.

In simple terms: Data Catalog helps you answer “What data do we have, where is it, who owns it, and how should it be used?”—without moving or copying the underlying data.

Technically, Data Catalog is a metadata management and data discovery service. You create a catalog, register data sources (called data assets), run harvest jobs to extract technical metadata (schemas, tables, files, columns, etc.), and enrich that metadata with business context such as glossary terms, tags, and custom properties. Consumers then use search and browsing to find trusted datasets faster.

It solves common data-management problems such as: – Lack of visibility into what data exists across teams and clouds – Inconsistent definitions (e.g., “customer”, “revenue”, “active user”) – Difficulty finding the right dataset and its owner/steward – Governance needs for audits and compliance (knowing what exists, where, and how it’s classified)

Service name check: The service is commonly documented as Oracle Cloud Infrastructure (OCI) Data Catalog. This tutorial uses the required primary name Data Catalog and keeps alignment with Oracle Cloud and Data Management. If Oracle renames any UI labels or endpoints in your region, verify in official docs.

2. What is Data Catalog?

Official purpose

In Oracle Cloud’s Data Management portfolio, Data Catalog is intended to provide a centralized place to: – Collect technical metadata from supported data sources – Organize and curate that metadata for discoverability – Add business context using glossary, tags, and properties – Support governance by making ownership and definitions explicit

Core capabilities (what it does)

Data Catalog typically supports the following capability areas (exact source coverage depends on your region and connectors; verify supported data assets in official docs): – Metadata harvesting from registered data assets – Search and discovery across harvested entities (tables, views, files, columns, etc.) – Business glossary for definitions and standard terminology – Curation and enrichment via tags, custom properties, and relationships – Access control using Oracle Cloud IAM and compartments

Major components (mental model)

Catalog: The top-level container for metadata. Created in a specific Oracle Cloud region and compartment.
Data asset: A registered data source (for example, Object Storage, Autonomous Database, or other supported sources). Think of it as “this is where metadata can be harvested from.”
Connection / credential: How Data Catalog authenticates to the data asset (varies by source type; may use IAM/service access for OCI-native services or credentials for databases).
Harvest: A job (manual or scheduled) that extracts metadata from a data asset into the catalog.
Entities: The harvested objects (schemas, tables, columns, files, etc.) represented in the catalog.
Glossary / terms: Business definitions linked to harvested entities to clarify meaning and intended use.
Tags and custom properties: Lightweight governance controls (classification, sensitivity, owner, SLA tier, domain, etc.)

Service type

Managed Oracle Cloud service (control plane managed by Oracle)
Metadata system (stores metadata and governance context, not the underlying data)

Scope: regional vs global

Data Catalog is created in a specific Oracle Cloud region and a compartment within your tenancy. You can catalog sources across compartments if IAM policies allow it. Cross-region cataloging patterns exist, but the catalog itself is regional; plan accordingly and verify current cross-region support in official docs.

How it fits into the Oracle Cloud ecosystem

Data Catalog sits at the center of a typical Oracle Cloud Data Management and analytics environment: – Data producers store data in Object Storage, Autonomous Database, and other platforms. – Data engineers transform data using services such as OCI Data Integration, OCI Data Flow, and other processing engines. – Data Catalog provides the “system of record” for metadata, helping analysts and engineers find and interpret datasets. – Security and governance rely on OCI IAM, Audit, and tagging strategies.

3. Why use Data Catalog?

Business reasons

Faster time-to-data: Teams spend less time searching and re-creating datasets.
Better decision-making: Shared definitions reduce reporting conflicts.
Reduced risk: Easier to identify sensitive data locations for compliance initiatives.
Increased reuse: Analysts find trusted datasets instead of building shadow copies.

Technical reasons

Central metadata index for multiple sources
Searchable inventory of tables/files/columns and their attributes
Standardization via glossary and curated metadata
Extensibility through tags and custom properties

Operational reasons

Repeatable harvesting (manual/scheduled) to keep metadata current
Ownership and stewardship captured alongside metadata
Better handoffs between engineering, analytics, and governance teams

Security/compliance reasons

Supports governance patterns like:
“Know where PII might exist”
“Who owns this dataset?”
“What’s the approved definition of a metric?”
Integrates with IAM for access control and with auditing capabilities in Oracle Cloud.

Scalability/performance reasons

Data Catalog is designed to scale in metadata volume and user access patterns typical of medium-to-large enterprises. The underlying data stays in place; you manage metadata, which is far lighter than copying datasets.

When teams should choose Data Catalog

Choose Data Catalog when: – You have multiple data sources and need a single discovery experience – You need a business glossary tied to real datasets – You want to operationalize data governance without building a custom metadata system – You want an Oracle-managed metadata catalog integrated with Oracle Cloud IAM

When teams should not choose it

Data Catalog may not be the right fit if: – You only have one small data store and discovery is trivial – You need full data-quality rules engine or master data management (different tool category) – You require capabilities not currently supported by Data Catalog connectors in your region (verify first) – You want a fully open-source/self-managed solution with deep customization and are willing to operate it

4. Where is Data Catalog used?

Industries

Financial services (regulatory reporting, audit readiness)
Healthcare/life sciences (data sensitivity classification)
Retail/e-commerce (product/customer analytics definitions)
Telecom (large-scale data platforms with many producers)
Government/public sector (data inventories and stewardship)
SaaS companies (internal analytics governance)

Team types

Data platform teams
Data engineering and ETL teams
Analytics engineering teams
BI teams and data analysts
Security and compliance teams
Enterprise architecture and governance teams

Workloads

Data lake discovery (Object Storage)
Data warehouse cataloging (Autonomous Data Warehouse and other supported DBs)
Cross-domain metrics standardization (glossary-driven analytics)
Migration governance (inventory before moving data)
Audit response (identify datasets and owners)

Architectures

Central lakehouse with multiple pipelines
Multi-compartment data mesh-like layouts (domain-based compartments)
Hybrid environments (OCI plus external sources where supported; verify connector coverage)

Real-world deployment contexts

Production: catalog is used by analysts and governance daily; harvesting is scheduled and monitored.
Dev/test: used to validate metadata extraction and glossary structure before scaling.

5. Top Use Cases and Scenarios

Below are realistic scenarios where Oracle Cloud Data Catalog commonly fits.

1) Data lake discovery for Object Storage

Problem: Hundreds of buckets and folders; nobody knows what’s inside.
Why it fits: Data Catalog can harvest and index metadata for supported Object Storage structures (verify exact capabilities for file formats and depth).
Scenario: A data platform team catalogs curated datasets in Object Storage so analysts can search for “orders” and find the canonical dataset plus owner.

2) Cataloging a data warehouse for self-service analytics

Problem: Analysts can query the warehouse but don’t know table meanings.
Why it fits: Harvest tables/columns and enrich them with glossary terms and curated descriptions.
Scenario: Finance defines “Net Revenue” as a glossary term and links it to the correct column(s) in the warehouse.

3) Standardizing business definitions across departments

Problem: “Customer” means different things in Sales vs Support.
Why it fits: Glossary provides a shared vocabulary with stewarded definitions.
Scenario: Governance team defines “Customer (Bill-to)” and “Customer (User)” as separate terms and maps datasets accordingly.

4) Ownership and stewardship mapping (operational governance)

Problem: No one knows who to contact about a dataset.
Why it fits: Use custom properties/tags to record owner, steward, support channel, SLA tier.
Scenario: Every curated dataset includes Owner, Steward, SlackChannel, and RefreshFrequency.

5) Sensitive data discovery support (classification workflow)

Problem: Compliance asks where PII exists; teams respond manually.
Why it fits: Tag entities/attributes with classifications; create views of sensitive datasets.
Scenario: A quarterly review exports a list of entities tagged as PII for follow-up controls (actual export/reporting methods depend on UI/API; verify).

6) Pre-migration inventory and rationalization

Problem: Before migrating to OCI, you need an inventory of sources and schemas.
Why it fits: Data Catalog becomes a landing place for harvested metadata, highlighting duplicates and unused datasets.
Scenario: During warehouse modernization, teams catalog legacy schemas, then mark deprecated datasets with tags.

7) Data product catalog for a platform team (data mesh-ish)

Problem: Domain teams publish data products but discovery is fragmented.
Why it fits: Central catalog with domain-based tags and glossary.
Scenario: Marketing and Supply Chain publish certified datasets; Data Catalog becomes the discovery portal.

8) Faster onboarding for new engineers and analysts

Problem: New hires take weeks to learn data landscape.
Why it fits: Search, browse, and glossary shorten ramp-up time.
Scenario: A new analyst searches “returns” and quickly finds the curated returns dataset and its definition.

9) Pipeline change impact analysis (metadata-based)

Problem: Schema changes break dashboards; teams don’t see dependencies.
Why it fits: Metadata and relationships can help document dependencies; if lineage integrations are available in your setup, it’s even stronger (verify lineage support/integration).
Scenario: Data engineers annotate downstream consumers in custom properties and use consistent tags for impacted domains.

10) Audit response and evidence collection

Problem: Auditors ask for data inventory, ownership, and definitions.
Why it fits: Catalog provides centralized metadata, ownership, and governance artifacts.
Scenario: Security exports a list of datasets tagged Confidential and shows steward approvals recorded in process (process tooling is external; catalog supports the metadata).

11) Shared KPI metric governance for BI

Problem: Multiple dashboards calculate metrics differently.
Why it fits: Glossary defines metrics and points to canonical datasets/columns.
Scenario: “Active Subscriber” is defined once, used across BI reports.

12) Cross-team dataset certification

Problem: Users can’t tell trusted datasets from experimental ones.
Why it fits: Tag datasets as Certified, Bronze/Silver/Gold, or Trusted.
Scenario: Platform team certifies “Gold” tables after validation; analysts filter search to only certified assets.

6. Core Features

Feature availability can vary by region, permissions, and connector type. Confirm exact UI labels and supported source types in the official documentation.

1) Catalogs (metadata containers)

What it does: Provides a top-level container to store metadata, glossary, tags, and enrichment.
Why it matters: Separates environments or domains (e.g., “Prod Catalog” vs “Sandbox Catalog”).
Practical benefit: Cleaner governance boundaries and access control.
Caveats: Catalog is regional; plan for multi-region architectures.

2) Data assets (source registration)

What it does: Registers a data source for harvesting.
Why it matters: Establishes the “where” for metadata.
Practical benefit: Standardized onboarding process for new sources.
Caveats: Each asset type has distinct connection requirements.

3) Harvesting (metadata extraction jobs)

What it does: Extracts and updates technical metadata from a data asset into the catalog.
Why it matters: Keeps metadata current as schemas/files evolve.
Practical benefit: Repeatable scheduled refresh reduces manual documentation.
Caveats: Requires correct IAM/credentials and network access; harvesting can fail if policies are missing.

4) Search and browse

What it does: Lets users find entities using keywords, filters, and navigation.
Why it matters: Discovery is the core value of a catalog.
Practical benefit: Reduces tribal knowledge dependency.
Caveats: Search quality depends on metadata quality; add descriptions, glossary terms, tags.

5) Business glossary

What it does: Stores business terms, definitions, and associations to technical assets.
Why it matters: Aligns teams on consistent definitions.
Practical benefit: BI and analytics become more reliable.
Caveats: Glossary governance is a people/process challenge; needs steward ownership.

6) Tags (classification and organization)

What it does: Apply labels to assets/entities/attributes.
Why it matters: Enables filtering, governance, and lifecycle management.
Practical benefit: Common tags: PII, Confidential, Certified, Domain:Marketing.
Caveats: Without naming conventions, tags become messy and duplicated.

7) Custom properties (metadata enrichment)

What it does: Adds organization-specific fields (owner, SLA, refresh frequency, cost center).
Why it matters: Most governance needs are organization-specific.
Practical benefit: Convert tribal knowledge into structured metadata.
Caveats: Over-customization can reduce usability; keep a controlled list.

8) IAM integration (access control)

What it does: Uses Oracle Cloud IAM policies and compartments to control who can manage catalogs, assets, harvest, and metadata.
Why it matters: Governance requires role-based access.
Practical benefit: Separate duties between admins, stewards, and consumers.
Caveats: Harvesting access to source systems often requires additional policies/credentials.

9) Auditability (via Oracle Cloud auditing capabilities)

What it does: Administrative actions can be audited via OCI Audit (exact event coverage: verify).
Why it matters: Compliance needs traceability.
Practical benefit: Investigate who changed glossary definitions or asset registrations.
Caveats: You must enable and retain logs per policy and compliance requirements.

10) API/SDK/CLI support (automation)

What it does: Enables automation of catalog lifecycle, asset creation, harvesting, and metadata operations via APIs (verify the set of operations you need).
Why it matters: Scales onboarding and governance workflows.
Practical benefit: “Catalog as code” patterns for enterprise consistency.
Caveats: IAM and rate limits apply; build idempotent automation.

7. Architecture and How It Works

High-level architecture

Data Catalog sits between: – Metadata producers (data sources such as Object Storage and databases) – Metadata consumers (analysts, engineers, governance users) – Governance controls (IAM, tagging standards, auditing)

Key principle: Data Catalog stores metadata, not the data itself. Harvesting reads source metadata and indexes it in the catalog.

Request/data/control flow (typical)

An administrator creates a catalog in a compartment and region.
They register a data asset and configure access (IAM policies and/or credentials).
They run a harvest job: – The service connects to the source – Reads technical metadata (schemas, tables, files, columns) – Stores metadata objects in the catalog
Stewards enrich metadata with glossary terms, tags, and custom properties.
Consumers search/browse to find datasets and interpret them correctly.

Integrations with related Oracle Cloud services (common patterns)

Object Storage: catalog data lake buckets and curated datasets.
Autonomous Database / Autonomous Data Warehouse: catalog tables/views (connector support varies; verify).
OCI Vault: store database credentials/secrets (pattern depends on connector; verify).
OCI Events + Notifications: notify teams when harvest jobs fail or complete (pattern depends on available events; verify).
OCI Logging / Audit: operational traceability and compliance evidence.
OCI Data Integration / Data Flow: data pipelines; catalog provides metadata context. (Lineage availability depends on integration; verify.)

Dependency services

OCI IAM: policies, compartments, groups (mandatory)
Networking (VCN): required when harvesting private data sources (if supported via private endpoints; verify)
Source services: Object Storage, databases, etc.

Security/authentication model

User access to Data Catalog is governed by OCI IAM.
Service access (Data Catalog reading metadata from sources) typically requires:
OCI-native access policies for OCI resources (Object Storage, etc.)
Credentials for database sources (stored securely; exact method depends on connector—verify in docs)
Prefer least privilege: only allow read access required for metadata extraction.

Networking model

Access to Data Catalog is via Oracle Cloud endpoints in the region.
Harvesting network path depends on the source:
For OCI public endpoints (like Object Storage), IAM permission is often the primary gate.
For private databases, you may need private connectivity (VCN/private endpoint patterns—verify what Data Catalog supports in your region).

Monitoring/logging/governance considerations

Treat harvesting as an operational workload:
Schedule harvest windows
Monitor job outcomes
Track changes to glossary and tags
Use IAM and compartments to separate:
Platform admins
Data stewards
Read-only consumers

Simple architecture diagram (Mermaid)

flowchart LR
  U[User: Admin/Steward/Analyst] -->|Console/API| DC[Oracle Cloud Data Catalog]
  DC -->|Harvest metadata| OS[OCI Object Storage Bucket]
  DC --> M[(Metadata Index\nEntities/Attributes/Tags/Glossary)]
  U -->|Search/Browse| DC

Production-style architecture diagram (Mermaid)

flowchart TB
  subgraph Tenancy[Oracle Cloud Tenancy]
    subgraph IAM[OCI IAM]
      G[Groups/Roles]
      P[Policies]
    end

    subgraph Region[Region (e.g., us-ashburn-1)]
      DC[Data Catalog (Regional)]
      AUD[Audit]
      LOG[Logging/Monitoring]
      EVT[Events/Notifications]
      VAULT[OCI Vault (Secrets)]

      subgraph DataLake[Data Lake Compartment]
        OS1[Object Storage: Raw Bucket]
        OS2[Object Storage: Curated Bucket]
      end

      subgraph Warehouse[Analytics Compartment]
        ADB[Autonomous Database / ADW]
      end

      subgraph Network[VCN (if needed)]
        PE[Private Connectivity / Endpoint\n(verify Data Catalog support)]
      end
    end
  end

  G --> P
  U1[Admins/Stewards/Consumers] -->|IAM AuthZ| DC
  DC -->|Harvest| OS2
  DC -->|Harvest (if supported)| ADB
  DC -->|Read secrets (pattern)| VAULT
  DC --> AUD
  DC --> LOG
  DC --> EVT
  ADB --- PE

8. Prerequisites

Tenancy and billing

An active Oracle Cloud tenancy
Ability to create resources in the chosen region and compartment
Billing/credits as required by your account (Data Catalog may be metered; verify pricing and free tier eligibility)

Permissions / IAM roles

You need permissions to: – Create and manage Data Catalog resources in a compartment – Create and manage Object Storage resources for the lab (bucket + objects) – Grant Data Catalog (as a service) permission to read metadata from the target source (policy requirements vary)

Because IAM policies are security-critical and can change, use the official doc patterns for: – Data Catalog administrators – Data Catalog users – Service access to Object Storage or databases
Verify in official docs: https://docs.oracle.com/en-us/iaas/data-catalog/home.htm

Tools

Oracle Cloud Console access (browser)
Optional:
OCI CLI (if you want automation): https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm
SDKs (Python/Java/Go) if integrating with pipelines (verify Data Catalog SDK coverage)

Region availability

Data Catalog is not necessarily available in every region. Confirm in your region in the Console or official docs/service availability pages.

Quotas/limits

Catalog count, harvest frequency, and metadata volume may be governed by service limits.
Check Service Limits in the OCI Console for Data Catalog and related services.

Prerequisite services for this lab

Object Storage bucket in the same tenancy (and ideally same region)
A compartment to contain the lab resources

9. Pricing / Cost

Pricing changes over time and can be region-dependent. Do not rely on blog posts for exact numbers.

Current pricing model (how to confirm)

Oracle publishes OCI pricing on the official price list and pricing pages. Confirm Data Catalog pricing here: – OCI Pricing / Price List: https://www.oracle.com/cloud/price-list/ – OCI Cost Estimator (calculator): https://www.oracle.com/cloud/costestimator.html (if redirected, use the OCI cost estimator from the Oracle Cloud site)

Look for Data Management → Data Catalog in the price list. If the pricing page breaks out billable dimensions (for example, per catalog, per metadata volume, per user, per harvest, etc.), treat that as the source of truth.

Typical pricing dimensions to look for (verify)

Depending on Oracle’s current SKU model, pricing can be based on items such as: – Number of catalogs or capacity units – Amount of metadata stored/indexed – Number of users or requests – Harvest operations or scheduling frequency

Because these dimensions can change, verify in the official pricing entry for Data Catalog.

Cost drivers (direct and indirect)

Direct or near-direct drivers: – Number of catalogs (dev/test/prod separation can multiply costs) – Number of data assets and the metadata volume harvested – Frequency of harvest jobs (daily vs hourly) – Number of active users (if user-based pricing applies in your current SKU model)

Indirect drivers: – Object Storage cost for storing sample/curated datasets (your underlying data) – Network egress (generally avoid cross-region data access patterns if they cause additional cost) – Operational overhead: governance workflows and stewardship time – If private connectivity is required for sources, networking components may have cost

Network/data transfer implications

Harvesting reads metadata; for OCI-native services in the same region, data transfer charges are typically lower than cross-region or internet egress scenarios.
If cataloging sources across regions or through complex network paths, validate whether any data transfer fees apply.

How to optimize cost

Start with one catalog and a small number of assets; expand after standards are proven.
Harvest only what you need (avoid cataloging every raw bucket if it’s not useful).
Use tags/properties to identify “curated” vs “raw” datasets and prioritize harvesting curated zones.
Schedule harvesting at a reasonable cadence (nightly for many warehouses is enough; hourly harvesting can increase cost and operational noise).
Enforce lifecycle: retire/deprecate obsolete assets rather than leaving them searchable forever.

Example low-cost starter estimate (conceptual)

A low-cost starter typically looks like: – 1 catalog (single region) – 1–3 data assets (Object Storage curated bucket + one warehouse) – Harvest run manually during setup, then scheduled nightly – Limited steward group (2–5 users)

Use the OCI Cost Estimator and the Data Catalog pricing entry to compute your estimate. Do not assume “free” unless the official pricing explicitly states a free tier for your tenancy/region.

Example production cost considerations

In production, the cost shape is driven by: – Many assets across domains (Marketing, Finance, Ops) – Higher metadata object counts (tables, columns, partitions, files) – Frequent harvest schedules and governance workflows – Potential multi-region requirements (which can imply multiple catalogs)

10. Step-by-Step Hands-On Tutorial

Objective

Create an Oracle Cloud Data Catalog, catalog an Object Storage bucket by harvesting metadata, and enrich one discovered dataset with tags and a glossary term—all using a safe, beginner-friendly workflow.

Lab Overview

You will: 1. Create a compartment and an Object Storage bucket with a small sample dataset. 2. Create a Data Catalog in Oracle Cloud. 3. Configure IAM access so Data Catalog can read Object Storage metadata (policy statements vary; you will validate using official docs). 4. Register the bucket as a data asset and run a harvest job. 5. Search for the harvested dataset and enrich it with tags and glossary.

Expected end state: – A catalog exists and contains harvested metadata for a bucket/object path. – You can search and find an entity representing your dataset. – The entity is tagged and linked to a glossary term.

Step 1: Create a compartment for the lab

In the Oracle Cloud Console, open the navigation menu.
Go to Identity & Security → Compartments.
Click Create Compartment.
Name it: lab-datacatalog
(Optional) Description: Hands-on lab for Data Catalog tutorial
Click Create.

Expected outcome: A new compartment appears and becomes available within seconds (sometimes minutes).

Step 2: Create an Object Storage bucket and upload a sample file

Go to Storage → Object Storage & Archive Storage → Buckets.
Ensure you’re in the correct region and compartment (lab-datacatalog).
Click Create Bucket.
Bucket name: lab-dc-bucket-<unique-suffix>
Defaults are usually fine for a lab. Click Create.

Now create a small CSV file locally named customers.csv:

customer_id,full_name,email,country,signup_date
1001,Alice Johnson,alice@example.com,US,2024-01-12
1002,Bob Smith,bob@example.com,GB,2024-02-03
1003,Chandra Patel,chandra@example.com,IN,2024-02-19

Upload it: 1. Open your bucket. 2. Click Upload. 3. Select customers.csv. 4. Click Upload.

Expected outcome: The bucket contains customers.csv.

Verification: You can click the object name and view details (size, last modified).

Step 3: Create (or confirm) IAM permissions for Data Catalog and for your user

3A) Ensure your user/group can manage Data Catalog

If you’re in a training tenancy you might already be an admin. If not, you need IAM policies allowing your group to manage Data Catalog in the compartment.

Because policy naming and required verbs must be exact, use the official documentation’s IAM policy examples for Data Catalog: – Docs home (navigate to IAM/policies section): https://docs.oracle.com/en-us/iaas/data-catalog/home.htm

Create policies in: Identity & Security → Policies

Common pattern (example only—verify exact service names, resource-types, and verbs in docs): – Allow a group to manage Data Catalog resources in compartment lab-datacatalog.

3B) Allow Data Catalog to read Object Storage metadata

Harvesting needs permission to read Object Storage (at least bucket/object metadata, possibly object listings).

Use the official Data Catalog documentation for Object Storage harvesting IAM policy statements. Create them in a policy attached to the compartment containing the bucket.

Important: Do not over-permission. Grant read-only access and scope it to the lab compartment where possible.

Expected outcome: Policies exist and are attached to the correct compartment.

Verification: IAM policy changes can take a short time to propagate. If harvest fails with authorization errors, wait a few minutes and retry after confirming policies.

Step 4: Create a Data Catalog

Go to Analytics & AI (or search for Data Catalog in the console search bar).
Open Data Catalog.
Select compartment: lab-datacatalog.
Click Create Catalog.
Name: lab-catalog
(Optional) Description: Catalog for Object Storage metadata harvesting lab
Create.

Expected outcome: Catalog is created and appears as Active.

Verification: Open the catalog and confirm you can see catalog details and navigation items (Data Assets, Glossary, etc.).

Step 5: Register Object Storage as a Data Asset

Inside your catalog, go to Data Assets.
Click Create Data Asset.
Choose the data asset type for Object Storage (label can vary; select the OCI Object Storage option).
Provide: – Name: lab-os-asset – Description: Object Storage bucket for lab dataset – Bucket details: select/enter your bucket and namespace as required by the UI
Save/Create.

Expected outcome: A data asset representing your bucket exists in the catalog.

Verification: The data asset appears in the list and shows connection details (where configured).

Step 6: Run a harvest job to ingest metadata

Open the data asset lab-os-asset.
Locate Harvest (or “Harvesting”) in the asset actions.
Create a harvest job (or run a harvest immediately): – Harvest type: choose the default “metadata harvest” option shown – Scope: optionally limit to a prefix/path if your UI supports it (useful for large buckets)
Start the harvest.

Expected outcome: Harvest job starts and then completes successfully.

Verification: – Check harvest job status: Succeeded/Completed. – If the UI provides a job run log, review it for counts of discovered entities.

Step 7: Search for the harvested dataset and enrich metadata

7A) Find the dataset

In the catalog, use Search.
Search for: customers (or customers.csv depending on how the entity is represented).
Open the entity representing your dataset.

Expected outcome: You can view metadata such as name, location/path, and possibly inferred schema/columns (exact metadata depends on connector support).

7B) Add tags

In the entity details, find Tags (or classification).
Add tags such as: – Domain:Lab – Sensitivity:Internal – Lifecycle:Demo

Expected outcome: Tags appear on the entity and become searchable filters.

7C) Create a glossary term and link it

Go to Glossary.
Create a term: – Term: Customer – Definition: A person or organization that has signed up for our service.
Return to the customers entity and associate/link the glossary term (UI wording varies).

Expected outcome: The entity now shows an associated glossary term, improving business clarity.

Validation

Use this checklist:

Catalog exists and is Active.
Data asset exists for Object Storage bucket.
Harvest job succeeded.
Searching for customers returns at least one entity.
Entity shows your tags and linked glossary term.

If any item fails, use the troubleshooting section below.

Troubleshooting

Issue: Harvest fails with authorization/403 errors

Cause: Missing or incorrect IAM policy allowing Data Catalog service to read Object Storage.
Fix:
Re-check the official Data Catalog Object Storage harvesting policy examples.
Confirm policy is in the correct compartment (where the bucket resides).
Wait for IAM propagation (a few minutes) and retry harvest.

Issue: Bucket or namespace not found

Cause: Wrong region/compartment selected, or incorrect namespace.
Fix: Confirm region at the top right and the compartment selector in Object Storage and Data Catalog.

Issue: No entities found after harvest

Cause: Harvest scope/prefix excludes the object, or connector doesn’t infer metadata from the file type.
Fix:
Confirm customers.csv exists in the bucket.
Re-run harvest without prefix filters.
Check whether file-level metadata vs schema inference is supported for your connector/version (verify in docs).

Issue: Can’t see Data Catalog in console

Cause: Service not enabled/available in your region, or you lack IAM permissions.
Fix: Switch regions and confirm service availability; request access from your tenancy administrator.

Cleanup

To avoid ongoing costs and clutter, remove lab resources:

In Data Catalog: – Delete harvest jobs (if required by the UI) – Delete the data asset lab-os-asset – Delete the catalog lab-catalog
In Object Storage: – Delete customers.csv – Delete the bucket lab-dc-bucket-...
In IAM: – Remove lab-specific policies if they were created only for this exercise
Delete the compartment lab-datacatalog (only after all resources inside are deleted)

11. Best Practices

Architecture best practices

Start with a curated zone: Catalog your “silver/gold” datasets before raw ingestion zones.
Design for domains: Use consistent tagging like Domain:<name> and map assets to domain ownership.
Separate environments: Use separate catalogs or clear naming (and separate compartments) for dev/test/prod depending on governance needs and pricing.

IAM/security best practices

Use least privilege for both:
Human users (stewards vs consumers)
Service access for harvesting (read-only where possible)
Keep catalog administration limited to a small group.
Use compartments to enforce boundaries between domains or business units.

Cost best practices

Avoid harvesting everything. Harvesting should be intentional and tied to discovery value.
Set harvest schedules carefully; nightly is often enough.
Periodically deprecate/remove assets no longer needed.

Performance best practices

Use a naming standard for assets and entities to improve search quality.
Enforce required metadata fields (owner, description) through governance processes.
Keep tags controlled (avoid dozens of near-duplicates like PII, Pii, pii).

Reliability best practices

Treat harvest as a production job:
Define RACI for failures
Add alerts/notifications (where supported)
Document rollback/mitigation (e.g., last-known-good metadata)

Operations best practices

Create an operational runbook:
Harvest cadence
Failure handling
Change management for glossary
Use Audit and logging to track administrative activity.

Governance/tagging/naming best practices

Tag strategy examples:
Sensitivity:Public|Internal|Confidential|Restricted
Certification:Bronze|Silver|Gold
Domain:<DomainName>
OwnerTeam:<TeamName>
Name catalogs and assets with predictable prefixes:
prod-, nonprod-, sandbox-
Require a short description for every data asset and key entity.

12. Security Considerations

Identity and access model

Data Catalog uses OCI IAM for authentication and authorization.
Use groups and policies to separate:
Catalog administrators (create/manage catalogs, assets, harvest)
Data stewards (edit glossary, curation fields)
Consumers (read-only search/browse)

Encryption

Oracle Cloud services typically encrypt data at rest and in transit. Confirm Data Catalog’s encryption specifics and key management options (Oracle-managed keys vs customer-managed keys, if available) in official docs.

Network exposure

Console/API access uses Oracle Cloud endpoints.
Harvest connectivity to private sources may require private networking patterns (verify private endpoint support and requirements).

Secrets handling

If harvesting requires credentials (common for databases), store secrets securely:
Prefer Oracle Cloud Vault where supported by the connector pattern (verify).
Restrict who can view/rotate credentials.
Rotate secrets regularly and on staff changes.

Audit/logging

Use OCI Audit to record administrative events.
Retain logs per compliance requirements.
Monitor harvest activity and unexpected changes to glossary terms/tags.

Compliance considerations

Data Catalog helps with: – Data inventory visibility – Ownership and stewardship traceability – Classification tagging workflows
But it does not replace: – DLP tooling – Full data access monitoring on underlying stores – Data retention enforcement (that remains with the storage/database systems)

Common security mistakes

Granting overly broad Object Storage permissions to the Data Catalog service or to users
Using shared personal credentials for database harvesting
Allowing anyone to edit glossary terms (definitions become untrusted)
Not tracking who changed sensitive classification tags

Secure deployment recommendations

Use compartment boundaries and least privilege.
Centralize naming/tagging standards.
Restrict write privileges to curated metadata fields.
Establish a review workflow for high-impact glossary changes.

13. Limitations and Gotchas

Treat this section as a checklist to validate early; details vary by region and connector.

Connector coverage varies: Not all data sources are supported everywhere. Confirm supported data asset types in your region.
Regional service: Catalogs are regional. Multi-region organizations may need multiple catalogs and governance alignment.
IAM complexity: Harvesting often fails due to missing service permissions to source systems.
Metadata ≠ data: Data Catalog doesn’t grant access to the underlying data; it only indexes metadata.
Glossary success depends on process: Without stewardship and standards, glossary becomes stale.
Tag sprawl risk: Without controlled vocabulary, tags become inconsistent and reduce search value.
Private network sources: Harvesting private databases can require networking setup; validate what Data Catalog supports (private endpoints/connectivity).
Operational visibility: If you need detailed metrics/alerts, verify what native monitoring and events exist; you may need process tooling around it.
Deletion dependencies: You may need to delete harvest jobs or assets before deleting catalogs, depending on UI rules.

14. Comparison with Alternatives

Data Catalog is one component of a broader Data Management stack. Here’s how it compares to nearby options.

Comparison table

Option	Best For	Strengths	Weaknesses	When to Choose
Oracle Cloud Data Catalog	OCI-centric metadata discovery and governance	Managed service, integrates with OCI IAM/compartments, glossary + enrichment	Connector coverage and regional scope must be validated; governance requires process	You run data platforms on Oracle Cloud and want a managed metadata catalog
OCI Data Integration (metadata features)	ETL/ELT pipeline building with some metadata context	Strong for building pipelines; can complement a catalog	Not a dedicated enterprise catalog by itself	You need data pipelines first, and cataloging is a secondary need
Custom metadata in a database/wiki	Very small environments	Simple, cheap at tiny scale	Not searchable at enterprise scale; not governed; becomes stale	Small team with limited sources and minimal compliance requirements
AWS Glue Data Catalog	AWS data lake and analytics	Tight AWS integration; common in AWS ecosystems	AWS-specific; different IAM model	Your platform is primarily on AWS
Microsoft Purview	Microsoft-centric governance and cataloging	Broad governance suite, integrations across Microsoft stack	Complexity and licensing can be significant	Your ecosystem is Microsoft/Azure-first and you need broad governance suite
Google Cloud Dataplex Catalog (and related GCP governance tools)	GCP data governance	Integrates with GCP data services	GCP-specific	You are GCP-first and need native governance/catalog
Apache Atlas (self-managed)	Highly customizable governance	Open-source, extensible	Operational burden; scaling and UX depend on your implementation	You need deep customization and can operate the platform
DataHub / Amundsen (self-managed)	Modern metadata platforms	Strong community, flexible ingestion	You run/scale it; integrations vary	You want open ecosystem control and can invest in operations

15. Real-World Example

Enterprise example (regulated industry)

Problem: A financial services company runs multiple analytics domains on Oracle Cloud. Auditors request a repeatable inventory of datasets used for regulatory reporting, including definitions and owners. Teams also struggle with inconsistent KPI definitions across departments.

Proposed architecture: – One regional Data Catalog per primary region – Data assets for: – Curated Object Storage buckets (domain-based) – Autonomous Data Warehouse (core reporting) – Governance model: – Data stewards manage glossary and certification tags – Platform admins manage assets/harvesting – Consumers get read-only access – Operational integration: – Scheduled nightly harvest for curated sources – Audit log retention aligned to compliance policy

Why Data Catalog was chosen: – Native integration with Oracle Cloud IAM and compartments – Central business glossary connected to technical assets – Managed service reduces operational overhead vs self-hosting

Expected outcomes: – Faster audit response (inventory + ownership in one place) – Reduced KPI disputes due to glossary-driven definitions – Improved analyst productivity via search and certified datasets

Startup/small-team example

Problem: A SaaS startup stores product analytics events in Object Storage and a small warehouse. New team members don’t know which datasets are safe to use, and dashboards are inconsistent.

Proposed architecture: – Single Data Catalog in the team’s region – Catalog only curated datasets: – analytics_curated bucket paths – Warehouse schema BI_MART – Simple glossary: – “Active User”, “Conversion”, “Churn” – Tagging: – Certified:Gold for tables used in executive dashboards

Why Data Catalog was chosen: – Quick setup without building a custom system – Glossary + tags provide immediate value for a small team – Scales as the startup adds data sources

Expected outcomes: – New hires onboard faster – Fewer broken dashboards from misunderstanding data meaning – Better reuse of curated datasets

16. FAQ

1) Does Data Catalog store my actual data?
No. Data Catalog stores metadata (information about data). The underlying data remains in Object Storage, databases, or other systems.

2) Is Data Catalog a data governance platform?
It supports governance workflows (glossary, tags, ownership metadata), but full governance often requires processes and potentially additional tools.

3) Can Data Catalog catalog Object Storage buckets?
Commonly yes, through a data asset and harvest job for Object Storage. Confirm exact connector behavior and supported formats in the official docs.

4) Can I catalog Autonomous Data Warehouse or Autonomous Database?
Often yes, depending on connector support and your configuration. Verify supported sources and required credentials/networking.

5) How do users access Data Catalog?
Through the Oracle Cloud Console and APIs, controlled by OCI IAM policies.

6) How do I keep metadata up to date?
Use scheduled harvests (if supported in your UI) or run harvest jobs periodically. Also operationalize steward updates for business context.

7) What’s the difference between a catalog and a data asset?
A catalog is the container. A data asset is a registered source inside the catalog.

8) What’s a harvest job?
A harvest job connects to a data asset and extracts technical metadata into the catalog.

9) Can I restrict who can edit glossary terms?
Yes—use IAM policies and role separation so only stewards/admins can modify governed fields.

10) Will Data Catalog improve query performance?
No. It’s not a query engine. It improves discovery and understanding, not execution speed.

11) How do I classify sensitive fields (like email)?
Apply tags and/or custom properties at the entity/attribute level as supported. The exact tagging granularity depends on the harvested metadata model.

12) Does Data Catalog automatically detect PII?
Some catalogs provide classification features; do not assume automatic detection. Verify whether OCI Data Catalog includes automated classification in your current version/region, and consider complementary tooling if needed.

13) Can I automate onboarding of new datasets?
Yes, using APIs/CLI/SDK where supported. Many teams implement “catalog as code” patterns plus standard tags/properties.

14) What’s the best way to design tags?
Use controlled vocabularies and a small number of standardized dimensions (Sensitivity, Certification, Domain, OwnerTeam).

15) How do I estimate cost?
Use the official price list entry for Data Catalog and the OCI Cost Estimator. Costs depend on the pricing dimensions Oracle currently uses for this service—verify before scaling.

16) Should I create one catalog or many?
Start with one per environment or region, then scale only if governance boundaries require it. Multiple catalogs increase operational overhead and may increase cost.

17) Can I integrate Data Catalog with CI/CD?
Yes, by calling APIs in pipelines to create assets, apply tags, or trigger harvest. Ensure policies and secrets management are handled securely.

17. Top Online Resources to Learn Data Catalog

Resource Type	Name	Why It Is Useful
Official Documentation	OCI Data Catalog Documentation	Primary source for concepts, connectors, IAM policies, and API references: https://docs.oracle.com/en-us/iaas/data-catalog/home.htm
Official Pricing	Oracle Cloud Price List	Find Data Catalog under Data Management and confirm current billable dimensions: https://www.oracle.com/cloud/price-list/
Pricing Calculator	OCI Cost Estimator	Build scenario estimates using current SKUs: https://www.oracle.com/cloud/costestimator.html
Official Console	Oracle Cloud Console	Hands-on creation of catalogs, data assets, harvest jobs: https://cloud.oracle.com/
Architecture Center	Oracle Architecture Center	Reference architectures for data platforms that commonly include cataloging/governance patterns (search within): https://docs.oracle.com/en/solutions/
Tutorials / Workshops	Oracle LiveLabs	Hands-on labs (search for “Data Catalog” and verify lab availability): https://apexapps.oracle.com/pls/apex/r/dbpm/livelabs/home
API/CLI Docs	OCI CLI Installation and Usage	If you automate Data Catalog operations: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm
Community Learning	Oracle Cloud Customer Connect / Community	Practical troubleshooting and patterns (validate against docs): https://community.oracle.com/customerconnect/categories/oracle-cloud-infrastructure

18. Training and Certification Providers

Institute Name	Suitable Audience	Likely Learning Focus	Mode	Website URL
DevOpsSchool.com	DevOps engineers, platform teams, cloud engineers	OCI fundamentals, DevOps practices, cloud operations (verify course specifics)	check website	https://www.devopsschool.com/
ScmGalaxy.com	Beginners to intermediate engineers	SCM/DevOps foundations, automation practices (verify OCI coverage)	check website	https://www.scmgalaxy.com/
CLoudOpsNow.in	Cloud operations, SRE, platform operations	Cloud ops practices, monitoring, reliability (verify OCI content)	check website	https://www.cloudopsnow.in/
SreSchool.com	SREs, production operations teams	Reliability engineering, incident response, observability (verify cloud modules)	check website	https://www.sreschool.com/
AiOpsSchool.com	Ops teams adopting AIOps	AIOps concepts, operations analytics (verify integrations)	check website	https://www.aiopsschool.com/

19. Top Trainers

Platform/Site Name	Likely Specialization	Suitable Audience	Website URL
RajeshKumar.xyz	DevOps/cloud training resources (verify specific offerings)	Students and working engineers	https://www.rajeshkumar.xyz/
devopstrainer.in	DevOps training and mentoring (verify course catalog)	Beginners to intermediate DevOps engineers	https://www.devopstrainer.in/
devopsfreelancer.com	Freelance DevOps guidance/training resources (verify offerings)	Teams needing short-term enablement	https://www.devopsfreelancer.com/
devopssupport.in	DevOps support and enablement resources (verify services)	Ops and DevOps teams	https://www.devopssupport.in/

20. Top Consulting Companies

Company Name	Likely Service Area	Where They May Help	Consulting Use Case Examples	Website URL
cotocus.com	Cloud/DevOps consulting (verify current portfolio)	Platform engineering, cloud adoption, operations	Standing up governance-friendly cloud landing zones; automation and operational readiness	https://www.cotocus.com/
DevOpsSchool.com	Training + consulting (verify service catalog)	Enablement, DevOps transformation, cloud best practices	Designing IAM and operational runbooks; implementing CI/CD and automation around data platforms	https://www.devopsschool.com/
DEVOPSCONSULTING.IN	DevOps consulting (verify offerings)	DevOps tooling, reliability improvements	Building monitoring/alerting and incident processes; automation for cloud resource provisioning	https://www.devopsconsulting.in/

21. Career and Learning Roadmap

What to learn before Data Catalog

Oracle Cloud fundamentals:
Tenancy, compartments, IAM policies, groups
Regions and availability
Object Storage basics (buckets, objects, namespaces)
Basic data concepts:
Schemas, tables, partitions, file formats
Data lake vs data warehouse
Governance foundations:
Data ownership, stewardship, classification

What to learn after Data Catalog

Data pipelines and processing:
OCI Data Integration, OCI Data Flow (or your preferred tools)
Security hardening:
OCI Vault, key management, network segmentation
Observability and operations:
OCI Logging, Monitoring, Audit, and alerting patterns
Advanced governance:
Data-quality checks, access reviews, retention policies (implemented in source systems)

Job roles that use it

Data Engineer (metadata-aware pipelines)
Analytics Engineer (semantic definitions, curated marts)
Data Steward / Governance Analyst (glossary, classification)
Cloud Engineer / Platform Engineer (IAM, compartments, automation)
Security Engineer (classification workflows, audit readiness)
Solution Architect (data platform design)

Certification path (if available)

Oracle’s certification catalog changes over time. Look for: – OCI architect and data-related certifications on the official Oracle University pages.
Verify current paths here: https://education.oracle.com/

Project ideas for practice

Curated dataset certification workflow: Tag assets as Bronze/Silver/Gold and document steward review steps.
Glossary-driven metrics: Build a glossary for 20 key KPIs and link them to warehouse columns.
Automated asset onboarding: Script creation of data assets and harvesting (API/CLI), then auto-apply tags.
Compliance inventory: Maintain a list of datasets tagged Confidential and perform quarterly owner reviews.
Multi-compartment domain model: Organize assets by domain compartments and implement least-privilege access.

22. Glossary

Catalog: A regional container in Oracle Cloud Data Catalog that stores harvested metadata and business context.
Data Asset: A registered data source (Object Storage, database, etc.) that can be harvested.
Harvest: The process/job that extracts technical metadata from a data asset into the catalog.
Entity: A metadata object in the catalog (table, file, view, column/attribute, etc.).
Business Glossary: A curated set of business terms and definitions linked to technical metadata.
Tag: A label applied to catalog objects for classification and discovery.
Custom Property: An organization-defined metadata field added to catalog objects (owner, SLA, domain, etc.).
Compartment: OCI logical container for organizing resources and applying IAM access control.
IAM Policy: A statement that grants permissions to groups/users/services for OCI resources.
Steward: A role responsible for maintaining business definitions and governance metadata.
Certified Dataset: A dataset that has been reviewed and approved for broad use (implemented via tags/process).
Metadata: Data about data—schema, structure, definitions, location, and governance annotations.

23. Summary

Oracle Cloud Data Catalog is a managed metadata discovery and governance service in the Data Management category. It helps organizations find datasets faster, standardize definitions with a business glossary, and operationalize stewardship through tags and custom properties—without moving the underlying data.

Architecturally, it works by registering data assets and running harvest jobs to ingest technical metadata, then enabling users to search and enrich that metadata securely using OCI IAM controls. Cost depends on Oracle’s current pricing dimensions for Data Catalog (confirm in the official price list), and indirect costs are mostly driven by how broadly and frequently you harvest.

Use Data Catalog when you need reliable data discovery and shared definitions across multiple teams and sources in Oracle Cloud. Next step: expand from the lab by cataloging one curated production domain, establishing a minimal glossary, and implementing a controlled tagging standard backed by IAM role separation.

rajeshkumar

Category