{"id":884,"date":"2026-04-16T13:26:11","date_gmt":"2026-04-16T13:26:11","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/oracle-cloud-data-catalog-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-data-management\/"},"modified":"2026-04-16T13:26:11","modified_gmt":"2026-04-16T13:26:11","slug":"oracle-cloud-data-catalog-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-data-management","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/oracle-cloud-data-catalog-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-data-management\/","title":{"rendered":"Oracle Cloud Data Catalog Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Data Management"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p>Data Management<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<p>Oracle Cloud <strong>Data Catalog<\/strong> is Oracle Cloud Infrastructure\u2019s managed service for <strong>discovering, organizing, and governing metadata<\/strong> about the data your organization stores across databases, data lakes, and analytics platforms.<\/p>\n\n\n\n<p>In simple terms: <strong>Data Catalog helps you answer \u201cWhat data do we have, where is it, who owns it, and how should it be used?\u201d<\/strong>\u2014without moving or copying the underlying data.<\/p>\n\n\n\n<p>Technically, Data Catalog is a <strong>metadata management<\/strong> and <strong>data discovery<\/strong> service. You create a catalog, register data sources (called <strong>data assets<\/strong>), run <strong>harvest<\/strong> jobs to extract technical metadata (schemas, tables, files, columns, etc.), and enrich that metadata with business context such as <strong>glossary terms<\/strong>, <strong>tags<\/strong>, and <strong>custom properties<\/strong>. Consumers then use <strong>search<\/strong> and <strong>browsing<\/strong> to find trusted datasets faster.<\/p>\n\n\n\n<p>It solves common data-management problems such as:\n&#8211; Lack of visibility into what data exists across teams and clouds\n&#8211; Inconsistent definitions (e.g., \u201ccustomer\u201d, \u201crevenue\u201d, \u201cactive user\u201d)\n&#8211; Difficulty finding the right dataset and its owner\/steward\n&#8211; Governance needs for audits and compliance (knowing what exists, where, and how it\u2019s classified)<\/p>\n\n\n\n<blockquote>\n<p>Service name check: The service is commonly documented as <strong>Oracle Cloud Infrastructure (OCI) Data Catalog<\/strong>. This tutorial uses the required primary name <strong>Data Catalog<\/strong> and keeps alignment with <strong>Oracle Cloud<\/strong> and <strong>Data Management<\/strong>. If Oracle renames any UI labels or endpoints in your region, <strong>verify in official docs<\/strong>.<\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is Data Catalog?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Official purpose<\/h3>\n\n\n\n<p>In Oracle Cloud\u2019s Data Management portfolio, <strong>Data Catalog<\/strong> is intended to provide a centralized place to:\n&#8211; <strong>Collect technical metadata<\/strong> from supported data sources\n&#8211; <strong>Organize and curate<\/strong> that metadata for discoverability\n&#8211; <strong>Add business context<\/strong> using glossary, tags, and properties\n&#8211; <strong>Support governance<\/strong> by making ownership and definitions explicit<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Core capabilities (what it does)<\/h3>\n\n\n\n<p>Data Catalog typically supports the following capability areas (exact source coverage depends on your region and connectors; <strong>verify supported data assets in official docs<\/strong>):\n&#8211; <strong>Metadata harvesting<\/strong> from registered data assets\n&#8211; <strong>Search and discovery<\/strong> across harvested entities (tables, views, files, columns, etc.)\n&#8211; <strong>Business glossary<\/strong> for definitions and standard terminology\n&#8211; <strong>Curation and enrichment<\/strong> via tags, custom properties, and relationships\n&#8211; <strong>Access control<\/strong> using Oracle Cloud IAM and compartments<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Major components (mental model)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Catalog<\/strong>: The top-level container for metadata. Created in a specific Oracle Cloud region and compartment.<\/li>\n<li><strong>Data asset<\/strong>: A registered data source (for example, Object Storage, Autonomous Database, or other supported sources). Think of it as \u201cthis is where metadata can be harvested from.\u201d<\/li>\n<li><strong>Connection \/ credential<\/strong>: How Data Catalog authenticates to the data asset (varies by source type; may use IAM\/service access for OCI-native services or credentials for databases).<\/li>\n<li><strong>Harvest<\/strong>: A job (manual or scheduled) that extracts metadata from a data asset into the catalog.<\/li>\n<li><strong>Entities<\/strong>: The harvested objects (schemas, tables, columns, files, etc.) represented in the catalog.<\/li>\n<li><strong>Glossary \/ terms<\/strong>: Business definitions linked to harvested entities to clarify meaning and intended use.<\/li>\n<li><strong>Tags and custom properties<\/strong>: Lightweight governance controls (classification, sensitivity, owner, SLA tier, domain, etc.)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Service type<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed Oracle Cloud service<\/strong> (control plane managed by Oracle)<\/li>\n<li><strong>Metadata system<\/strong> (stores metadata and governance context, not the underlying data)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scope: regional vs global<\/h3>\n\n\n\n<p>Data Catalog is created in a <strong>specific Oracle Cloud region<\/strong> and a <strong>compartment<\/strong> within your tenancy. You can catalog sources across compartments if IAM policies allow it. Cross-region cataloging patterns exist, but the catalog itself is regional; plan accordingly and <strong>verify current cross-region support<\/strong> in official docs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How it fits into the Oracle Cloud ecosystem<\/h3>\n\n\n\n<p>Data Catalog sits at the center of a typical Oracle Cloud Data Management and analytics environment:\n&#8211; Data producers store data in <strong>Object Storage<\/strong>, <strong>Autonomous Database<\/strong>, and other platforms.\n&#8211; Data engineers transform data using services such as <strong>OCI Data Integration<\/strong>, <strong>OCI Data Flow<\/strong>, and other processing engines.\n&#8211; Data Catalog provides the <strong>\u201csystem of record\u201d for metadata<\/strong>, helping analysts and engineers find and interpret datasets.\n&#8211; Security and governance rely on <strong>OCI IAM<\/strong>, <strong>Audit<\/strong>, and tagging strategies.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use Data Catalog?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Faster time-to-data<\/strong>: Teams spend less time searching and re-creating datasets.<\/li>\n<li><strong>Better decision-making<\/strong>: Shared definitions reduce reporting conflicts.<\/li>\n<li><strong>Reduced risk<\/strong>: Easier to identify sensitive data locations for compliance initiatives.<\/li>\n<li><strong>Increased reuse<\/strong>: Analysts find trusted datasets instead of building shadow copies.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Central metadata index<\/strong> for multiple sources<\/li>\n<li><strong>Searchable inventory<\/strong> of tables\/files\/columns and their attributes<\/li>\n<li><strong>Standardization<\/strong> via glossary and curated metadata<\/li>\n<li><strong>Extensibility<\/strong> through tags and custom properties<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Repeatable harvesting<\/strong> (manual\/scheduled) to keep metadata current<\/li>\n<li><strong>Ownership and stewardship<\/strong> captured alongside metadata<\/li>\n<li>Better handoffs between engineering, analytics, and governance teams<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Supports governance patterns like:<\/li>\n<li>\u201cKnow where PII might exist\u201d<\/li>\n<li>\u201cWho owns this dataset?\u201d<\/li>\n<li>\u201cWhat\u2019s the approved definition of a metric?\u201d<\/li>\n<li>Integrates with IAM for access control and with auditing capabilities in Oracle Cloud.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<p>Data Catalog is designed to scale in metadata volume and user access patterns typical of medium-to-large enterprises. The underlying data stays in place; you manage <strong>metadata<\/strong>, which is far lighter than copying datasets.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose Data Catalog<\/h3>\n\n\n\n<p>Choose Data Catalog when:\n&#8211; You have multiple data sources and need a <strong>single discovery experience<\/strong>\n&#8211; You need a <strong>business glossary<\/strong> tied to real datasets\n&#8211; You want to operationalize data governance without building a custom metadata system\n&#8211; You want an Oracle-managed metadata catalog integrated with Oracle Cloud IAM<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should not choose it<\/h3>\n\n\n\n<p>Data Catalog may not be the right fit if:\n&#8211; You only have one small data store and discovery is trivial\n&#8211; You need full <strong>data-quality rules engine<\/strong> or <strong>master data management<\/strong> (different tool category)\n&#8211; You require capabilities not currently supported by Data Catalog connectors in your region (verify first)\n&#8211; You want a fully open-source\/self-managed solution with deep customization and are willing to operate it<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is Data Catalog used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Financial services (regulatory reporting, audit readiness)<\/li>\n<li>Healthcare\/life sciences (data sensitivity classification)<\/li>\n<li>Retail\/e-commerce (product\/customer analytics definitions)<\/li>\n<li>Telecom (large-scale data platforms with many producers)<\/li>\n<li>Government\/public sector (data inventories and stewardship)<\/li>\n<li>SaaS companies (internal analytics governance)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data platform teams<\/li>\n<li>Data engineering and ETL teams<\/li>\n<li>Analytics engineering teams<\/li>\n<li>BI teams and data analysts<\/li>\n<li>Security and compliance teams<\/li>\n<li>Enterprise architecture and governance teams<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data lake discovery (Object Storage)<\/li>\n<li>Data warehouse cataloging (Autonomous Data Warehouse and other supported DBs)<\/li>\n<li>Cross-domain metrics standardization (glossary-driven analytics)<\/li>\n<li>Migration governance (inventory before moving data)<\/li>\n<li>Audit response (identify datasets and owners)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Central lakehouse with multiple pipelines<\/li>\n<li>Multi-compartment data mesh-like layouts (domain-based compartments)<\/li>\n<li>Hybrid environments (OCI plus external sources where supported; verify connector coverage)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Real-world deployment contexts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Production: catalog is used by analysts and governance daily; harvesting is scheduled and monitored.<\/li>\n<li>Dev\/test: used to validate metadata extraction and glossary structure before scaling.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p>Below are realistic scenarios where Oracle Cloud <strong>Data Catalog<\/strong> commonly fits.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Data lake discovery for Object Storage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Hundreds of buckets and folders; nobody knows what\u2019s inside.<\/li>\n<li><strong>Why it fits:<\/strong> Data Catalog can harvest and index metadata for supported Object Storage structures (verify exact capabilities for file formats and depth).<\/li>\n<li><strong>Scenario:<\/strong> A data platform team catalogs curated datasets in Object Storage so analysts can search for \u201corders\u201d and find the canonical dataset plus owner.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Cataloging a data warehouse for self-service analytics<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Analysts can query the warehouse but don\u2019t know table meanings.<\/li>\n<li><strong>Why it fits:<\/strong> Harvest tables\/columns and enrich them with glossary terms and curated descriptions.<\/li>\n<li><strong>Scenario:<\/strong> Finance defines \u201cNet Revenue\u201d as a glossary term and links it to the correct column(s) in the warehouse.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Standardizing business definitions across departments<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> \u201cCustomer\u201d means different things in Sales vs Support.<\/li>\n<li><strong>Why it fits:<\/strong> Glossary provides a shared vocabulary with stewarded definitions.<\/li>\n<li><strong>Scenario:<\/strong> Governance team defines \u201cCustomer (Bill-to)\u201d and \u201cCustomer (User)\u201d as separate terms and maps datasets accordingly.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Ownership and stewardship mapping (operational governance)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> No one knows who to contact about a dataset.<\/li>\n<li><strong>Why it fits:<\/strong> Use custom properties\/tags to record owner, steward, support channel, SLA tier.<\/li>\n<li><strong>Scenario:<\/strong> Every curated dataset includes <code>Owner<\/code>, <code>Steward<\/code>, <code>SlackChannel<\/code>, and <code>RefreshFrequency<\/code>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Sensitive data discovery support (classification workflow)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Compliance asks where PII exists; teams respond manually.<\/li>\n<li><strong>Why it fits:<\/strong> Tag entities\/attributes with classifications; create views of sensitive datasets.<\/li>\n<li><strong>Scenario:<\/strong> A quarterly review exports a list of entities tagged as <code>PII<\/code> for follow-up controls (actual export\/reporting methods depend on UI\/API; verify).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Pre-migration inventory and rationalization<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Before migrating to OCI, you need an inventory of sources and schemas.<\/li>\n<li><strong>Why it fits:<\/strong> Data Catalog becomes a landing place for harvested metadata, highlighting duplicates and unused datasets.<\/li>\n<li><strong>Scenario:<\/strong> During warehouse modernization, teams catalog legacy schemas, then mark deprecated datasets with tags.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Data product catalog for a platform team (data mesh-ish)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Domain teams publish data products but discovery is fragmented.<\/li>\n<li><strong>Why it fits:<\/strong> Central catalog with domain-based tags and glossary.<\/li>\n<li><strong>Scenario:<\/strong> Marketing and Supply Chain publish certified datasets; Data Catalog becomes the discovery portal.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Faster onboarding for new engineers and analysts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> New hires take weeks to learn data landscape.<\/li>\n<li><strong>Why it fits:<\/strong> Search, browse, and glossary shorten ramp-up time.<\/li>\n<li><strong>Scenario:<\/strong> A new analyst searches \u201creturns\u201d and quickly finds the curated returns dataset and its definition.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Pipeline change impact analysis (metadata-based)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Schema changes break dashboards; teams don\u2019t see dependencies.<\/li>\n<li><strong>Why it fits:<\/strong> Metadata and relationships can help document dependencies; if lineage integrations are available in your setup, it\u2019s even stronger (verify lineage support\/integration).<\/li>\n<li><strong>Scenario:<\/strong> Data engineers annotate downstream consumers in custom properties and use consistent tags for impacted domains.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Audit response and evidence collection<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Auditors ask for data inventory, ownership, and definitions.<\/li>\n<li><strong>Why it fits:<\/strong> Catalog provides centralized metadata, ownership, and governance artifacts.<\/li>\n<li><strong>Scenario:<\/strong> Security exports a list of datasets tagged <code>Confidential<\/code> and shows steward approvals recorded in process (process tooling is external; catalog supports the metadata).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) Shared KPI metric governance for BI<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Multiple dashboards calculate metrics differently.<\/li>\n<li><strong>Why it fits:<\/strong> Glossary defines metrics and points to canonical datasets\/columns.<\/li>\n<li><strong>Scenario:<\/strong> \u201cActive Subscriber\u201d is defined once, used across BI reports.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) Cross-team dataset certification<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem:<\/strong> Users can\u2019t tell trusted datasets from experimental ones.<\/li>\n<li><strong>Why it fits:<\/strong> Tag datasets as <code>Certified<\/code>, <code>Bronze\/Silver\/Gold<\/code>, or <code>Trusted<\/code>.<\/li>\n<li><strong>Scenario:<\/strong> Platform team certifies \u201cGold\u201d tables after validation; analysts filter search to only certified assets.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<blockquote>\n<p>Feature availability can vary by region, permissions, and connector type. Confirm exact UI labels and supported source types in the official documentation.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">1) Catalogs (metadata containers)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Provides a top-level container to store metadata, glossary, tags, and enrichment.<\/li>\n<li><strong>Why it matters:<\/strong> Separates environments or domains (e.g., \u201cProd Catalog\u201d vs \u201cSandbox Catalog\u201d).<\/li>\n<li><strong>Practical benefit:<\/strong> Cleaner governance boundaries and access control.<\/li>\n<li><strong>Caveats:<\/strong> Catalog is regional; plan for multi-region architectures.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Data assets (source registration)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Registers a data source for harvesting.<\/li>\n<li><strong>Why it matters:<\/strong> Establishes the \u201cwhere\u201d for metadata.<\/li>\n<li><strong>Practical benefit:<\/strong> Standardized onboarding process for new sources.<\/li>\n<li><strong>Caveats:<\/strong> Each asset type has distinct connection requirements.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Harvesting (metadata extraction jobs)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Extracts and updates technical metadata from a data asset into the catalog.<\/li>\n<li><strong>Why it matters:<\/strong> Keeps metadata current as schemas\/files evolve.<\/li>\n<li><strong>Practical benefit:<\/strong> Repeatable scheduled refresh reduces manual documentation.<\/li>\n<li><strong>Caveats:<\/strong> Requires correct IAM\/credentials and network access; harvesting can fail if policies are missing.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Search and browse<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Lets users find entities using keywords, filters, and navigation.<\/li>\n<li><strong>Why it matters:<\/strong> Discovery is the core value of a catalog.<\/li>\n<li><strong>Practical benefit:<\/strong> Reduces tribal knowledge dependency.<\/li>\n<li><strong>Caveats:<\/strong> Search quality depends on metadata quality; add descriptions, glossary terms, tags.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Business glossary<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Stores business terms, definitions, and associations to technical assets.<\/li>\n<li><strong>Why it matters:<\/strong> Aligns teams on consistent definitions.<\/li>\n<li><strong>Practical benefit:<\/strong> BI and analytics become more reliable.<\/li>\n<li><strong>Caveats:<\/strong> Glossary governance is a people\/process challenge; needs steward ownership.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Tags (classification and organization)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Apply labels to assets\/entities\/attributes.<\/li>\n<li><strong>Why it matters:<\/strong> Enables filtering, governance, and lifecycle management.<\/li>\n<li><strong>Practical benefit:<\/strong> Common tags: <code>PII<\/code>, <code>Confidential<\/code>, <code>Certified<\/code>, <code>Domain:Marketing<\/code>.<\/li>\n<li><strong>Caveats:<\/strong> Without naming conventions, tags become messy and duplicated.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Custom properties (metadata enrichment)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Adds organization-specific fields (owner, SLA, refresh frequency, cost center).<\/li>\n<li><strong>Why it matters:<\/strong> Most governance needs are organization-specific.<\/li>\n<li><strong>Practical benefit:<\/strong> Convert tribal knowledge into structured metadata.<\/li>\n<li><strong>Caveats:<\/strong> Over-customization can reduce usability; keep a controlled list.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) IAM integration (access control)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Uses Oracle Cloud IAM policies and compartments to control who can manage catalogs, assets, harvest, and metadata.<\/li>\n<li><strong>Why it matters:<\/strong> Governance requires role-based access.<\/li>\n<li><strong>Practical benefit:<\/strong> Separate duties between admins, stewards, and consumers.<\/li>\n<li><strong>Caveats:<\/strong> Harvesting access to source systems often requires additional policies\/credentials.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Auditability (via Oracle Cloud auditing capabilities)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Administrative actions can be audited via OCI Audit (exact event coverage: verify).<\/li>\n<li><strong>Why it matters:<\/strong> Compliance needs traceability.<\/li>\n<li><strong>Practical benefit:<\/strong> Investigate who changed glossary definitions or asset registrations.<\/li>\n<li><strong>Caveats:<\/strong> You must enable and retain logs per policy and compliance requirements.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) API\/SDK\/CLI support (automation)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does:<\/strong> Enables automation of catalog lifecycle, asset creation, harvesting, and metadata operations via APIs (verify the set of operations you need).<\/li>\n<li><strong>Why it matters:<\/strong> Scales onboarding and governance workflows.<\/li>\n<li><strong>Practical benefit:<\/strong> \u201cCatalog as code\u201d patterns for enterprise consistency.<\/li>\n<li><strong>Caveats:<\/strong> IAM and rate limits apply; build idempotent automation.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level architecture<\/h3>\n\n\n\n<p>Data Catalog sits between:\n&#8211; <strong>Metadata producers<\/strong> (data sources such as Object Storage and databases)\n&#8211; <strong>Metadata consumers<\/strong> (analysts, engineers, governance users)\n&#8211; <strong>Governance controls<\/strong> (IAM, tagging standards, auditing)<\/p>\n\n\n\n<p>Key principle: <strong>Data Catalog stores metadata, not the data itself.<\/strong> Harvesting reads source metadata and indexes it in the catalog.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Request\/data\/control flow (typical)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>An administrator creates a <strong>catalog<\/strong> in a compartment and region.<\/li>\n<li>They register a <strong>data asset<\/strong> and configure access (IAM policies and\/or credentials).<\/li>\n<li>They run a <strong>harvest job<\/strong>:\n   &#8211; The service connects to the source\n   &#8211; Reads technical metadata (schemas, tables, files, columns)\n   &#8211; Stores metadata objects in the catalog<\/li>\n<li>Stewards enrich metadata with <strong>glossary terms<\/strong>, <strong>tags<\/strong>, and <strong>custom properties<\/strong>.<\/li>\n<li>Consumers search\/browse to find datasets and interpret them correctly.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Integrations with related Oracle Cloud services (common patterns)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Object Storage<\/strong>: catalog data lake buckets and curated datasets.<\/li>\n<li><strong>Autonomous Database \/ Autonomous Data Warehouse<\/strong>: catalog tables\/views (connector support varies; verify).<\/li>\n<li><strong>OCI Vault<\/strong>: store database credentials\/secrets (pattern depends on connector; verify).<\/li>\n<li><strong>OCI Events + Notifications<\/strong>: notify teams when harvest jobs fail or complete (pattern depends on available events; verify).<\/li>\n<li><strong>OCI Logging \/ Audit<\/strong>: operational traceability and compliance evidence.<\/li>\n<li><strong>OCI Data Integration \/ Data Flow<\/strong>: data pipelines; catalog provides metadata context. (Lineage availability depends on integration; verify.)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>OCI IAM<\/strong>: policies, compartments, groups (mandatory)<\/li>\n<li><strong>Networking (VCN)<\/strong>: required when harvesting private data sources (if supported via private endpoints; verify)<\/li>\n<li><strong>Source services<\/strong>: Object Storage, databases, etc.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>User access<\/strong> to Data Catalog is governed by <strong>OCI IAM<\/strong>.<\/li>\n<li><strong>Service access<\/strong> (Data Catalog reading metadata from sources) typically requires:<\/li>\n<li>OCI-native access policies for OCI resources (Object Storage, etc.)<\/li>\n<li>Credentials for database sources (stored securely; exact method depends on connector\u2014verify in docs)<\/li>\n<li>Prefer least privilege: only allow read access required for metadata extraction.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Access to Data Catalog is via Oracle Cloud endpoints in the region.<\/li>\n<li>Harvesting network path depends on the source:<\/li>\n<li>For OCI public endpoints (like Object Storage), IAM permission is often the primary gate.<\/li>\n<li>For private databases, you may need private connectivity (VCN\/private endpoint patterns\u2014verify what Data Catalog supports in your region).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring\/logging\/governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Treat harvesting as an operational workload:<\/li>\n<li>Schedule harvest windows<\/li>\n<li>Monitor job outcomes<\/li>\n<li>Track changes to glossary and tags<\/li>\n<li>Use IAM and compartments to separate:<\/li>\n<li>Platform admins<\/li>\n<li>Data stewards<\/li>\n<li>Read-only consumers<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  U[User: Admin\/Steward\/Analyst] --&gt;|Console\/API| DC[Oracle Cloud Data Catalog]\n  DC --&gt;|Harvest metadata| OS[OCI Object Storage Bucket]\n  DC --&gt; M[(Metadata Index\\nEntities\/Attributes\/Tags\/Glossary)]\n  U --&gt;|Search\/Browse| DC\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph Tenancy[Oracle Cloud Tenancy]\n    subgraph IAM[OCI IAM]\n      G[Groups\/Roles]\n      P[Policies]\n    end\n\n    subgraph Region[Region (e.g., us-ashburn-1)]\n      DC[Data Catalog (Regional)]\n      AUD[Audit]\n      LOG[Logging\/Monitoring]\n      EVT[Events\/Notifications]\n      VAULT[OCI Vault (Secrets)]\n\n      subgraph DataLake[Data Lake Compartment]\n        OS1[Object Storage: Raw Bucket]\n        OS2[Object Storage: Curated Bucket]\n      end\n\n      subgraph Warehouse[Analytics Compartment]\n        ADB[Autonomous Database \/ ADW]\n      end\n\n      subgraph Network[VCN (if needed)]\n        PE[Private Connectivity \/ Endpoint\\n(verify Data Catalog support)]\n      end\n    end\n  end\n\n  G --&gt; P\n  U1[Admins\/Stewards\/Consumers] --&gt;|IAM AuthZ| DC\n  DC --&gt;|Harvest| OS2\n  DC --&gt;|Harvest (if supported)| ADB\n  DC --&gt;|Read secrets (pattern)| VAULT\n  DC --&gt; AUD\n  DC --&gt; LOG\n  DC --&gt; EVT\n  ADB --- PE\n<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Tenancy and billing<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An active <strong>Oracle Cloud tenancy<\/strong><\/li>\n<li>Ability to create resources in the chosen region and compartment<\/li>\n<li>Billing\/credits as required by your account (Data Catalog may be metered; <strong>verify pricing and free tier eligibility<\/strong>)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions \/ IAM roles<\/h3>\n\n\n\n<p>You need permissions to:\n&#8211; Create and manage <strong>Data Catalog<\/strong> resources in a compartment\n&#8211; Create and manage <strong>Object Storage<\/strong> resources for the lab (bucket + objects)\n&#8211; Grant Data Catalog (as a service) permission to read metadata from the target source (policy requirements vary)<\/p>\n\n\n\n<p>Because IAM policies are security-critical and can change, use the official doc patterns for:\n&#8211; Data Catalog administrators\n&#8211; Data Catalog users\n&#8211; Service access to Object Storage or databases<br\/>\n<strong>Verify in official docs<\/strong>: https:\/\/docs.oracle.com\/en-us\/iaas\/data-catalog\/home.htm<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Tools<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Oracle Cloud Console access (browser)<\/li>\n<li>Optional:<\/li>\n<li>OCI CLI (if you want automation): https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/API\/SDKDocs\/cliinstall.htm<\/li>\n<li>SDKs (Python\/Java\/Go) if integrating with pipelines (verify Data Catalog SDK coverage)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data Catalog is not necessarily available in every region. Confirm in your region in the Console or official docs\/service availability pages.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas\/limits<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Catalog count, harvest frequency, and metadata volume may be governed by service limits.<\/li>\n<li>Check <strong>Service Limits<\/strong> in the OCI Console for Data Catalog and related services.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services for this lab<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Object Storage<\/strong> bucket in the same tenancy (and ideally same region)<\/li>\n<li>A compartment to contain the lab resources<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<blockquote>\n<p>Pricing changes over time and can be region-dependent. Do not rely on blog posts for exact numbers.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Current pricing model (how to confirm)<\/h3>\n\n\n\n<p>Oracle publishes OCI pricing on the official price list and pricing pages. Confirm Data Catalog pricing here:\n&#8211; OCI Pricing \/ Price List: https:\/\/www.oracle.com\/cloud\/price-list\/\n&#8211; OCI Cost Estimator (calculator): https:\/\/www.oracle.com\/cloud\/costestimator.html (if redirected, use the OCI cost estimator from the Oracle Cloud site)<\/p>\n\n\n\n<p>Look for <strong>Data Management \u2192 Data Catalog<\/strong> in the price list. If the pricing page breaks out billable dimensions (for example, per catalog, per metadata volume, per user, per harvest, etc.), treat that as the source of truth.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Typical pricing dimensions to look for (verify)<\/h3>\n\n\n\n<p>Depending on Oracle\u2019s current SKU model, pricing can be based on items such as:\n&#8211; Number of catalogs or capacity units\n&#8211; Amount of metadata stored\/indexed\n&#8211; Number of users or requests\n&#8211; Harvest operations or scheduling frequency<\/p>\n\n\n\n<p>Because these dimensions can change, <strong>verify in the official pricing entry<\/strong> for Data Catalog.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Cost drivers (direct and indirect)<\/h3>\n\n\n\n<p>Direct or near-direct drivers:\n&#8211; Number of catalogs (dev\/test\/prod separation can multiply costs)\n&#8211; Number of data assets and the metadata volume harvested\n&#8211; Frequency of harvest jobs (daily vs hourly)\n&#8211; Number of active users (if user-based pricing applies in your current SKU model)<\/p>\n\n\n\n<p>Indirect drivers:\n&#8211; Object Storage cost for storing sample\/curated datasets (your underlying data)\n&#8211; Network egress (generally avoid cross-region data access patterns if they cause additional cost)\n&#8211; Operational overhead: governance workflows and stewardship time\n&#8211; If private connectivity is required for sources, networking components may have cost<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Network\/data transfer implications<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Harvesting reads metadata; for OCI-native services in the same region, data transfer charges are typically lower than cross-region or internet egress scenarios.<\/li>\n<li>If cataloging sources across regions or through complex network paths, validate whether any data transfer fees apply.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Start with one catalog and a small number of assets; expand after standards are proven.<\/li>\n<li>Harvest only what you need (avoid cataloging every raw bucket if it\u2019s not useful).<\/li>\n<li>Use tags\/properties to identify \u201ccurated\u201d vs \u201craw\u201d datasets and prioritize harvesting curated zones.<\/li>\n<li>Schedule harvesting at a reasonable cadence (nightly for many warehouses is enough; hourly harvesting can increase cost and operational noise).<\/li>\n<li>Enforce lifecycle: retire\/deprecate obsolete assets rather than leaving them searchable forever.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (conceptual)<\/h3>\n\n\n\n<p>A low-cost starter typically looks like:\n&#8211; 1 catalog (single region)\n&#8211; 1\u20133 data assets (Object Storage curated bucket + one warehouse)\n&#8211; Harvest run manually during setup, then scheduled nightly\n&#8211; Limited steward group (2\u20135 users)<\/p>\n\n\n\n<p>Use the OCI Cost Estimator and the Data Catalog pricing entry to compute your estimate. <strong>Do not assume \u201cfree\u201d unless the official pricing explicitly states a free tier for your tenancy\/region.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p>In production, the cost shape is driven by:\n&#8211; Many assets across domains (Marketing, Finance, Ops)\n&#8211; Higher metadata object counts (tables, columns, partitions, files)\n&#8211; Frequent harvest schedules and governance workflows\n&#8211; Potential multi-region requirements (which can imply multiple catalogs)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<p>Create an Oracle Cloud <strong>Data Catalog<\/strong>, catalog an <strong>Object Storage<\/strong> bucket by harvesting metadata, and enrich one discovered dataset with <strong>tags<\/strong> and a <strong>glossary term<\/strong>\u2014all using a safe, beginner-friendly workflow.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p>You will:\n1. Create a compartment and an Object Storage bucket with a small sample dataset.\n2. Create a Data Catalog in Oracle Cloud.\n3. Configure IAM access so Data Catalog can read Object Storage metadata (policy statements vary; you will validate using official docs).\n4. Register the bucket as a <strong>data asset<\/strong> and run a <strong>harvest<\/strong> job.\n5. Search for the harvested dataset and enrich it with tags and glossary.<\/p>\n\n\n\n<p><strong>Expected end state:<\/strong>\n&#8211; A catalog exists and contains harvested metadata for a bucket\/object path.\n&#8211; You can search and find an entity representing your dataset.\n&#8211; The entity is tagged and linked to a glossary term.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Create a compartment for the lab<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In the Oracle Cloud Console, open the navigation menu.<\/li>\n<li>Go to <strong>Identity &amp; Security \u2192 Compartments<\/strong>.<\/li>\n<li>Click <strong>Create Compartment<\/strong>.<\/li>\n<li>Name it: <code>lab-datacatalog<\/code><\/li>\n<li>(Optional) Description: <code>Hands-on lab for Data Catalog tutorial<\/code><\/li>\n<li>Click <strong>Create<\/strong>.<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> A new compartment appears and becomes available within seconds (sometimes minutes).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create an Object Storage bucket and upload a sample file<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to <strong>Storage \u2192 Object Storage &amp; Archive Storage \u2192 Buckets<\/strong>.<\/li>\n<li>Ensure you\u2019re in the correct <strong>region<\/strong> and <strong>compartment<\/strong> (<code>lab-datacatalog<\/code>).<\/li>\n<li>Click <strong>Create Bucket<\/strong>.<\/li>\n<li>Bucket name: <code>lab-dc-bucket-&lt;unique-suffix&gt;<\/code><\/li>\n<li>Defaults are usually fine for a lab. Click <strong>Create<\/strong>.<\/li>\n<\/ol>\n\n\n\n<p>Now create a small CSV file locally named <code>customers.csv<\/code>:<\/p>\n\n\n\n<pre><code class=\"language-csv\">customer_id,full_name,email,country,signup_date\n1001,Alice Johnson,alice@example.com,US,2024-01-12\n1002,Bob Smith,bob@example.com,GB,2024-02-03\n1003,Chandra Patel,chandra@example.com,IN,2024-02-19\n<\/code><\/pre>\n\n\n\n<p>Upload it:\n1. Open your bucket.\n2. Click <strong>Upload<\/strong>.\n3. Select <code>customers.csv<\/code>.\n4. Click <strong>Upload<\/strong>.<\/p>\n\n\n\n<p><strong>Expected outcome:<\/strong> The bucket contains <code>customers.csv<\/code>.<\/p>\n\n\n\n<p><strong>Verification:<\/strong> You can click the object name and view details (size, last modified).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Create (or confirm) IAM permissions for Data Catalog and for your user<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">3A) Ensure your user\/group can manage Data Catalog<\/h4>\n\n\n\n<p>If you\u2019re in a training tenancy you might already be an admin. If not, you need IAM policies allowing your group to manage Data Catalog in the compartment.<\/p>\n\n\n\n<p>Because policy naming and required verbs must be exact, use the official documentation\u2019s IAM policy examples for Data Catalog:\n&#8211; Docs home (navigate to IAM\/policies section): https:\/\/docs.oracle.com\/en-us\/iaas\/data-catalog\/home.htm<\/p>\n\n\n\n<p>Create policies in:\n<strong>Identity &amp; Security \u2192 Policies<\/strong><\/p>\n\n\n\n<p>Common pattern (example only\u2014<strong>verify<\/strong> exact service names, resource-types, and verbs in docs):\n&#8211; Allow a group to manage Data Catalog resources in compartment <code>lab-datacatalog<\/code>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">3B) Allow Data Catalog to read Object Storage metadata<\/h4>\n\n\n\n<p>Harvesting needs permission to read Object Storage (at least bucket\/object metadata, possibly object listings).<\/p>\n\n\n\n<p>Use the official Data Catalog documentation for Object Storage harvesting IAM policy statements. Create them in a policy attached to the <strong>compartment containing the bucket<\/strong>.<\/p>\n\n\n\n<p><strong>Important:<\/strong> Do not over-permission. Grant read-only access and scope it to the lab compartment where possible.<\/p>\n\n\n\n<p><strong>Expected outcome:<\/strong> Policies exist and are attached to the correct compartment.<\/p>\n\n\n\n<p><strong>Verification:<\/strong> IAM policy changes can take a short time to propagate. If harvest fails with authorization errors, wait a few minutes and retry after confirming policies.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Create a Data Catalog<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to <strong>Analytics &amp; AI<\/strong> (or search for <strong>Data Catalog<\/strong> in the console search bar).<\/li>\n<li>Open <strong>Data Catalog<\/strong>.<\/li>\n<li>Select compartment: <code>lab-datacatalog<\/code>.<\/li>\n<li>Click <strong>Create Catalog<\/strong>.<\/li>\n<li>Name: <code>lab-catalog<\/code><\/li>\n<li>(Optional) Description: <code>Catalog for Object Storage metadata harvesting lab<\/code><\/li>\n<li>Create.<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> Catalog is created and appears as <strong>Active<\/strong>.<\/p>\n\n\n\n<p><strong>Verification:<\/strong> Open the catalog and confirm you can see catalog details and navigation items (Data Assets, Glossary, etc.).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Register Object Storage as a Data Asset<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Inside your catalog, go to <strong>Data Assets<\/strong>.<\/li>\n<li>Click <strong>Create Data Asset<\/strong>.<\/li>\n<li>Choose the data asset type for <strong>Object Storage<\/strong> (label can vary; select the OCI Object Storage option).<\/li>\n<li>Provide:\n   &#8211; Name: <code>lab-os-asset<\/code>\n   &#8211; Description: <code>Object Storage bucket for lab dataset<\/code>\n   &#8211; Bucket details: select\/enter your bucket and namespace as required by the UI<\/li>\n<li>Save\/Create.<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> A data asset representing your bucket exists in the catalog.<\/p>\n\n\n\n<p><strong>Verification:<\/strong> The data asset appears in the list and shows connection details (where configured).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Run a harvest job to ingest metadata<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Open the data asset <code>lab-os-asset<\/code>.<\/li>\n<li>Locate <strong>Harvest<\/strong> (or \u201cHarvesting\u201d) in the asset actions.<\/li>\n<li>Create a harvest job (or run a harvest immediately):\n   &#8211; Harvest type: choose the default \u201cmetadata harvest\u201d option shown\n   &#8211; Scope: optionally limit to a prefix\/path if your UI supports it (useful for large buckets)<\/li>\n<li>Start the harvest.<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> Harvest job starts and then completes successfully.<\/p>\n\n\n\n<p><strong>Verification:<\/strong>\n&#8211; Check harvest job status: <strong>Succeeded\/Completed<\/strong>.\n&#8211; If the UI provides a job run log, review it for counts of discovered entities.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 7: Search for the harvested dataset and enrich metadata<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">7A) Find the dataset<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In the catalog, use <strong>Search<\/strong>.<\/li>\n<li>Search for: <code>customers<\/code> (or <code>customers.csv<\/code> depending on how the entity is represented).<\/li>\n<li>Open the entity representing your dataset.<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> You can view metadata such as name, location\/path, and possibly inferred schema\/columns (exact metadata depends on connector support).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">7B) Add tags<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In the entity details, find <strong>Tags<\/strong> (or classification).<\/li>\n<li>Add tags such as:\n   &#8211; <code>Domain:Lab<\/code>\n   &#8211; <code>Sensitivity:Internal<\/code>\n   &#8211; <code>Lifecycle:Demo<\/code><\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> Tags appear on the entity and become searchable filters.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">7C) Create a glossary term and link it<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to <strong>Glossary<\/strong>.<\/li>\n<li>Create a term:\n   &#8211; Term: <code>Customer<\/code>\n   &#8211; Definition: <code>A person or organization that has signed up for our service.<\/code><\/li>\n<li>Return to the <code>customers<\/code> entity and associate\/link the glossary term (UI wording varies).<\/li>\n<\/ol>\n\n\n\n<p><strong>Expected outcome:<\/strong> The entity now shows an associated glossary term, improving business clarity.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p>Use this checklist:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Catalog exists<\/strong> and is Active.<\/li>\n<li><strong>Data asset exists<\/strong> for Object Storage bucket.<\/li>\n<li><strong>Harvest job succeeded<\/strong>.<\/li>\n<li>Searching for <code>customers<\/code> returns at least one entity.<\/li>\n<li>Entity shows your <strong>tags<\/strong> and linked <strong>glossary term<\/strong>.<\/li>\n<\/ol>\n\n\n\n<p>If any item fails, use the troubleshooting section below.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: Harvest fails with authorization\/403 errors<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause:<\/strong> Missing or incorrect IAM policy allowing Data Catalog service to read Object Storage.<\/li>\n<li><strong>Fix:<\/strong><\/li>\n<li>Re-check the official Data Catalog Object Storage harvesting policy examples.<\/li>\n<li>Confirm policy is in the correct compartment (where the bucket resides).<\/li>\n<li>Wait for IAM propagation (a few minutes) and retry harvest.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: Bucket or namespace not found<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause:<\/strong> Wrong region\/compartment selected, or incorrect namespace.<\/li>\n<li><strong>Fix:<\/strong> Confirm region at the top right and the compartment selector in Object Storage and Data Catalog.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: No entities found after harvest<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause:<\/strong> Harvest scope\/prefix excludes the object, or connector doesn\u2019t infer metadata from the file type.<\/li>\n<li><strong>Fix:<\/strong><\/li>\n<li>Confirm <code>customers.csv<\/code> exists in the bucket.<\/li>\n<li>Re-run harvest without prefix filters.<\/li>\n<li>Check whether file-level metadata vs schema inference is supported for your connector\/version (verify in docs).<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: Can\u2019t see Data Catalog in console<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cause:<\/strong> Service not enabled\/available in your region, or you lack IAM permissions.<\/li>\n<li><strong>Fix:<\/strong> Switch regions and confirm service availability; request access from your tenancy administrator.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p>To avoid ongoing costs and clutter, remove lab resources:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p>In <strong>Data Catalog<\/strong>:\n   &#8211; Delete harvest jobs (if required by the UI)\n   &#8211; Delete the data asset <code>lab-os-asset<\/code>\n   &#8211; Delete the catalog <code>lab-catalog<\/code><\/p>\n<\/li>\n<li>\n<p>In <strong>Object Storage<\/strong>:\n   &#8211; Delete <code>customers.csv<\/code>\n   &#8211; Delete the bucket <code>lab-dc-bucket-...<\/code><\/p>\n<\/li>\n<li>\n<p>In <strong>IAM<\/strong>:\n   &#8211; Remove lab-specific policies if they were created only for this exercise<\/p>\n<\/li>\n<li>\n<p>Delete the compartment <code>lab-datacatalog<\/code> (only after all resources inside are deleted)<\/p>\n<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Start with a curated zone<\/strong>: Catalog your \u201csilver\/gold\u201d datasets before raw ingestion zones.<\/li>\n<li><strong>Design for domains<\/strong>: Use consistent tagging like <code>Domain:&lt;name&gt;<\/code> and map assets to domain ownership.<\/li>\n<li><strong>Separate environments<\/strong>: Use separate catalogs or clear naming (and separate compartments) for dev\/test\/prod depending on governance needs and pricing.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>least privilege<\/strong> for both:<\/li>\n<li>Human users (stewards vs consumers)<\/li>\n<li>Service access for harvesting (read-only where possible)<\/li>\n<li>Keep catalog administration limited to a small group.<\/li>\n<li>Use compartments to enforce boundaries between domains or business units.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid harvesting everything. Harvesting should be intentional and tied to discovery value.<\/li>\n<li>Set harvest schedules carefully; nightly is often enough.<\/li>\n<li>Periodically deprecate\/remove assets no longer needed.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use a naming standard for assets and entities to improve search quality.<\/li>\n<li>Enforce required metadata fields (owner, description) through governance processes.<\/li>\n<li>Keep tags controlled (avoid dozens of near-duplicates like <code>PII<\/code>, <code>Pii<\/code>, <code>pii<\/code>).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Treat harvest as a production job:<\/li>\n<li>Define RACI for failures<\/li>\n<li>Add alerts\/notifications (where supported)<\/li>\n<li>Document rollback\/mitigation (e.g., last-known-good metadata)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Create an operational runbook:<\/li>\n<li>Harvest cadence<\/li>\n<li>Failure handling<\/li>\n<li>Change management for glossary<\/li>\n<li>Use Audit and logging to track administrative activity.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Tag strategy examples:<\/li>\n<li><code>Sensitivity:Public|Internal|Confidential|Restricted<\/code><\/li>\n<li><code>Certification:Bronze|Silver|Gold<\/code><\/li>\n<li><code>Domain:&lt;DomainName&gt;<\/code><\/li>\n<li><code>OwnerTeam:&lt;TeamName&gt;<\/code><\/li>\n<li>Name catalogs and assets with predictable prefixes:<\/li>\n<li><code>prod-<\/code>, <code>nonprod-<\/code>, <code>sandbox-<\/code><\/li>\n<li>Require a short description for every data asset and key entity.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data Catalog uses <strong>OCI IAM<\/strong> for authentication and authorization.<\/li>\n<li>Use groups and policies to separate:<\/li>\n<li>Catalog administrators (create\/manage catalogs, assets, harvest)<\/li>\n<li>Data stewards (edit glossary, curation fields)<\/li>\n<li>Consumers (read-only search\/browse)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Oracle Cloud services typically encrypt data at rest and in transit. Confirm Data Catalog\u2019s encryption specifics and key management options (Oracle-managed keys vs customer-managed keys, if available) in official docs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Console\/API access uses Oracle Cloud endpoints.<\/li>\n<li>Harvest connectivity to private sources may require private networking patterns (verify private endpoint support and requirements).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If harvesting requires credentials (common for databases), store secrets securely:<\/li>\n<li>Prefer Oracle Cloud <strong>Vault<\/strong> where supported by the connector pattern (verify).<\/li>\n<li>Restrict who can view\/rotate credentials.<\/li>\n<li>Rotate secrets regularly and on staff changes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>OCI Audit<\/strong> to record administrative events.<\/li>\n<li>Retain logs per compliance requirements.<\/li>\n<li>Monitor harvest activity and unexpected changes to glossary terms\/tags.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<p>Data Catalog helps with:\n&#8211; Data inventory visibility\n&#8211; Ownership and stewardship traceability\n&#8211; Classification tagging workflows<br\/>\nBut it does not replace:\n&#8211; DLP tooling\n&#8211; Full data access monitoring on underlying stores\n&#8211; Data retention enforcement (that remains with the storage\/database systems)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Granting overly broad Object Storage permissions to the Data Catalog service or to users<\/li>\n<li>Using shared personal credentials for database harvesting<\/li>\n<li>Allowing anyone to edit glossary terms (definitions become untrusted)<\/li>\n<li>Not tracking who changed sensitive classification tags<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use compartment boundaries and least privilege.<\/li>\n<li>Centralize naming\/tagging standards.<\/li>\n<li>Restrict write privileges to curated metadata fields.<\/li>\n<li>Establish a review workflow for high-impact glossary changes.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<blockquote>\n<p>Treat this section as a checklist to validate early; details vary by region and connector.<\/p>\n<\/blockquote>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Connector coverage varies:<\/strong> Not all data sources are supported everywhere. Confirm supported data asset types in your region.<\/li>\n<li><strong>Regional service:<\/strong> Catalogs are regional. Multi-region organizations may need multiple catalogs and governance alignment.<\/li>\n<li><strong>IAM complexity:<\/strong> Harvesting often fails due to missing service permissions to source systems.<\/li>\n<li><strong>Metadata \u2260 data:<\/strong> Data Catalog doesn\u2019t grant access to the underlying data; it only indexes metadata.<\/li>\n<li><strong>Glossary success depends on process:<\/strong> Without stewardship and standards, glossary becomes stale.<\/li>\n<li><strong>Tag sprawl risk:<\/strong> Without controlled vocabulary, tags become inconsistent and reduce search value.<\/li>\n<li><strong>Private network sources:<\/strong> Harvesting private databases can require networking setup; validate what Data Catalog supports (private endpoints\/connectivity).<\/li>\n<li><strong>Operational visibility:<\/strong> If you need detailed metrics\/alerts, verify what native monitoring and events exist; you may need process tooling around it.<\/li>\n<li><strong>Deletion dependencies:<\/strong> You may need to delete harvest jobs or assets before deleting catalogs, depending on UI rules.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<p>Data Catalog is one component of a broader Data Management stack. Here\u2019s how it compares to nearby options.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Comparison table<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Oracle Cloud Data Catalog<\/strong><\/td>\n<td>OCI-centric metadata discovery and governance<\/td>\n<td>Managed service, integrates with OCI IAM\/compartments, glossary + enrichment<\/td>\n<td>Connector coverage and regional scope must be validated; governance requires process<\/td>\n<td>You run data platforms on Oracle Cloud and want a managed metadata catalog<\/td>\n<\/tr>\n<tr>\n<td><strong>OCI Data Integration (metadata features)<\/strong><\/td>\n<td>ETL\/ELT pipeline building with some metadata context<\/td>\n<td>Strong for building pipelines; can complement a catalog<\/td>\n<td>Not a dedicated enterprise catalog by itself<\/td>\n<td>You need data pipelines first, and cataloging is a secondary need<\/td>\n<\/tr>\n<tr>\n<td><strong>Custom metadata in a database\/wiki<\/strong><\/td>\n<td>Very small environments<\/td>\n<td>Simple, cheap at tiny scale<\/td>\n<td>Not searchable at enterprise scale; not governed; becomes stale<\/td>\n<td>Small team with limited sources and minimal compliance requirements<\/td>\n<\/tr>\n<tr>\n<td><strong>AWS Glue Data Catalog<\/strong><\/td>\n<td>AWS data lake and analytics<\/td>\n<td>Tight AWS integration; common in AWS ecosystems<\/td>\n<td>AWS-specific; different IAM model<\/td>\n<td>Your platform is primarily on AWS<\/td>\n<\/tr>\n<tr>\n<td><strong>Microsoft Purview<\/strong><\/td>\n<td>Microsoft-centric governance and cataloging<\/td>\n<td>Broad governance suite, integrations across Microsoft stack<\/td>\n<td>Complexity and licensing can be significant<\/td>\n<td>Your ecosystem is Microsoft\/Azure-first and you need broad governance suite<\/td>\n<\/tr>\n<tr>\n<td><strong>Google Cloud Dataplex Catalog (and related GCP governance tools)<\/strong><\/td>\n<td>GCP data governance<\/td>\n<td>Integrates with GCP data services<\/td>\n<td>GCP-specific<\/td>\n<td>You are GCP-first and need native governance\/catalog<\/td>\n<\/tr>\n<tr>\n<td><strong>Apache Atlas (self-managed)<\/strong><\/td>\n<td>Highly customizable governance<\/td>\n<td>Open-source, extensible<\/td>\n<td>Operational burden; scaling and UX depend on your implementation<\/td>\n<td>You need deep customization and can operate the platform<\/td>\n<\/tr>\n<tr>\n<td><strong>DataHub \/ Amundsen (self-managed)<\/strong><\/td>\n<td>Modern metadata platforms<\/td>\n<td>Strong community, flexible ingestion<\/td>\n<td>You run\/scale it; integrations vary<\/td>\n<td>You want open ecosystem control and can invest in operations<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example (regulated industry)<\/h3>\n\n\n\n<p><strong>Problem:<\/strong> A financial services company runs multiple analytics domains on Oracle Cloud. Auditors request a repeatable inventory of datasets used for regulatory reporting, including definitions and owners. Teams also struggle with inconsistent KPI definitions across departments.<\/p>\n\n\n\n<p><strong>Proposed architecture:<\/strong>\n&#8211; One regional <strong>Data Catalog<\/strong> per primary region\n&#8211; Data assets for:\n  &#8211; Curated Object Storage buckets (domain-based)\n  &#8211; Autonomous Data Warehouse (core reporting)\n&#8211; Governance model:\n  &#8211; Data stewards manage glossary and certification tags\n  &#8211; Platform admins manage assets\/harvesting\n  &#8211; Consumers get read-only access\n&#8211; Operational integration:\n  &#8211; Scheduled nightly harvest for curated sources\n  &#8211; Audit log retention aligned to compliance policy<\/p>\n\n\n\n<p><strong>Why Data Catalog was chosen:<\/strong>\n&#8211; Native integration with Oracle Cloud IAM and compartments\n&#8211; Central business glossary connected to technical assets\n&#8211; Managed service reduces operational overhead vs self-hosting<\/p>\n\n\n\n<p><strong>Expected outcomes:<\/strong>\n&#8211; Faster audit response (inventory + ownership in one place)\n&#8211; Reduced KPI disputes due to glossary-driven definitions\n&#8211; Improved analyst productivity via search and certified datasets<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example<\/h3>\n\n\n\n<p><strong>Problem:<\/strong> A SaaS startup stores product analytics events in Object Storage and a small warehouse. New team members don\u2019t know which datasets are safe to use, and dashboards are inconsistent.<\/p>\n\n\n\n<p><strong>Proposed architecture:<\/strong>\n&#8211; Single <strong>Data Catalog<\/strong> in the team\u2019s region\n&#8211; Catalog only curated datasets:\n  &#8211; <code>analytics_curated<\/code> bucket paths\n  &#8211; Warehouse schema <code>BI_MART<\/code>\n&#8211; Simple glossary:\n  &#8211; \u201cActive User\u201d, \u201cConversion\u201d, \u201cChurn\u201d\n&#8211; Tagging:\n  &#8211; <code>Certified:Gold<\/code> for tables used in executive dashboards<\/p>\n\n\n\n<p><strong>Why Data Catalog was chosen:<\/strong>\n&#8211; Quick setup without building a custom system\n&#8211; Glossary + tags provide immediate value for a small team\n&#8211; Scales as the startup adds data sources<\/p>\n\n\n\n<p><strong>Expected outcomes:<\/strong>\n&#8211; New hires onboard faster\n&#8211; Fewer broken dashboards from misunderstanding data meaning\n&#8211; Better reuse of curated datasets<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<p>1) <strong>Does Data Catalog store my actual data?<\/strong><br\/>\nNo. Data Catalog stores <strong>metadata<\/strong> (information about data). The underlying data remains in Object Storage, databases, or other systems.<\/p>\n\n\n\n<p>2) <strong>Is Data Catalog a data governance platform?<\/strong><br\/>\nIt supports governance workflows (glossary, tags, ownership metadata), but full governance often requires processes and potentially additional tools.<\/p>\n\n\n\n<p>3) <strong>Can Data Catalog catalog Object Storage buckets?<\/strong><br\/>\nCommonly yes, through a data asset and harvest job for Object Storage. Confirm exact connector behavior and supported formats in the official docs.<\/p>\n\n\n\n<p>4) <strong>Can I catalog Autonomous Data Warehouse or Autonomous Database?<\/strong><br\/>\nOften yes, depending on connector support and your configuration. Verify supported sources and required credentials\/networking.<\/p>\n\n\n\n<p>5) <strong>How do users access Data Catalog?<\/strong><br\/>\nThrough the Oracle Cloud Console and APIs, controlled by OCI IAM policies.<\/p>\n\n\n\n<p>6) <strong>How do I keep metadata up to date?<\/strong><br\/>\nUse scheduled harvests (if supported in your UI) or run harvest jobs periodically. Also operationalize steward updates for business context.<\/p>\n\n\n\n<p>7) <strong>What\u2019s the difference between a catalog and a data asset?<\/strong><br\/>\nA <strong>catalog<\/strong> is the container. A <strong>data asset<\/strong> is a registered source inside the catalog.<\/p>\n\n\n\n<p>8) <strong>What\u2019s a harvest job?<\/strong><br\/>\nA harvest job connects to a data asset and extracts technical metadata into the catalog.<\/p>\n\n\n\n<p>9) <strong>Can I restrict who can edit glossary terms?<\/strong><br\/>\nYes\u2014use IAM policies and role separation so only stewards\/admins can modify governed fields.<\/p>\n\n\n\n<p>10) <strong>Will Data Catalog improve query performance?<\/strong><br\/>\nNo. It\u2019s not a query engine. It improves discovery and understanding, not execution speed.<\/p>\n\n\n\n<p>11) <strong>How do I classify sensitive fields (like email)?<\/strong><br\/>\nApply tags and\/or custom properties at the entity\/attribute level as supported. The exact tagging granularity depends on the harvested metadata model.<\/p>\n\n\n\n<p>12) <strong>Does Data Catalog automatically detect PII?<\/strong><br\/>\nSome catalogs provide classification features; do not assume automatic detection. Verify whether OCI Data Catalog includes automated classification in your current version\/region, and consider complementary tooling if needed.<\/p>\n\n\n\n<p>13) <strong>Can I automate onboarding of new datasets?<\/strong><br\/>\nYes, using APIs\/CLI\/SDK where supported. Many teams implement \u201ccatalog as code\u201d patterns plus standard tags\/properties.<\/p>\n\n\n\n<p>14) <strong>What\u2019s the best way to design tags?<\/strong><br\/>\nUse controlled vocabularies and a small number of standardized dimensions (Sensitivity, Certification, Domain, OwnerTeam).<\/p>\n\n\n\n<p>15) <strong>How do I estimate cost?<\/strong><br\/>\nUse the official price list entry for Data Catalog and the OCI Cost Estimator. Costs depend on the pricing dimensions Oracle currently uses for this service\u2014verify before scaling.<\/p>\n\n\n\n<p>16) <strong>Should I create one catalog or many?<\/strong><br\/>\nStart with one per environment or region, then scale only if governance boundaries require it. Multiple catalogs increase operational overhead and may increase cost.<\/p>\n\n\n\n<p>17) <strong>Can I integrate Data Catalog with CI\/CD?<\/strong><br\/>\nYes, by calling APIs in pipelines to create assets, apply tags, or trigger harvest. Ensure policies and secrets management are handled securely.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn Data Catalog<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official Documentation<\/td>\n<td>OCI Data Catalog Documentation<\/td>\n<td>Primary source for concepts, connectors, IAM policies, and API references: https:\/\/docs.oracle.com\/en-us\/iaas\/data-catalog\/home.htm<\/td>\n<\/tr>\n<tr>\n<td>Official Pricing<\/td>\n<td>Oracle Cloud Price List<\/td>\n<td>Find Data Catalog under Data Management and confirm current billable dimensions: https:\/\/www.oracle.com\/cloud\/price-list\/<\/td>\n<\/tr>\n<tr>\n<td>Pricing Calculator<\/td>\n<td>OCI Cost Estimator<\/td>\n<td>Build scenario estimates using current SKUs: https:\/\/www.oracle.com\/cloud\/costestimator.html<\/td>\n<\/tr>\n<tr>\n<td>Official Console<\/td>\n<td>Oracle Cloud Console<\/td>\n<td>Hands-on creation of catalogs, data assets, harvest jobs: https:\/\/cloud.oracle.com\/<\/td>\n<\/tr>\n<tr>\n<td>Architecture Center<\/td>\n<td>Oracle Architecture Center<\/td>\n<td>Reference architectures for data platforms that commonly include cataloging\/governance patterns (search within): https:\/\/docs.oracle.com\/en\/solutions\/<\/td>\n<\/tr>\n<tr>\n<td>Tutorials \/ Workshops<\/td>\n<td>Oracle LiveLabs<\/td>\n<td>Hands-on labs (search for \u201cData Catalog\u201d and verify lab availability): https:\/\/apexapps.oracle.com\/pls\/apex\/r\/dbpm\/livelabs\/home<\/td>\n<\/tr>\n<tr>\n<td>API\/CLI Docs<\/td>\n<td>OCI CLI Installation and Usage<\/td>\n<td>If you automate Data Catalog operations: https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/API\/SDKDocs\/cliinstall.htm<\/td>\n<\/tr>\n<tr>\n<td>Community Learning<\/td>\n<td>Oracle Cloud Customer Connect \/ Community<\/td>\n<td>Practical troubleshooting and patterns (validate against docs): https:\/\/community.oracle.com\/customerconnect\/categories\/oracle-cloud-infrastructure<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute Name<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps engineers, platform teams, cloud engineers<\/td>\n<td>OCI fundamentals, DevOps practices, cloud operations (verify course specifics)<\/td>\n<td>check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Beginners to intermediate engineers<\/td>\n<td>SCM\/DevOps foundations, automation practices (verify OCI coverage)<\/td>\n<td>check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud operations, SRE, platform operations<\/td>\n<td>Cloud ops practices, monitoring, reliability (verify OCI content)<\/td>\n<td>check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs, production operations teams<\/td>\n<td>Reliability engineering, incident response, observability (verify cloud modules)<\/td>\n<td>check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops teams adopting AIOps<\/td>\n<td>AIOps concepts, operations analytics (verify integrations)<\/td>\n<td>check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site Name<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>DevOps\/cloud training resources (verify specific offerings)<\/td>\n<td>Students and working engineers<\/td>\n<td>https:\/\/www.rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps training and mentoring (verify course catalog)<\/td>\n<td>Beginners to intermediate DevOps engineers<\/td>\n<td>https:\/\/www.devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps guidance\/training resources (verify offerings)<\/td>\n<td>Teams needing short-term enablement<\/td>\n<td>https:\/\/www.devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support and enablement resources (verify services)<\/td>\n<td>Ops and DevOps teams<\/td>\n<td>https:\/\/www.devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company Name<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps consulting (verify current portfolio)<\/td>\n<td>Platform engineering, cloud adoption, operations<\/td>\n<td>Standing up governance-friendly cloud landing zones; automation and operational readiness<\/td>\n<td>https:\/\/www.cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>Training + consulting (verify service catalog)<\/td>\n<td>Enablement, DevOps transformation, cloud best practices<\/td>\n<td>Designing IAM and operational runbooks; implementing CI\/CD and automation around data platforms<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting (verify offerings)<\/td>\n<td>DevOps tooling, reliability improvements<\/td>\n<td>Building monitoring\/alerting and incident processes; automation for cloud resource provisioning<\/td>\n<td>https:\/\/www.devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before Data Catalog<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Oracle Cloud fundamentals:<\/li>\n<li>Tenancy, compartments, IAM policies, groups<\/li>\n<li>Regions and availability<\/li>\n<li>Object Storage basics (buckets, objects, namespaces)<\/li>\n<li>Basic data concepts:<\/li>\n<li>Schemas, tables, partitions, file formats<\/li>\n<li>Data lake vs data warehouse<\/li>\n<li>Governance foundations:<\/li>\n<li>Data ownership, stewardship, classification<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after Data Catalog<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data pipelines and processing:<\/li>\n<li>OCI Data Integration, OCI Data Flow (or your preferred tools)<\/li>\n<li>Security hardening:<\/li>\n<li>OCI Vault, key management, network segmentation<\/li>\n<li>Observability and operations:<\/li>\n<li>OCI Logging, Monitoring, Audit, and alerting patterns<\/li>\n<li>Advanced governance:<\/li>\n<li>Data-quality checks, access reviews, retention policies (implemented in source systems)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use it<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data Engineer (metadata-aware pipelines)<\/li>\n<li>Analytics Engineer (semantic definitions, curated marts)<\/li>\n<li>Data Steward \/ Governance Analyst (glossary, classification)<\/li>\n<li>Cloud Engineer \/ Platform Engineer (IAM, compartments, automation)<\/li>\n<li>Security Engineer (classification workflows, audit readiness)<\/li>\n<li>Solution Architect (data platform design)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (if available)<\/h3>\n\n\n\n<p>Oracle\u2019s certification catalog changes over time. Look for:\n&#8211; OCI architect and data-related certifications on the official Oracle University pages.<br\/>\nVerify current paths here: https:\/\/education.oracle.com\/<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Curated dataset certification workflow<\/strong>: Tag assets as Bronze\/Silver\/Gold and document steward review steps.<\/li>\n<li><strong>Glossary-driven metrics<\/strong>: Build a glossary for 20 key KPIs and link them to warehouse columns.<\/li>\n<li><strong>Automated asset onboarding<\/strong>: Script creation of data assets and harvesting (API\/CLI), then auto-apply tags.<\/li>\n<li><strong>Compliance inventory<\/strong>: Maintain a list of datasets tagged <code>Confidential<\/code> and perform quarterly owner reviews.<\/li>\n<li><strong>Multi-compartment domain model<\/strong>: Organize assets by domain compartments and implement least-privilege access.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Catalog:<\/strong> A regional container in Oracle Cloud Data Catalog that stores harvested metadata and business context.<\/li>\n<li><strong>Data Asset:<\/strong> A registered data source (Object Storage, database, etc.) that can be harvested.<\/li>\n<li><strong>Harvest:<\/strong> The process\/job that extracts technical metadata from a data asset into the catalog.<\/li>\n<li><strong>Entity:<\/strong> A metadata object in the catalog (table, file, view, column\/attribute, etc.).<\/li>\n<li><strong>Business Glossary:<\/strong> A curated set of business terms and definitions linked to technical metadata.<\/li>\n<li><strong>Tag:<\/strong> A label applied to catalog objects for classification and discovery.<\/li>\n<li><strong>Custom Property:<\/strong> An organization-defined metadata field added to catalog objects (owner, SLA, domain, etc.).<\/li>\n<li><strong>Compartment:<\/strong> OCI logical container for organizing resources and applying IAM access control.<\/li>\n<li><strong>IAM Policy:<\/strong> A statement that grants permissions to groups\/users\/services for OCI resources.<\/li>\n<li><strong>Steward:<\/strong> A role responsible for maintaining business definitions and governance metadata.<\/li>\n<li><strong>Certified Dataset:<\/strong> A dataset that has been reviewed and approved for broad use (implemented via tags\/process).<\/li>\n<li><strong>Metadata:<\/strong> Data about data\u2014schema, structure, definitions, location, and governance annotations.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p>Oracle Cloud <strong>Data Catalog<\/strong> is a managed <strong>metadata discovery and governance<\/strong> service in the <strong>Data Management<\/strong> category. It helps organizations find datasets faster, standardize definitions with a business glossary, and operationalize stewardship through tags and custom properties\u2014without moving the underlying data.<\/p>\n\n\n\n<p>Architecturally, it works by registering <strong>data assets<\/strong> and running <strong>harvest<\/strong> jobs to ingest technical metadata, then enabling users to search and enrich that metadata securely using <strong>OCI IAM<\/strong> controls. Cost depends on Oracle\u2019s current pricing dimensions for Data Catalog (confirm in the official price list), and indirect costs are mostly driven by how broadly and frequently you harvest.<\/p>\n\n\n\n<p>Use Data Catalog when you need reliable data discovery and shared definitions across multiple teams and sources in Oracle Cloud. Next step: expand from the lab by cataloging one curated production domain, establishing a minimal glossary, and implementing a controlled tagging standard backed by IAM role separation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data Management<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[68,62],"tags":[],"class_list":["post-884","post","type-post","status-publish","format-standard","hentry","category-data-management","category-oracle-cloud"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/884","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=884"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/884\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=884"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=884"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=884"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}