Oracle Cloud Data Labeling Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Analytics and AI

1. Introduction

Oracle Cloud Data Labeling is a managed service in Oracle Cloud Infrastructure (OCI) that helps teams create labeled datasets for supervised machine learning (ML). In simple terms, it provides tools and workflows to tag raw data—such as images or text—with the “correct answers” (labels) that ML models need for training and evaluation.

From a technical perspective, Data Labeling provides dataset management, label set definition, and a labeling workflow/UI and APIs that connect strongly with OCI foundational services like Object Storage (for raw and exported labeled data), IAM (for access control), and Audit (for governance and traceability). It is designed to support repeatable labeling operations that can be integrated into MLOps pipelines.

The problem it solves is practical and common: most ML projects fail or stall because training data is messy, unlabeled, inconsistently labeled, or difficult to manage across teams. Data Labeling provides structure, access control, and export mechanisms so that labeled datasets can reliably feed downstream training in services like OCI Data Science (or other training platforms).

Naming/status note: The service is commonly referred to as OCI Data Labeling or Data Labeling service in Oracle documentation. Verify the exact current console placement and terminology in your region/tenancy because OCI console navigation can change over time.

2. What is Data Labeling?

Official purpose (OCI-aligned):
Data Labeling helps you create, manage, and export labeled datasets so you can train and evaluate machine learning models.

Core capabilities

Create and manage datasets for labeling
Define label sets (the categories/tags you want labelers to apply)
Assign and perform labeling work (often via labeling jobs/workflows—verify exact current UI terms in official docs)
Track labeling progress and dataset state
Export labeled data/annotations for ML training

Major components (conceptual model)

While exact resource names can vary by UI/API version, the service typically revolves around:

Dataset: A collection of records (examples) to label.
Record: An individual item (for example, an image file or text file stored in Object Storage).
Label set / labels: The controlled vocabulary of labels (e.g., positive, negative).
Annotations: The labeling output attached to each record.
Work assignment / labeling job (if exposed in your tenancy): A workflow that assigns work to one or more labelers and tracks completion.

If any of these terms differ in your tenancy, treat them as conceptual equivalents and verify in official docs.

Service type

Managed cloud service for human-in-the-loop dataset labeling and export.
Accessed through the OCI Console, REST APIs, and typically the OCI CLI/SDKs (verify current CLI command group availability in your installed CLI version).

Scope: regional and compartment-scoped

In OCI, services are generally: – Region-specific for resource creation and operations (datasets and related resources typically exist in a region). – Compartment-scoped for access control and organization.

Data itself generally resides in OCI Object Storage buckets in a specific region, and the Data Labeling service references that data and writes exports back to Object Storage.

How it fits into the Oracle Cloud ecosystem

Data Labeling sits in the Analytics and AI category and commonly supports: – OCI Data Science model training pipelines – OCI AI Services projects that require custom training data (where applicable) – Enterprise governance via IAM, Audit, Tagging, and Compartments – Storage and lifecycle via Object Storage (and potentially Archive Storage for long-term retention)

3. Why use Data Labeling?

Business reasons

Faster time-to-model: standardized workflows reduce delays caused by ad hoc labeling tools.
Better model outcomes: consistent labeling improves training signal and reduces rework.
Cross-team collaboration: shared datasets and controlled access reduce duplication and confusion.
Traceability: labeling artifacts can be governed like other cloud assets.

Technical reasons

Tight integration with OCI IAM and Object Storage means your data stays inside your OCI environment.
Exported labels can feed training workflows in OCI Data Science or external training systems.
Programmatic control through APIs (and often CLI/SDK) supports automation.

Operational reasons

Centralized management of datasets and progress tracking (as exposed by the service).
Uses OCI standard constructs—compartments, policies, tags, Audit logs—which most platform teams already operate.

Security/compliance reasons

Access controlled by least privilege using IAM policies.
Data typically remains in Object Storage, enabling encryption, retention policies, and access logs.
API calls are captured by OCI Audit.

Scalability/performance reasons

Object Storage scales for large datasets without you managing capacity.
Multiple labelers can work in parallel (subject to your workflow design and tenancy setup).

When teams should choose Data Labeling

Choose Oracle Cloud Data Labeling when: – You already store data in OCI and want labeling to remain in the same cloud boundary. – You need strong tenancy governance (IAM, Audit, tagging, compartments). – You want a managed approach instead of running and patching your own labeling platform. – You need an auditable, repeatable process to produce training datasets.

When teams should not choose it

Avoid (or reconsider) Data Labeling when: – You need specialized annotation types not supported by the service in your region (verify supported data/annotation types). – You require a built-in external workforce/managed labeling workforce. OCI Data Labeling is commonly used with your own labelers (employees/contractors) rather than providing a marketplace workforce—verify if your Oracle offering includes any workforce options. – You are already heavily invested in a different labeling ecosystem (e.g., Label Studio/CVAT) with established pipelines and integrations.

4. Where is Data Labeling used?

Industries

Healthcare: imaging classification, clinical text categorization (with appropriate compliance controls)
Manufacturing: defect detection datasets for computer vision
Retail/e-commerce: product categorization and moderation datasets
Financial services: document classification, text categorization, fraud-related training sets
Telecom: ticket classification, NER for network incident descriptions
Media: content tagging, policy compliance datasets

Team types

Data science and ML engineering teams
Platform/Cloud engineering teams (setting up secure workflows)
Data governance teams (access controls, audit requirements)
Product teams building AI features
Annotation teams/operations teams performing labeling

Workloads

Supervised learning training dataset creation
Human-in-the-loop dataset cleanup and normalization
Ongoing labeling for model retraining and drift response

Architectures

“Object Storage → Data Labeling → Export → Data Science training”
Hybrid pipelines where training happens outside OCI but labeling and storage stay in OCI
Secure multi-compartment separation: raw data in one compartment, labeled exports in another

Production vs dev/test usage

Dev/test: small datasets, quick iteration on label definitions, sampling strategies, QA rules.
Production: controlled label taxonomy, review workflows, audit requirements, and reproducible exports integrated into CI/CD or MLOps processes.

5. Top Use Cases and Scenarios

Below are realistic scenarios where Oracle Cloud Data Labeling is a good fit. Each use case assumes your raw data is stored in OCI Object Storage and you need consistent labels for ML training.

1) Customer support ticket classification

Problem: Thousands of tickets need category labels to train an auto-routing classifier.
Why this service fits: Managed dataset organization + controlled label set; labelers can tag text records.
Example scenario: Support ops label 20,000 historical tickets into categories like billing, outage, account, feature_request.

2) Product review sentiment labeling

Problem: Train a sentiment model but reviews are unlabeled.
Why this service fits: Simple label sets like positive/neutral/negative; easy export.
Example scenario: E-commerce team labels 5,000 reviews and exports annotations for training.

3) Image classification for quality inspection

Problem: Determine whether a product photo indicates pass or fail.
Why this service fits: Human labeling workflow over image records.
Example scenario: Manufacturing QA labels 10,000 assembly-line images as ok or defect.

4) Object detection dataset preparation (if supported)

Problem: Need bounding boxes for objects in images.
Why this service fits: Some labeling services support bounding box annotation; verify OCI Data Labeling annotation types in official docs for your region.
Example scenario: Logistics team labels pallets/forklifts in warehouse images for a safety model.

5) Content moderation categorization

Problem: Train a classifier to detect policy-violating content.
Why this service fits: Strong governance (IAM/Audit), consistent label taxonomy.
Example scenario: Trust & Safety labels text snippets into spam, hate, safe, adult.

6) Document classification (if document datasets are supported)

Problem: Label inbound PDFs into document types.
Why this service fits: Central dataset management and export; verify document support and annotation features.
Example scenario: Finance team labels invoices vs receipts vs statements.

7) Named entity recognition (NER) training data (if supported)

Problem: Extract entities like customer_name, account_id from text.
Why this service fits: If NER annotation is supported; otherwise you may need a specialized tool—verify.
Example scenario: Telecom team labels service notes for entity extraction.

8) Retraining dataset for model drift response

Problem: Model accuracy drops due to new data patterns.
Why this service fits: Add new records, label them, export incremental dataset for retraining.
Example scenario: Monthly labeling batches feed retraining pipeline.

9) Human QA pass on weakly labeled data

Problem: Labels produced by heuristics are noisy.
Why this service fits: Use human labelers to correct and validate a subset; export gold dataset.
Example scenario: Start with rule-based tagging, then correct 10% sample.

10) Multilingual text categorization

Problem: Need labeled data across multiple languages.
Why this service fits: Central management and separate datasets per language; labelers assigned by language skill.
Example scenario: Create datasets tickets-en, tickets-es, tickets-fr with shared label taxonomy.

11) Model evaluation holdout dataset labeling

Problem: Need a trusted test set to measure model performance.
Why this service fits: Controlled process; tighter access to avoid leakage.
Example scenario: Security team labels a test dataset only accessible to evaluators.

12) Data governance and auditability for regulated labeling

Problem: Must prove who labeled what and when.
Why this service fits: OCI Audit captures API actions; IAM enforces access.
Example scenario: Healthcare org labels imaging metadata with strict compartment access and audit retention.

6. Core Features

Important: OCI capabilities can vary by region and over time. For exact supported data types and annotation modes, verify in the official Data Labeling documentation for your tenancy/region.

Feature 1: Dataset management (create, organize, lifecycle)

What it does: Lets you create and manage datasets within compartments.
Why it matters: Provides structure for labeling projects; reduces ad hoc sprawl.
Practical benefit: Consistent dataset naming, tagging, and lifecycle controls.
Caveats: Dataset operations are typically regional; plan for data locality.

Feature 2: Object Storage integration (source and export)

What it does: Uses OCI Object Storage as the durable store for input records and exported labeled output.
Why it matters: Object Storage is scalable and supports encryption and lifecycle policies.
Practical benefit: Easy to integrate exports into ML training pipelines.
Caveats: Ensure buckets and policies are in the correct region/compartment; watch for egress if exporting across regions.

Feature 3: Label sets and controlled taxonomy

What it does: Define allowed labels/categories for a dataset or project.
Why it matters: Prevents label drift and inconsistent categories.
Practical benefit: Higher-quality training data and cleaner evaluation metrics.
Caveats: Changing label sets mid-stream can complicate versioning; plan label governance.

Feature 4: Human labeling workflow (UI-based labeling)

What it does: Provides a console experience for labelers to open records and apply labels/annotations.
Why it matters: Human-in-the-loop labeling remains essential for many datasets.
Practical benefit: Reduces need for third-party labeling tools when your workflow fits OCI capabilities.
Caveats: Complex annotation types may not be supported; validate before committing.

Feature 5: Collaboration through OCI IAM users and groups

What it does: Enables multiple labelers and reviewers via OCI identity and policy.
Why it matters: Enterprise governance and least privilege are easier when integrated with OCI IAM.
Practical benefit: You can separate duties (admins vs labelers vs export operators).
Caveats: Requires thoughtful policy design to avoid over-broad access to buckets.

Feature 6: Export labeled datasets for training

What it does: Exports labels/annotations to Object Storage.
Why it matters: Training systems generally consume files, not labeling UI state.
Practical benefit: Repeatable training runs from exported artifacts (store them immutably if needed).
Caveats: Export format and schema must match your training pipeline; verify supported export formats.

Feature 7: API-driven operations (automation-ready)

What it does: Supports programmatic dataset operations through OCI APIs (and typically SDKs/CLI).
Why it matters: Enables integration into MLOps pipelines and CI/CD.
Practical benefit: Automate dataset creation, record import, export, and reporting.
Caveats: API feature coverage can differ from UI; confirm in API reference.

Feature 8: Governance via compartments, tags, and Audit

What it does: Uses OCI resource organization and logging primitives.
Why it matters: Regulated teams need traceability and access boundaries.
Practical benefit: Standard OCI governance model; easy to align with landing zone patterns.
Caveats: Audit captures API calls but may not capture every user action detail within a labeling UI—verify what is logged.

7. Architecture and How It Works

High-level service architecture

At a high level: 1. You store raw data (images/text/docs) in OCI Object Storage. 2. You create a Data Labeling dataset that references those objects. 3. Labelers authenticate with OCI IAM and label records in the Console UI. 4. The service stores label state/metadata and can export labeled results back to Object Storage. 5. You feed exported annotations into training (e.g., OCI Data Science jobs/notebooks) and deploy models.

Request/data/control flow

Control plane: Dataset creation, label set creation, user permissions, export operations.
Data plane: Object Storage objects are the raw inputs and exported outputs; Data Labeling references them.
Identity flow: Users authenticate via OCI IAM; access determined by IAM policies.
Audit flow: OCI Audit records API operations (create/update/delete/export, etc.).

Integrations with related OCI services

Common integrations include: – OCI Object Storage: input records + export target – OCI IAM: users/groups/policies for labelers and admins – OCI Audit: governance and activity trails – OCI Events / Notifications: optionally trigger automation when exports complete (verify available event types) – OCI Data Science: training pipelines consume exported data – OCI Vault / KMS: encryption key management for Object Storage (and potentially other integrated components)

Dependency services (typical)

Object Storage bucket(s)
IAM policies
Network access to OCI Console endpoints (for labelers)

Security/authentication model

OCI IAM user authentication (console) and OCI API request signing (SDK/CLI)
Authorization via IAM policies scoped to compartments and resource families
Object Storage access controlled via IAM policies and bucket policies (as configured)

Networking model

Labelers typically use the public OCI Console over HTTPS.
Data stays in Object Storage; network egress charges can apply if you download/export outside the region or cloud boundary.
For enterprise environments, consider:
OCI Cloud Guard and security zones (where applicable)
Private access patterns for Object Storage (e.g., via Service Gateway in VCN) for compute-based pipelines—labeling UI itself is console-based.

Monitoring/logging/governance

Audit is the baseline for “who did what” at the API/resource level.
Operational monitoring is often indirect: track export objects created, dataset status, and downstream training success.
Use tagging (cost-center, project, data-classification) for cost allocation and governance.

Simple architecture diagram

flowchart LR
  U[Labelers<br/>OCI Users] -->|Console (HTTPS)| DL[OCI Data Labeling]
  DL -->|Reads raw records| OS[(OCI Object Storage<br/>Raw Data Bucket)]
  DL -->|Exports annotations| OS2[(OCI Object Storage<br/>Labeled Export Bucket)]
  OS2 --> DS[OCI Data Science<br/>Training/Notebooks]
  IAM[(OCI IAM)] --> DL
  AUD[(OCI Audit)] --> DL

Production-style architecture diagram

flowchart TB
  subgraph Tenancy[OCI Tenancy]
    subgraph CompartmentA[Compartment: ml-raw]
      OSRAW[(Object Storage Bucket<br/>raw-data)]
    end

    subgraph CompartmentB[Compartment: ml-labeling]
      DL[Data Labeling Datasets<br/>Label Sets / Jobs]
      TAGS[Tagging & Cost Tracking]
    end

    subgraph CompartmentC[Compartment: ml-train]
      OSLBL[(Object Storage Bucket<br/>labeled-exports)]
      DS[OCI Data Science<br/>Projects/Jobs]
      ART[(Model Artifacts<br/>Object Storage)]
    end

    IAM[(OCI IAM<br/>Groups/Policies)]
    AUD[(OCI Audit)]
    EVT[OCI Events]
    NOTIF[OCI Notifications]
  end

  OSRAW --> DL
  DL --> OSLBL
  OSLBL --> DS
  DS --> ART

  IAM --> DL
  IAM --> OSRAW
  IAM --> OSLBL
  AUD --> DL
  AUD --> OSRAW
  EVT --> NOTIF
  DL -.optional events.-> EVT

8. Prerequisites

Tenancy/account requirements

An active Oracle Cloud (OCI) tenancy with permissions to use Analytics and AI services.
Access to a region where Data Labeling is available. Availability varies—verify in official docs and in your Console service list.

Permissions / IAM roles

You typically need: – Permission to create and manage Data Labeling resources in a compartment. – Permission to read input objects from Object Storage. – Permission to write export objects to Object Storage.

OCI IAM is policy-based; exact policy statements depend on your compartment structure. The resource family name for Data Labeling in IAM policies must match the official documentation—verify in official docs.

Example policy patterns to validate and adapt (do not paste blindly without verification): – Manage Data Labeling resources in a compartment – Read objects from a specific bucket/prefix – Write objects to an export bucket

Billing requirements

You need billing enabled for the tenancy (even if the service itself is no-charge, dependent services like storage or egress are billable).
You need a budget owner/cost center tag strategy for tracking.

CLI/SDK/tools needed

For this tutorial: – OCI Console access (required for interactive labeling UI) – Optional: OCI CLI for creating buckets and uploading sample records
Install/verify OCI CLI: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm

Region availability

Data Labeling might not be in every region. Check:
OCI Services availability in your region (Console service list)
Official docs for region support (service documentation)

Quotas/limits

OCI enforces tenancy and compartment quotas for many services.
Object Storage has practical limits (object count, request rates) and service limits (see official docs).
Data Labeling may have dataset/job limits—verify in official docs.

Prerequisite services

OCI Object Storage for input and export buckets
OCI IAM for groups/policies
Recommended: OCI Audit (enabled by default in OCI)

9. Pricing / Cost

Current pricing model (what to verify)

Oracle Cloud pricing changes over time and can be region- or contract-dependent. For Data Labeling, the direct service charge may be: – No additional charge (in some OCI services, labeling tools are provided without separate metering), or – Metered by usage dimensions (less common for basic labeling tools), or – Included as part of broader AI/ML offerings.

Because you must not rely on guesses for billing, do this first: – Check the official OCI pricing pages and the Data Labeling docs “Pricing” section (if present). – Use the official OCI cost estimator.

Official pricing entry points: – OCI Pricing overview: https://www.oracle.com/cloud/pricing/ – OCI Cost Estimator: https://www.oracle.com/cloud/costestimator.html – OCI price list (for SKU-level detail): https://www.oracle.com/cloud/price-list/

Pricing dimensions to consider (most common in practice)

Even if Data Labeling itself is low-cost/no-charge, you still pay for dependencies:

Object Storage – Storage capacity (GB-month) – Requests (PUT/GET/LIST) – Retrieval (if using Archive tier)
Network egress – Downloading exported datasets outside OCI/region can incur egress charges.
Compute for training – OCI Data Science job runs/notebooks and GPU usage (separate from labeling).
People cost – Human labeling time is often the biggest cost driver. This is not an OCI bill, but it’s real.

Free tier (if applicable)

OCI has an Always Free tier for certain services. Whether Data Labeling is included or free in your tenancy depends on Oracle’s current program—verify: – https://www.oracle.com/cloud/free/

Cost drivers (direct and indirect)

Dataset size (number of objects/records)
Export frequency (daily/weekly exports increase storage + request counts)
Duplicate datasets and poor lifecycle policies (extra storage)
Cross-region movement of exports
Labeling team size and throughput (people cost, operations overhead)
QA/review cycles (re-labeling increases time and complexity)

Hidden or indirect costs

Data duplication: exporting multiple versions of annotations without lifecycle rules.
Large objects: high-resolution images drive storage and slower human workflows.
Egress surprise: downloading labeled exports to on-prem or other cloud.
Operational overhead: IAM policy mistakes causing delays.

How to optimize cost

Keep raw data in the most cost-effective tier (but don’t use Archive if you need frequent access).
Use lifecycle rules for old exports (e.g., move to Archive after 30–90 days).
Export only what you need (e.g., incremental exports, if supported and appropriate).
Reduce object size where it doesn’t harm training signal (resize images, compress).
Use sampling strategies to label fewer, higher-value examples first.

Example low-cost starter estimate (no fabricated numbers)

A small pilot typically includes: – 10–100 MB of text files or compressed images in Object Storage – 1–2 exports – 1–3 labelers for a few hours Costs will usually be dominated by human time, while OCI costs are mainly Object Storage requests and storage. Use the OCI Cost Estimator to model your region and expected storage and requests.

Example production cost considerations

In production, typical cost planning includes: – Raw data: 1–10+ TB Object Storage – Exported versions: multiple TB over time – High request rates due to frequent dataset refresh – Substantial people costs for labeling and QA – Downstream training GPU costs (often much larger than storage)

10. Step-by-Step Hands-On Tutorial

This lab is designed to be safe, low-cost, and beginner-friendly, while still reflecting a real workflow: store raw data in Object Storage, create a dataset in Oracle Cloud Data Labeling, label records, export labeled data, and clean up.

Objective

Create a small text classification dataset in Oracle Cloud Data Labeling, label a handful of records, export the labeled dataset to Object Storage, and verify the export.

Lab Overview

You will: 1. Create two Object Storage buckets: one for raw records and one for exports. 2. Upload a few small .txt files that represent records to be labeled. 3. Create a Data Labeling dataset and a label set (e.g., positive, negative). 4. Label records in the OCI Console labeling UI. 5. Export the labeled dataset to an export bucket. 6. Validate the exported objects exist in Object Storage. 7. Clean up all resources.

Note on UI navigation: The Console location for Data Labeling can change. If you don’t see it under “Analytics and AI,” use the Console search bar for Data Labeling.

Step 1: Create a compartment (recommended)

Why: Keeps lab resources isolated for cleanup and least privilege.

Console 1. Open OCI Console. 2. Go to Identity & Security → Compartments. 3. Click Create Compartment. 4. Name: dl-lab 5. Click Create.

Expected outcome: A new compartment dl-lab exists.

Step 2: Create Object Storage buckets

You will create: – dl-lab-raw (raw text records) – dl-lab-export (exported labeled data)

Option A: Console 1. Go to Storage → Object Storage → Buckets 2. Select compartment: dl-lab 3. Click Create Bucket 4. Bucket name: dl-lab-raw
Default storage tier is fine for a lab. 5. Create another bucket: dl-lab-export

Option B: OCI CLI (optional) Prereqs: – OCI CLI configured (oci setup config) – You know your namespace

Get namespace:

oci os ns get

Create buckets (replace <COMPARTMENT_OCID>):

oci os bucket create \
  --compartment-id <COMPARTMENT_OCID> \
  --name dl-lab-raw

oci os bucket create \
  --compartment-id <COMPARTMENT_OCID> \
  --name dl-lab-export

Expected outcome: Two buckets exist in your selected region.

Step 3: Upload sample text records to the raw bucket

Create a local folder and add a few .txt files.

On your machine

mkdir -p dl-lab-records
cat > dl-lab-records/001.txt << 'EOF'
I love how fast the delivery was. Great experience.
EOF

cat > dl-lab-records/002.txt << 'EOF'
The item arrived broken and support was unhelpful.
EOF

cat > dl-lab-records/003.txt << 'EOF'
It is okay. Not bad, not great—just average.
EOF

Upload to Object Storage.

Option A: Console 1. Open Storage → Object Storage → Buckets → dl-lab-raw 2. Click Upload 3. Upload 001.txt, 002.txt, 003.txt

Option B: OCI CLI Replace <NAMESPACE>:

oci os object put --namespace-name <NAMESPACE> \
  --bucket-name dl-lab-raw \
  --name records/001.txt --file dl-lab-records/001.txt

oci os object put --namespace-name <NAMESPACE> \
  --bucket-name dl-lab-raw \
  --name records/002.txt --file dl-lab-records/002.txt

oci os object put --namespace-name <NAMESPACE> \
  --bucket-name dl-lab-raw \
  --name records/003.txt --file dl-lab-records/003.txt

Expected outcome: You can see three objects in the bucket under prefix records/.

Step 4: Create IAM access for labelers (minimum required)

If you are doing this lab as an administrator in your tenancy, you might already have access. For real teams, create a group (e.g., DataLabelers) and grant the minimum necessary permissions.

Because IAM policy syntax and resource families must be exact, use this step as a checklist and verify in official docs: – Data labelers need to: – use/manage Data Labeling resources (dataset operations and labeling) – read input objects from dl-lab-raw – write export objects to dl-lab-export

Expected outcome: Your user (or group) can create datasets and read/write the relevant buckets.

Verification step (practical): – In the Console, confirm you can: – list objects in dl-lab-raw – create a Data Labeling dataset If either fails, troubleshoot IAM before proceeding.

Step 5: Create a Data Labeling dataset (text)

Console 1. Navigate to Data Labeling in the OCI Console (use search if needed). 2. Select compartment: dl-lab 3. Click Create dataset 4. Name: sentiment-lab 5. Choose dataset type: Text (or the closest equivalent shown) 6. Create the dataset.

Expected outcome: Dataset sentiment-lab exists and is empty (no records yet) or ready for record import.

Step 6: Add/import records from Object Storage

Console 1. Open dataset sentiment-lab 2. Find the option to Add records / Import data (exact wording varies) 3. Select: – Bucket: dl-lab-raw – Prefix: records/ 4. Start the import.

Expected outcome: Dataset shows 3 records available for labeling.

Verification: – The dataset record list displays 001.txt, 002.txt, 003.txt (or equivalent record identifiers).

Step 7: Create a label set for sentiment

Console 1. In dataset settings (or label configuration), create a label set with labels: – positive – negative – neutral 2. Save the label set.

Expected outcome: Labelers can choose only these labels, improving consistency.

Step 8: Label the records in the labeling UI

Console 1. Open the dataset and choose Start labeling / Label (exact wording varies) 2. For each record: – 001.txt → positive – 002.txt → negative – 003.txt → neutral 3. Save/submit labels.

Expected outcome: Each record shows as labeled, and dataset progress indicates 3/3 labeled (or similar).

Step 9: Export the labeled dataset to Object Storage

Console 1. In the dataset, find Export (or “Export annotations”) 2. Choose target: – Bucket: dl-lab-export – Prefix: exports/sentiment-lab/ (recommended) 3. Choose export format (if prompted).
If multiple formats exist, choose the one best aligned with your training toolchain. If unsure, choose the default and inspect the output. 4. Start export.

Expected outcome: Export completes successfully and objects appear in dl-lab-export under exports/sentiment-lab/.

Validation

Validate from Object Storage.

Console 1. Go to Storage → Object Storage → Buckets → dl-lab-export 2. Open exports/sentiment-lab/ 3. Confirm one or more export files exist (e.g., manifest/annotation files).

Optional CLI validation

oci os object list --namespace-name <NAMESPACE> \
  --bucket-name dl-lab-export \
  --prefix exports/sentiment-lab/

You should see exported objects listed.

Troubleshooting

Issue: “Not authorized” when creating dataset or importing records

Likely cause: Missing IAM policy for Data Labeling and/or Object Storage access.
Fix: – Confirm your user/group has permission for Data Labeling resources in the correct compartment. – Confirm read access to dl-lab-raw objects and write access to dl-lab-export.

Issue: Records import fails or shows zero records

Likely causes: – Wrong bucket/prefix – Objects are not in the region expected – Unsupported file types for the chosen dataset type Fix: – Confirm objects exist under the prefix. – Try importing without a prefix to validate visibility. – Verify supported input file formats in official docs.

Issue: Export completes but you can’t find output files

Likely causes: – Exported to a different bucket/prefix – You lack permission to list objects in export bucket Fix: – Re-check export configuration (bucket and prefix). – Verify IAM permissions on dl-lab-export.

Issue: Labeling UI is slow or errors in browser

Likely causes: Browser extensions, network restrictions, or session timeouts.
Fix: – Try an incognito/private window. – Use a supported browser per OCI Console requirements. – Ensure corporate proxy rules allow OCI Console domains.

Cleanup

To avoid ongoing cost and clutter:

Delete the Data Labeling dataset – Go to Data Labeling → dataset sentiment-lab → Delete
Delete exported objects – Empty dl-lab-export bucket (delete objects)
Delete raw objects – Empty dl-lab-raw bucket (delete objects)
Delete buckets – Delete dl-lab-export – Delete dl-lab-raw
Delete compartment (optional) – If you created dl-lab, delete it after confirming it contains no resources.

Expected outcome: No lab resources remain.

11. Best Practices

Architecture best practices

Keep data close: store raw data and exports in the same region as Data Labeling to reduce latency and avoid cross-region transfer.
Separate raw vs labeled buckets: use distinct buckets or prefixes and separate permissions.
Version your exports: export to a versioned prefix (exports/<dataset>/<YYYY-MM-DD>/) so training runs are reproducible.

IAM/security best practices

Use least privilege:
Labelers often need read access to raw records and write access only to exports (not delete).
Admins manage datasets, label sets, and export configuration.
Use groups (DataLabelers, DataLabelAdmins) rather than individual user policies.
Use compartment boundaries:
Keep raw data in a compartment with stricter access.
Keep labeling projects in a separate compartment.
Require MFA for labeler accounts where possible.

Cost best practices

Use Object Storage lifecycle policies for old exports.
Avoid repeatedly exporting full datasets if incremental export strategies work for your pipeline (verify what export options exist).
Keep objects compressed and reasonably sized.

Performance best practices

Organize objects with sensible prefixes (records/, exports/) for manageable listing and operations.
Avoid extremely large single objects that are slow for labelers to open.

Reliability best practices

Treat export artifacts as immutable training inputs: don’t overwrite; write new versions.
Keep a backup of label taxonomy and labeling guidelines outside the tool (e.g., a controlled doc) to prevent drift.

Operations best practices

Standardize naming:
Dataset names: <project>-<datatype>-<purpose>
Buckets: <env>-<team>-<purpose>
Use tags:
project, environment, owner, cost-center, data-classification
Establish a review process (human QA) for label quality:
Sampling-based review
Inter-annotator agreement checks (if your workflow supports multiple labelers)

Governance best practices

Define data classification and retention rules:
Are you labeling PII? If yes, enforce access controls and minimize exposure.
Keep audit logs retention aligned with compliance policies.

12. Security Considerations

Identity and access model

OCI IAM governs access using:
Users, groups, policies
Compartments as authorization boundaries
Use separate groups for:
Labelers: can label, view records, export if needed
Admins: can create datasets, manage label sets, manage exports

Encryption

Object Storage supports encryption at rest (Oracle-managed keys by default).
For stricter control, consider customer-managed keys with OCI Vault/KMS (verify supported configurations and organizational requirements).

Network exposure

Console-based labeling uses HTTPS to OCI endpoints.
For data processing pipelines that run in VCN (e.g., training jobs), use a Service Gateway for private access to Object Storage where appropriate.

Secrets handling

Prefer OCI-native auth (IAM principals, instance principals, resource principals) over embedding API keys in scripts.
If using API keys for OCI CLI/SDK, store and rotate them securely.

Audit/logging

OCI Audit records API calls for supported services.
Use Audit to track dataset creation, deletion, and export actions.
If you need additional operational observability, log and monitor:
Export object creation in Object Storage
Downstream training pipeline results

Compliance considerations

If labeling data includes PII/PHI or regulated content:
Minimize access to raw data (need-to-know)
Consider redaction or anonymization before labeling
Establish retention and deletion policies
Document your labeling SOPs (standard operating procedures)

Common security mistakes

Granting labelers broad manage object-family permissions across the tenancy.
Storing raw sensitive data and exports in the same bucket with permissive policies.
Exporting labeled datasets to public buckets or generating pre-authenticated requests without controls.

Secure deployment recommendations

Use dedicated compartments for raw, labeling, and training.
Use strict bucket policies; limit to specific buckets and prefixes.
Apply consistent tagging and ownership.
Periodically review IAM policies and group membership.

13. Limitations and Gotchas

These are common limitations/pitfalls seen in managed labeling workflows. For service-specific hard limits, verify in official Data Labeling docs.

Known limitations (verify specifics)

Region availability may be limited compared to core OCI services.
Supported data types and annotation types may not cover all needs (e.g., advanced polygon segmentation, 3D point clouds).
Export formats may require transformation before training.

Quotas and service limits

Dataset count, record count, or concurrency limits may apply.
Object Storage request rate limits can be hit during bulk operations.

Regional constraints

Buckets are regional. Keep raw and export buckets in the same region as the dataset for simpler operations.

Pricing surprises

Even if Data Labeling has minimal direct cost, you can be charged for:
Object Storage capacity and requests
Network egress for downloads
Downstream compute training (often the major cloud cost)

Compatibility issues

Your ML training framework expects a particular annotation schema; you may need a conversion step.
File naming conventions and character sets can cause import issues.

Operational gotchas

Changing label definitions mid-project creates dataset versioning challenges.
Mixed labeling standards across labelers reduces model accuracy; invest in guidelines and QA.
Browser-based labeling can be impacted by session timeouts; plan work accordingly.

Migration challenges

Migrating from Label Studio/CVAT/doccano requires mapping label schemas and export formats.
Ensure consistent class names, IDs, and annotation coordinate conventions.

Vendor-specific nuances

OCI resource organization via compartments is powerful but can confuse new teams; document your compartment strategy early.

14. Comparison with Alternatives

Data labeling exists across clouds and in open-source tools. The best choice depends on annotation complexity, governance needs, workforce model, and integration targets.

Comparison table

Option	Best For	Strengths	Weaknesses	When to Choose
Oracle Cloud Data Labeling	Teams already on OCI needing governed labeling workflows	Tight OCI IAM/compartment integration, Object Storage-native, good for governed environments	Feature set and annotation types may be narrower than specialized tools; region availability can vary	You want labeling inside OCI with standard governance and easy export to OCI Data Science
OCI Data Science (adjacent)	End-to-end OCI ML lifecycle	Training and MLOps features; integrates with Object Storage	Not a labeling tool by itself	Use alongside Data Labeling for training and deployment
AWS SageMaker Ground Truth	AWS-native labeling with managed workforce options	Mature ecosystem, workforce options, strong integrations	AWS lock-in; pricing and workforce features vary	You’re AWS-first and want built-in workforce and tight SageMaker integration
Google Cloud Data Labeling Service / Vertex AI labeling	GCP-native ML pipelines	Vertex AI integration, managed workflows	GCP lock-in; feature availability varies	You’re on Vertex AI and want cloud-native labeling
Azure Machine Learning Data Labeling	Azure ML users	Integrated with Azure ML pipelines	Azure lock-in; workflow complexity can vary	You’re Azure-first and training in Azure ML
Label Studio (open-source / self-managed)	Custom workflows, extensibility	Highly flexible, plugin ecosystem	You manage hosting, scaling, security, upgrades	You need custom annotation types and can operate the platform
CVAT (open-source)	Computer vision annotation (boxes, polygons, etc.)	Strong CV annotation capabilities	Self-managed ops burden	You need advanced vision annotation beyond managed service capabilities
doccano (open-source)	NLP labeling (classification/NER)	Good for text workflows	Self-managed; limited enterprise governance out-of-box	You need NLP-focused annotation with customization

15. Real-World Example

Enterprise example: Regulated customer communications classification

Problem: A financial services enterprise needs labeled text data from customer communications (emails/tickets) to train a classifier for routing and compliance flagging. Data includes sensitive content and requires strict audit trails.
Proposed architecture:
Raw communications stored in OCI Object Storage in a restricted compartment (fin-raw).
Data Labeling datasets in fin-labeling compartment.
Exported labeled datasets written to fin-exports compartment/bucket with stricter write controls and immutable versioning.
Training in OCI Data Science using exported datasets; model artifacts stored in Object Storage.
Governance with IAM groups (Labelers, Reviewers, MLAdmins), tagging, and Audit retention.
Why Data Labeling was chosen:
OCI IAM/compartment governance, centralized workflows, and auditability.
Keeps data inside Oracle Cloud boundaries.
Expected outcomes:
Faster dataset creation with controlled label taxonomy.
Improved model performance due to consistent labels.
Audit-ready processes demonstrating controlled access and traceable exports.

Startup/small-team example: Quick sentiment model MVP

Problem: A startup wants to launch a sentiment feature in 2 weeks. They have 2,000 reviews but no labels.
Proposed architecture:
Reviews stored as .txt objects in a single Object Storage bucket.
One Data Labeling dataset with three labels (positive/neutral/negative).
Weekly export to Object Storage; training done in a small OCI Data Science notebook/job.
Why Data Labeling was chosen:
Minimal infrastructure; no need to host an annotation platform.
Simple workflow with fast iteration.
Expected outcomes:
MVP dataset labeled quickly.
Repeatable export and retraining loop.
Clear path to add QA and multi-labeler review later.

16. FAQ

Is Oracle Cloud Data Labeling a separate product or part of OCI?
It is an OCI service commonly referred to as Data Labeling (or Data Labeling service). It integrates with OCI services like Object Storage and IAM.
Do I need OCI Object Storage to use Data Labeling?
In most practical workflows, yes—raw records and exported labels are commonly stored in Object Storage.
Does Data Labeling provide human labelers (a workforce)?
Typically, you use your own OCI users (employees/contractors). If you need a managed workforce, verify current Oracle offerings and your contract terms.
What data types can I label?
Commonly images and text, and sometimes documents depending on the service version and region. Verify supported data types in official docs.
Can I automate dataset creation and export?
Usually yes via OCI APIs (and often CLI/SDK). Confirm API coverage in the Data Labeling API reference.
How do I control who can label data?
Use OCI IAM groups and policies scoped to the dataset compartment and Object Storage buckets.
How do I prevent label taxonomy drift?
Lock down who can edit label sets, document labeling guidelines, and version your exports.
Where are labels stored?
Labels/annotations are managed by the service and can be exported to Object Storage for training and archiving.
What export formats are supported?
Export formats vary by dataset type and service version. Verify supported formats and validate with a small pilot export.
Can I use the labeled output with OCI Data Science?
Yes—export to Object Storage and consume the exported artifacts in OCI Data Science jobs/notebooks.
How do I track labeling progress?
The Console typically shows dataset/job progress and labeled counts. For automation, use APIs to query status (verify exact endpoints).
Is Data Labeling suitable for large-scale annotation (millions of records)?
Potentially, but operational planning is needed: quotas, throughput, object organization, and QA processes. Validate limits in official docs.
How do I handle sensitive data like PII?
Restrict access, consider redaction/anonymization, enforce encryption and audit retention, and follow compliance requirements.
Can multiple labelers label the same record for agreement checks?
Some labeling systems support multiple annotations per record; verify whether OCI Data Labeling supports this natively in your tenancy.
What’s the fastest way to start?
Store a small set of records in Object Storage, create a dataset, define labels, label a few records, export, and confirm your training pipeline can read the export.

17. Top Online Resources to Learn Data Labeling

Resource Type	Name	Why It Is Useful
Official documentation	OCI Data Labeling Documentation: https://docs.oracle.com/en-us/iaas/data-labeling/	Primary source for supported data types, workflows, APIs, and limits
Official API reference	OCI APIs (start here, then navigate to Data Labeling): https://docs.oracle.com/en-us/iaas/api/	Authoritative API operations and schemas for automation
Official CLI install	OCI CLI Installation: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm	Helps you automate Object Storage + OCI operations
Official Object Storage docs	Object Storage Overview: https://docs.oracle.com/en-us/iaas/Content/Object/Concepts/objectstorageoverview.htm	Required to manage input/output data and lifecycle policies
Official IAM docs	IAM Overview: https://docs.oracle.com/en-us/iaas/Content/Identity/Concepts/overview.htm	Required for secure access design for labelers/admins
Official Audit docs	Audit Overview: https://docs.oracle.com/en-us/iaas/Content/Audit/Concepts/auditoverview.htm	Understand audit trails and governance
Official pricing	OCI Pricing: https://www.oracle.com/cloud/pricing/	Understand cost model and billing dimensions
Official cost estimator	OCI Cost Estimator: https://www.oracle.com/cloud/costestimator.html	Create region-specific estimates without guessing
Architecture center	OCI Architecture Center: https://docs.oracle.com/solutions/	Reference architectures that help design production ML platforms
Training (official)	Oracle Cloud training portal: https://education.oracle.com/	Look for OCI Data Science / AI learning paths that reference labeling workflows
Community learning	Oracle Cloud Infrastructure blog: https://blogs.oracle.com/cloud-infrastructure/	Practical posts and updates (verify against docs)
Samples (check availability)	Oracle GitHub (search OCI AI/Data Science samples): https://github.com/oracle	May include reference code; validate compatibility and recency

18. Training and Certification Providers

Institute	Suitable Audience	Likely Learning Focus	Mode	Website URL
DevOpsSchool.com	DevOps engineers, cloud engineers, platform teams	OCI fundamentals, MLOps/DevOps practices, automation basics	Check website	https://www.devopsschool.com/
ScmGalaxy.com	Students, engineers moving into DevOps/Cloud	CI/CD, SCM, cloud fundamentals	Check website	https://www.scmgalaxy.com/
CLoudOpsNow.in	Cloud operations and engineering teams	Cloud ops, monitoring, governance	Check website	https://www.cloudopsnow.in/
SreSchool.com	SREs, reliability engineers, platform teams	SRE practices, production ops, reliability patterns	Check website	https://www.sreschool.com/
AiOpsSchool.com	Ops + data teams	AIOps concepts, monitoring automation	Check website	https://www.aiopsschool.com/

19. Top Trainers

Platform/Site	Likely Specialization	Suitable Audience	Website URL
RajeshKumar.xyz	DevOps/cloud training content (verify specific offerings)	Beginners to intermediate engineers	https://rajeshkumar.xyz/
devopstrainer.in	DevOps training and mentoring (verify OCI coverage)	Engineers seeking structured training	https://www.devopstrainer.in/
devopsfreelancer.com	Freelance DevOps support/training (verify scope)	Teams needing hands-on guidance	https://www.devopsfreelancer.com/
devopssupport.in	DevOps support services and learning resources (verify scope)	Ops teams and practitioners	https://www.devopssupport.in/

20. Top Consulting Companies

Company	Likely Service Area	Where They May Help	Consulting Use Case Examples	Website URL
cotocus.com	Cloud/DevOps consulting (verify OCI specialization)	Architecture, automation, operationalization	Designing secure OCI compartments; setting up Object Storage governance; pipeline automation	https://cotocus.com/
DevOpsSchool.com	DevOps & cloud consulting/training	Platform enablement, CI/CD, cloud ops	Building MLOps-ready landing zones; IAM best practices; operational playbooks	https://www.devopsschool.com/
DEVOPSCONSULTING.IN	DevOps consulting services	Delivery acceleration, DevOps toolchains	Implementing CI/CD; infrastructure automation; operational readiness reviews	https://www.devopsconsulting.in/

21. Career and Learning Roadmap

What to learn before this service

OCI basics: regions, compartments, VCN concepts
OCI IAM: users, groups, policies, dynamic groups
OCI Object Storage: buckets, prefixes, lifecycle policies
ML fundamentals: supervised learning, train/validation/test splits
Data governance basics: labeling guidelines, QA processes

What to learn after this service

OCI Data Science: projects, notebooks, jobs, model deployment
MLOps practices: versioning datasets, reproducible training runs, pipeline automation
Monitoring ML systems: data drift detection concepts, evaluation pipelines
Security deep dives: Vault/KMS, security zones, Cloud Guard (where applicable)

Job roles that use it

Data Scientist
ML Engineer
MLOps Engineer
Cloud/Platform Engineer supporting AI platforms
Data/AI Program Manager (for governance and throughput planning)
Security Engineer (reviewing access and compliance controls)

Certification path (if available)

Oracle certifications change over time. Start with: – OCI foundations certifications – OCI Data Science/AI learning paths (if offered) Check the official Oracle training portal: – https://education.oracle.com/

Project ideas for practice

Build a sentiment classifier using labeled text exported from Data Labeling and trained in OCI Data Science.
Create an end-to-end pipeline: upload new raw records daily → label weekly → export → retrain monthly.
Implement dataset governance: compartment design + IAM least privilege + tagging + lifecycle rules.
Create a conversion script that transforms exported labels into the exact format required by your ML framework.

22. Glossary

Annotation: The label information applied to a record (e.g., class label for text, bounding box for image).
Compartment (OCI): A logical container for organizing and isolating OCI resources for access control and billing.
Dataset: A managed collection of records to be labeled.
Export: The process of writing labeled annotations to a file/object format in Object Storage for training use.
IAM Policy (OCI): A statement defining who can do what on which resources in OCI.
Label set: The defined list of allowed labels/categories used for consistent tagging.
Object Storage: OCI service for storing unstructured data as objects in buckets.
Record: A single data item to label (e.g., one text file or image object).
Supervised learning: ML training method where the model learns from labeled examples.
Tenancy (OCI): Your OCI account boundary containing compartments, IAM, and resources.

23. Summary

Oracle Cloud Data Labeling (Analytics and AI) is a managed OCI service for creating and exporting labeled datasets used in supervised machine learning. It fits naturally into OCI architectures by integrating with Object Storage for data, IAM for access control, and Audit for governance.

Cost planning should focus on the real drivers: Object Storage usage, export/versioning strategy, network egress if data leaves the region, downstream training compute, and—most importantly—human labeling time. Security should be built around least-privilege IAM, compartment separation, encryption, and auditability.

Use Data Labeling when you want a governed, OCI-native workflow for labeling data that will feed training pipelines such as OCI Data Science. The best next step is to run a small pilot (like the lab above), inspect export formats, and then formalize labeling guidelines, QA checks, and dataset versioning for production.

rajeshkumar

Category