Category
Analytics and AI
1. Introduction
Vision is an Oracle Cloud (OCI) Analytics and AI service for analyzing images (and, in some configurations, documents) using machine learning. It’s designed to help you extract meaning from visual content—such as identifying objects, reading text (OCR), and classifying images—without having to build and train deep learning models from scratch.
In simple terms: you send an image to Vision, and it returns structured results (labels, bounding boxes, confidence scores, detected text, and related metadata). You can use those results to automate business processes (e.g., content moderation, inventory checks), enrich analytics, or drive downstream workflows (e.g., routing, alerting, tagging).
Technically, Vision is delivered as a managed API service within Oracle Cloud Infrastructure. You call the service through the OCI Console, REST APIs, or OCI SDKs. Vision supports prebuilt models for common computer vision tasks and (depending on the current product capabilities in your region/tenancy) may support custom model training for tasks like image classification and object detection. Always confirm exact feature availability and limits in the official docs for your region.
What problem it solves: turning unstructured visual data (images, scans, photos, screenshots) into structured information that systems can search, analyze, and act on—without operating GPU infrastructure or managing model lifecycles end-to-end yourself.
2. What is Vision?
Official purpose (what Vision is for)
Vision is Oracle Cloud’s managed computer vision service under the Analytics and AI portfolio. Its purpose is to provide API-driven image understanding so teams can add visual intelligence to applications and workflows.
Because Oracle product pages and console labels can evolve, you may see the service referred to as “Vision”, “OCI Vision”, or “AI Vision” in various places. The core service is the same; verify the latest naming and feature set in the official documentation: – Docs home (Vision): https://docs.oracle.com/en-us/iaas/vision/vision/
Core capabilities (high level)
Common Vision capabilities include (confirm current availability in your region): – Image classification (categorize an image into labels) – Object detection (find and locate objects with bounding boxes) – Text detection / OCR (extract printed text from images) – Asynchronous jobs for longer-running analyses (where supported) – Custom model workflows (project/dataset/model lifecycle) where supported
Major components (what you interact with)
Depending on your workflow, you’ll typically use:
- Vision API / SDK – The runtime endpoint you call to analyze images (and possibly documents).
- OCI Identity and Access Management (IAM) – Policies controlling who can call Vision, create projects, and access datasets.
- Object Storage (common companion service) – Frequently used to store images and training datasets.
- Projects / Datasets / Models (for custom vision, where available) – Resource containers and lifecycle objects for training and managing custom models.
Service type
- Managed AI service (API-driven): You don’t manage model servers, scaling groups, or GPU drivers for prebuilt inference.
- If you use custom training/hosting features (where available), you still consume them as managed workflows rather than building the entire ML platform yourself.
Scope: regional vs global
Vision is an OCI service that is typically regional (most OCI services are). That means: – You select a region in the Console. – Vision resources and endpoints are associated with that region. – Data residency, latency, and service availability vary by region.
Always confirm region support: – OCI Regions: https://www.oracle.com/cloud/public-cloud-regions/
How Vision fits into the Oracle Cloud ecosystem
Vision commonly integrates with: – Object Storage for image storage and dataset management – Functions for event-driven automation – Events (and/or Notifications) for pipeline orchestration – API Gateway for exposing controlled endpoints to external callers – Logging & Audit for governance and traceability – Data Science when you need deeper customization than managed Vision features provide
3. Why use Vision?
Business reasons
- Automate manual review: reduce human effort for tagging, sorting, and reading information from images.
- Speed up operations: process large image volumes consistently (e.g., quality checks, documentation intake).
- Improve search and discovery: turn image content into searchable metadata.
- Enable new product features: visual search, automated compliance checks, and intelligent routing.
Technical reasons
- API-first: integrate into apps with REST/SDK calls.
- No GPU ops for prebuilt inference: you avoid managing scaling, patching, and inference servers.
- Structured outputs: bounding boxes, labels, confidence scores support deterministic downstream logic.
- Composable with OCI: pairs well with Object Storage, Functions, Streaming, and Observability.
Operational reasons
- Managed service: reduces operational burden compared to self-hosting OpenCV + OCR + deep learning stacks.
- IAM-driven access: centralized access control through OCI policies.
- Auditability: API calls can be captured via OCI Audit (verify exact event coverage in your tenancy).
Security/compliance reasons
- OCI IAM policies: enforce least privilege.
- Encryption: OCI services generally support encryption at rest and in transit; verify service-specific details.
- Data residency: regional service behavior helps meet locality requirements (confirm per region).
Scalability/performance reasons
- Elastic scaling for API workloads (within service limits).
- Batch patterns using asynchronous jobs and Object Storage can scale better than synchronous single-image calls in high-throughput pipelines.
When teams should choose Vision
Choose Vision when you want: – Quick integration of computer vision features – Managed inference for common tasks (classification/detection/OCR) – Strong alignment with OCI-native pipelines and governance
When teams should not choose Vision
Consider alternatives when: – You need very specialized models (medical imaging, niche industrial vision) and managed features aren’t sufficient – You require full control over model architectures, training regimes, and inference runtimes – You must run fully on-prem or in an environment without OCI connectivity – Your workload depends on a feature that isn’t available in your region/edition (verify in docs)
4. Where is Vision used?
Industries
- Retail and e-commerce (catalog tagging, counterfeit detection support)
- Manufacturing (visual QA, defect detection—often custom)
- Logistics (label reading, package inspection images)
- Financial services (document intake augmentation—often combined with specialized document services)
- Media and advertising (content tagging, brand detection workflows)
- Public sector (archival digitization, compliance processing)
Team types
- Application developers adding visual intelligence
- Data/ML teams wanting a managed baseline before custom ML
- Platform teams building shared AI services for internal consumers
- Security teams building content review pipelines
- Operations teams automating inspection workflows
Workloads
- Real-time API calls from web/mobile apps (with careful latency considerations)
- Batch processing for large image sets (preferred for cost and throughput predictability)
- Event-driven pipelines triggered by object creation in Object Storage
Architectures
- Serverless pipelines: Object Storage → Events → Functions → Vision → results to DB/Search
- Microservices: API service calls Vision and enriches metadata stored in a database
- Data lake enrichment: Vision output stored as JSON alongside images for analytics
Production vs dev/test usage
- Dev/test: validate accuracy, tune thresholds, design workflows, and estimate cost drivers.
- Production: implement retry logic, backpressure controls, audit logging, and clear data retention policies.
5. Top Use Cases and Scenarios
Below are realistic scenarios where Vision typically fits well. Exact outputs and supported feature sets depend on the model type and your region—verify with the official docs.
1) Product catalog auto-tagging (retail)
- Problem: Thousands of product photos need consistent tags for search and recommendations.
- Why Vision fits: Image classification/object detection can produce metadata at scale.
- Example: Upload new product images to Object Storage; a function calls Vision and stores labels in a product DB.
2) Warehouse inventory spot checks
- Problem: Manual auditing of shelf images is slow and inconsistent.
- Why Vision fits: Object detection can detect items and count approximate instances (with caution).
- Example: Daily camera snapshots are analyzed; discrepancies trigger a ticket.
3) OCR for labels and signage
- Problem: Extract text from shipping labels, shelf tags, or safety signage.
- Why Vision fits: Text detection (OCR) returns extracted strings and locations.
- Example: A mobile app captures a label photo; the backend calls Vision OCR and validates formatting.
4) Content moderation support (basic)
- Problem: You need to reduce exposure to disallowed or sensitive imagery.
- Why Vision fits: Classification can assist moderation workflows (often as a first pass).
- Example: User uploads go to a review queue if Vision assigns certain risk labels (ensure human review for edge cases).
5) Brand/logo detection workflows (marketing ops)
- Problem: Find brand presence across media assets.
- Why Vision fits: Object detection or custom models can help identify specific logos (custom often required).
- Example: Marketing scans event photos and tags those containing a sponsor’s logo.
6) Insurance claim intake enrichment
- Problem: Photos submitted for claims need triage and categorization.
- Why Vision fits: Classification + object detection can route claims to the right team.
- Example: Damage photos are classified by type; a rules engine assigns a claim category.
7) Manufacturing visual inspection (custom model path)
- Problem: Defect patterns are specific to your production line.
- Why Vision fits: If custom model workflows are available, you can train for your defect classes.
- Example: Operators capture images; Vision detects defect regions and returns bounding boxes for QA review.
8) Document scan pre-processing (paired with document services)
- Problem: You receive mixed scans (forms, IDs, receipts) and want to route them.
- Why Vision fits: Basic OCR and classification can act as a router before deeper document extraction.
- Example: Vision extracts key text; routing logic selects a specialized document understanding step (verify OCI service fit).
9) Security operations: camera snapshot triage
- Problem: Review too many camera snapshots and false alarms.
- Why Vision fits: Object detection can identify people/vehicles in snapshots (accuracy varies; validate thoroughly).
- Example: Only images with detected person/vehicle objects are escalated.
10) Accessibility and alt-text generation support
- Problem: Need baseline descriptions/tags to help generate alt-text or metadata.
- Why Vision fits: Classification results can seed human-reviewed alt-text.
- Example: Vision tags images; a content team approves and publishes descriptive metadata.
11) Duplicate and near-duplicate media triage (supporting feature)
- Problem: Storing many similar images drives cost and complicates discovery.
- Why Vision fits: Vision isn’t primarily a dedup tool, but consistent tagging can help cluster assets for review.
- Example: Combine Vision tags with perceptual hashing in your app to find candidates for deduplication.
12) Compliance archiving metadata enrichment
- Problem: Large archives of images are unsearchable.
- Why Vision fits: Batch classification/OCR produces metadata to power search and retrieval.
- Example: Nightly job processes new archives, storing tags and extracted text in a searchable index.
6. Core Features
Note: Vision capabilities can vary by region, model type, and service updates. Confirm exact capabilities and quotas in the official docs: https://docs.oracle.com/en-us/iaas/vision/vision/
Feature: Prebuilt image classification
- What it does: Assigns one or more labels/categories to an image.
- Why it matters: Useful for tagging, routing, and search metadata.
- Practical benefit: You can build auto-tagging and content organization quickly.
- Caveats: Labels are only as good as the prebuilt taxonomy; validate accuracy on your domain data.
Feature: Prebuilt object detection
- What it does: Detects objects and returns bounding boxes and confidence scores.
- Why it matters: Enables locating items within images, not just labeling the image.
- Practical benefit: Supports inspection, counting (approximate), and UI overlays.
- Caveats: Small/occluded objects and domain-specific objects may perform poorly without customization.
Feature: Text detection (OCR)
- What it does: Detects and extracts text from images; often returns text locations.
- Why it matters: Converts printed text into machine-readable content.
- Practical benefit: Automates data entry and indexing of scanned images.
- Caveats: Low-quality images, handwriting, rotated text, or unusual fonts can reduce accuracy. Verify language support.
Feature: Asynchronous processing (where supported)
- What it does: Submits longer-running analysis as a job and retrieves results later.
- Why it matters: Better for batch and high-volume workflows; avoids client timeouts.
- Practical benefit: Supports pipelines that process thousands of images reliably.
- Caveats: Requires job orchestration and result polling/notification.
Feature: Custom model training (where supported)
- What it does: Lets you train models on your labeled images for classification or detection.
- Why it matters: Improves accuracy for domain-specific objects and classes.
- Practical benefit: Tailors the model to your products, defects, logos, or specialized categories.
- Caveats: Requires labeled data, evaluation discipline, and governance; costs and limits differ from prebuilt inference.
Feature: Project/dataset/model resource management (custom workflows)
- What it does: Organizes training data, model versions, and lifecycle states.
- Why it matters: Enables repeatable ML operations with traceability.
- Practical benefit: Supports promotion from dev → staging → production models.
- Caveats: Ensure IAM and compartments are structured for separation of duties.
Feature: REST API + OCI SDK support
- What it does: Provides programmatic access using signed requests and official SDKs.
- Why it matters: Enables automation and integration into CI/CD and microservices.
- Practical benefit: You can build reliable pipelines using standard OCI auth mechanisms.
- Caveats: Keep SDK versions current; watch for API version changes (check release notes).
Feature: OCI IAM integration
- What it does: Uses OCI policies for authorization and compartment scoping.
- Why it matters: Centralized governance and least privilege.
- Practical benefit: You can restrict access by group, compartment, and verb (read/use/manage).
- Caveats: Policy resource types must match Vision’s current IAM vocabulary—verify in docs.
Feature: OCI Audit visibility (governance)
- What it does: Captures control-plane API events for compliance and investigations.
- Why it matters: Helps answer “who did what, when.”
- Practical benefit: Supports regulated environments.
- Caveats: Confirm which events are logged and retention requirements in your org.
7. Architecture and How It Works
High-level service architecture
At a high level: 1. Your application (or script) authenticates with OCI (user principal, instance principal, or resource principal). 2. You call Vision’s API endpoint in your region. 3. Vision runs the selected model(s) and returns structured results. 4. Your application stores results (JSON) and triggers downstream actions.
Request/data/control flow (typical patterns)
- Synchronous (simple, low-volume):
- Client → Vision API → immediate response
- Asynchronous/batch (preferred for scale):
- Upload to Object Storage → submit job → poll job status / receive event → fetch results → store outputs
Integrations with related OCI services
Common integrations include: – Object Storage: store images and datasets; control access via IAM. – Functions: run event-driven analysis when a new object is uploaded. – Events + Notifications: route job completions and operational alerts. – API Gateway: secure frontend for external callers (rate limiting, auth). – Streaming: decouple ingestion and processing for high throughput. – Autonomous Database / Oracle Database: store metadata and results. – OpenSearch (self-managed) or search layer: index extracted labels/text.
Dependency services (most common)
- OCI IAM (users, groups, policies)
- Object Storage (optional but common)
- Observability (Audit; optionally Logging/Monitoring where applicable)
Security/authentication model
Vision uses standard OCI request authentication: – API signing keys (users) – Instance principals (workloads running on OCI compute) – Resource principals (some managed services) – Dynamic groups + policies to authorize workloads without long-lived keys
Networking model
- Vision is consumed via OCI regional endpoints over HTTPS.
- If you need private network egress control, you typically route calls through controlled NAT/proxy patterns from private subnets.
- For strict environments, verify whether Vision supports private endpoints or service gateway patterns in your region. If not certain, verify in official docs and plan accordingly.
Monitoring/logging/governance considerations
- Audit: capture API calls for governance.
- Application logs: log request IDs, latency, and error codes.
- Retries: implement exponential backoff for throttling and transient errors.
- Tagging: apply cost-center and environment tags to supporting resources (buckets, functions, etc.).
Simple architecture diagram (Mermaid)
flowchart LR
A[User / App] -->|HTTPS + OCI Auth| B[Vision API (Regional)]
B --> C[Analysis Result (JSON)]
C --> D[(App DB / Metadata Store)]
Production-style reference architecture (Mermaid)
flowchart TB
subgraph Ingestion
U[Users / Devices] --> APIGW[API Gateway]
APIGW --> UP[Upload Service]
UP --> OS[(Object Storage Bucket)]
end
subgraph Processing
EV[Events] --> FN[Functions: Vision Orchestrator]
OS --> EV
FN -->|Analyze Image| VSN[Vision API (Regional)]
VSN --> FN
FN --> META[(Database for Results)]
FN --> IDX[(Search Index)]
end
subgraph Ops_and_Gov
AUD[OCI Audit]
LOG[App Logging]
MON[Monitoring/Alarms]
end
APIGW -.-> AUD
FN -.-> LOG
FN -.-> MON
VSN -.-> AUD
8. Prerequisites
Tenancy/account requirements
- An active Oracle Cloud tenancy with permission to use Analytics and AI services.
- Access to a compartment where you can create and manage required resources.
Permissions / IAM roles
You typically need: – Permission to call Vision APIs and (optionally) manage Vision resources (projects/datasets/models). – Permission to read images from Object Storage if you use Object Storage as input.
Policy syntax and resource types can evolve; verify the exact IAM policy statements in the official Vision documentation. As a starting point to discuss with your OCI admin, policies often look like:
Allow group <group-name> to use ai-service-vision-family in compartment <compartment-name>
Allow group <group-name> to read object-family in compartment <compartment-name>
If you will create buckets/objects:
Allow group <group-name> to manage object-family in compartment <compartment-name>
Billing requirements
- Vision is generally a paid OCI service with usage-based charges.
- Even if you use an Always Free tenancy or trial credits, confirm whether Vision is included in free allocations in your region. Verify in official pricing.
Tools
Any of the following works: – OCI Console (browser) – OCI Cloud Shell (recommended for labs; includes OCI CLI and common tooling) – OCI CLI (optional) – OCI SDK for Python/Java/Go/Node/.NET (for automation)
OCI Cloud Shell docs: – https://docs.oracle.com/en-us/iaas/Content/API/Concepts/cloudshellintro.htm
Region availability
Vision may not be available in every OCI region. – Check service availability and endpoints in the Vision docs and your OCI Console region selector.
Quotas/limits
Expect limits around: – Requests per second (throttling) – Max image size / resolution – Supported file types – Job concurrency (for asynchronous workflows)
Do not assume numbers—verify in official docs for your region and tenancy.
Prerequisite services (for this tutorial)
- Object Storage (bucket + an uploaded image)
- Cloud Shell (or local Python environment) to call Vision programmatically
9. Pricing / Cost
Pricing changes over time and can be region- and contract-dependent. Use official sources to confirm current SKUs and rates.
Current pricing model (typical dimensions)
Vision is typically priced based on one or more of: – Number of images analyzed (per image or per 1,000 images) – Type of analysis (classification vs detection vs OCR; may have different SKUs) – Asynchronous/batch job usage (if priced differently) – Custom model training (often billed by training time/compute usage where supported) – Model hosting/inference for custom models (may have separate charges where applicable)
Confirm the current Vision pricing here: – Oracle Cloud price list (AI services section): https://www.oracle.com/cloud/price-list/ – Oracle Cloud Pricing Calculator: https://www.oracle.com/cloud/costestimator.html
If the pricing page lists “AI Services” and then “Vision” SKUs, use those entries. If you cannot find Vision listed, search within the price list for “Vision” and “AI Services,” and verify in official docs.
Free tier
- OCI free tier offerings vary. Vision may or may not include a free allotment.
- Do not assume free usage. Verify on:
- https://www.oracle.com/cloud/free/
Primary cost drivers
- Volume of images analyzed per day/month
- Whether you run multiple features per image (e.g., OCR + object detection)
- Re-processing frequency (retries, re-analysis after model updates)
- Data preparation and storage (Object Storage size + requests)
- Egress/data transfer (if you move images/results out of OCI regions)
Hidden/indirect costs to plan for
- Object Storage costs (stored images, lifecycle policies, replication)
- Requests to Object Storage (PUT/LIST/GET can add up at scale)
- Outbound network egress if results are exported to external systems
- Operational tooling (Logging storage, alarms, SIEM export)
- Serverless/compute costs if you orchestrate via Functions or Compute instances
Network/data transfer implications
- Calls to Vision stay within OCI endpoints, but your client location matters:
- On-OCI workloads usually have lower latency and avoid some egress patterns.
- Off-OCI callers may incur internet egress on their side and higher latency.
- Moving large images across regions can increase costs; prefer regional locality.
How to optimize cost
- Use batch/asynchronous patterns for large volumes (reduces retries/timeouts).
- Avoid calling multiple Vision features unless needed (e.g., don’t run OCR if you only need object detection).
- Downscale images responsibly if your use case allows (while maintaining accuracy).
- Cache results; don’t re-process unchanged images.
- Apply Object Storage lifecycle rules (archive or delete old inputs).
- Use tags and budgets to attribute and control spend.
Example low-cost starter estimate (no fabricated numbers)
A realistic starter design: – Store a handful of test images in Object Storage. – Run a small number of Vision analyses per day during development. – Use Cloud Shell and a simple script (no always-on compute).
To estimate accurately: 1. Determine expected images/day and features/image. 2. Look up the Vision SKU rates on the official price list for your region. 3. Add Object Storage monthly cost for stored test images (usually minimal for small volumes). 4. Add any orchestration compute (Functions) if used.
Example production cost considerations (what to model)
For production, model: – Peak and average image ingestion rate – Error rate and retry overhead – Multi-feature calls per image – Storage retention period and replication – Cross-region or cross-cloud export – Monitoring/log retention
10. Step-by-Step Hands-On Tutorial
This lab uses Object Storage + Vision to analyze an image with a prebuilt feature (such as object detection or text detection). It is designed to be safe, beginner-friendly, and relatively low cost.
Because SDK class names and API versions can change, this tutorial prioritizes OCI Cloud Shell + OCI Python SDK and includes guidance on how to validate against official docs.
Objective
Upload an image to Oracle Cloud Object Storage and call Vision to analyze that image, then review structured results and clean up resources.
Lab Overview
You will: 1. Create an Object Storage bucket and upload an image. 2. Use Cloud Shell to authenticate and run a Python script. 3. Call Vision to analyze the image using its Object Storage location. 4. Validate the output. 5. Clean up the bucket and object.
Step 1: Create a bucket and upload a test image (Console)
- Sign in to the Oracle Cloud Console.
- Select the region where Vision is available for your tenancy.
- Go to Storage → Object Storage & Archive Storage → Buckets.
- Choose your compartment.
- Click Create Bucket.
– Bucket name:
vision-lab-bucket-<unique>– Default settings are fine for a lab (unless your org requires encryption keys or specific policies). - Open the bucket and click Upload.
- Upload a small test image: – For OCR testing: a screenshot containing clear printed text. – For object detection: a photo with common objects.
Expected outcome: You have a bucket with one uploaded image object.
Step 2: Capture the Object Storage details you’ll need
You need: – Namespace – Bucket name – Object name – Region – Compartment OCID (sometimes required by service calls)
How to get them: 1. In Object Storage, find Namespace in the console (often shown in bucket details or tenancy details). 2. Copy: – Bucket name – Object name (including any prefix/folder path)
Expected outcome: You have values for namespace, bucket, object, and region.
Step 3: Open Cloud Shell and confirm authentication
- In the OCI Console, click the Cloud Shell icon.
- In Cloud Shell, confirm the region and identity context.
Run:
oci iam region list --query "data[?contains(name,'')].{name:name,key:key}" --output table
Then check your current CLI setup:
oci session validate
If oci session validate is not available in your CLI version, run a simple command such as:
oci iam availability-domain list --output table
Expected outcome: CLI commands work without configuring API keys manually (Cloud Shell is pre-authenticated to your user session).
Step 4: Install/upgrade the OCI Python SDK (Cloud Shell)
Cloud Shell often has Python and SDK available, but versions differ. Upgrade in your user environment:
python3 -m pip install --upgrade --user oci
Confirm:
python3 -c "import oci; print(oci.__version__)"
Expected outcome: The OCI Python SDK imports successfully.
Step 5: Run a Vision analysis script (Python)
Create a file named vision_analyze.py:
cat > vision_analyze.py <<'PY'
import sys
import json
import oci
# -------------------------
# User inputs (edit these)
# -------------------------
NAMESPACE = sys.argv[1]
BUCKET = sys.argv[2]
OBJECT_NAME = sys.argv[3]
# Optional: feature selection (choose one)
FEATURE = sys.argv[4] if len(sys.argv) > 4 else "OBJECT_DETECTION"
# Other common values you may try (verify in docs): IMAGE_CLASSIFICATION, TEXT_DETECTION
# -------------------------
# OCI config and client
# -------------------------
config = oci.config.from_file() # In Cloud Shell, this typically works.
signer = oci.signer.Signer(
tenancy=config["tenancy"],
user=config["user"],
fingerprint=config["fingerprint"],
private_key_file_location=config["key_file"],
pass_phrase=config.get("pass_phrase")
)
# The Vision client module/class naming can evolve across SDK versions.
# As of recent OCI SDK patterns, Vision is under oci.ai_vision with AIServiceVisionClient.
# If this import fails, verify the latest SDK docs for Vision.
from oci.ai_vision import AIServiceVisionClient
from oci.ai_vision.models import (
AnalyzeImageDetails,
ObjectStorageImageDetails,
ObjectStorageLocation,
ImageClassificationFeature,
ObjectDetectionFeature,
TextDetectionFeature
)
client = AIServiceVisionClient(config=config, signer=signer)
# -------------------------
# Build request payload
# -------------------------
image = ObjectStorageImageDetails(
object_storage_location=ObjectStorageLocation(
namespace_name=NAMESPACE,
bucket_name=BUCKET,
object_name=OBJECT_NAME
)
)
if FEATURE == "IMAGE_CLASSIFICATION":
features = [ImageClassificationFeature()]
elif FEATURE == "TEXT_DETECTION":
features = [TextDetectionFeature()]
else:
features = [ObjectDetectionFeature()]
details = AnalyzeImageDetails(
image=image,
features=features
)
# -------------------------
# Call Vision
# -------------------------
resp = client.analyze_image(analyze_image_details=details)
print(json.dumps(oci.util.to_dict(resp.data), indent=2))
PY
Now run it (replace placeholders):
python3 vision_analyze.py <namespace> <bucket_name> <object_name> OBJECT_DETECTION
Examples:
– Object detection:
bash
python3 vision_analyze.py mynamespace vision-lab-bucket-123 photo.jpg OBJECT_DETECTION
– OCR/text detection:
bash
python3 vision_analyze.py mynamespace vision-lab-bucket-123 screenshot.png TEXT_DETECTION
Expected outcome: You receive a JSON response containing detected objects or detected text with confidence scores and (for detection) bounding box coordinates.
Step 6: Interpret results and store them (optional)
For a quick lab, you can save the JSON:
python3 vision_analyze.py <namespace> <bucket_name> <object_name> TEXT_DETECTION > vision_result.json
ls -lh vision_result.json
head -n 40 vision_result.json
Expected outcome: You have a local vision_result.json artifact you can use in downstream steps.
Validation
Use this checklist:
- API call succeeded (no authentication or authorization errors).
- JSON output includes: – A list of detections/labels/text lines (structure depends on feature). – Confidence scores (typically floats).
- Results make sense for your test image. – For OCR: confirm expected text appears. – For detection: confirm detected object labels and rough bounding locations.
If you need deeper validation: – Repeat with a second image. – Compare OCR results against known ground truth. – Measure false positives and confidence thresholds.
Troubleshooting
Common issues and fixes:
-
ModuleNotFoundError: No module named 'oci.ai_vision'– Fix: upgrade SDK again:bash python3 -m pip install --upgrade --user oci– If it still fails, your Cloud Shell image may be pinned. Use a virtual environment or consult the current SDK docs. Verify in official docs:- SDK docs: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/overview.htm
-
401 Unauthorized / signing errors – In Cloud Shell, auth should work, but if you use a local machine:
- Ensure
~/.oci/configand API key are configured. - Ensure the key file path is correct and readable.
- Consider using instance principals for OCI compute workloads.
- Ensure
-
403 Forbidden – IAM policy issue. Ensure you have permissions to:
- Use Vision
- Read the object in Object Storage
- Work with your admin to validate policies for your compartment.
-
404 Object not found – Check
namespace,bucket, andobject_nameexactly (case-sensitive). – Confirm the object exists and is in the same region. -
Throttling / rate limit errors – Implement retries with exponential backoff. – Reduce concurrency. – Use async/batch patterns for high volume (verify job APIs in docs).
-
Unexpected/low-quality detection – Use a clearer image (higher resolution, less blur). – Try a different feature (classification vs detection). – Consider custom training (if supported and justified).
Cleanup
To avoid ongoing storage charges:
- In the OCI Console, go to your bucket.
- Delete the uploaded image object.
- Delete the bucket.
If you created any additional resources (functions, policies, or databases), remove them according to your organization’s change process.
Expected outcome: No lab resources remain that could incur ongoing costs.
11. Best Practices
Architecture best practices
- Prefer event-driven pipelines (Object Storage → Events → Functions) for scalable ingestion.
- Separate concerns:
- Ingestion service
- Vision analysis worker
- Results store + downstream actions
- Store raw images and analysis results separately; retain only what you need.
IAM/security best practices
- Use least privilege policies:
- Allow only required compartments.
- Use
userather thanmanagewhere possible. - Prefer dynamic groups + instance principals for OCI workloads instead of long-lived user keys.
- Control Object Storage access tightly; treat images as potentially sensitive data.
Cost best practices
- Don’t run multiple Vision features unless required.
- Batch and cache results; avoid re-processing.
- Use Object Storage lifecycle rules to expire or archive old images.
Performance best practices
- Keep processing regional to minimize latency.
- Use asynchronous patterns for throughput and resilience.
- Apply reasonable image preprocessing (crop/resize) if it improves accuracy and reduces payload sizes.
Reliability best practices
- Add retries with exponential backoff.
- Implement idempotency in your pipeline (same object shouldn’t produce duplicate DB writes).
- Use dead-letter patterns for failures (queue or table of failed objects).
Operations best practices
- Log request IDs and correlation IDs for support.
- Monitor error rates, latency, and queue depth (if using streaming/queues).
- Track model/version changes (for custom workflows) and re-validate accuracy after updates.
Governance/tagging/naming best practices
- Use consistent resource naming:
env-app-purpose-region- Apply tags:
costCenterenvironment(dev/test/prod)dataClassification
12. Security Considerations
Identity and access model
- Vision uses OCI IAM for authentication/authorization.
- Prefer workload identities (instance/resource principals) over user keys in production.
- Ensure Object Storage access is scoped:
- Only the bucket(s) needed
- Only read access when possible
Encryption
- Data in transit: HTTPS.
- Data at rest: Object Storage supports encryption at rest; for Vision’s internal handling, confirm service-specific statements in official docs.
Network exposure
- If calling Vision from private subnets:
- Use controlled egress (NAT gateway, firewalls/proxies).
- If exposing an API to the internet:
- Put API Gateway in front, use auth (JWT/OAuth/custom authorizers), and rate limiting.
Secrets handling
- Do not store OCI keys in code repositories.
- Use OCI Vault for secrets if you must store credentials (though instance principals are better).
Audit/logging
- Enable and monitor OCI Audit.
- Forward logs to a SIEM if required by your compliance program.
Compliance considerations
- Images may contain PII/PHI.
- Implement:
- Data minimization
- Retention limits
- Access logging
- Encryption controls
- Confirm regional processing requirements with legal/compliance.
Common security mistakes
- Overbroad IAM policies (e.g., tenancy-wide manage permissions)
- Public buckets or overly permissive pre-authenticated requests
- Storing extracted text (OCR) without classification/retention controls
- Shipping images out of region without approval
Secure deployment recommendations
- Use private buckets.
- Apply least privilege IAM.
- Keep processing in-region.
- Store only required outputs; redact sensitive extracted text where possible.
13. Limitations and Gotchas
Because limits evolve, treat these as categories to validate in official docs:
- Image limits: maximum file size, resolution, and supported formats (verify exact values).
- Language support (OCR): not all languages/scripts may be supported equally.
- Accuracy variability: performance depends on lighting, angle, blur, occlusion, and domain similarity.
- Throttling: API rate limits can impact high-concurrency designs.
- Regional availability: not all OCI regions support Vision features equally.
- Cost surprises:
- Reprocessing images
- Running multiple features per image
- Storing large image archives long-term
- Custom model constraints (if used):
- Data labeling effort
- Training time and evaluation requirements
- Model lifecycle governance
14. Comparison with Alternatives
In Oracle Cloud (nearest alternatives)
- OCI Data Science: when you want full control over model building/training and custom pipelines.
- Specialized document services (if your primary need is form/receipt extraction rather than general OCR): verify current OCI service lineup in your region.
- Custom self-managed inference on OCI GPU compute: for highly specialized workloads.
In other clouds
- AWS Rekognition
- Google Cloud Vision API
- Azure AI Vision (Computer Vision)
Open-source / self-managed
- OpenCV for classical CV and preprocessing
- Tesseract OCR for text extraction
- YOLO / Detectron2 / Segment Anything (self-managed) for detection/segmentation
Comparison table
| Option | Best For | Strengths | Weaknesses | When to Choose |
|---|---|---|---|---|
| Oracle Cloud Vision | OCI-native vision APIs with IAM/governance | Managed service, integrates with OCI, reduces ops burden | Feature availability varies by region; less control than self-managed ML | You’re on OCI and want managed classification/detection/OCR quickly |
| OCI Data Science | Custom ML end-to-end | Full flexibility, notebooks, pipelines | More engineering/ops; you manage model serving choices | You need custom architectures, bespoke training, and full control |
| AWS Rekognition | Vision features in AWS ecosystems | Mature integrations, broad adoption | Tied to AWS; cost model differs | Your platform is primarily AWS |
| Google Cloud Vision API | OCR and vision with Google ecosystem | Strong OCR reputation (validate for your use case) | Tied to GCP; pricing and limits vary | You’re on GCP and need deep OCR/vision features |
| Azure AI Vision | Microsoft ecosystem integration | Good integration with Azure services | Tied to Azure | Your apps/data are on Azure |
| OpenCV + Tesseract + self-hosted models | Maximum control / offline environments | Full control, no vendor lock-in | High ops burden, scaling and maintenance | You need on-prem/offline or custom models with full control |
15. Real-World Example
Enterprise example: Insurance claims photo triage
- Problem: An insurer receives thousands of claim photos daily and needs to route them quickly.
- Proposed architecture:
- Mobile/web uploads → Object Storage
- Events → Functions orchestrator
- Functions call Vision (classification/detection/OCR as needed)
- Results stored in a database and indexed for adjusters
- Audit + logging for governance
- Why Vision was chosen: Managed inference reduces time-to-market and operational complexity; OCI IAM aligns with enterprise governance.
- Expected outcomes:
- Faster routing and reduced manual triage workload
- Consistent metadata tagging
- Better search and reporting over claims imagery
Startup/small-team example: Marketplace auto-moderation support
- Problem: A small marketplace team needs to identify risky uploads without a large moderation staff.
- Proposed architecture:
- Upload API → Object Storage
- Serverless function calls Vision classification
- Items above a threshold go to manual review; others proceed
- Why Vision was chosen: Fast integration, minimal ops, pay-per-use economics.
- Expected outcomes:
- Reduced moderation queue volume
- Faster listing approvals
- Clear audit trail for moderation decisions (when combined with app logs)
16. FAQ
1) Is Vision the same as “OCI Vision” or “AI Vision”?
They typically refer to the same Oracle Cloud service for computer vision. Naming can differ across console, docs, and SDK modules. Always confirm the current service naming in Oracle’s official documentation.
2) Does Vision require me to manage GPUs?
For prebuilt inference APIs, you generally do not manage GPUs or inference servers. If you use custom training/hosting features (where available), the service may manage infrastructure behind the scenes while charging for the associated usage.
3) Can Vision read text from images (OCR)?
Vision commonly supports text detection/OCR. Language coverage, accuracy, and limits vary—verify supported languages, image constraints, and output schema in official docs.
4) Can I analyze images stored in Object Storage?
Yes, Object Storage is a common pattern. You provide the bucket/object location (subject to IAM permissions) or upload image bytes inline, depending on the API.
5) What IAM permissions do I need?
You need permissions to call Vision and to read input images (often in Object Storage). The exact IAM policy resource types should be verified in Vision IAM documentation for your tenancy.
6) How do I keep image data private?
Use private buckets, least privilege IAM, and avoid public access mechanisms unless required. For external ingestion, use authenticated uploads and short-lived access patterns.
7) Does Vision support asynchronous/batch jobs?
Many vision services provide async options for scale. If you need that, confirm the current Vision APIs for job submission and result retrieval in the official docs.
8) How do I estimate costs?
Identify your monthly image volume and which features you’ll call per image, then use the official AI services price list and OCI cost estimator. Add Object Storage and data transfer costs.
9) What’s the difference between image classification and object detection?
Classification assigns labels to the overall image; object detection finds and localizes objects with bounding boxes. Detection is usually more informative but may be more computationally intensive.
10) Is Vision suitable for real-time mobile apps?
It can be, but you must test latency, payload sizes, and error handling. For mobile, you may want a backend proxy (API Gateway + service) rather than calling Vision directly from the device.
11) Can I train a custom model?
Vision may support custom model training in some regions/tenancies. This typically requires labeled datasets and a project/model lifecycle. Verify the current custom model feature set in official docs.
12) How do I handle throttling?
Implement retries with exponential backoff, respect service limits, and use queues/async jobs for high-volume pipelines. Monitor error rates and request patterns.
13) Where should I store analysis results?
Store results as JSON in a database or object store. For search, index key fields (labels, confidence thresholds, extracted text) in a search engine or database text index.
14) How do I troubleshoot low accuracy?
Use better-quality images, refine preprocessing (crop/resize), adjust confidence thresholds, and evaluate whether your domain requires custom models. Always measure against a labeled test set.
15) Does Vision integrate with OCI Observability?
Control-plane actions are typically visible in OCI Audit. For metrics and logs, you’ll usually rely on application-level instrumentation; verify whether Vision publishes service metrics in your region.
17. Top Online Resources to Learn Vision
| Resource Type | Name | Why It Is Useful |
|---|---|---|
| Official documentation | Vision docs home: https://docs.oracle.com/en-us/iaas/vision/vision/ | Canonical feature set, limits, tutorials, and API details |
| API reference | OCI API Reference (Vision): https://docs.oracle.com/en-us/iaas/api/#/en/vision/ | Endpoint schemas, request/response models, auth requirements |
| Pricing | Oracle Cloud Price List: https://www.oracle.com/cloud/price-list/ | Official SKU-based pricing (region/contract dependent) |
| Cost estimation | OCI Cost Estimator: https://www.oracle.com/cloud/costestimator.html | Build a scenario-based estimate using official inputs |
| Free tier | Oracle Cloud Free Tier: https://www.oracle.com/cloud/free/ | Check whether Vision or related services have free allocations |
| SDKs | OCI SDK docs: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/overview.htm | Language SDK usage patterns and authentication guidance |
| Cloud Shell | Cloud Shell intro: https://docs.oracle.com/en-us/iaas/Content/API/Concepts/cloudshellintro.htm | Pre-authenticated environment for running labs quickly |
| Architecture references | OCI Solutions / Architecture Center: https://docs.oracle.com/en/solutions/ | Patterns for event-driven pipelines, storage, and governance |
| Videos (official) | Oracle Cloud Infrastructure YouTube: https://www.youtube.com/@OracleCloudInfrastructure | Product walkthroughs and service updates (search for Vision/AI services) |
| Samples | Oracle OCI GitHub org: https://github.com/oracle | Look for official SDK samples and reference implementations (verify Vision-specific repos) |
18. Training and Certification Providers
| Institute | Suitable Audience | Likely Learning Focus | Mode | Website URL |
|---|---|---|---|---|
| DevOpsSchool.com | DevOps engineers, cloud engineers, architects | OCI fundamentals, automation, platform practices around cloud services | Check website | https://www.devopsschool.com/ |
| ScmGalaxy.com | Beginners to intermediate engineers | DevOps/SCM foundations, tooling, and delivery practices that complement OCI workloads | Check website | https://www.scmgalaxy.com/ |
| CLoudOpsNow.in | Cloud operations and platform teams | Cloud ops, reliability, operationalizing services | Check website | https://cloudopsnow.in/ |
| SreSchool.com | SREs, operations, platform engineers | Reliability engineering, monitoring, incident response patterns applicable to OCI services | Check website | https://sreschool.com/ |
| AiOpsSchool.com | Ops + AI/automation practitioners | AIOps concepts, automation, operational analytics | Check website | https://aiopsschool.com/ |
19. Top Trainers
| Platform/Site | Likely Specialization | Suitable Audience | Website URL |
|---|---|---|---|
| RajeshKumar.xyz | DevOps/cloud training content | Engineers seeking practical training resources | https://rajeshkumar.xyz/ |
| devopstrainer.in | DevOps tooling and practices | Beginners to advanced DevOps learners | https://devopstrainer.in/ |
| devopsfreelancer.com | Freelance DevOps services/training platform | Teams seeking hands-on guidance | https://devopsfreelancer.com/ |
| devopssupport.in | DevOps support and enablement | Ops/DevOps teams needing implementation support | https://devopssupport.in/ |
20. Top Consulting Companies
| Company | Likely Service Area | Where They May Help | Consulting Use Case Examples | Website URL |
|---|---|---|---|---|
| cotocus.com | Cloud/DevOps consulting | Architecture, implementation, CI/CD, operational readiness | Building event-driven pipelines, IaC rollout, governance setup | https://cotocus.com/ |
| DevOpsSchool.com | Training + consulting | Delivery enablement, automation, DevOps transformation | Setting up secure OCI automation, building platform playbooks | https://www.devopsschool.com/ |
| DEVOPSCONSULTING.IN | DevOps consulting | Deployment automation, observability, cloud operations | Production hardening, monitoring strategy, incident process | https://devopsconsulting.in/ |
21. Career and Learning Roadmap
What to learn before Vision
- OCI fundamentals:
- Compartments, IAM users/groups/policies
- Regions and networking basics
- Object Storage basics:
- Buckets, objects, lifecycle policies
- API fundamentals:
- REST concepts, authentication, request/response handling
- Basic ML concepts:
- Classification vs detection vs OCR
- Precision/recall and confidence thresholds
What to learn after Vision
- Event-driven architectures:
- OCI Events, Functions, Notifications
- Data persistence and search:
- Databases for metadata and JSON
- Indexing strategies for extracted text
- MLOps (if you move into custom models):
- Dataset management, labeling, evaluation
- Model versioning and rollout strategies
- Security:
- OCI Vault
- Security zones (if used in your org)
- Audit and compliance reporting
Job roles that use Vision
- Cloud engineer (building pipelines and integrations)
- Solutions architect (designing AI-enriched systems)
- DevOps/SRE (operationalizing AI services)
- Data engineer (metadata enrichment pipelines)
- ML engineer (when using custom training workflows)
Certification path (if available)
Oracle certification programs change over time. Use Oracle University and OCI certification listings to find current tracks that cover AI services: – Oracle University: https://education.oracle.com/ – OCI certifications overview (verify current page): https://education.oracle.com/oracle-cloud-infrastructure-certification
Project ideas for practice
- Serverless pipeline: Object Storage → Function → Vision → DB
- OCR indexer: extract text and index into a searchable store
- Moderation workflow: threshold-based review queue with audit logging
- Cost dashboard: track images processed and estimate monthly spend by tags
22. Glossary
- Compartment (OCI): A logical container for organizing and isolating cloud resources with IAM policies.
- IAM Policy: Text-based rules that define who can do what in OCI (e.g., “Allow group X to use Y in compartment Z”).
- Object Storage Namespace: A tenancy-scoped identifier used in Object Storage addressing.
- Image Classification: Assigning labels to an entire image.
- Object Detection: Identifying and locating objects in an image, typically with bounding boxes.
- OCR (Optical Character Recognition): Extracting text from images.
- Confidence Score: A numeric value indicating the model’s estimated likelihood that a prediction is correct.
- Bounding Box: Coordinates defining the rectangle around a detected object or text region.
- Synchronous Inference: A request/response call where results return immediately.
- Asynchronous Job: A submitted task that completes later; results are retrieved after completion.
- Least Privilege: Security principle of granting only the minimum permissions necessary.
- Instance Principal: OCI authentication method for workloads running on OCI compute without embedding user keys.
- Dynamic Group: A set of OCI resources (instances, functions) grouped for IAM policies.
- Egress: Outbound network traffic leaving a region or cloud environment.
23. Summary
Vision on Oracle Cloud is a managed Analytics and AI service that analyzes images via APIs to return structured results such as labels, detected objects, and extracted text. It fits well when you want to add computer vision capabilities quickly without operating your own inference infrastructure, and when you want tight integration with OCI services like Object Storage, Functions, and IAM.
Cost is primarily driven by how many images you analyze and which features you run per image, plus storage and data transfer. Security depends on least-privilege IAM, private storage, careful handling of sensitive images/OCR outputs, and governance via Audit and logging.
Use Vision for OCI-native image understanding and scalable pipelines; consider OCI Data Science or self-managed alternatives when you need deep customization and full control. Next, deepen your skills by productionizing the lab into an event-driven pipeline and validating accuracy/cost with real workload data.