{"id":841,"date":"2026-04-16T08:59:33","date_gmt":"2026-04-16T08:59:33","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/oracle-cloud-speech-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics-and-ai\/"},"modified":"2026-04-16T08:59:33","modified_gmt":"2026-04-16T08:59:33","slug":"oracle-cloud-speech-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics-and-ai","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/oracle-cloud-speech-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics-and-ai\/","title":{"rendered":"Oracle Cloud Speech Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Analytics and AI"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Analytics and AI<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What this service is<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Speech in Oracle Cloud (OCI) is a managed AI service that converts spoken audio into text (speech-to-text) and can convert text into natural-sounding audio (text-to-speech), using Oracle-managed models and APIs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">One-paragraph simple explanation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">If you have call recordings, meetings, voice notes, or voice-enabled apps, Speech helps you turn audio into searchable text and summaries (when paired with other services), or generate spoken audio from text for IVR systems and voice assistants\u2014without building your own speech models.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">One-paragraph technical explanation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Speech is part of <strong>Oracle Cloud Infrastructure (OCI) AI Services<\/strong> in the <strong>Analytics and AI<\/strong> portfolio. It exposes regional HTTPS API endpoints and console workflows for speech recognition and speech synthesis. Typical implementations combine Speech with <strong>Object Storage<\/strong> for audio input\/output, <strong>IAM<\/strong> for access control, and optionally <strong>Functions<\/strong>, <strong>Events<\/strong>, <strong>Streaming<\/strong>, and <strong>API Gateway<\/strong> to build event-driven pipelines and production APIs.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What problem it solves<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Speech solves the operational and engineering burden of implementing speech recognition and synthesis at scale: model training, language support, audio format handling, performance tuning, scaling, and secure API operations. It provides a managed, policy-controlled way to add speech capabilities to applications and analytics pipelines.<\/p>\n\n\n\n<blockquote>\n<p>Service name note: In OCI documentation, this service is typically listed as <strong>Speech<\/strong> under <strong>AI Services<\/strong>. Some SDKs or APIs may refer to it as <strong>AI Speech<\/strong>. Verify the exact naming in your OCI Console in your region.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is Speech?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Official purpose<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Speech is an Oracle Cloud AI Service intended to:\n&#8211; <strong>Transcribe audio into text<\/strong> (speech-to-text) for analytics, search, compliance, and automation.\n&#8211; <strong>Synthesize speech from text<\/strong> (text-to-speech) for voice experiences such as IVR prompts, notifications, and voice-enabled applications.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">(Confirm the exact set of supported operations\u2014speech-to-text vs. text-to-speech\u2014in the official Speech documentation for your region and service version.)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Core capabilities (what it generally includes)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Common capabilities you will encounter in managed speech services on OCI include:\n&#8211; <strong>Speech-to-text transcription<\/strong> from audio sources (often via asynchronous jobs for files in Object Storage).\n&#8211; <strong>Text-to-speech synthesis<\/strong> to generate an audio file\/stream from input text.\n&#8211; <strong>Language\/locale selection<\/strong> for recognition or synthesis.\n&#8211; <strong>Structured outputs<\/strong> such as transcripts, timestamps, confidence, and metadata (availability varies\u2014verify in official docs).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Major components<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Speech solutions on Oracle Cloud typically involve these components:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Speech API \/ Console workflows<\/strong><\/li>\n<li>\n<p>The managed endpoints and console pages where you submit audio\/text and receive transcripts\/audio.<\/p>\n<\/li>\n<li>\n<p><strong>OCI Identity and Access Management (IAM)<\/strong><\/p>\n<\/li>\n<li>\n<p>Policies and authentication (user principals, instance principals, dynamic groups) that control who can call Speech and which compartments can be used.<\/p>\n<\/li>\n<li>\n<p><strong>OCI Object Storage (commonly used)<\/strong><\/p>\n<\/li>\n<li>\n<p>Frequently used as the source of audio files and the destination for transcription outputs or synthesized audio artifacts.<\/p>\n<\/li>\n<li>\n<p><strong>Optional orchestration services<\/strong><\/p>\n<\/li>\n<li><strong>Events<\/strong> and <strong>Functions<\/strong> for event-driven pipelines<\/li>\n<li><strong>API Gateway<\/strong> for controlled public API exposure<\/li>\n<li><strong>Streaming<\/strong> for ingestion pipelines (verify if your Speech workflow supports streaming recognition in your region)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Service type<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Managed AI API service<\/strong> (platform service)<\/li>\n<li>Accessed over <strong>HTTPS<\/strong> via console and APIs<\/li>\n<li>Integrated with OCI IAM and compartments<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scope: regional vs. global<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Speech is typically a <strong>regional service<\/strong> in OCI:\n&#8211; You select a <strong>region<\/strong> (for example, <code>us-ashburn-1<\/code>) where you call the service endpoint.\n&#8211; Resources such as jobs (if the service uses job resources) are generally <strong>compartment-scoped<\/strong> and <strong>region-scoped<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Always confirm:\n&#8211; Region availability for Speech\n&#8211; Supported languages\/locales by region<br\/>\nin the official documentation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How it fits into the Oracle Cloud ecosystem<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Speech commonly sits in the middle of OCI data and application architectures:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Inbound audio<\/strong>: Object Storage, Streaming, application uploads<\/li>\n<li><strong>Processing<\/strong>: Speech for transcription or synthesis<\/li>\n<li><strong>Downstream<\/strong>:<\/li>\n<li>Search and analytics in Autonomous Database, OpenSearch (if used), or data lakes<\/li>\n<li>Automation with Functions and integration services<\/li>\n<li>Governance through IAM, Audit, Logging, and tagging<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use Speech?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Faster insights from audio<\/strong>: Turn recorded calls and meetings into searchable text for discovery and analytics.<\/li>\n<li><strong>Better customer experience<\/strong>: Use text-to-speech for consistent IVR prompts and multilingual experiences (where supported).<\/li>\n<li><strong>Compliance and audit readiness<\/strong>: Transcripts support review workflows, QA, and evidence retention.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>No model hosting<\/strong>: You avoid running GPU infrastructure and model lifecycle management for speech models.<\/li>\n<li><strong>API-first integration<\/strong>: Integrate directly into apps, pipelines, and microservices.<\/li>\n<li><strong>Works with Object Storage<\/strong>: Enables batch workflows for large audio corpora.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Scales without capacity planning<\/strong>: Managed scaling (within service limits\/quotas).<\/li>\n<li><strong>Standard OCI governance<\/strong>: Compartments, IAM policies, tagging, Audit logs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>IAM-controlled access<\/strong>: Least-privilege access to Speech operations and related storage locations.<\/li>\n<li><strong>Data encryption<\/strong>: TLS in transit; Object Storage encryption at rest (and optional customer-managed keys in Vault for Object Storage\u2014verify configuration).<\/li>\n<li><strong>Auditability<\/strong>: OCI Audit records service API activity at the tenancy level.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Batch processing patterns<\/strong>: Suitable for large-scale transcription of many recordings.<\/li>\n<li><strong>Decoupled pipelines<\/strong>: Combine with Events\/Functions\/Queues to handle bursts and retries.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose it<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Choose Speech on Oracle Cloud when:\n&#8211; You already use OCI and want speech capabilities with native IAM and compartment governance.\n&#8211; You need a managed solution for transcription or speech synthesis without operating ML infrastructure.\n&#8211; Your architecture already stores audio in Object Storage and benefits from asynchronous processing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should not choose it<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Speech may not be the best fit when:\n&#8211; You require <strong>on-prem only<\/strong> processing with no cloud egress.\n&#8211; You require <strong>fully custom speech models<\/strong>, specialized domain tuning, or guaranteed features not documented for OCI Speech (verify customization support).\n&#8211; You need strict <strong>data residency<\/strong> in a region where Speech is not available.\n&#8211; You need ultra-low-latency streaming recognition and Speech in your region does not support it (verify).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is Speech used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Contact centers and customer support<\/strong><\/li>\n<li><strong>Healthcare<\/strong> (dictation, patient call routing) \u2014 subject to compliance requirements<\/li>\n<li><strong>Financial services<\/strong> (call monitoring, trading desk recordings) \u2014 compliance-heavy<\/li>\n<li><strong>Media and entertainment<\/strong> (captioning, indexing)<\/li>\n<li><strong>Education<\/strong> (lecture transcription, accessibility)<\/li>\n<li><strong>Retail<\/strong> (voice-based assistants, IVR)<\/li>\n<li><strong>Public sector<\/strong> (meeting minutes, case recordings) \u2014 check regional compliance<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Application engineering teams building voice features<\/li>\n<li>Data engineering teams building transcription pipelines<\/li>\n<li>Security\/compliance teams supporting discovery and review<\/li>\n<li>DevOps\/SRE teams operating event-driven workloads<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Batch transcription of stored audio<\/li>\n<li>On-demand transcription for uploaded voice notes<\/li>\n<li>Text-to-speech generation for IVR prompt libraries<\/li>\n<li>Analytics enrichment: transcription \u2192 NLP \u2192 reporting<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Event-driven: Object Storage upload \u2192 Event \u2192 Function \u2192 Speech job \u2192 results in Object Storage\/DB<\/li>\n<li>Pipeline-driven: Streaming ingestion \u2192 processing workers \u2192 storage<\/li>\n<li>API-based: API Gateway \u2192 Function \u2192 Speech \u2192 response to client<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Real-world deployment contexts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Production<\/strong>: governed compartments, strict IAM, logging\/audit, encryption, lifecycle rules for audio retention<\/li>\n<li><strong>Dev\/Test<\/strong>: smaller sample sets, cost controls, shorter retention policies, synthetic audio for testing<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Below are realistic scenarios where Oracle Cloud Speech commonly fits. (Confirm exact feature availability\u2014speech-to-text vs text-to-speech and streaming support\u2014in official docs.)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Contact center call transcription<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Thousands of daily call recordings are hard to search and analyze.<\/li>\n<li><strong>Why Speech fits<\/strong>: Batch transcription converts audio files into text artifacts you can index and analyze.<\/li>\n<li><strong>Example<\/strong>: Nightly pipeline transcribes new call recordings in Object Storage and stores transcripts for QA review.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) Compliance keyword search across recorded calls<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Compliance teams need to find calls that mention restricted phrases.<\/li>\n<li><strong>Why Speech fits<\/strong>: Transcripts enable keyword search; pair with NLP for entity extraction.<\/li>\n<li><strong>Example<\/strong>: Transcribe \u2192 run OCI Language (if used) \u2192 flag calls with sensitive terms.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Meeting minutes and action items pipeline<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Teams lose decisions and action items from meetings.<\/li>\n<li><strong>Why Speech fits<\/strong>: Speech-to-text provides the raw transcript; downstream services can summarize (outside Speech).<\/li>\n<li><strong>Example<\/strong>: Upload meeting audio \u2192 transcript stored \u2192 summarization performed by another tool\/service.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) Voice notes to searchable tickets (IT\/Field operations)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Field staff record voice notes; back office needs structured tickets.<\/li>\n<li><strong>Why Speech fits<\/strong>: Transcribe short notes and push text into ticket systems.<\/li>\n<li><strong>Example<\/strong>: Mobile app uploads audio \u2192 transcript \u2192 create a service ticket automatically.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) IVR prompt generation (text-to-speech)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Manually recording prompts is slow and inconsistent.<\/li>\n<li><strong>Why Speech fits<\/strong>: Text-to-speech can generate consistent voice prompts at scale.<\/li>\n<li><strong>Example<\/strong>: Generate an audio library for \u201cYour balance is\u2026\u201d prompts in supported locales.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Accessibility: captions for internal training videos<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Training videos need captions for accessibility and search.<\/li>\n<li><strong>Why Speech fits<\/strong>: Transcribe audio tracks to create caption text.<\/li>\n<li><strong>Example<\/strong>: Store video audio track \u2192 transcribe \u2192 create captions downstream.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Podcast indexing and chaptering<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Long audio content is difficult to navigate.<\/li>\n<li><strong>Why Speech fits<\/strong>: Transcripts with timestamps (if supported) can power chapter markers.<\/li>\n<li><strong>Example<\/strong>: Transcribe podcast episodes; build a search UI over transcript segments.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Voice-driven app features (dictation)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Users want to dictate messages instead of typing.<\/li>\n<li><strong>Why Speech fits<\/strong>: Speech-to-text converts dictated audio to text.<\/li>\n<li><strong>Example<\/strong>: App sends audio clip to backend; backend calls Speech and returns text.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Multilingual audio processing (where supported)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Calls arrive in multiple languages and need routing.<\/li>\n<li><strong>Why Speech fits<\/strong>: Select recognition language\/locale when transcribing; combine with language detection downstream.<\/li>\n<li><strong>Example<\/strong>: Detect\/choose locale \u2192 transcribe \u2192 route transcript to appropriate team.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Audio evidence processing for investigations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Investigators need fast review of audio evidence.<\/li>\n<li><strong>Why Speech fits<\/strong>: Transcripts reduce manual listen time; enable search.<\/li>\n<li><strong>Example<\/strong>: Transcribe recordings, store transcript with chain-of-custody metadata and access controls.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) Quality assurance scoring support<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: QA needs context for scoring calls.<\/li>\n<li><strong>Why Speech fits<\/strong>: Transcripts make it easy to review and annotate.<\/li>\n<li><strong>Example<\/strong>: QA tool loads transcript and highlights required script segments.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) Voice notifications (text-to-speech)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Systems need to place automated voice calls with dynamic text.<\/li>\n<li><strong>Why Speech fits<\/strong>: TTS can generate audio from templates.<\/li>\n<li><strong>Example<\/strong>: \u201cYour appointment is confirmed for\u2026\u201d audio generated per customer (verify usage constraints and telephony integration).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<blockquote>\n<p>The exact feature set can vary by region and service release. Use the official Speech docs to confirm supported operations, languages\/locales, audio formats, and limits.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 1: Speech-to-text transcription (batch\/job-based)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Converts audio recordings into text, typically using an asynchronous job workflow.<\/li>\n<li><strong>Why it matters<\/strong>: Most enterprise audio workloads are file-based (call recordings, meeting captures).<\/li>\n<li><strong>Practical benefit<\/strong>: Decouples processing from your app; you can submit many jobs and retrieve results later.<\/li>\n<li><strong>Limitations\/caveats<\/strong>:<\/li>\n<li>Supported audio formats, max duration, and file size limits apply (verify in official docs).<\/li>\n<li>Accuracy depends on audio quality and language\/locale selection.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 2: Text-to-speech (speech synthesis)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Generates spoken audio from input text, using selectable voices\/locales (where available).<\/li>\n<li><strong>Why it matters<\/strong>: Avoids manual recording; supports consistent UX.<\/li>\n<li><strong>Practical benefit<\/strong>: Generate prompt libraries and dynamic messages programmatically.<\/li>\n<li><strong>Limitations\/caveats<\/strong>:<\/li>\n<li>Voice inventory and locales vary (verify).<\/li>\n<li>Output audio format options and character limits apply (verify).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 3: OCI Console integration<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Lets you create and monitor speech jobs (where job resources exist) from the OCI Console.<\/li>\n<li><strong>Why it matters<\/strong>: Useful for first-time setup, debugging, and operational visibility.<\/li>\n<li><strong>Practical benefit<\/strong>: Faster onboarding than building everything via API first.<\/li>\n<li><strong>Limitations\/caveats<\/strong>:<\/li>\n<li>Some advanced automation still requires API\/SDK.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 4: Object Storage as input\/output (common pattern)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Uses OCI Object Storage buckets as a durable source for audio and destination for transcripts\/audio.<\/li>\n<li><strong>Why it matters<\/strong>: Enables repeatable, auditable pipelines and retention controls.<\/li>\n<li><strong>Practical benefit<\/strong>: Works well with lifecycle policies, encryption, and events.<\/li>\n<li><strong>Limitations\/caveats<\/strong>:<\/li>\n<li>Ensure IAM policies allow Speech workflows to access required buckets (model varies\u2014verify docs for how Speech accesses objects).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 5: SDK and REST API support<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Enables programmatic integration from applications and pipelines.<\/li>\n<li><strong>Why it matters<\/strong>: Production systems need automation.<\/li>\n<li><strong>Practical benefit<\/strong>: Integrate with CI\/CD, functions, microservices, and data workflows.<\/li>\n<li><strong>Limitations\/caveats<\/strong>:<\/li>\n<li>API shapes evolve; pin SDK versions and track release notes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 6: IAM-based access control (compartments\/policies)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Controls who can create jobs, read results, and access related resources.<\/li>\n<li><strong>Why it matters<\/strong>: Audio and transcripts may be sensitive.<\/li>\n<li><strong>Practical benefit<\/strong>: Least-privilege access and separation by compartment\/environment.<\/li>\n<li><strong>Limitations\/caveats<\/strong>:<\/li>\n<li>Mis-scoped policies are a top cause of failed jobs or \u201cnot authorized\u201d errors.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 7: Auditability via OCI Audit<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Records API calls against OCI services (including Speech operations) at the tenancy level.<\/li>\n<li><strong>Why it matters<\/strong>: Governance, forensics, and compliance.<\/li>\n<li><strong>Practical benefit<\/strong>: Central investigation trail: who did what, when, from where.<\/li>\n<li><strong>Limitations\/caveats<\/strong>:<\/li>\n<li>Audit records API calls, not necessarily content-level details.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 8: Tagging and compartment governance<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Enables cost tracking and operational organization.<\/li>\n<li><strong>Why it matters<\/strong>: Speech workloads can scale quickly, so governance prevents surprises.<\/li>\n<li><strong>Practical benefit<\/strong>: Track by app\/team\/environment; implement budget alerts.<\/li>\n<li><strong>Limitations\/caveats<\/strong>:<\/li>\n<li>Tag enforcement requires governance setup (Tag Defaults \/ IAM).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 9: Integration patterns with Functions\/Events (architecture feature)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Enables event-driven orchestration, for example on object upload.<\/li>\n<li><strong>Why it matters<\/strong>: Automates transcription without manual triggers.<\/li>\n<li><strong>Practical benefit<\/strong>: Serverless pipeline that scales with ingestion.<\/li>\n<li><strong>Limitations\/caveats<\/strong>:<\/li>\n<li>Ensure idempotency and retries; handle duplicate events.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Feature 10: (Verify) Streaming transcription support<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Real-time speech recognition for live audio streams.<\/li>\n<li><strong>Why it matters<\/strong>: Some apps need low latency.<\/li>\n<li><strong>Practical benefit<\/strong>: Enables near-real-time captions and assistants.<\/li>\n<li><strong>Limitations\/caveats<\/strong>:<\/li>\n<li>Not guaranteed in all regions\/service versions. <strong>Verify in official docs<\/strong> before designing for streaming.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">High-level service architecture<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Speech sits behind regional OCI endpoints. Your client (app, function, pipeline) authenticates with OCI IAM and sends requests over HTTPS. For file-based speech-to-text, you typically:\n1. Store audio in Object Storage.\n2. Create a transcription job referencing the audio object.\n3. Speech processes the job asynchronously.\n4. Results are stored back to Object Storage or returned via API (depending on workflow).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For text-to-speech, you typically:\n1. Send text and synthesis parameters to Speech.\n2. Receive audio output (immediate response or stored artifact, depending on API).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Request, data, and control flow (typical batch transcription)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Control plane<\/strong>: Create job, check status, list jobs.<\/li>\n<li><strong>Data plane<\/strong>: Audio input (Object Storage), transcript output (Object Storage).<\/li>\n<li><strong>Identity<\/strong>: AuthN via OCI request signing (user\/API key, instance principal, resource principal), AuthZ via IAM policies.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common integrations with related OCI services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Object Storage<\/strong>: audio source + transcript destination<\/li>\n<li><strong>Functions<\/strong>: orchestration and glue code<\/li>\n<li><strong>Events<\/strong>: trigger when new audio arrives<\/li>\n<li><strong>API Gateway<\/strong>: expose a controlled endpoint to clients<\/li>\n<li><strong>Vault<\/strong>: store secrets (if you must store API keys; prefer principals over long-lived keys)<\/li>\n<li><strong>Logging\/Audit\/Monitoring<\/strong>: operations and governance<\/li>\n<li><strong>Data integration targets<\/strong>: Autonomous Database, Data Lake patterns, OCI Search\/OpenSearch (if used), or downstream analytics tools<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Dependency services<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Speech itself is managed, but production designs almost always depend on:\n&#8211; IAM (policies, groups\/dynamic groups)\n&#8211; Object Storage (durable artifacts)\n&#8211; Optional serverless\/compute for orchestration<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/authentication model<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">OCI Speech requests are authenticated using standard OCI mechanisms:\n&#8211; <strong>User principal<\/strong> (API keys) for development\n&#8211; <strong>Instance principal<\/strong> (recommended for compute in OCI)\n&#8211; <strong>Resource principal<\/strong> (recommended for Functions)\nAuthorization is enforced by <strong>IAM policies<\/strong> in compartments.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Networking model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Speech is accessed through <strong>regional public HTTPS endpoints<\/strong>.<\/li>\n<li>From inside a VCN, you may need:<\/li>\n<li><strong>NAT Gateway<\/strong> (for private subnets) or<\/li>\n<li><strong>Internet Gateway<\/strong> (for public subnets)<\/li>\n<li>Some OCI services can be accessed via <strong>Service Gateway<\/strong> (private access to Oracle Services Network). Whether Speech is supported via Service Gateway can vary\u2014<strong>verify in official docs and your region\u2019s Service Gateway service list<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monitoring, logging, governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Audit<\/strong>: enable tenancy-wide tracking of API calls.<\/li>\n<li><strong>Logging<\/strong>: log your orchestrator (Functions\/Compute) application logs; store Speech job IDs for traceability.<\/li>\n<li><strong>Metrics<\/strong>: monitor job throughput, error rates, backlog, and downstream storage operations. Native Speech metrics availability may vary\u2014verify.<\/li>\n<li><strong>Tagging<\/strong>: tag buckets, functions, and any Speech resources (jobs) if supported.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Simple architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  U[User \/ App] --&gt;|Upload audio| OS[(OCI Object Storage)]\n  U --&gt;|Create transcription job| SP[Speech]\n  SP --&gt;|Read audio| OS\n  SP --&gt;|Write transcript| OS\n  U --&gt;|Download transcript| OS\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Production-style architecture diagram (Mermaid)<\/h3>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph VCN[Customer VCN]\n    subgraph Subnet[Private Subnet]\n      FN[OCI Functions \/ Worker]\n      DB[(Autonomous DB \/ Data Store)]\n    end\n    GW[NAT or Internet Gateway]\n  end\n\n  OS[(Object Storage: audio + transcripts)]\n  EV[Events]\n  APIG[API Gateway]\n  SP[Speech (Regional AI Service)]\n  AUD[OCI Audit]\n  LOG[OCI Logging]\n  VAULT[OCI Vault]\n\n  Client[Client Apps] --&gt;|Upload audio (pre-auth URL or API)| OS\n  OS --&gt; EV --&gt; FN\n  FN --&gt;|Create job \/ poll status| SP\n  SP --&gt;|Read audio| OS\n  SP --&gt;|Write transcript| OS\n  FN --&gt;|Persist metadata, pointers| DB\n  APIG --&gt; FN\n  FN --&gt; LOG\n  SP --&gt; AUD\n  FN --&gt; VAULT\n  FN --&gt; GW\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Notes:\n&#8211; Use <strong>pre-authenticated requests (PARs)<\/strong> for upload flows if appropriate, but govern carefully.\n&#8211; Store only pointers\/metadata in databases; keep large artifacts in Object Storage with lifecycle policies.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Account\/tenancy requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An active <strong>Oracle Cloud (OCI) tenancy<\/strong><\/li>\n<li>Ability to create resources in a compartment (or access to an existing compartment)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions \/ IAM roles<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You need permissions to:\n&#8211; Use Speech\n&#8211; Use Object Storage (create bucket, upload objects, read objects, write objects)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">OCI uses policy statements. Exact verbs\/resource types for Speech can differ by service API group and naming. Start from official docs and adapt.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A typical policy approach (example pattern; <strong>verify exact policy syntax in official Speech docs<\/strong>):\n&#8211; Allow a group to manage Speech resources in a compartment\n&#8211; Allow the same group to manage Object Storage buckets\/objects used for input\/output<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Also plan for:\n&#8211; <strong>Dynamic Group + policy<\/strong> if using Functions or Compute instance principals.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Billing requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Speech is a paid OCI service under usage-based pricing (unless your tenancy has free allowances).<\/li>\n<li>Ensure your tenancy has a valid payment method or Oracle Universal Credits.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">CLI\/SDK\/tools needed (for the lab below)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For a console-first lab, tooling is minimal:\n&#8211; Web browser access to OCI Console\n&#8211; Optional but useful:\n  &#8211; <strong>OCI CLI<\/strong> (for automation) \u2014 verify Speech command groups in your installed CLI version\n  &#8211; <strong>Python<\/strong> + OCI SDK (for integration)\n  &#8211; <code>ffmpeg<\/code> or any audio tool (only if you need to convert audio formats)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Speech may not be available in all OCI regions.<\/li>\n<li>Choose a region where Speech is listed in the Console under <strong>Analytics &amp; AI \u2192 AI Services \u2192 Speech<\/strong> (or equivalent).<\/li>\n<li>Verify supported languages\/locales in that region.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas\/limits<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Expect limits such as:\n&#8211; Maximum audio duration per request\/job\n&#8211; Maximum file size\n&#8211; Concurrency (jobs in progress)\n&#8211; Rate limits (API calls)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Exact limits are service- and region-specific. <strong>Verify in official docs<\/strong> and request quota increases via OCI support if needed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Object Storage<\/strong> bucket(s) for input and output (recommended)<\/li>\n<li><strong>IAM policies<\/strong> for users or dynamic groups<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<blockquote>\n<p>Do not rely on memorized numbers for pricing. Speech pricing can change, and rates can vary by region or contract. Use official pricing pages and your tenancy\u2019s cost tools for accurate estimates.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Current pricing model (typical dimensions)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Speech services are usually priced by one or more of the following:\n&#8211; <strong>Speech-to-text<\/strong>: priced by <strong>audio duration processed<\/strong> (for example, per minute\/hour of audio)\n&#8211; <strong>Text-to-speech<\/strong>: priced by <strong>characters synthesized<\/strong> (for example, per million characters)\n&#8211; Potential additional dimensions (verify):\n  &#8211; Model tier\/quality\n  &#8211; Real-time vs batch processing\n  &#8211; Customization features (if offered)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Free Tier (if applicable)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">OCI often offers limited <strong>Free Tier<\/strong> usage for some services, but it varies by service and time.\n&#8211; Check the official Oracle Cloud Free Tier and Speech pricing pages.\n&#8211; If a free allowance exists, validate:\n  &#8211; Whether it applies to your tenancy type\n  &#8211; Which regions it applies to\n  &#8211; Monthly caps and overage behavior<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Cost drivers<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Direct cost drivers:\n&#8211; Total hours\/minutes of audio transcribed\n&#8211; Total characters synthesized\n&#8211; Number of jobs\/requests (if pricing includes request charges)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Indirect\/hidden cost drivers:\n&#8211; <strong>Object Storage<\/strong>:\n  &#8211; Storage of raw audio and transcripts\n  &#8211; Requests (PUT\/GET\/LIST) at scale\n&#8211; <strong>Network egress<\/strong>:\n  &#8211; Downloading large transcripts\/audio from OCI to the internet or another cloud\n&#8211; <strong>Orchestration compute<\/strong>:\n  &#8211; Functions invocations or Compute instances used for automation\n&#8211; <strong>Observability<\/strong>:\n  &#8211; Logging ingestion and retention costs (depending on OCI Logging configuration)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Network\/data transfer implications<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ingress into OCI is generally not billed, but <strong>egress<\/strong> (data leaving OCI) can be.<\/li>\n<li>If your workflow downloads audio\/transcripts to on-prem or another cloud, model egress costs.<\/li>\n<li>Prefer processing within OCI and only exporting derived results when necessary.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Transcribe only what you need<\/strong>:<\/li>\n<li>Skip silence-heavy segments (pre-processing) if your workflow allows it.<\/li>\n<li><strong>Choose appropriate audio quality<\/strong>:<\/li>\n<li>Very high bitrates may increase storage and transfer without improving accuracy.<\/li>\n<li><strong>Lifecycle policies<\/strong>:<\/li>\n<li>Expire raw audio after retention requirements are met.<\/li>\n<li>Transition objects to lower-cost tiers if appropriate (verify Object Storage tiers in your region).<\/li>\n<li><strong>Batch and automate<\/strong>:<\/li>\n<li>Reduce human-in-the-loop steps and reprocessing.<\/li>\n<li><strong>Tag resources<\/strong>:<\/li>\n<li>Use cost tracking by project\/environment; set budgets and alerts in OCI.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (no fabricated prices)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">A starter pilot typically includes:\n&#8211; A few hours of audio transcription (development\/testing)\n&#8211; Object Storage for a few GB of audio and transcripts\n&#8211; Minimal orchestration (manual console job creation or a small Function)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To estimate:\n1. Measure total audio minutes for your pilot.\n2. Multiply by the regional Speech-to-text unit rate (from official pricing).\n3. Add Object Storage monthly storage + request costs.\n4. Add any logging and egress costs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Use:\n&#8211; Official pricing page: https:\/\/www.oracle.com\/cloud\/price-list\/ (navigate to AI Services \/ Speech)\n&#8211; OCI cost estimator: https:\/\/www.oracle.com\/cloud\/costestimator.html\n&#8211; OCI Cost Analysis (in Console): <strong>Billing &amp; Cost Management \u2192 Cost Analysis<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In production, the largest cost multipliers are:\n&#8211; Audio volume growth (daily call recordings, meeting archives)\n&#8211; Reprocessing (bug fixes that trigger retranscription)\n&#8211; Egress to external systems\n&#8211; Long retention of raw audio<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Recommended production practice:\n&#8211; Track \u201ccost per hour of audio processed\u201d\n&#8211; Monitor daily ingestion volume\n&#8211; Enforce retention and reprocessing controls<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This lab uses a <strong>console-first<\/strong> workflow to avoid SDK\/CLI mismatches and make it executable for beginners. You will upload an audio file to Object Storage and create a Speech transcription job (speech-to-text). The job will write transcript output back to Object Storage.<\/p>\n\n\n\n<blockquote>\n<p>If your region\u2019s Speech console UI differs or the job workflow is not available, use the official Speech API\/SDK docs for the equivalent steps.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Transcribe an audio file stored in OCI Object Storage using <strong>Speech<\/strong> (Oracle Cloud, Analytics and AI) and retrieve the resulting transcript from Object Storage.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You will:\n1. Create (or choose) a compartment for the lab.\n2. Create an Object Storage bucket for input and output.\n3. Upload a short audio file.\n4. Create a Speech transcription job referencing that audio.\n5. Monitor job status until completion.\n6. Download and review the transcript output.\n7. Clean up resources to avoid ongoing costs.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Choose a compartment and confirm region<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Sign in to the OCI Console.<\/li>\n<li>In the top region selector, choose a region where <strong>Speech<\/strong> is available (look for it under <strong>Analytics &amp; AI<\/strong>).<\/li>\n<li>Open the navigation menu and go to <strong>Identity &amp; Security \u2192 Compartments<\/strong>.<\/li>\n<li>Either:\n   &#8211; Select an existing compartment for labs, or\n   &#8211; Create a new compartment (for example: <code>lab-speech-dev<\/code>).<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; You have a compartment selected for all resources in this lab.\n&#8211; You have confirmed a region where Speech appears in the Console.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; In the Console, ensure the compartment is visible in the compartment selector.\n&#8211; Under <strong>Analytics &amp; AI<\/strong>, verify you can find <strong>AI Services \u2192 Speech<\/strong> (naming can vary slightly).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Ensure you have required IAM permissions<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">If you are a tenancy admin, you likely already have access. If not, coordinate with your admin.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You typically need permissions to:\n&#8211; Manage Speech resources (jobs\/requests)\n&#8211; Manage Object Storage buckets and objects in the lab compartment<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; You can create buckets and access the Speech service pages.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; Try to create a bucket (next step). If you get \u201cnot authorized,\u201d you need an IAM policy adjustment.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Common fix<\/strong>\n&#8211; Ask your OCI admin to grant least-privilege access for:\n  &#8211; Speech in the target compartment\n  &#8211; Object Storage in the target compartment<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Because policy syntax for Speech may be service-specific, use the official Speech IAM documentation to construct the exact statements.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Create an Object Storage bucket<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to <strong>Storage \u2192 Object Storage &amp; Archive Storage \u2192 Buckets<\/strong>.<\/li>\n<li>Select your lab compartment.<\/li>\n<li>Click <strong>Create Bucket<\/strong>.<\/li>\n<li>Provide:\n   &#8211; Bucket name: <code>speech-lab-bucket<\/code> (must be unique within the namespace)\n   &#8211; Default storage tier: choose <strong>Standard<\/strong> for simplicity (optimize later)<\/li>\n<li>Create the bucket.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; A bucket exists for audio input and transcript output.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; Open the bucket and confirm it appears with status \u201cAvailable.\u201d<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Upload a sample audio file<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You need an audio file that Speech supports (format\/codec constraints apply\u2014verify the supported formats in Speech docs). Use a short clip (10\u201360 seconds) for fast turnaround.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Open your bucket: <code>speech-lab-bucket<\/code>.<\/li>\n<li>Click <strong>Upload<\/strong>.<\/li>\n<li>Upload your file, for example:\n   &#8211; <code>sample.wav<\/code> (or another supported format)<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; The audio object is stored in Object Storage.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; You can see the object listed in the bucket.\n&#8211; Note the object name exactly (case-sensitive).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Tip (optional)<\/strong>\n&#8211; Keep your audio clear: minimal background noise, single speaker, normal speaking pace.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Create a Speech transcription job (speech-to-text)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Go to <strong>Analytics &amp; AI \u2192 AI Services \u2192 Speech<\/strong>.<\/li>\n<li>Find <strong>Transcription Jobs<\/strong> (or equivalent page).<\/li>\n<li>Click <strong>Create transcription job<\/strong>.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Fill in the job details (names may vary by console version):\n&#8211; <strong>Compartment<\/strong>: your lab compartment\n&#8211; <strong>Input location<\/strong>: Object Storage\n  &#8211; Namespace: your tenancy namespace (shown in Object Storage)\n  &#8211; Bucket: <code>speech-lab-bucket<\/code>\n  &#8211; Object: <code>sample.wav<\/code> (your uploaded file)\n&#8211; <strong>Output location<\/strong>: Object Storage\n  &#8211; Bucket: <code>speech-lab-bucket<\/code>\n  &#8211; Output prefix\/folder: <code>transcripts\/<\/code> (recommended)\n&#8211; <strong>Language\/locale<\/strong>: choose the correct language for your audio (accuracy depends on this)\n&#8211; Optional settings (only if present in your UI and you understand them):\n  &#8211; Punctuation\/capitalization\n  &#8211; Time offsets \/ word timestamps\n  &#8211; Profanity filtering<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Submit\/create the job.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; A transcription job resource is created and enters a queued\/running state.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; You can see the job in the job list with a status such as <strong>ACCEPTED<\/strong>, <strong>IN_PROGRESS<\/strong>, or <strong>RUNNING<\/strong> (exact terms vary).\n&#8211; You can open the job details and see the referenced Object Storage input.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Monitor job status and retrieve output<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In the Speech job list, click the job to open details.<\/li>\n<li>Monitor status until it changes to <strong>SUCCEEDED<\/strong> (or equivalent).<\/li>\n<li>Once complete, go back to <strong>Object Storage \u2192 Buckets \u2192 speech-lab-bucket<\/strong>.<\/li>\n<li>Navigate to the output prefix (for example <code>transcripts\/<\/code>).<\/li>\n<li>Download the output file(s).<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Depending on the service output format, you might receive:\n&#8211; Plain text transcript\n&#8211; JSON containing transcript text and metadata (timestamps, confidence, segments)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; You can download and read the transcript output.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; The transcript content matches the spoken content in your audio clip.\n&#8211; The job details show a completed status without errors.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 7 (Optional): Basic operationalization (repeatable structure)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">If you plan to process many files, establish conventions now:\n&#8211; Input prefix: <code>audio\/incoming\/<\/code>\n&#8211; Output prefix: <code>audio\/transcripts\/<\/code>\n&#8211; Add metadata tags to objects (caller ID, date, language) if your governance allows it\n&#8211; Use Object Storage lifecycle rules for retention<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; Your bucket layout supports scaling from a manual lab to a pipeline.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Use this checklist:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>[ ] You can see the uploaded audio object in Object Storage.<\/li>\n<li>[ ] A Speech transcription job was created successfully.<\/li>\n<li>[ ] Job status reached <strong>SUCCEEDED<\/strong>.<\/li>\n<li>[ ] Transcript output exists in the output prefix.<\/li>\n<li>[ ] Transcript content is readable and correct for the audio.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">If any item fails, proceed to Troubleshooting.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Error: \u201cNot authorized\u201d when creating job or accessing objects<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Cause<\/strong>\n&#8211; Missing IAM permissions for Speech and\/or Object Storage in the compartment.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Fix<\/strong>\n&#8211; Confirm:\n  &#8211; You are operating in the correct compartment and region.\n  &#8211; Your group has permissions for Speech operations.\n  &#8211; Your group can read the input object and write to the output location.\n&#8211; Use OCI policy builder guidance and Speech IAM docs to craft least-privilege policies.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Error: Job fails quickly with invalid input \/ unsupported format<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Cause<\/strong>\n&#8211; Audio file codec\/container not supported, or file is corrupted.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Fix<\/strong>\n&#8211; Convert audio to a supported format (verify required format in docs).\n&#8211; Use a short WAV\/FLAC (commonly supported in many speech engines, but <strong>verify for OCI Speech<\/strong>).\n&#8211; Re-upload and retry.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Error: Output not found in Object Storage<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Cause<\/strong>\n&#8211; Output prefix\/bucket mismatch, or job wrote output under a different name.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Fix<\/strong>\n&#8211; Re-check job details for the output location and exact object names.\n&#8211; Search the bucket by prefix or last modified time.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Error: Job stuck in running\/queued for a long time<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Cause<\/strong>\n&#8211; Service-side backlog, large input file, or quota limits.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Fix<\/strong>\n&#8211; Try a smaller file to validate the workflow.\n&#8211; Check quotas\/limits for concurrency.\n&#8211; If persistent, open an OCI support request.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Low transcription accuracy<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Cause<\/strong>\n&#8211; Wrong language\/locale selected, noisy audio, multiple speakers, crosstalk.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Fix<\/strong>\n&#8211; Choose correct locale.\n&#8211; Use higher quality microphone input.\n&#8211; Split channels\/speakers if your upstream pipeline supports it.\n&#8211; Consider post-processing with domain dictionaries (if supported\u2014verify).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To avoid ongoing costs, clean up:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Delete transcription job(s)<\/strong> (if the console retains job resources):\n   &#8211; Speech \u2192 Transcription Jobs \u2192 select job \u2192 delete (if supported)<\/li>\n<li><strong>Delete transcript output objects<\/strong>:\n   &#8211; Object Storage bucket \u2192 delete objects under <code>transcripts\/<\/code><\/li>\n<li><strong>Delete the uploaded audio object<\/strong>:\n   &#8211; Delete <code>sample.wav<\/code><\/li>\n<li><strong>Delete the bucket<\/strong>:\n   &#8211; Delete <code>speech-lab-bucket<\/code> (must be empty)<\/li>\n<li>Optionally delete the <strong>compartment<\/strong> if you created a dedicated one (only if you\u2019re sure nothing else is inside).<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Prefer asynchronous batch for recorded audio<\/strong>: It\u2019s resilient and cost-predictable.<\/li>\n<li><strong>Use Object Storage as the system of record<\/strong>: Store raw audio and derived transcripts with clear prefixes and lifecycle rules.<\/li>\n<li><strong>Design idempotent pipelines<\/strong>: If an event triggers twice, your function should not create duplicate jobs or overwrite outputs unexpectedly.<\/li>\n<li><strong>Store metadata separately<\/strong>: Keep job IDs, status, and pointers to objects in a database table for traceability.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Least privilege<\/strong>:<\/li>\n<li>Separate \u201csubmit jobs\u201d from \u201cread results\u201d if teams differ.<\/li>\n<li><strong>Use dynamic groups for automation<\/strong>:<\/li>\n<li>Prefer instance principals\/resource principals over embedding API keys.<\/li>\n<li><strong>Separate environments<\/strong>:<\/li>\n<li>Dev\/test\/prod in separate compartments (or tenancies), with separate buckets and policies.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Control reprocessing<\/strong>:<\/li>\n<li>Use object metadata or a manifest DB to avoid retranscribing the same file.<\/li>\n<li><strong>Retention rules<\/strong>:<\/li>\n<li>Keep transcripts longer than raw audio if allowed; raw audio is heavier and may be more sensitive.<\/li>\n<li><strong>Budget alerts<\/strong>:<\/li>\n<li>Set OCI budgets\/alerts tied to tags or compartments.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Pre-process audio<\/strong>:<\/li>\n<li>Normalize volume, remove long silences, ensure correct sample rate\/encoding (within supported formats).<\/li>\n<li><strong>Parallelism with care<\/strong>:<\/li>\n<li>Throttle job submission to avoid API rate limits and quota breaches.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Retry with backoff<\/strong>:<\/li>\n<li>If you automate via API, handle transient errors.<\/li>\n<li><strong>Dead-letter patterns<\/strong>:<\/li>\n<li>Store failed job inputs in a <code>failed\/<\/code> prefix for manual review.<\/li>\n<li><strong>Track job lifecycle<\/strong>:<\/li>\n<li>Store status and timestamps; alert on long-running jobs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Central logging<\/strong>:<\/li>\n<li>Log job creation, job IDs, object names, and correlation IDs.<\/li>\n<li><strong>Audit reviews<\/strong>:<\/li>\n<li>Periodically review access to sensitive buckets and speech operations.<\/li>\n<li><strong>Runbooks<\/strong>:<\/li>\n<li>Document how to re-run jobs, rotate credentials (if used), and validate outputs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Naming conventions<\/strong>:<\/li>\n<li>Buckets: <code>speech-&lt;env&gt;-&lt;team&gt;<\/code><\/li>\n<li>Prefixes: <code>audio\/incoming\/<\/code>, <code>audio\/processed\/<\/code>, <code>audio\/transcripts\/<\/code><\/li>\n<li><strong>Tags<\/strong>:<\/li>\n<li><code>CostCenter<\/code>, <code>App<\/code>, <code>Environment<\/code>, <code>DataClassification<\/code><\/li>\n<li><strong>Data classification<\/strong>:<\/li>\n<li>Treat transcripts as sensitive; they often contain more searchable sensitive content than audio.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Authentication<\/strong>: OCI request signing via user\/API key, instance principal, or resource principal.<\/li>\n<li><strong>Authorization<\/strong>: IAM policies scoped to compartments and resource types.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Recommendations:\n&#8211; For production automation, use <strong>resource principals (Functions)<\/strong> or <strong>instance principals (Compute)<\/strong>.\n&#8211; Avoid long-lived user API keys in apps.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>In transit<\/strong>: HTTPS\/TLS to Speech endpoints.<\/li>\n<li><strong>At rest<\/strong>:<\/li>\n<li>Object Storage encrypts objects at rest by default.<\/li>\n<li>For higher control, use <strong>customer-managed keys<\/strong> with OCI Vault for Object Storage encryption (verify your bucket encryption configuration options).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Speech endpoints are typically public regional endpoints.<\/li>\n<li>If your workloads run in private subnets, plan for controlled egress via NAT Gateway and egress rules.<\/li>\n<li>If you require private connectivity, check whether Speech supports access via Service Gateway\/Oracle Services Network in your region\u2014<strong>verify<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer principals (resource\/instance) rather than storing API keys.<\/li>\n<li>If secrets are unavoidable:<\/li>\n<li>Store them in <strong>OCI Vault<\/strong><\/li>\n<li>Rotate regularly<\/li>\n<li>Restrict access with IAM<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enable and use <strong>OCI Audit<\/strong> for governance.<\/li>\n<li>Log:<\/li>\n<li>Job submission events<\/li>\n<li>Object names and output locations<\/li>\n<li>Error payloads (careful: do not log sensitive transcript content)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Speech workloads often process regulated data (PII\/PHI\/PCI) in voice form.\n&#8211; Confirm:\n  &#8211; Data residency requirements (region)\n  &#8211; Retention and deletion policies\n  &#8211; Access logging and least-privilege enforcement\n  &#8211; Any organizational requirements for AI processing<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If you have strict regulatory constraints, involve your security\/compliance team early.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Buckets with overly broad access (public buckets or wide group policies)<\/li>\n<li>Storing raw audio longer than required<\/li>\n<li>Logging transcript content in application logs<\/li>\n<li>Using a shared \u201cadmin\u201d API key in production code<\/li>\n<li>No separation between dev\/test\/prod data<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Compartment separation + least-privilege policies<\/li>\n<li>Dedicated buckets per environment<\/li>\n<li>Lifecycle rules for deletion\/archival<\/li>\n<li>Vault for secrets + principal-based auth<\/li>\n<li>Continuous review of Audit logs and IAM policies<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<blockquote>\n<p>Treat this section as a checklist to validate against the official Speech documentation for your region.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Known limitations to verify<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Supported languages\/locales<\/strong>: Not all languages may be supported; availability can vary by region.<\/li>\n<li><strong>Supported audio formats\/codecs<\/strong>: Container (WAV\/MP3\/etc.) and codec constraints apply.<\/li>\n<li><strong>Maximum audio duration\/file size<\/strong>: Large files may fail or require splitting.<\/li>\n<li><strong>Concurrency limits<\/strong>: Number of jobs in progress.<\/li>\n<li><strong>Rate limits<\/strong>: API calls per second\/minute.<\/li>\n<li><strong>Output formats<\/strong>: Plain text vs JSON; timestamp granularity (segment-level vs word-level).<\/li>\n<li><strong>Streaming support<\/strong>: If you need real-time transcription, confirm the service supports it in your region.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Regional constraints<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Speech may not be enabled in every OCI region.<\/li>\n<li>DR planning must consider region pairs and service availability.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing surprises<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Retranscribing the same audio repeatedly due to pipeline bugs.<\/li>\n<li>Large egress when exporting transcripts\/audio out of OCI.<\/li>\n<li>Storing raw audio indefinitely.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compatibility issues<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Audio recorded from phones can have variable sample rates and noisy backgrounds.<\/li>\n<li>Stereo vs mono channel handling can impact results; you may need pre-processing.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational gotchas<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>IAM misconfiguration is the #1 cause of failures.<\/li>\n<li>Object naming\/prefix mistakes lead to \u201coutput not found\u201d confusion.<\/li>\n<li>Without a metadata store, it\u2019s hard to track which files were processed and when.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Migration challenges<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If migrating from another speech provider:<\/li>\n<li>Transcript formatting differences (timestamps, punctuation)<\/li>\n<li>Confidence scoring differences<\/li>\n<li>Locale\/voice availability differences<\/li>\n<li>Re-validation effort for compliance workflows<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Vendor-specific nuances (OCI)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OCI governance is compartment-centric; design your folder\/prefix strategy accordingly.<\/li>\n<li>Service gateway\/private access support for Speech must be verified per region.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Speech is one component in the Oracle Cloud Analytics and AI stack. Alternatives include other OCI services for adjacent tasks, other cloud providers\u2019 speech services, and self-managed open-source stacks.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Comparison table<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Oracle Cloud Speech<\/strong><\/td>\n<td>OCI-native speech-to-text \/ text-to-speech workloads<\/td>\n<td>Native IAM\/compartments, integrates well with Object Storage and OCI ops tooling<\/td>\n<td>Feature\/language availability varies by region; may not match specialized needs<\/td>\n<td>You want managed speech in OCI with strong governance<\/td>\n<\/tr>\n<tr>\n<td><strong>Oracle Digital Assistant (ODA)<\/strong><\/td>\n<td>Conversational experiences and chat\/voice bots<\/td>\n<td>Higher-level bot platform; integrates with channels and enterprise apps<\/td>\n<td>Not a drop-in replacement for raw transcription pipelines<\/td>\n<td>You need end-to-end conversational flows, not just transcription<\/td>\n<\/tr>\n<tr>\n<td><strong>OCI Language \/ Vision (adjacent services)<\/strong><\/td>\n<td>Post-processing transcripts (NLP) or analyzing related media<\/td>\n<td>Complements Speech for entity extraction, classification, etc.<\/td>\n<td>Not speech services; require pipeline integration<\/td>\n<td>You want to enrich transcripts after Speech<\/td>\n<\/tr>\n<tr>\n<td><strong>AWS Transcribe \/ Polly<\/strong><\/td>\n<td>Broad feature set and deep AWS integration<\/td>\n<td>Mature ecosystem; many integrations<\/td>\n<td>Different IAM and data gravity; cross-cloud egress<\/td>\n<td>Your stack is primarily on AWS<\/td>\n<\/tr>\n<tr>\n<td><strong>Azure Speech<\/strong><\/td>\n<td>Microsoft ecosystem integration<\/td>\n<td>Strong enterprise integrations<\/td>\n<td>Cross-cloud egress; different ops model<\/td>\n<td>Your stack is primarily on Azure<\/td>\n<\/tr>\n<tr>\n<td><strong>Google Cloud Speech-to-Text \/ TTS<\/strong><\/td>\n<td>Google AI ecosystem<\/td>\n<td>Strong model quality for many languages<\/td>\n<td>Cross-cloud egress; different governance model<\/td>\n<td>Your stack is primarily on GCP<\/td>\n<\/tr>\n<tr>\n<td><strong>Open-source (Whisper, Vosk, etc.) self-managed<\/strong><\/td>\n<td>Full control and offline\/on-prem processing<\/td>\n<td>Customizable, on-prem capable<\/td>\n<td>You run\/scale\/secure GPUs; operational burden<\/td>\n<td>You need on-prem\/offline, or custom control outweighs ops cost<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example: Regulated contact center transcription pipeline<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Problem<\/strong>\nA financial services company records customer support calls. Compliance teams need:\n&#8211; Searchable transcripts\n&#8211; Retention controls\n&#8211; Audit trails and least-privilege access\n&#8211; Predictable processing at scale<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Proposed architecture<\/strong>\n&#8211; Store call recordings in <strong>Object Storage<\/strong> with a strict prefix scheme by date and line of business.\n&#8211; Use <strong>Events<\/strong> to trigger an <strong>OCI Function<\/strong> when a new audio file is uploaded.\n&#8211; The Function submits a <strong>Speech<\/strong> transcription job and writes job metadata to a database.\n&#8211; Speech outputs transcripts to an output prefix with restricted access.\n&#8211; Downstream analytics uses transcripts (not raw audio) for dashboards and compliance review.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Why Speech was chosen<\/strong>\n&#8211; OCI-native governance with compartments and IAM\n&#8211; Integrates with Object Storage and event-driven serverless patterns\n&#8211; Reduces ML infrastructure operations and accelerates time-to-value<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcomes<\/strong>\n&#8211; Reduced manual review time\n&#8211; Faster compliance investigations via search\n&#8211; Centralized audit trail through OCI Audit + application logs\n&#8211; Controlled retention and cost visibility with tags and lifecycle policies<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example: Meeting transcription feature for a SaaS app<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Problem<\/strong>\nA small SaaS team wants to add \u201cupload meeting audio \u2192 get transcript\u201d functionality with minimal ops overhead.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Proposed architecture<\/strong>\n&#8211; Web app uploads audio to Object Storage (optionally via pre-authenticated request).\n&#8211; Backend submits Speech transcription job and returns a job ID to the user.\n&#8211; User checks status; when complete, transcript is shown in-app.\n&#8211; Data is retained for 30 days via lifecycle policies.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Why Speech was chosen<\/strong>\n&#8211; Managed service reduces engineering time\n&#8211; Good fit for asynchronous processing\n&#8211; Works well with a small team\u2019s operational capacity<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcomes<\/strong>\n&#8211; Faster feature delivery\n&#8211; Low operational overhead\n&#8211; Straightforward cost scaling with usage (audio minutes)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">1) <strong>Is Speech the same as Oracle Digital Assistant?<\/strong><br\/>\nNo. Speech provides speech recognition and\/or speech synthesis capabilities at the API\/service level. Oracle Digital Assistant is a higher-level conversational platform. They can be used together in some architectures, but they solve different problems.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">2) <strong>Does Speech support both speech-to-text and text-to-speech?<\/strong><br\/>\nIn many OCI deployments, Speech refers to a service that includes speech-to-text and text-to-speech capabilities, but availability can vary. Verify supported operations in the official Speech documentation for your region.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">3) <strong>Is Speech regional or global?<\/strong><br\/>\nSpeech is typically a regional OCI service. You select the region where you submit jobs and store data.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">4) <strong>Do I need to store audio in Object Storage?<\/strong><br\/>\nFor many batch workflows, yes\u2014Object Storage is the simplest and most common integration. Some APIs may accept direct audio payloads or streaming audio; verify in the official docs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">5) <strong>How do I secure access to transcripts?<\/strong><br\/>\nUse compartment-based IAM policies and separate buckets\/prefixes. Treat transcripts as sensitive data and restrict read access. Avoid logging transcript content.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">6) <strong>What audio formats are supported?<\/strong><br\/>\nSupported formats\/codecs are service-specific. Check the official Speech documentation for supported containers, codecs, and sample rates.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">7) <strong>How accurate is transcription?<\/strong><br\/>\nAccuracy depends on language\/locale selection, audio quality, background noise, speaker overlap, and domain-specific vocabulary. Pilot with real samples and measure word error rate (WER) for your domain.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">8) <strong>Can I process very large audio files?<\/strong><br\/>\nThere are usually limits for duration and file size. For long recordings, split audio into segments and process in parallel, then merge transcripts (with timestamps if available).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">9) <strong>How do I handle multiple speakers?<\/strong><br\/>\nSome services provide speaker diarization (speaker labeling). If this is critical, verify diarization support in OCI Speech before committing to an architecture.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">10) <strong>How do I estimate costs?<\/strong><br\/>\nMeasure total audio minutes to be transcribed and apply the regional unit rate from Oracle\u2019s pricing page. Add Object Storage, orchestration, logging, and egress costs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">11) <strong>Does Speech integrate with Functions and Events?<\/strong><br\/>\nYes as an architecture pattern: Events can trigger Functions on new audio uploads, and the Function can call Speech APIs to create jobs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">12) <strong>How do I monitor Speech in production?<\/strong><br\/>\nTrack job throughput, failures, and latencies via your orchestration layer logs and metrics. Use OCI Audit for API-level tracking. Native service metrics availability should be verified.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">13) <strong>Can I keep processing private within a VCN without internet?<\/strong><br\/>\nSpeech endpoints are generally public HTTPS endpoints. Some OCI services can be reached privately via Service Gateway; verify if Speech supports it in your region. Otherwise use NAT with strict egress controls.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">14) <strong>What\u2019s the best way to avoid duplicate processing?<\/strong><br\/>\nMaintain a manifest table (object name \u2192 processed status \u2192 job ID \u2192 output object). Make your pipeline idempotent and handle duplicate events safely.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">15) <strong>Can Speech outputs be used for analytics and search?<\/strong><br\/>\nYes. Store transcripts in Object Storage and load\/index them into your preferred analytics or search system. Apply data classification and retention controls.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">16) <strong>How do I delete sensitive audio and transcripts?<\/strong><br\/>\nUse Object Storage lifecycle policies for automatic deletion and apply legal hold policies where necessary. Ensure access policies prevent unauthorized reads during retention.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">17) <strong>Can I use customer-managed keys (CMK) for audio storage?<\/strong><br\/>\nYou can use OCI Vault keys with Object Storage for encryption. Confirm the configuration steps in Object Storage documentation and validate compliance requirements.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn Speech<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official documentation<\/td>\n<td>OCI Speech documentation (home) \u2013 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/ai-speech\/home.htm<\/td>\n<td>Primary source for features, workflows, limits, and regional notes<\/td>\n<\/tr>\n<tr>\n<td>Official API reference<\/td>\n<td>OCI API docs (AI Speech endpoints\/models) \u2013 start from https:\/\/docs.oracle.com\/en-us\/iaas\/api\/<\/td>\n<td>Authoritative request\/response schemas and endpoint behavior<\/td>\n<\/tr>\n<tr>\n<td>Official pricing<\/td>\n<td>Oracle Cloud Price List (AI Services) \u2013 https:\/\/www.oracle.com\/cloud\/price-list\/<\/td>\n<td>Current pricing dimensions and regional rates<\/td>\n<\/tr>\n<tr>\n<td>Cost estimation<\/td>\n<td>Oracle Cloud Cost Estimator \u2013 https:\/\/www.oracle.com\/cloud\/costestimator.html<\/td>\n<td>Build scenario-based estimates without guessing<\/td>\n<\/tr>\n<tr>\n<td>Billing governance<\/td>\n<td>OCI Billing &amp; Cost Management docs \u2013 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Billing\/home.htm<\/td>\n<td>Budgets, cost analysis, and governance practices<\/td>\n<\/tr>\n<tr>\n<td>IAM fundamentals<\/td>\n<td>OCI IAM docs \u2013 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Identity\/home.htm<\/td>\n<td>Policies, dynamic groups, and secure authentication patterns<\/td>\n<\/tr>\n<tr>\n<td>Object Storage<\/td>\n<td>OCI Object Storage docs \u2013 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Object\/home.htm<\/td>\n<td>Buckets, PARs, lifecycle policies, encryption, events<\/td>\n<\/tr>\n<tr>\n<td>Events<\/td>\n<td>OCI Events docs \u2013 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Events\/home.htm<\/td>\n<td>Trigger pipelines from Object Storage uploads<\/td>\n<\/tr>\n<tr>\n<td>Functions<\/td>\n<td>OCI Functions docs \u2013 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Functions\/home.htm<\/td>\n<td>Serverless orchestration for speech pipelines<\/td>\n<\/tr>\n<tr>\n<td>Observability<\/td>\n<td>OCI Logging &amp; Monitoring docs \u2013 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Logging\/home.htm and https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Monitoring\/home.htm<\/td>\n<td>Central logging, metrics, alarms, and operational runbooks<\/td>\n<\/tr>\n<tr>\n<td>Audit<\/td>\n<td>OCI Audit docs \u2013 https:\/\/docs.oracle.com\/en-us\/iaas\/Content\/Audit\/home.htm<\/td>\n<td>Governance trail of API actions<\/td>\n<\/tr>\n<tr>\n<td>Architecture center<\/td>\n<td>Oracle Architecture Center \u2013 https:\/\/www.oracle.com\/cloud\/architecture-center\/<\/td>\n<td>Reference architectures and best practices patterns<\/td>\n<\/tr>\n<tr>\n<td>Tutorials\/labs<\/td>\n<td>Oracle LiveLabs \u2013 https:\/\/apexapps.oracle.com\/pls\/apex\/r\/dbpm\/livelabs\/home<\/td>\n<td>Hands-on labs across OCI services (search for AI Services\/Speech topics)<\/td>\n<\/tr>\n<tr>\n<td>Developer portal<\/td>\n<td>Oracle Developers \u2013 https:\/\/developer.oracle.com\/<\/td>\n<td>Tutorials, SDK guidance, and integration patterns<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps engineers, cloud engineers, architects<\/td>\n<td>OCI fundamentals, automation, DevOps practices around OCI services<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Beginners to intermediate engineers<\/td>\n<td>DevOps and cloud basics, process and tooling<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud operations teams, SREs<\/td>\n<td>Cloud operations, monitoring, reliability practices<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs, platform engineers<\/td>\n<td>SRE principles, incident response, reliability engineering<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops teams adopting AI-driven operations<\/td>\n<td>AIOps concepts, monitoring automation, operational analytics<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>DevOps\/cloud training and mentoring (verify current offerings)<\/td>\n<td>Engineers seeking guided learning<\/td>\n<td>https:\/\/rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps tooling and cloud practices (verify focus)<\/td>\n<td>Beginners to intermediate DevOps engineers<\/td>\n<td>https:\/\/devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps services\/training platform (verify offerings)<\/td>\n<td>Teams needing short-term help or training<\/td>\n<td>https:\/\/devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support and enablement (verify offerings)<\/td>\n<td>Ops\/DevOps teams needing practical support<\/td>\n<td>https:\/\/devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company Name<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps consulting (verify service catalog)<\/td>\n<td>Architecture reviews, implementation help, ops enablement<\/td>\n<td>Designing event-driven transcription pipelines; setting up IAM and governance<\/td>\n<td>https:\/\/cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>DevOps and cloud consulting\/training<\/td>\n<td>Platform engineering, CI\/CD, cloud adoption<\/td>\n<td>Automating Speech workflows with Functions; cost governance and monitoring<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting (verify service catalog)<\/td>\n<td>DevOps process, tooling, cloud operations<\/td>\n<td>Building secure OCI landing zones; setting up observability for AI pipelines<\/td>\n<td>https:\/\/devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before this service<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>OCI fundamentals<\/strong>\n   &#8211; Regions, compartments, VCN basics<\/li>\n<li><strong>IAM<\/strong>\n   &#8211; Groups, policies, dynamic groups, principals<\/li>\n<li><strong>Object Storage<\/strong>\n   &#8211; Buckets, prefixes, lifecycle policies, encryption, PARs<\/li>\n<li><strong>Basic API concepts<\/strong>\n   &#8211; REST, authentication, request\/response patterns<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after this service<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Event-driven pipelines<\/strong>\n   &#8211; OCI Events + Functions + (optional) Queues\/Streaming patterns<\/li>\n<li><strong>Observability<\/strong>\n   &#8211; Logging, Monitoring, alarms, audit investigations<\/li>\n<li><strong>Data engineering<\/strong>\n   &#8211; Loading transcripts into data stores; indexing and search<\/li>\n<li><strong>NLP enrichment<\/strong>\n   &#8211; Using OCI Language (or other tools) to extract entities, topics, and classifications from transcripts<\/li>\n<li><strong>Governance<\/strong>\n   &#8211; Tagging strategy, budgets, retention, and data classification<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use it<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud Engineer (OCI)<\/li>\n<li>Solutions Architect<\/li>\n<li>DevOps Engineer \/ SRE<\/li>\n<li>Data Engineer (audio-to-text pipelines)<\/li>\n<li>Application Developer (voice-enabled features)<\/li>\n<li>Security Engineer (governance and controls)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (if available)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Oracle certification offerings change over time. For current OCI certifications:\n&#8211; Start at Oracle Cloud Infrastructure foundations\/associate tracks.\n&#8211; Map AI Services skills to architect\/developer paths as appropriate.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Verify current certifications here:\n&#8211; https:\/\/education.oracle.com\/ (search OCI certifications)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Build an event-driven transcription pipeline:<\/li>\n<li>Object Storage upload \u2192 Event \u2192 Function \u2192 Speech job \u2192 transcript output<\/li>\n<li>Build a transcript search demo:<\/li>\n<li>Store transcripts \u2192 index in a search engine or database \u2192 simple UI<\/li>\n<li>Build a TTS prompt generator:<\/li>\n<li>Text templates \u2192 generate audio \u2192 store in Object Storage with versioning<\/li>\n<li>Add governance:<\/li>\n<li>Tags, budgets, lifecycle policies, least-privilege IAM policies<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AI Services (OCI)<\/strong>: Oracle-managed AI APIs (Speech, Language, Vision, etc.) under Analytics and AI.<\/li>\n<li><strong>Compartment<\/strong>: OCI\u2019s logical isolation boundary for organizing and controlling access to resources.<\/li>\n<li><strong>IAM Policy<\/strong>: A set of statements that grant permissions to groups\/dynamic groups in OCI.<\/li>\n<li><strong>Object Storage<\/strong>: OCI service for storing unstructured data (audio files, transcripts).<\/li>\n<li><strong>Namespace (Object Storage)<\/strong>: A tenancy-level unique identifier used in Object Storage.<\/li>\n<li><strong>Transcription<\/strong>: Converting spoken audio into text.<\/li>\n<li><strong>Speech-to-text (STT)<\/strong>: Automated conversion of audio speech into text.<\/li>\n<li><strong>Text-to-speech (TTS)<\/strong>: Automated generation of spoken audio from text.<\/li>\n<li><strong>Locale<\/strong>: Language + regional variant (for example, English US vs English UK).<\/li>\n<li><strong>Dynamic Group<\/strong>: OCI identity construct for grouping resources (instances\/functions) so they can be granted IAM permissions.<\/li>\n<li><strong>Resource Principal<\/strong>: Authentication method used by OCI Functions to call OCI APIs without storing keys.<\/li>\n<li><strong>Instance Principal<\/strong>: Authentication method used by OCI Compute instances to call OCI APIs without storing keys.<\/li>\n<li><strong>Lifecycle Policy (Object Storage)<\/strong>: Rules that automatically archive or delete objects after a defined time.<\/li>\n<li><strong>Egress<\/strong>: Data leaving OCI to the internet or external networks; can incur costs.<\/li>\n<li><strong>Audit Log<\/strong>: OCI-recorded log of API calls for governance and security investigations.<\/li>\n<li><strong>Idempotency<\/strong>: The property where repeating an operation produces the same result (important for event retries).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Speech in <strong>Oracle Cloud<\/strong> (under <strong>Analytics and AI<\/strong>) is a managed service for <strong>speech-to-text transcription<\/strong> and (where supported) <strong>text-to-speech synthesis<\/strong>, designed to plug into OCI-native architectures with IAM, compartments, and Object Storage.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It matters because it lets teams turn audio into usable text (and text into voice) without operating ML infrastructure, while still meeting enterprise needs for governance, auditability, and scalable processing. Cost is primarily driven by <strong>audio duration processed<\/strong> (for transcription) and <strong>characters synthesized<\/strong> (for TTS), plus indirect costs like Object Storage, logging retention, and egress.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Use Speech when you want OCI-native speech capabilities with secure access controls and repeatable pipelines; avoid it when you require features not documented for your region, strict offline processing, or streaming behaviors that the service doesn\u2019t support in your deployment.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: follow the official Speech documentation for your region, then operationalize the lab into an event-driven pipeline using Object Storage + Events + Functions with least-privilege IAM and cost governance.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Analytics and AI<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[66,62],"tags":[],"class_list":["post-841","post","type-post","status-publish","format-standard","hentry","category-analytics-and-ai","category-oracle-cloud"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/841","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=841"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/841\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=841"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=841"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=841"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}