{"id":81,"date":"2026-04-12T18:27:22","date_gmt":"2026-04-12T18:27:22","guid":{"rendered":"https:\/\/www.devopsschool.com\/tutorials\/alibaba-cloud-maxcompute-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics-computing\/"},"modified":"2026-04-12T18:27:22","modified_gmt":"2026-04-12T18:27:22","slug":"alibaba-cloud-maxcompute-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics-computing","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/tutorials\/alibaba-cloud-maxcompute-tutorial-architecture-pricing-use-cases-and-hands-on-guide-for-analytics-computing\/","title":{"rendered":"Alibaba Cloud MaxCompute Tutorial: Architecture, Pricing, Use Cases, and Hands-On Guide for Analytics Computing"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Category<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Analytics Computing<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Introduction<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">MaxCompute is Alibaba Cloud\u2019s fully managed, distributed data warehousing and big data computing service in the <strong>Analytics Computing<\/strong> category. It is designed for large-scale batch processing, SQL-based analytics, ETL\/ELT pipelines, and offline data warehousing on very large datasets.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In simple terms: <strong>you store data in MaxCompute tables and run SQL (and other batch jobs) to transform and analyze that data at scale<\/strong>, without managing servers, clusters, or distributed storage.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Technically, MaxCompute provides a project-scoped, multi-tenant big data platform with managed storage, a distributed execution engine, and multiple development\/ingestion interfaces (SQL, SDKs, command-line tools, and integration with Alibaba Cloud data services). It is commonly used as the \u201coffline warehouse\u201d layer in Alibaba Cloud analytics stacks, often paired with services like <strong>DataWorks<\/strong> (data development\/scheduling\/governance), <strong>Object Storage Service (OSS)<\/strong> (data lake storage), <strong>Data Transmission Service (DTS)<\/strong> (ingestion), <strong>Log Service (SLS)<\/strong> (log analytics ingestion), and BI\/serving engines (for example <strong>Quick BI<\/strong>, or low-latency analytic engines such as <strong>Hologres<\/strong> depending on use case).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">MaxCompute solves the problem of:\n&#8211; Storing and processing very large datasets reliably and cost-effectively\n&#8211; Running scalable batch analytics and ETL without operating Hadoop\/Spark clusters\n&#8211; Enforcing project-level isolation and access control for enterprise data warehousing\n&#8211; Integrating ingestion, governance, and analytics workflows in the Alibaba Cloud ecosystem<\/p>\n\n\n\n<blockquote>\n<p>Naming note: MaxCompute was historically known as <strong>ODPS (Open Data Processing Service)<\/strong>. Today, the official product name is <strong>MaxCompute<\/strong>. ODPS may still appear in tool names, endpoints, or legacy documentation references. Verify in official docs if you see ODPS in your environment.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">2. What is MaxCompute?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Official purpose<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">MaxCompute is Alibaba Cloud\u2019s managed big data computing platform for <strong>data warehousing and large-scale batch computing<\/strong>, typically accessed via SQL and used for offline analytics workloads.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Core capabilities (high-level)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Managed storage for structured datasets (tables with schema, partitions)<\/li>\n<li>Distributed batch compute for:<\/li>\n<li>SQL queries and transformations<\/li>\n<li>ETL\/ELT processing<\/li>\n<li>Custom functions (UDFs) and batch jobs (depending on enabled capabilities)<\/li>\n<li>Data ingestion and export via supported tools\/APIs (commonly via \u201cTunnel\u201d tooling\/interfaces and ecosystem integrations)<\/li>\n<li>Project-based isolation, permissions, and governance hooks<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Major components (conceptual)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>MaxCompute Project<\/strong>: The primary isolation boundary for data, users, permissions, quotas, and billing attribution.<\/li>\n<li><strong>Tables \/ Partitions<\/strong>: Structured storage objects (often partitioned for performance and cost control).<\/li>\n<li><strong>SQL Engine (MaxCompute SQL)<\/strong>: The primary interface for querying and transformation.<\/li>\n<li><strong>Access Interfaces<\/strong>:<\/li>\n<li>Web console (management)<\/li>\n<li>Command-line client (commonly <code>odpscmd<\/code>, verify latest tooling in docs)<\/li>\n<li>SDKs\/APIs (language-specific; verify current supported SDKs)<\/li>\n<li>Integration via DataWorks and other Alibaba Cloud services<\/li>\n<li><strong>Data Transfer (Tunnel)<\/strong>: A commonly used ingestion\/export mechanism in MaxCompute ecosystems (tooling and endpoints vary by region; verify in official docs).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Service type<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Fully managed analytics computing \/ data warehouse compute service<\/strong> (batch-oriented).<\/li>\n<li>You manage schemas, SQL, and permissions; Alibaba Cloud manages the underlying infrastructure and scaling.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scope: regional\/global\/zonal and tenancy<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>MaxCompute is typically <strong>regional<\/strong>: you create resources in a specific Alibaba Cloud region.<\/li>\n<li>Operational and security isolation is typically <strong>project-scoped<\/strong> within your Alibaba Cloud account\/tenant.<\/li>\n<li>Billing is usage-based (and\/or capacity-based depending on your purchase model). Exact billing dimensions vary by edition\/region and should be confirmed in the official pricing pages.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How it fits into the Alibaba Cloud ecosystem<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">MaxCompute often sits in the center of an Alibaba Cloud analytics platform:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ingestion<\/strong>: DTS (databases), Data Integration (DataWorks), SLS (logs), OSS (files), or application exports<\/li>\n<li><strong>Processing<\/strong>: MaxCompute SQL (transformations, aggregations), scheduled workflows (DataWorks), batch jobs<\/li>\n<li><strong>Serving<\/strong>: BI tools (Quick BI), downstream data marts, low-latency OLAP engines (e.g., Hologres\/AnalyticDB depending on requirements), or export to OSS for sharing<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">MaxCompute is most commonly used for <strong>offline<\/strong> (batch) analytics. If you need sub-second interactive queries or high concurrency serving, you often complement it with a serving\/OLAP engine rather than forcing MaxCompute to behave like an OLTP database.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">3. Why use MaxCompute?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Business reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Lower operational burden<\/strong>: No cluster provisioning, patching, or capacity planning like self-managed Hadoop\/Spark.<\/li>\n<li><strong>Scales for large datasets<\/strong>: Designed for data warehouse-scale storage and compute.<\/li>\n<li><strong>Ecosystem integration<\/strong>: Works naturally with Alibaba Cloud data services (DataWorks, OSS, DTS, etc.).<\/li>\n<li><strong>Governance and isolation<\/strong>: Project-scoped boundaries help align with business domains and organizational structures.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Technical reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>SQL-centric analytics<\/strong>: Many analytics workloads can be expressed in SQL, reducing custom code.<\/li>\n<li><strong>Partitioned tables<\/strong>: Enables efficient incremental processing and cost control.<\/li>\n<li><strong>Batch compute patterns<\/strong>: Suitable for nightly jobs, periodic pipelines, large joins, aggregations, and feature computation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Project-level management<\/strong>: Clear boundaries for quotas, users, permissions, and lifecycle policies.<\/li>\n<li><strong>Automation via orchestration<\/strong>: Often paired with DataWorks for scheduling, dependency management, and release workflows.<\/li>\n<li><strong>Repeatable workflows<\/strong>: Mature pattern for \u201craw \u2192 cleaned \u2192 curated \u2192 marts\u201d layered data warehouse design.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security\/compliance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Access control<\/strong>: Fine-grained permissions can be applied at project\/object level (exact granularity depends on configuration and features; verify in official docs).<\/li>\n<li><strong>Auditability<\/strong>: Alibaba Cloud provides logs and audit trails across account activities; MaxCompute also has operational metadata and job history mechanisms (verify exact logging integration patterns).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scalability\/performance reasons<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Massively parallel batch execution<\/strong>: Designed for large-scale transformations and aggregations.<\/li>\n<li><strong>Works well with partition pruning<\/strong>: Proper partitioning dramatically improves performance and reduces scanned data.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should choose MaxCompute<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Choose MaxCompute when you need:\n&#8211; A managed batch data warehouse and compute engine\n&#8211; Centralized offline analytics with large datasets\n&#8211; ETL\/ELT pipelines and scheduled transformations\n&#8211; A strong \u201cwarehouse core\u201d integrated with Alibaba Cloud analytics services<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When teams should not choose MaxCompute<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Avoid (or complement) MaxCompute when you need:\n&#8211; <strong>Low-latency interactive analytics<\/strong> with very high concurrency (consider an OLAP serving engine)\n&#8211; <strong>OLTP workloads<\/strong> (transactions, row-level updates at high frequency)\n&#8211; <strong>Streaming-first processing<\/strong> (consider Realtime Compute for Apache Flink, then land results into MaxCompute\/OSS)\n&#8211; <strong>Strict ANSI SQL compatibility<\/strong> (dialect differences may require adaptation; verify supported syntax)<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">4. Where is MaxCompute used?<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Industries<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>E-commerce and retail (sales analytics, inventory, customer segmentation)<\/li>\n<li>FinTech and insurance (risk analytics, compliance reporting, fraud analysis)<\/li>\n<li>Gaming and media (behavior analytics, retention cohorts, recommendation features)<\/li>\n<li>Logistics and transportation (route optimization analytics, demand forecasting features)<\/li>\n<li>Manufacturing\/IoT (batch analytics on telemetry, quality analytics)<\/li>\n<li>SaaS companies (product analytics, billing analytics, data marts for BI)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team types<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data engineering teams building batch pipelines<\/li>\n<li>Analytics engineering teams building curated models and marts<\/li>\n<li>BI teams consuming curated datasets<\/li>\n<li>Platform teams operating shared data infrastructure and governance<\/li>\n<li>Security\/compliance teams enforcing access boundaries and auditing<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Workloads<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data warehouse layer transformations (raw \u2192 ODS \u2192 DWD \u2192 DWS \u2192 ADS patterns are common in practice)<\/li>\n<li>Large-scale joins, deduplication, and aggregations<\/li>\n<li>Feature engineering for ML (offline features)<\/li>\n<li>Periodic reporting datasets for dashboards<\/li>\n<li>Backfills and historical recomputation<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Architectures<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Warehouse-centric analytics: MaxCompute as the central store and compute<\/li>\n<li>Lakehouse-style: OSS as the lake, MaxCompute as a curated compute\/warehouse layer (implementation details vary; verify current integration patterns)<\/li>\n<li>Hybrid serving: MaxCompute for offline processing + OLAP engine for serving + OSS for sharing\/archival<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Real-world deployment contexts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Multi-project design per business domain (marketing, finance, supply chain)<\/li>\n<li>Central platform project for shared reference data<\/li>\n<li>Dev\/test projects for CI-like workflows and safe experiments<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Production vs dev\/test usage<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Production<\/strong>: strict permissions, audited changes, workflow orchestration, lifecycle policies, cost controls<\/li>\n<li><strong>Dev\/test<\/strong>: smaller quotas, separate projects, sample data, shorter retention<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">5. Top Use Cases and Scenarios<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Below are realistic scenarios where MaxCompute is a strong fit.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1) Enterprise data warehouse (offline)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Centralize data from multiple systems and run consistent reporting.<\/li>\n<li><strong>Why MaxCompute fits<\/strong>: Managed warehouse storage + scalable batch SQL transformations.<\/li>\n<li><strong>Example<\/strong>: Nightly loads from CRM + order DB \u2192 standardized fact\/dimension tables \u2192 finance dashboards.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2) ETL\/ELT pipelines for BI marts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Transform raw ingestion tables into curated, BI-ready datasets.<\/li>\n<li><strong>Why MaxCompute fits<\/strong>: Partitioned transformations, repeatable SQL models, integration with schedulers (commonly DataWorks).<\/li>\n<li><strong>Example<\/strong>: Build a \u201cdaily_sales_mart\u201d dataset partitioned by date for dashboards.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">3) Large-scale log analytics (batch)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Analyze large volumes of application logs for trends and anomaly baselines.<\/li>\n<li><strong>Why MaxCompute fits<\/strong>: Batch aggregation on big datasets; ingest via SLS\/OSS then process.<\/li>\n<li><strong>Example<\/strong>: Compute daily error-rate aggregates and top error signatures.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4) User behavior analytics and cohorts (offline)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Build retention, funnel, and cohort metrics on event data.<\/li>\n<li><strong>Why MaxCompute fits<\/strong>: SQL-based sessionization and cohort computations on partitioned event tables.<\/li>\n<li><strong>Example<\/strong>: Weekly retention by acquisition channel for last 12 months.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5) Feature engineering for machine learning (offline features)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Generate training datasets and offline features from historical data.<\/li>\n<li><strong>Why MaxCompute fits<\/strong>: Large joins\/aggregations; reproducible training snapshots.<\/li>\n<li><strong>Example<\/strong>: Build user-level features (30\/60\/90-day windows) for churn prediction.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6) Periodic compliance reporting and auditing datasets<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Generate regulatory reports requiring large-scale reconciliation.<\/li>\n<li><strong>Why MaxCompute fits<\/strong>: Batch compute, repeatability, and project-based isolation.<\/li>\n<li><strong>Example<\/strong>: Monthly transaction reconciliation report with cross-system matching.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7) Data quality checks at scale<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Detect schema drift, null spikes, duplicate keys, out-of-range values.<\/li>\n<li><strong>Why MaxCompute fits<\/strong>: SQL-based profiling on large partitions; integrate results into governance workflows.<\/li>\n<li><strong>Example<\/strong>: Daily job computes null-rate and uniqueness metrics per critical table.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">8) Backfill and historical recomputation<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Recompute historical metrics after logic changes or bug fixes.<\/li>\n<li><strong>Why MaxCompute fits<\/strong>: Designed for long-running batch compute and large scans (with cost awareness).<\/li>\n<li><strong>Example<\/strong>: Recompute 2 years of daily metrics after changing attribution logic.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">9) Multi-tenant analytics platform (project-per-tenant)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Provide analytics compute\/storage isolation per tenant or business unit.<\/li>\n<li><strong>Why MaxCompute fits<\/strong>: Project boundaries for access control, quotas, and cost allocation.<\/li>\n<li><strong>Example<\/strong>: Separate MaxCompute projects for each subsidiary.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">10) Offline aggregation for low-latency serving systems<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Serving system needs pre-aggregated tables to keep latency low.<\/li>\n<li><strong>Why MaxCompute fits<\/strong>: Efficient batch pre-aggregation and export to serving stores.<\/li>\n<li><strong>Example<\/strong>: Precompute product ranking features nightly and export results for an API.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">11) Data lake to warehouse curation (OSS \u2192 MaxCompute)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Raw files in OSS need standardization and structured querying.<\/li>\n<li><strong>Why MaxCompute fits<\/strong>: Create structured tables from raw data, apply partitions, enforce schemas.<\/li>\n<li><strong>Example<\/strong>: Convert daily CSV\/Parquet drops into partitioned curated tables.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12) Cross-system reconciliation and anomaly detection (batch)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong>: Compare metrics across multiple data sources and flag anomalies.<\/li>\n<li><strong>Why MaxCompute fits<\/strong>: Large joins, window functions (if supported), and statistical aggregations.<\/li>\n<li><strong>Example<\/strong>: Compare payment gateway totals vs internal ledger totals daily.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">6. Core Features<\/h2>\n\n\n\n<blockquote>\n<p>Feature availability can vary by region\/edition and by what is enabled in your MaxCompute project. Always confirm in the official MaxCompute documentation for your region.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">6.1 Project-based resource and security isolation<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Organizes datasets, permissions, quotas, and billing context into \u201cprojects.\u201d<\/li>\n<li><strong>Why it matters<\/strong>: Projects are the primary boundary for multi-team and multi-domain governance.<\/li>\n<li><strong>Practical benefit<\/strong>: Safer separation of dev\/test\/prod and business units.<\/li>\n<li><strong>Caveats<\/strong>: Cross-project sharing requires explicit configuration and governance.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.2 Managed table storage with schema<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Stores structured data in tables with defined columns and types.<\/li>\n<li><strong>Why it matters<\/strong>: Enforces consistency and supports SQL analytics.<\/li>\n<li><strong>Practical benefit<\/strong>: Clear data contracts and predictable query behavior.<\/li>\n<li><strong>Caveats<\/strong>: Schema evolution and data type changes require careful handling (verify supported DDL operations).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.3 Partitioned tables (often essential)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Physically\/logically organizes table data by partition keys (commonly date).<\/li>\n<li><strong>Why it matters<\/strong>: Partition pruning reduces scanned data and improves performance\/cost.<\/li>\n<li><strong>Practical benefit<\/strong>: Efficient daily incremental processing and retention control.<\/li>\n<li><strong>Caveats<\/strong>: Poor partition design (too many partitions, wrong keys) can hurt performance and manageability.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.4 MaxCompute SQL (batch analytics)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Provides SQL-based query and transformation on large datasets.<\/li>\n<li><strong>Why it matters<\/strong>: SQL is widely understood; reduces custom code.<\/li>\n<li><strong>Practical benefit<\/strong>: Faster development for ETL and analytics.<\/li>\n<li><strong>Caveats<\/strong>: SQL dialect and supported functions can differ from other databases; test portability.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.5 UDF\/UDTF and extensibility (project-dependent)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Extends SQL with custom logic (user-defined functions).<\/li>\n<li><strong>Why it matters<\/strong>: Enables reuse of business logic not available in built-in functions.<\/li>\n<li><strong>Practical benefit<\/strong>: Standardize transformations such as parsing, classification, hashing, masking.<\/li>\n<li><strong>Caveats<\/strong>: Operational overhead for deployment\/versioning; performance impacts; language\/runtime constraints (verify current supported runtimes).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.6 Data ingestion and export (commonly via Tunnel + integrations)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Moves data into\/out of MaxCompute using supported ingestion methods and integrations.<\/li>\n<li><strong>Why it matters<\/strong>: Warehouses are only useful if data movement is reliable and governed.<\/li>\n<li><strong>Practical benefit<\/strong>: Supports building repeatable pipelines from databases, logs, and OSS.<\/li>\n<li><strong>Caveats<\/strong>: Throughput limits, quotas, and region endpoints apply. Cross-region transfer may add cost and latency.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.7 Lifecycle and data retention controls<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Helps manage data retention\/expiration (for example, partition lifecycle policies).<\/li>\n<li><strong>Why it matters<\/strong>: Prevents uncontrolled storage growth and supports compliance.<\/li>\n<li><strong>Practical benefit<\/strong>: Lower storage costs and reduced risk of keeping data longer than allowed.<\/li>\n<li><strong>Caveats<\/strong>: Misconfigured lifecycle can delete needed data; implement safeguards.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.8 Job management, history, and operational metadata<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Tracks executed jobs\/queries and outcomes (exact UX depends on console\/tools).<\/li>\n<li><strong>Why it matters<\/strong>: Debugging, auditability, performance tuning.<\/li>\n<li><strong>Practical benefit<\/strong>: Identify expensive queries, failures, and long runtimes.<\/li>\n<li><strong>Caveats<\/strong>: Retention of job history and depth of metrics may vary; integrate with broader observability practices.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.9 Ecosystem integration (DataWorks, OSS, DTS, SLS, PAI, BI)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>What it does<\/strong>: Connects MaxCompute to ingestion, governance, ML, and BI workflows.<\/li>\n<li><strong>Why it matters<\/strong>: Most production systems need orchestration and governance around the warehouse.<\/li>\n<li><strong>Practical benefit<\/strong>: End-to-end data platform rather than isolated compute.<\/li>\n<li><strong>Caveats<\/strong>: Some integrations are separate paid products (for example DataWorks); design costs accordingly.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">7. Architecture and How It Works<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">7.1 High-level architecture<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">At a high level, MaxCompute is a managed service where:\n&#8211; Data is stored in MaxCompute-managed storage (tables\/partitions).\n&#8211; Users and services submit SQL or batch jobs to an execution engine.\n&#8211; The engine schedules distributed tasks internally and returns results.\n&#8211; External services (DataWorks, DTS, OSS, SLS, BI tools) integrate through connectors, APIs, or export pipelines.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">7.2 Request\/data\/control flow (typical)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Authentication\/authorization<\/strong>: Caller (user, RAM role, or service integration) authenticates to Alibaba Cloud and is authorized at MaxCompute project\/object level.<\/li>\n<li><strong>Job submission<\/strong>: SQL or job definition is submitted via console, client, or integration.<\/li>\n<li><strong>Planning and execution<\/strong>: MaxCompute plans the query\/job and runs it across distributed resources.<\/li>\n<li><strong>Storage access<\/strong>: The job reads partitions\/objects and writes results to target tables\/partitions.<\/li>\n<li><strong>Results retrieval<\/strong>: Results are saved to tables or returned as query output (interactive result size limits may apply; verify in docs).<\/li>\n<li><strong>Governance\/ops<\/strong>: Job metadata and logs are available for monitoring, auditing, and troubleshooting.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">7.3 Common integrations with related Alibaba Cloud services<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>DataWorks<\/strong>: Data development, scheduling, dependency management, data quality, governance (often the primary \u201ccontrol plane\u201d for pipelines).<\/li>\n<li><strong>OSS (Object Storage Service)<\/strong>: Landing zone for files; archival; data lake patterns; import\/export.<\/li>\n<li><strong>DTS (Data Transmission Service)<\/strong>: Database CDC\/replication into analytics stores (confirm supported targets and patterns).<\/li>\n<li><strong>SLS (Log Service)<\/strong>: Collect logs, store, and export for batch analytics.<\/li>\n<li><strong>PAI (Machine Learning Platform for AI)<\/strong>: Build training datasets and features from MaxCompute; run ML pipelines (integration details vary).<\/li>\n<li><strong>Quick BI<\/strong>: BI dashboards and reporting (connectivity and performance patterns vary).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7.4 Dependency services (practical)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>RAM<\/strong> (Resource Access Management): identities, policies, AccessKey management, role-based access.<\/li>\n<li><strong>VPC\/networking<\/strong>: Some access patterns use VPC endpoints or private connectivity; verify current options for your region.<\/li>\n<li><strong>KMS<\/strong> (Key Management Service): If encryption with customer-managed keys is used (verify exact MaxCompute encryption options in docs).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7.5 Security\/authentication model (overview)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Identity is handled through <strong>Alibaba Cloud RAM<\/strong>.<\/li>\n<li>Access to MaxCompute is controlled through a combination of:<\/li>\n<li>Project-level membership\/roles<\/li>\n<li>Object-level privileges (tables, resources, functions), depending on enabled access control model<\/li>\n<li>For service-to-service access, prefer short-lived credentials (for example via STS) where supported by your workflow.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7.6 Networking model (overview)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>MaxCompute is a managed service accessed via service endpoints.<\/li>\n<li>Connectivity may be via public endpoints and\/or private networking options depending on region and account configuration.<\/li>\n<li>Data movement tools (like Tunnel) have specific endpoints per region. Always use the endpoint patterns documented for your region.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7.7 Monitoring\/logging\/governance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Track:<\/li>\n<li>Query\/job failures and reasons<\/li>\n<li>Runtime and resource consumption (to manage cost and SLAs)<\/li>\n<li>Data growth and partition counts<\/li>\n<li>Permissions changes and project membership changes<\/li>\n<li>For enterprise operations:<\/li>\n<li>Standardize naming conventions for projects\/tables\/partitions<\/li>\n<li>Define retention policies<\/li>\n<li>Control who can run large scans or cross-join type workloads<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7.8 Architecture diagrams<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Simple learning architecture<\/h4>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart LR\n  U[Engineer \/ Analyst] --&gt;|SQL \/ Client| MC[MaxCompute Project]\n  MC --&gt; T[(Tables &amp; Partitions)]\n  U --&gt;|Upload\/Download| TN[Tunnel \/ Ingestion Tooling]\n  TN --&gt; MC\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Production-style reference architecture (common pattern)<\/h4>\n\n\n\n<pre><code class=\"language-mermaid\">flowchart TB\n  subgraph Sources\n    OLTP[(RDS \/ Self-managed DBs)]\n    LOGS[(Apps \/ Logs)]\n    FILES[(Files in OSS)]\n  end\n\n  subgraph Ingestion\n    DTS[DTS \/ CDC]\n    SLS[SLS Log Service]\n    DI[Data Integration (DataWorks) \/ ETL Connectors]\n  end\n\n  subgraph Warehouse[\"MaxCompute (Regional)\"]\n    P1[Project: raw\/ods]\n    P2[Project: dwd\/dws\/ads]\n    TBLS[(Partitioned Tables)]\n    JOBS[SQL Jobs \/ Batch Compute]\n  end\n\n  subgraph GovernanceOps\n    DW[DataWorks: Dev+Scheduler+Governance]\n    RAM[RAM: IAM\/Policies]\n    AUDIT[Audit\/Logs (account-level + job history)]\n  end\n\n  subgraph Serving\n    BI[Quick BI \/ BI Tools]\n    OLAP[Serving OLAP Engine\\n(e.g., Hologres\/AnalyticDB - choose per needs)]\n    EXP[Export to OSS \/ API consumers]\n  end\n\n  OLTP --&gt; DTS --&gt; P1\n  LOGS --&gt; SLS --&gt; FILES\n  FILES --&gt; DI --&gt; P1\n\n  P1 --&gt; JOBS --&gt; P2\n  P2 --&gt; TBLS\n\n  TBLS --&gt; BI\n  TBLS --&gt; OLAP\n  TBLS --&gt; EXP\n\n  DW --&gt; P1\n  DW --&gt; P2\n  RAM --&gt; Warehouse\n  AUDIT --&gt; GovernanceOps\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">8. Prerequisites<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Account \/ project requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An <strong>Alibaba Cloud account<\/strong> with billing enabled.<\/li>\n<li>A <strong>MaxCompute project<\/strong> in a chosen region (you will create one in the lab).<\/li>\n<li>Optional but common in production: <strong>DataWorks<\/strong> workspace associated with the MaxCompute project.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Permissions \/ IAM (RAM)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You typically need:\n&#8211; Permission to create or manage MaxCompute projects (often account-level administrative capability).\n&#8211; A <strong>RAM user<\/strong> or <strong>RAM role<\/strong> to operate MaxCompute with least privilege.\n&#8211; Ability to create AccessKeys if you plan to use command-line tools (follow your organization\u2019s security policy).<\/p>\n\n\n\n<blockquote>\n<p>In enterprises, avoid using the root account for daily operations. Use RAM users\/roles and least privilege.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Billing requirements<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A payment method attached to your account.<\/li>\n<li>Ensure your account can purchase\/activate MaxCompute in the selected region.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Tools<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Choose at least one interface:\n&#8211; <strong>Alibaba Cloud Console<\/strong> (web UI) for project creation and basic management.\n&#8211; <strong>Command-line client<\/strong> (commonly <code>odpscmd<\/code>) for SQL execution and scripting. Download links and latest instructions are in official docs.<br\/>\n  Official docs landing: https:\/\/www.alibabacloud.com\/help\/en\/maxcompute\/\n&#8211; Optional: <strong>DataWorks<\/strong> for a notebook-like development experience and scheduling.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Region availability<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>MaxCompute is region-based. Choose a region near your data sources and consumers to reduce latency and transfer costs.<\/li>\n<li>Confirm region availability and endpoints in official documentation for your account type.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Quotas\/limits (examples to plan for)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Exact quotas vary by account\/region\/edition; verify in official docs:\n&#8211; Max concurrent jobs\/queries\n&#8211; Storage limits per project\n&#8211; Partition count best practices\/limits\n&#8211; Upload\/download throughput via ingestion tools\n&#8211; SQL result size limits in interactive consoles\/clients<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Prerequisite services (optional, depending on your workflow)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>OSS (for file-based data exchange)<\/li>\n<li>DataWorks (for orchestration and governance)<\/li>\n<li>DTS (for database ingestion)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">9. Pricing \/ Cost<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">MaxCompute pricing can be <strong>multi-dimensional<\/strong> and can vary by <strong>region<\/strong>, <strong>billing mode<\/strong>, and potentially by <strong>edition\/SKU<\/strong> or negotiated enterprise agreements. Do not rely on fixed numbers\u2014use official pricing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Official pricing resources (start here)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Product page (global): https:\/\/www.alibabacloud.com\/product\/maxcompute  <\/li>\n<li>Help Center (docs hub): https:\/\/www.alibabacloud.com\/help\/en\/maxcompute\/  <\/li>\n<li>Alibaba Cloud pricing pages differ by locale and account type. If you use the China site, pricing is often listed under the Aliyun pricing center (verify current URL for MaxCompute pricing in your locale).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common pricing dimensions (verify exact model for your region)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>Compute<\/strong>\n   &#8211; Often billed by usage of compute resources (for example, CU-based consumption, job execution resources, or reserved capacity models depending on your purchase options).\n   &#8211; Some organizations buy reserved\/exclusive resources for predictable performance and budgeting (availability depends on region\/contract).<\/p>\n<\/li>\n<li>\n<p><strong>Storage<\/strong>\n   &#8211; Billed by data stored (GB-month) for tables and related storage.\n   &#8211; Costs depend on retention and the number\/size of partitions.<\/p>\n<\/li>\n<li>\n<p><strong>Data movement<\/strong>\n   &#8211; Upload\/download and inter-service transfer may incur costs (especially cross-region).\n   &#8211; Network egress from Alibaba Cloud regions is typically chargeable; intra-region transfers may be cheaper (verify).<\/p>\n<\/li>\n<li>\n<p><strong>Ecosystem services<\/strong>\n   &#8211; <strong>DataWorks<\/strong>, <strong>DTS<\/strong>, <strong>SLS<\/strong>, and BI tools are priced separately.\n   &#8211; The \u201ctrue cost\u201d of a warehouse platform is often dominated by orchestration + ingestion + serving tools, not only the warehouse compute.<\/p>\n<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Cost drivers (what usually makes bills spike)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Large scans due to missing partition filters<\/li>\n<li>Backfills across long history without staged rollouts<\/li>\n<li>Excessive intermediate tables and duplicated datasets<\/li>\n<li>High-frequency ETL jobs producing many small partitions<\/li>\n<li>Exporting large datasets out of region or to the public internet<\/li>\n<li>Keeping raw data forever without lifecycle policies<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Hidden\/indirect costs to plan for<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>DataWorks scheduling and development features (if used)<\/li>\n<li>OSS storage for staging\/raw\/lake layers<\/li>\n<li>DTS ongoing replication costs (if used)<\/li>\n<li>Cross-region replication\/backup<\/li>\n<li>Human costs: data modeling, governance, and operational readiness<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network\/data transfer implications<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prefer same-region placement for sources (DTS target), OSS, and MaxCompute to reduce transfer costs.<\/li>\n<li>If BI tools or consumers are outside Alibaba Cloud or in other regions, egress charges may apply.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to optimize cost (practical checklist)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Partition by date (and sometimes by region\/tenant) and always filter partitions in queries.<\/li>\n<li>Implement <strong>lifecycle policies<\/strong> for raw\/temporary tables and old partitions.<\/li>\n<li>Use incremental processing instead of full reloads.<\/li>\n<li>Avoid storing the same dataset in multiple forms unless there is a clear serving requirement.<\/li>\n<li>Monitor top expensive queries\/jobs and optimize them (join order, filters, pre-aggregation).<\/li>\n<li>Use dev\/test projects with smaller quotas and shorter retention.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example low-cost starter estimate (conceptual)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">A minimal learning setup typically includes:\n&#8211; A small MaxCompute project\n&#8211; One or two small tables (MBs to a few GB)\n&#8211; Occasional SQL queries<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Cost depends on:\n&#8211; Your region\u2019s minimum billing increments for compute\n&#8211; Storage size and retention\n&#8211; Whether you use paid orchestration tools (DataWorks)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Because exact prices vary, <strong>use the official pricing page\/calculator<\/strong> for your region and assume:\n&#8211; Storage costs scale with GB-month\n&#8211; Compute costs scale with the number and complexity of jobs and how often they run<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example production cost considerations<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For production, model costs across:\n&#8211; Daily ingest volume (GB\/day)\n&#8211; Number of transformations (jobs\/day) and their expected scan sizes\n&#8211; Retention (days\/months\/years)\n&#8211; Backfill frequency\n&#8211; Serving exports (GB\/day) and where data is consumed<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A common practice is to run a <strong>30-day proof<\/strong>:\n&#8211; Implement one pipeline end-to-end\n&#8211; Measure compute consumption per job and per day\n&#8211; Validate that partitioning reduces scanned data as expected\n&#8211; Set budgets\/alerts (where available in your billing tools) based on observed spend<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">10. Step-by-Step Hands-On Tutorial<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Objective<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Create a MaxCompute project, define a partitioned table, load a small sample dataset using SQL inserts, run analytical queries, and apply basic operational hygiene (verification, troubleshooting, cleanup).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Lab Overview<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">You will:\n1. Create a MaxCompute project in Alibaba Cloud.\n2. Create a RAM user (or use an existing least-privilege identity) and grant access to the project.\n3. Connect to MaxCompute using a supported SQL interface (console SQL editor or <code>odpscmd<\/code>, depending on what is available in your account\/region).\n4. Create a partitioned table (<code>events<\/code>) and insert sample data.\n5. Run queries that demonstrate partition pruning and aggregation.\n6. Drop the objects to avoid ongoing storage costs.<\/p>\n\n\n\n<blockquote>\n<p>Notes before you start<br\/>\n&#8211; The Alibaba Cloud UI and available \u201cSQL editor\u201d experiences can differ by region and account type. If the MaxCompute console in your region does not provide an in-browser SQL editor, use the official command-line tool (<code>odpscmd<\/code>) as described below.<br\/>\n&#8211; Replace placeholders like <code>&lt;region&gt;<\/code> and <code>&lt;project_name&gt;<\/code> with your values.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Create a MaxCompute project (Console)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Sign in to Alibaba Cloud Console: https:\/\/home.console.alibabacloud.com\/<\/li>\n<li>Search for <strong>MaxCompute<\/strong> and open the MaxCompute console.<\/li>\n<li>Choose the target <strong>Region<\/strong> (keep it consistent with your data sources).<\/li>\n<li>Create a <strong>Project<\/strong>:\n   &#8211; Project name example: <code>mc_lab_project<\/code>\n   &#8211; Set the necessary project parameters (billing mode\/options shown in your console).\n   &#8211; Confirm creation.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; A new MaxCompute project exists and appears in the MaxCompute console under your selected region.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; In the MaxCompute console, you can see the project and basic project info (region, status).<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Create\/prepare an IAM identity (RAM) and grant project access<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Open RAM console: https:\/\/ram.console.aliyun.com\/ (or from the console search bar \u201cRAM\u201d).<\/li>\n<li>Create a <strong>RAM user<\/strong> for the lab (recommended) or select an existing one.<\/li>\n<li>(Optional, for CLI use) Create an <strong>AccessKey<\/strong> for the RAM user. Store it securely.<\/li>\n<li>Grant the user permission to access MaxCompute:\n   &#8211; At minimum, the user must be able to connect to the project and create tables\/run SQL for this lab.\n   &#8211; In many organizations, you add the user to the MaxCompute project and grant appropriate project roles\/privileges.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; A RAM user can authenticate and has permissions to work inside the <code>mc_lab_project<\/code> project.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\n&#8211; Sign in as the RAM user and confirm you can open the MaxCompute project (or run a simple SQL statement later).<\/p>\n\n\n\n<blockquote>\n<p>Security note: Prefer least privilege. After the lab, disable\/delete AccessKeys you created for training.<\/p>\n<\/blockquote>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Choose your SQL execution method (Console SQL editor or odpscmd)<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Option A: Use a console-based SQL editor (if available)<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In the MaxCompute console, open your project.<\/li>\n<li>Find a feature like <strong>SQL<\/strong>, <strong>Query<\/strong>, <strong>SQL Editor<\/strong>, or similar.<\/li>\n<li>Confirm you can run a trivial statement (for example <code>SHOW TABLES;<\/code> or <code>SELECT 1;<\/code> if supported).<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">If this is not available, use Option B.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Option B: Use the official command-line client (<code>odpscmd<\/code>) (works in most environments)<\/h4>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In official docs, locate the latest download\/setup guide for the MaxCompute client (<code>odpscmd<\/code>):<br\/>\n   https:\/\/www.alibabacloud.com\/help\/en\/maxcompute\/<\/li>\n<li>Install it on your machine (Windows\/macOS\/Linux supported options may differ).<\/li>\n<li>Create\/update the configuration file with:\n   &#8211; <strong>Project name<\/strong>\n   &#8211; <strong>AccessKey ID\/Secret<\/strong> (or a more secure credential mechanism if your organization mandates it)\n   &#8211; <strong>Endpoint<\/strong> for MaxCompute in your region<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Example configuration (illustrative \u2014 <strong>verify exact keys and endpoint format in official docs<\/strong>):<\/p>\n\n\n\n<pre><code class=\"language-ini\"># odps_config.ini (example only; verify with official docs)\nproject_name=mc_lab_project\naccess_id=&lt;your_accesskey_id&gt;\naccess_key=&lt;your_accesskey_secret&gt;\nend_point=http:\/\/service.&lt;region&gt;.maxcompute.aliyun.com\/api\n# Optional tunnel endpoint if required by your workflow:\n# tunnel_endpoint=http:\/\/dt.&lt;region&gt;.maxcompute.aliyun.com\n<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\" start=\"4\">\n<li>Start the CLI (exact command depends on your installation; verify in docs). Common pattern:<\/li>\n<\/ol>\n\n\n\n<pre><code class=\"language-bash\">odpscmd\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; You can open an interactive session connected to your MaxCompute project.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\nRun:<\/p>\n\n\n\n<pre><code class=\"language-sql\">SHOW TABLES;\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Expected: either an empty list (new project) or a list of existing tables if the project already has data.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Create a partitioned table for events<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Run the following SQL in your chosen SQL interface:<\/p>\n\n\n\n<pre><code class=\"language-sql\">-- Create a simple partitioned table for event analytics\nCREATE TABLE IF NOT EXISTS events (\n  user_id     BIGINT,\n  event_name  STRING,\n  event_ts    STRING,\n  amount      DOUBLE\n)\nPARTITIONED BY (\n  dt STRING\n);\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; A table named <code>events<\/code> exists.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong><\/p>\n\n\n\n<pre><code class=\"language-sql\">DESC events;\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">You should see columns plus the partition column <code>dt<\/code>.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Insert sample data into two partitions (two days)<\/h3>\n\n\n\n<pre><code class=\"language-sql\">-- Insert sample data into dt=2026-04-10\nINSERT INTO TABLE events PARTITION (dt='2026-04-10')\nVALUES\n  (101, 'view',     '2026-04-10T10:00:00Z', 0.0),\n  (101, 'purchase', '2026-04-10T10:05:00Z', 39.9),\n  (102, 'view',     '2026-04-10T11:00:00Z', 0.0),\n  (103, 'purchase', '2026-04-10T12:00:00Z', 15.0);\n\n-- Insert sample data into dt=2026-04-11\nINSERT INTO TABLE events PARTITION (dt='2026-04-11')\nVALUES\n  (101, 'view',     '2026-04-11T09:00:00Z', 0.0),\n  (104, 'view',     '2026-04-11T09:10:00Z', 0.0),\n  (104, 'purchase', '2026-04-11T09:20:00Z', 120.0),\n  (102, 'purchase', '2026-04-11T14:00:00Z', 9.9);\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; Two partitions now exist with sample rows.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong>\nList partitions (syntax can vary; try the following and adjust if needed per your SQL dialect\/version):<\/p>\n\n\n\n<pre><code class=\"language-sql\">SHOW PARTITIONS events;\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">And validate row counts:<\/p>\n\n\n\n<pre><code class=\"language-sql\">SELECT dt, COUNT(*) AS cnt\nFROM events\nGROUP BY dt\nORDER BY dt;\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Expected output:\n&#8211; <code>2026-04-10<\/code> \u2192 4\n&#8211; <code>2026-04-11<\/code> \u2192 4<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Run analytics queries (demonstrate partition pruning)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Query A: Daily revenue<\/strong><\/p>\n\n\n\n<pre><code class=\"language-sql\">SELECT\n  dt,\n  SUM(CASE WHEN event_name = 'purchase' THEN amount ELSE 0.0 END) AS revenue\nFROM events\nGROUP BY dt\nORDER BY dt;\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; <code>2026-04-10<\/code> revenue = 54.9\n&#8211; <code>2026-04-11<\/code> revenue = 129.9<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Query B: Purchases for one day only (partition filter)<\/strong><\/p>\n\n\n\n<pre><code class=\"language-sql\">SELECT user_id, amount, event_ts\nFROM events\nWHERE dt = '2026-04-11'\n  AND event_name = 'purchase'\nORDER BY event_ts;\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; Rows for user 104 and 102 purchases on <code>2026-04-11<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Why this matters<\/strong>\n&#8211; In production, always filter by partition (<code>dt<\/code>) when possible. It\u2019s one of the biggest performance and cost levers in MaxCompute batch SQL.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 7: Create a simple view for BI-style consumption (optional)<\/h3>\n\n\n\n<pre><code class=\"language-sql\">CREATE VIEW IF NOT EXISTS v_daily_revenue AS\nSELECT\n  dt,\n  SUM(CASE WHEN event_name = 'purchase' THEN amount ELSE 0.0 END) AS revenue\nFROM events\nGROUP BY dt;\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Expected outcome<\/strong>\n&#8211; A view exists and can be queried.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Verification<\/strong><\/p>\n\n\n\n<pre><code class=\"language-sql\">SELECT * FROM v_daily_revenue ORDER BY dt;\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Validation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Run this checklist:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Table exists:<\/li>\n<\/ol>\n\n\n\n<pre><code class=\"language-sql\">SHOW TABLES LIKE 'events';\n<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\" start=\"2\">\n<li>Partitions exist:<\/li>\n<\/ol>\n\n\n\n<pre><code class=\"language-sql\">SHOW PARTITIONS events;\n<\/code><\/pre>\n\n\n\n<ol class=\"wp-block-list\" start=\"3\">\n<li>Revenue matches expected results:<\/li>\n<\/ol>\n\n\n\n<pre><code class=\"language-sql\">SELECT * FROM v_daily_revenue ORDER BY dt;\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">If your numbers match, the lab is complete.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Troubleshooting<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: \u201cAccess denied\u201d \/ permission errors<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Confirm the RAM user is added to the MaxCompute project and has the required privileges to:<\/li>\n<li>Create tables\/views<\/li>\n<li>Insert data<\/li>\n<li>Execute SQL<\/li>\n<li>Re-check whether you\u2019re using the correct project name and endpoint.<\/li>\n<li>If using <code>odpscmd<\/code>, verify the AccessKey belongs to the intended RAM user.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: Cannot connect \/ endpoint errors<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ensure you used the correct regional endpoint format from official docs for your region.<\/li>\n<li>Check if your network requires a proxy or if outbound HTTP(S) is restricted.<\/li>\n<li>If private networking is required in your environment, confirm VPC\/VPN connectivity requirements (verify with your organization and official docs).<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: <code>SHOW PARTITIONS<\/code> syntax not recognized<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>SQL dialect support can vary. Use the console UI metadata browser if available, or consult the MaxCompute SQL reference in official docs.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Issue: Insert statements fail<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Confirm data types match (for example <code>BIGINT<\/code> vs string).<\/li>\n<li>Some SQL engines require a different insert syntax or settings. Consult MaxCompute SQL documentation and adjust accordingly.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Cleanup<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To avoid ongoing storage costs, drop the created objects:<\/p>\n\n\n\n<pre><code class=\"language-sql\">DROP VIEW IF EXISTS v_daily_revenue;\nDROP TABLE IF EXISTS events;\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">If this project was created solely for training and you are sure nothing else is needed, delete the MaxCompute project from the console (project deletion may be restricted and irreversible\u2014follow your organization\u2019s change process).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Also:\n&#8211; Delete\/disable any AccessKeys created for the lab if not needed.\n&#8211; Remove temporary RAM permissions.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">11. Best Practices<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Architecture best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Design a layered model<\/strong>: raw\/ODS \u2192 cleaned \u2192 curated \u2192 marts. Keep contracts clear at each layer.<\/li>\n<li><strong>Use project boundaries intentionally<\/strong>: separate prod and non-prod; consider domain-based projects for access isolation.<\/li>\n<li><strong>Keep data close<\/strong>: place MaxCompute in the same region as OSS\/DTS sources and serving tools to reduce transfer costs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">IAM\/security best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use <strong>RAM roles\/users<\/strong> with least privilege.<\/li>\n<li>Separate duties:<\/li>\n<li>Data developers (create\/modify tables, write jobs)<\/li>\n<li>Operators (manage scheduling and releases)<\/li>\n<li>Analysts (read curated marts only)<\/li>\n<li>Avoid long-lived AccessKeys on laptops; prefer controlled environments and short-lived credentials where possible.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Cost best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Partition by date and enforce partition filters in code review.<\/li>\n<li>Apply <strong>lifecycle\/retention policies<\/strong> to raw and temporary datasets.<\/li>\n<li>Build incremental pipelines; avoid full refresh where possible.<\/li>\n<li>Track \u201ctop expensive jobs\u201d and optimize them monthly.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Partition for pruning (date is typical).<\/li>\n<li>Avoid data skew:<\/li>\n<li>Watch out for joins on highly skewed keys<\/li>\n<li>Consider pre-aggregation or salting strategies (implementation depends on supported SQL features)<\/li>\n<li>Prefer column selection over <code>SELECT *<\/code> in large transformations.<\/li>\n<li>Use appropriate data types (avoid storing numbers as strings).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Reliability best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Build idempotent jobs:<\/li>\n<li>Re-running a job for a partition should produce the same output.<\/li>\n<li>Use atomic partition overwrite patterns if supported in your workflow.<\/li>\n<li>Validate inputs (row counts, null rates) before publishing downstream.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operations best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Standardize naming:<\/li>\n<li>Projects: <code>company_domain_env<\/code> (e.g., <code>retail_ads_prod<\/code>)<\/li>\n<li>Tables: <code>layer_subject_entity<\/code> (e.g., <code>dwd_user_events<\/code>)<\/li>\n<li>Partitions: <code>dt=YYYY-MM-DD<\/code> and consistent timezone definition<\/li>\n<li>Maintain runbooks for:<\/li>\n<li>Job failures<\/li>\n<li>Backfills<\/li>\n<li>Schema changes<\/li>\n<li>Establish a release process for SQL changes (DataWorks commonly used here).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Governance\/tagging\/naming best practices<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use consistent ownership metadata (team, system, sensitivity).<\/li>\n<li>Track PII fields and apply masking\/tokenization at the appropriate layer.<\/li>\n<li>Maintain a data catalog (DataWorks governance features or another catalog tool).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">12. Security Considerations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Identity and access model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Alibaba Cloud RAM<\/strong> controls identity.<\/li>\n<li>MaxCompute permissions are enforced at the project and object levels (exact granularity depends on configuration and features; verify in official docs).<\/li>\n<li>Recommended patterns:<\/li>\n<li>Use groups\/roles rather than granting privileges to individual users.<\/li>\n<li>Restrict write access to curated layers.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Encryption<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>In transit<\/strong>: Access to service endpoints uses secure transport mechanisms (verify your client configuration and endpoint scheme; prefer HTTPS where supported).<\/li>\n<li><strong>At rest<\/strong>: Managed services typically encrypt storage; customer-managed keys may be available via KMS depending on service support and region. <strong>Verify MaxCompute encryption options in official docs<\/strong> for your region and compliance needs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Network exposure<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If using public endpoints, protect access with:<\/li>\n<li>Strong IAM<\/li>\n<li>IP allowlists where applicable (service capability varies)<\/li>\n<li>Controlled egress from corporate networks<\/li>\n<li>For sensitive environments, evaluate private connectivity options supported by Alibaba Cloud in your region (verify).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secrets handling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avoid embedding AccessKey secrets in code repositories.<\/li>\n<li>Use secret management practices:<\/li>\n<li>Store secrets in a secret manager (if used in your org)<\/li>\n<li>Rotate keys regularly<\/li>\n<li>Prefer role-based access for automation where possible<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Audit\/logging<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use Alibaba Cloud account-level auditing features (where available in your account) for:<\/li>\n<li>RAM user changes<\/li>\n<li>AccessKey usage<\/li>\n<li>Resource changes<\/li>\n<li>Within MaxCompute:<\/li>\n<li>Retain job execution history and query logs as required (verify retention and export options).<\/li>\n<li>Implement alerting on suspicious patterns (e.g., unusual data exports).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compliance considerations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Classify data (PII, PCI, financial).<\/li>\n<li>Apply:<\/li>\n<li>Least privilege<\/li>\n<li>Masking\/tokenization in curated layers<\/li>\n<li>Retention\/lifecycle controls<\/li>\n<li>Confirm residency requirements by selecting appropriate regions and controlling cross-region replication.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common security mistakes<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Using the root account for daily work<\/li>\n<li>Sharing AccessKeys among users<\/li>\n<li>Granting broad \u201cadmin\u201d privileges for convenience<\/li>\n<li>Allowing analysts to read raw PII tables directly<\/li>\n<li>Exporting sensitive datasets to OSS buckets without strict bucket policies<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Secure deployment recommendations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Separate projects by environment (dev\/test\/prod).<\/li>\n<li>Keep raw ingestion in a restricted project; publish curated datasets to broader-read projects.<\/li>\n<li>Enforce review for:<\/li>\n<li>New external exports<\/li>\n<li>Cross-project sharing<\/li>\n<li>Schema changes to sensitive datasets<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">13. Limitations and Gotchas<\/h2>\n\n\n\n<blockquote>\n<p>Limits and behaviors can change by region and product updates. Validate against official MaxCompute docs for your environment.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Common limitations \/ constraints<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Not an OLTP database<\/strong>: not designed for high-frequency row-level updates\/transactions.<\/li>\n<li><strong>SQL dialect differences<\/strong>: queries may require adaptation from ANSI SQL or other warehouses.<\/li>\n<li><strong>Interactive result limits<\/strong>: console\/CLI result sets can be limited; write outputs to tables for large results.<\/li>\n<li><strong>Concurrency and quotas<\/strong>: projects can have concurrency\/throughput quotas that impact peak times.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Performance gotchas<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Missing partition filters leads to large scans and higher cost.<\/li>\n<li>Data skew causes long runtimes; watch joins on skewed keys.<\/li>\n<li>Too many small partitions (or too fine-grained partitioning) increases overhead.<\/li>\n<li>Overuse of intermediate tables can inflate storage.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Operational gotchas<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Schema changes must be managed carefully; downstream jobs can break.<\/li>\n<li>Backfills can dominate costs if not controlled (do in batches, validate per range).<\/li>\n<li>Cross-project data access can become a governance problem without clear ownership.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Regional constraints<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Some features\/integrations may be region-dependent.<\/li>\n<li>Endpoint formats differ by region; always use the official endpoint reference.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Pricing surprises<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Large backfills and full-table scans.<\/li>\n<li>Exporting data cross-region or out to the internet.<\/li>\n<li>Additional paid products used in the pipeline (DataWorks, DTS, BI).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Compatibility issues<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Tools (IDE plugins, clients) may lag behind service capabilities; keep versions aligned with official recommendations.<\/li>\n<li>Some community connectors may not support all MaxCompute features; validate in staging.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Migration challenges<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Porting SQL from other warehouses (function differences, partition semantics).<\/li>\n<li>Rebuilding governance patterns (roles, data catalog).<\/li>\n<li>Rewriting ingestion\/export workflows.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">14. Comparison with Alternatives<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">MaxCompute is best compared to:\n&#8211; Other Alibaba Cloud analytics stores and engines (serving OLAP, managed Hadoop\/Spark)\n&#8211; Other cloud data warehouses (BigQuery, Redshift, Synapse)\n&#8211; Open-source self-managed stacks (Hive\/Trino\/Spark on object storage)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Comparison table<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Option<\/th>\n<th>Best For<\/th>\n<th>Strengths<\/th>\n<th>Weaknesses<\/th>\n<th>When to Choose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Alibaba Cloud MaxCompute<\/strong><\/td>\n<td>Large-scale offline data warehousing and batch analytics<\/td>\n<td>Fully managed; strong Alibaba Cloud ecosystem; project isolation; scalable batch SQL<\/td>\n<td>Not OLTP; interactive low-latency serving may require complement; SQL portability differences<\/td>\n<td>Choose for offline warehouse core and batch ETL\/analytics in Alibaba Cloud<\/td>\n<\/tr>\n<tr>\n<td><strong>Alibaba Cloud E-MapReduce (EMR)<\/strong><\/td>\n<td>Managed Hadoop\/Spark ecosystems, custom big data stacks<\/td>\n<td>Flexibility; open-source compatibility; cluster-level control<\/td>\n<td>More ops overhead than MaxCompute; capacity planning<\/td>\n<td>Choose when you need Spark\/Hadoop ecosystem control or custom frameworks<\/td>\n<\/tr>\n<tr>\n<td><strong>Alibaba Cloud Hologres<\/strong> (verify positioning in your region)<\/td>\n<td>Low-latency interactive analytics\/serving<\/td>\n<td>Fast interactive queries; serving workloads<\/td>\n<td>Different cost\/perf model; not a replacement for offline ETL<\/td>\n<td>Choose to serve curated data with low latency alongside MaxCompute<\/td>\n<\/tr>\n<tr>\n<td><strong>Alibaba Cloud AnalyticDB<\/strong> (MySQL\/PG variants)<\/td>\n<td>Managed MPP\/OLAP databases<\/td>\n<td>SQL OLAP patterns; serving and concurrency use cases<\/td>\n<td>Not the same as offline warehouse; ingestion and storage patterns differ<\/td>\n<td>Choose when you need an OLAP database experience and interactive workloads<\/td>\n<\/tr>\n<tr>\n<td><strong>Google BigQuery<\/strong><\/td>\n<td>Serverless analytics warehouse<\/td>\n<td>Strong serverless UX; broad ecosystem<\/td>\n<td>Different cloud; egress\/migration costs<\/td>\n<td>Choose if you\u2019re on GCP and want serverless warehouse<\/td>\n<\/tr>\n<tr>\n<td><strong>AWS Redshift \/ Athena<\/strong><\/td>\n<td>Warehouse (Redshift) and query-on-lake (Athena)<\/td>\n<td>Mature AWS ecosystem<\/td>\n<td>Ops\/cost tradeoffs vary; different governance model<\/td>\n<td>Choose if you\u2019re standardized on AWS<\/td>\n<\/tr>\n<tr>\n<td><strong>Azure Synapse<\/strong><\/td>\n<td>Warehouse + data integration (Azure)<\/td>\n<td>Integrated Azure analytics suite<\/td>\n<td>Complexity; cost management<\/td>\n<td>Choose if you\u2019re standardized on Azure<\/td>\n<\/tr>\n<tr>\n<td><strong>Self-managed Hive\/Trino\/Spark on OSS\/S3<\/strong><\/td>\n<td>Full control, open-source portability<\/td>\n<td>Maximum flexibility; avoid vendor lock-in<\/td>\n<td>High ops burden; reliability and governance are on you<\/td>\n<td>Choose if you must self-host or need deep customization<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">15. Real-World Example<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Enterprise example: Retail group offline warehouse + governed marts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong><\/li>\n<li>Multiple business units ingest data from order systems, loyalty platform, and marketing events.<\/li>\n<li>Need consistent KPIs (revenue, conversion, retention) with strict access control and auditability.<\/li>\n<li><strong>Proposed architecture<\/strong><\/li>\n<li>DTS replicates core OLTP tables into a restricted <strong>raw\/ODS<\/strong> MaxCompute project.<\/li>\n<li>DataWorks orchestrates nightly transformations into a curated <strong>DWD\/DWS<\/strong> project.<\/li>\n<li>Curated marts are published to a <strong>BI<\/strong> project with read-only access for analysts.<\/li>\n<li>Sensitive attributes are masked\/tokenized before reaching BI layers.<\/li>\n<li><strong>Why MaxCompute was chosen<\/strong><\/li>\n<li>Strong batch warehousing fit, scalable SQL transformations, and project-based isolation.<\/li>\n<li>Integration with Alibaba Cloud ingestion and governance tooling.<\/li>\n<li><strong>Expected outcomes<\/strong><\/li>\n<li>Standardized KPIs across subsidiaries.<\/li>\n<li>Reduced time to produce monthly\/weekly reports.<\/li>\n<li>Better security posture via least privilege and controlled data publishing.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Startup\/small-team example: Product analytics on event data<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Problem<\/strong><\/li>\n<li>Small team needs weekly product analytics (funnel, cohorts, conversion) without running clusters.<\/li>\n<li><strong>Proposed architecture<\/strong><\/li>\n<li>Events land in OSS daily (application export).<\/li>\n<li>A MaxCompute project stores curated event tables partitioned by <code>dt<\/code>.<\/li>\n<li>A simple scheduled pipeline (DataWorks or cron-triggered jobs using client tooling) builds weekly cohort tables.<\/li>\n<li>Quick BI dashboards read curated outputs.<\/li>\n<li><strong>Why MaxCompute was chosen<\/strong><\/li>\n<li>Managed batch SQL analytics with minimal operational overhead.<\/li>\n<li>Cost can be controlled by partitioning and lifecycle policies.<\/li>\n<li><strong>Expected outcomes<\/strong><\/li>\n<li>Reliable weekly metrics and cohort tables.<\/li>\n<li>Low operational burden for a small engineering team.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">16. FAQ<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">1) <strong>Is MaxCompute the same as ODPS?<\/strong><br\/>\nMaxCompute is the current product name. ODPS is the historical name and may appear in tools, endpoints, or legacy references. Use \u201cMaxCompute\u201d for current documentation and product discussions.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">2) <strong>Is MaxCompute a database?<\/strong><br\/>\nIt behaves like a data warehouse with SQL and tables, but it is designed primarily for <strong>batch analytics<\/strong>, not transactional OLTP workloads.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">3) <strong>Do I need DataWorks to use MaxCompute?<\/strong><br\/>\nNot strictly. You can run SQL via supported clients and consoles. DataWorks is commonly used for scheduling, orchestration, governance, and collaborative development.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">4) <strong>What\u2019s the most important design choice for performance?<\/strong><br\/>\nPartitioning strategy\u2014usually partition by date (<code>dt<\/code>)\u2014and consistently filtering partitions in queries.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">5) <strong>How do I load data into MaxCompute?<\/strong><br\/>\nCommon approaches include SQL inserts for small data, ingestion tools\/APIs (often referred to as Tunnel), and integrations via DataWorks, DTS, OSS, and SLS. Confirm the recommended method in official docs for your data type and volume.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">6) <strong>Can MaxCompute query data directly in OSS without loading it?<\/strong><br\/>\nMaxCompute supports integration patterns with OSS (for example external table-like approaches) in some configurations. Capabilities and best practices can vary\u2014verify in official docs for your region and file formats.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">7) <strong>How is MaxCompute billed?<\/strong><br\/>\nTypically through a combination of compute usage and storage, with additional costs for data transfer and integrated services. Exact billing dimensions vary by region and purchase model\u2014use the official pricing page.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">8) <strong>How do I control costs quickly?<\/strong><br\/>\nEnforce partition filters, implement lifecycle policies, and monitor top expensive jobs. Avoid large backfills without staged execution.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">9) <strong>Can I use MaxCompute for real-time analytics?<\/strong><br\/>\nMaxCompute is mainly for offline\/batch. For streaming ingestion and real-time compute, use a streaming engine (e.g., Realtime Compute for Apache Flink) and land results into serving stores or MaxCompute for batch consolidation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">10) <strong>What are MaxCompute \u201cprojects\u201d?<\/strong><br\/>\nProjects are the primary isolation unit for data, permissions, quotas, and operations. Treat projects like \u201caccounts within the warehouse.\u201d<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">11) <strong>How do I separate dev\/test\/prod?<\/strong><br\/>\nUse separate MaxCompute projects and separate orchestration\/workspaces. Avoid sharing write permissions from dev to prod.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">12) <strong>Is encryption supported?<\/strong><br\/>\nManaged services typically provide encryption in transit and at rest. Customer-managed keys may be available through KMS depending on region and configuration. Verify MaxCompute encryption options in official docs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">13) <strong>How do I share data across teams?<\/strong><br\/>\nPreferred pattern is publishing curated datasets to a shared project with controlled read permissions, rather than granting broad access to raw tables.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">14) <strong>What\u2019s a common reason queries are slow or expensive?<\/strong><br\/>\nFull scans from missing partition predicates, and joins on skewed keys.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">15) <strong>Can I export query results for downstream systems?<\/strong><br\/>\nYes\u2014commonly by writing results to tables\/partitions and exporting via supported tools or by pushing curated datasets to OSS\/serving engines. Confirm the recommended export approach for your use case.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">16) <strong>Does MaxCompute support UDFs?<\/strong><br\/>\nMaxCompute supports extensibility via UDFs in many configurations, but supported runtimes and deployment mechanisms can vary. Verify in official docs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">17) <strong>How do I monitor usage and troubleshoot failures?<\/strong><br\/>\nUse job history\/query logs in MaxCompute tooling and integrate with your organization\u2019s operational monitoring. Also track billing reports to detect cost anomalies.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">17. Top Online Resources to Learn MaxCompute<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Resource Type<\/th>\n<th>Name<\/th>\n<th>Why It Is Useful<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Official documentation<\/td>\n<td>MaxCompute Help Center<\/td>\n<td>Primary source for features, SQL reference, security, tools, and best practices: https:\/\/www.alibabacloud.com\/help\/en\/maxcompute\/<\/td>\n<\/tr>\n<tr>\n<td>Official product page<\/td>\n<td>MaxCompute Product Page<\/td>\n<td>Overview, positioning, and entry points to docs: https:\/\/www.alibabacloud.com\/product\/maxcompute<\/td>\n<\/tr>\n<tr>\n<td>Official getting started<\/td>\n<td>MaxCompute Getting Started (in docs)<\/td>\n<td>Step-by-step onboarding flows and first queries (navigate within docs hub): https:\/\/www.alibabacloud.com\/help\/en\/maxcompute\/<\/td>\n<\/tr>\n<tr>\n<td>Official pricing<\/td>\n<td>MaxCompute Pricing (region\/locale dependent)<\/td>\n<td>Confirm billing dimensions and current rates (start from product page and follow pricing links): https:\/\/www.alibabacloud.com\/product\/maxcompute<\/td>\n<\/tr>\n<tr>\n<td>Official architecture resources<\/td>\n<td>Alibaba Cloud Architecture Center<\/td>\n<td>Reference architectures and patterns (search for MaxCompute\/analytics): https:\/\/www.alibabacloud.com\/architecture<\/td>\n<\/tr>\n<tr>\n<td>Official tutorials<\/td>\n<td>Alibaba Cloud tutorials (varies)<\/td>\n<td>Practical walkthroughs across Alibaba Cloud ecosystem: https:\/\/www.alibabacloud.com\/getting-started<\/td>\n<\/tr>\n<tr>\n<td>Tooling documentation<\/td>\n<td>MaxCompute client \/ odpscmd docs<\/td>\n<td>Installation and usage for CLI-based workflows (within docs hub): https:\/\/www.alibabacloud.com\/help\/en\/maxcompute\/<\/td>\n<\/tr>\n<tr>\n<td>Ecosystem integration<\/td>\n<td>DataWorks documentation<\/td>\n<td>MaxCompute is frequently used with DataWorks for orchestration\/governance: https:\/\/www.alibabacloud.com\/help\/en\/dataworks\/<\/td>\n<\/tr>\n<tr>\n<td>Community learning<\/td>\n<td>Alibaba Cloud community blog<\/td>\n<td>Practical posts and examples; validate against official docs: https:\/\/www.alibabacloud.com\/blog<\/td>\n<\/tr>\n<tr>\n<td>Code samples<\/td>\n<td>GitHub (official Alibaba Cloud orgs)<\/td>\n<td>Look for MaxCompute\/DataWorks\/DTS examples; verify repository authenticity and recency: https:\/\/github.com\/alibabacloud<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">18. Training and Certification Providers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Institute<\/th>\n<th>Suitable Audience<\/th>\n<th>Likely Learning Focus<\/th>\n<th>Mode<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>Engineers, DevOps, platform teams, cloud learners<\/td>\n<td>Cloud + DevOps practices; may include data platform operations (verify course catalog)<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>ScmGalaxy.com<\/td>\n<td>Beginners to intermediate IT professionals<\/td>\n<td>SCM\/DevOps and tooling foundations; may offer cloud-adjacent training (verify)<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.scmgalaxy.com\/<\/td>\n<\/tr>\n<tr>\n<td>CLoudOpsNow.in<\/td>\n<td>Cloud operations learners<\/td>\n<td>Cloud operations and reliability practices (verify MaxCompute-specific coverage)<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.cloudopsnow.in\/<\/td>\n<\/tr>\n<tr>\n<td>SreSchool.com<\/td>\n<td>SREs, ops engineers, reliability-focused teams<\/td>\n<td>SRE practices, monitoring, incident response applied to cloud systems<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.sreschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>AiOpsSchool.com<\/td>\n<td>Ops\/DevOps teams exploring automation<\/td>\n<td>AIOps concepts, automation, operations analytics (verify course scope)<\/td>\n<td>Check website<\/td>\n<td>https:\/\/www.aiopsschool.com\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">19. Top Trainers<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Platform\/Site<\/th>\n<th>Likely Specialization<\/th>\n<th>Suitable Audience<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>RajeshKumar.xyz<\/td>\n<td>Cloud\/DevOps training content (verify specific offerings)<\/td>\n<td>Learners seeking instructor-led guidance<\/td>\n<td>https:\/\/rajeshkumar.xyz\/<\/td>\n<\/tr>\n<tr>\n<td>devopstrainer.in<\/td>\n<td>DevOps training and mentorship (verify MaxCompute coverage)<\/td>\n<td>DevOps engineers and cloud practitioners<\/td>\n<td>https:\/\/devopstrainer.in\/<\/td>\n<\/tr>\n<tr>\n<td>devopsfreelancer.com<\/td>\n<td>Freelance DevOps consulting\/training platform (verify offerings)<\/td>\n<td>Teams needing short-term training\/support<\/td>\n<td>https:\/\/devopsfreelancer.com\/<\/td>\n<\/tr>\n<tr>\n<td>devopssupport.in<\/td>\n<td>DevOps support and learning resources (verify services)<\/td>\n<td>Ops\/DevOps teams needing practical support<\/td>\n<td>https:\/\/devopssupport.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">20. Top Consulting Companies<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Company<\/th>\n<th>Likely Service Area<\/th>\n<th>Where They May Help<\/th>\n<th>Consulting Use Case Examples<\/th>\n<th>Website URL<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>cotocus.com<\/td>\n<td>Cloud\/DevOps consulting (verify portfolio)<\/td>\n<td>Architecture, platform engineering, operations enablement<\/td>\n<td>Standing up CI\/CD and infrastructure automation around data platforms; operational runbooks<\/td>\n<td>https:\/\/cotocus.com\/<\/td>\n<\/tr>\n<tr>\n<td>DevOpsSchool.com<\/td>\n<td>Training + consulting (verify offerings)<\/td>\n<td>Upskilling teams and implementing DevOps\/cloud practices<\/td>\n<td>Designing operational practices for analytics platforms; security\/IAM workshops<\/td>\n<td>https:\/\/www.devopsschool.com\/<\/td>\n<\/tr>\n<tr>\n<td>DEVOPSCONSULTING.IN<\/td>\n<td>DevOps consulting services (verify offerings)<\/td>\n<td>DevOps transformations, automation, and support<\/td>\n<td>Automating deployments, monitoring integrations, cost governance processes<\/td>\n<td>https:\/\/www.devopsconsulting.in\/<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">21. Career and Learning Roadmap<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn before MaxCompute<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>SQL fundamentals (joins, aggregation, window functions conceptually)<\/li>\n<li>Data warehousing basics:<\/li>\n<li>Fact\/dimension modeling<\/li>\n<li>Partitioning concepts<\/li>\n<li>ETL vs ELT<\/li>\n<li>Alibaba Cloud fundamentals:<\/li>\n<li>RAM (users, roles, policies)<\/li>\n<li>Regions\/VPC basics<\/li>\n<li>OSS basics<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What to learn after MaxCompute<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>DataWorks (recommended next step for real production pipelines)<\/li>\n<li>Data governance practices:<\/li>\n<li>Data cataloging, lineage, data quality<\/li>\n<li>Serving\/BI layer design:<\/li>\n<li>Quick BI connectivity patterns<\/li>\n<li>When to use Hologres\/AnalyticDB for interactive workloads<\/li>\n<li>Streaming analytics:<\/li>\n<li>Realtime Compute for Apache Flink (streaming transforms)<\/li>\n<li>Security specialization:<\/li>\n<li>KMS, key rotation, audit trails, least privilege enforcement<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Job roles that use MaxCompute<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data Engineer<\/li>\n<li>Analytics Engineer<\/li>\n<li>BI Engineer<\/li>\n<li>Cloud\/Data Platform Engineer<\/li>\n<li>Solutions Architect (data\/analytics)<\/li>\n<li>Security Engineer (data governance)<\/li>\n<li>SRE\/Operations (platform reliability and cost governance)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certification path (if available)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Alibaba Cloud certification programs evolve. Check current Alibaba Cloud certification listings and whether MaxCompute is explicitly included:\n&#8211; https:\/\/www.alibabacloud.com\/certification<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Project ideas for practice<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Build a mini-warehouse:<\/li>\n<li><code>events_raw<\/code> \u2192 <code>events_clean<\/code> \u2192 <code>daily_metrics<\/code><\/li>\n<li>Implement retention:<\/li>\n<li>Drop partitions older than N days (test safely)<\/li>\n<li>Cost\/performance exercise:<\/li>\n<li>Compare query runtime and scanned data with\/without partition filters<\/li>\n<li>Governance mini-project:<\/li>\n<li>Separate projects for dev\/prod and publish curated tables to a read-only project<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">22. Glossary<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Alibaba Cloud<\/strong>: Cloud provider offering MaxCompute and related analytics services.<\/li>\n<li><strong>Analytics Computing<\/strong>: Service category focused on large-scale data processing and analytics.<\/li>\n<li><strong>MaxCompute<\/strong>: Managed batch analytics and data warehousing service on Alibaba Cloud.<\/li>\n<li><strong>ODPS<\/strong>: Historical name (\u201cOpen Data Processing Service\u201d) for MaxCompute; may appear in legacy tooling.<\/li>\n<li><strong>Project (MaxCompute Project)<\/strong>: Isolation boundary for data, permissions, quotas, and operations.<\/li>\n<li><strong>Table<\/strong>: Structured dataset with schema stored in MaxCompute.<\/li>\n<li><strong>Partition<\/strong>: Subdivision of a table (commonly by date) used for performance and manageability.<\/li>\n<li><strong>Partition pruning<\/strong>: Optimization where queries scan only needed partitions based on filters.<\/li>\n<li><strong>ETL\/ELT<\/strong>: Extract-Transform-Load \/ Extract-Load-Transform; common pipeline patterns.<\/li>\n<li><strong>RAM<\/strong>: Resource Access Management; Alibaba Cloud identity and access management service.<\/li>\n<li><strong>AccessKey<\/strong>: Long-lived credential pair for programmatic access (handle carefully).<\/li>\n<li><strong>STS<\/strong>: Security Token Service; commonly used for short-lived credentials (verify usage patterns for your tools).<\/li>\n<li><strong>OSS<\/strong>: Object Storage Service; used for file storage, staging, and data lake patterns.<\/li>\n<li><strong>DTS<\/strong>: Data Transmission Service; used for replicating\/migrating data into analytics stores.<\/li>\n<li><strong>SLS<\/strong>: Log Service; used for log collection and analytics pipelines.<\/li>\n<li><strong>DataWorks<\/strong>: Alibaba Cloud data development and governance platform often used to orchestrate MaxCompute jobs.<\/li>\n<li><strong>UDF<\/strong>: User-defined function; custom function callable from SQL (availability and runtimes vary).<\/li>\n<li><strong>Lifecycle\/Retention policy<\/strong>: Rules to expire\/delete old data to control cost and meet compliance.<\/li>\n<li><strong>CU (Compute Unit)<\/strong>: A unit used in some Alibaba Cloud analytics billing models (verify MaxCompute\u2019s current compute billing units for your region).<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">23. Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">MaxCompute is Alibaba Cloud\u2019s managed <strong>Analytics Computing<\/strong> service for large-scale <strong>offline data warehousing and batch analytics<\/strong>. It provides project-based isolation, managed table storage, and scalable SQL execution that fits well at the center of an Alibaba Cloud analytics ecosystem.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It matters because it lets teams build reliable, governed batch pipelines and warehouse models without operating clusters\u2014while still scaling to large datasets. The key cost and performance levers are <strong>partitioning<\/strong>, <strong>incremental processing<\/strong>, <strong>lifecycle policies<\/strong>, and <strong>monitoring expensive jobs<\/strong>. The key security levers are <strong>least-privilege RAM access<\/strong>, controlled project boundaries, careful handling of credentials, and governed data publishing from raw to curated layers.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Use MaxCompute when you need an offline warehouse core and batch compute at scale in Alibaba Cloud. Complement it (rather than replace it) with streaming and low-latency serving engines when your use case requires real-time or interactive performance.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: read the official MaxCompute docs for your region, then learn DataWorks orchestration patterns to move from ad-hoc SQL into production-grade pipelines: https:\/\/www.alibabacloud.com\/help\/en\/maxcompute\/<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Analytics Computing<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,4],"tags":[],"class_list":["post-81","post","type-post","status-publish","format-standard","hentry","category-alibaba-cloud","category-analytics-computing"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/81","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/comments?post=81"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/posts\/81\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/media?parent=81"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/categories?post=81"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/tutorials\/wp-json\/wp\/v2\/tags?post=81"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}