Category
AI + Machine Learning
1. Introduction
What this service is
Microsoft Planetary Computer Pro is an Azure-aligned offering designed to help organizations operationalize geospatial and Earth observation (EO) analytics by standing up a “planetary computer” capability for their own data: a searchable catalog (commonly STAC-based), governed access, and workflows that make large geospatial datasets easier to discover, query, and use for analytics and AI.
Simple explanation (one paragraph)
If your team works with satellite imagery, climate rasters, vector layers, or derived geospatial products, Microsoft Planetary Computer Pro helps you turn scattered files and metadata into a consistent, queryable catalog and access pattern—so developers, analysts, and ML engineers can find the right data quickly and use it in notebooks, pipelines, and applications without reinventing data discovery and geospatial plumbing each time.
Technical explanation (one paragraph)
At a technical level, Microsoft Planetary Computer Pro centers around a data catalog pattern commonly implemented with STAC (SpatioTemporal Asset Catalog) concepts—Collections and Items—paired with hosted endpoints and Azure integrations so you can index geospatial metadata, reference assets stored in Azure (for example, Cloud Optimized GeoTIFFs in Azure Storage), and support repeatable access from tools like pystac-client, GDAL/rasterio, Spark, or Azure ML. The exact deployment model and included components can vary by release; verify the current architecture in the official docs.
What problem it solves
Teams building geospatial AI + Machine Learning solutions often hit the same obstacles:
- Data is massive (TB–PB), multi-format, and hard to move.
- Metadata is inconsistent and stored separately from assets.
- Discoverability is poor (“where is the raster for this place/date?”).
- Access and governance are fragmented (SAS tokens in spreadsheets, ad-hoc copies, unmanaged sharing).
- Every project rebuilds the same ingestion, indexing, and access patterns.
Microsoft Planetary Computer Pro is aimed at standardizing those patterns on Azure so teams can focus on analytics and models rather than catalog mechanics.
Naming/status note: Microsoft has long operated the public-facing Microsoft Planetary Computer experience. Microsoft Planetary Computer Pro refers to the “for your organization/for your data” capability. Product status (GA/preview), exact resource type name, and availability can change—verify in the official Microsoft documentation and/or Azure Marketplace listing for the most current details.
2. What is Microsoft Planetary Computer Pro?
Official purpose
Microsoft Planetary Computer Pro is intended to help organizations manage, catalog, and serve geospatial/EO data at scale on Azure using standardized geospatial catalog concepts (commonly STAC) and Azure-native security and operations.
Core capabilities
Commonly documented and expected capabilities for a “Planetary Computer Pro” style service include:
- Cataloging geospatial assets (rasters/vectors) with searchable metadata (space/time attributes, cloud cover, sensors, processing level, etc.).
- Standards-based discovery for applications and notebooks (for example, STAC API patterns).
- Governed access aligned with Azure identity and authorization.
- Integration with Azure storage and compute so analytics can run close to the data.
- Operational controls (monitoring, logging, scaling) consistent with Azure practices.
Because component details can evolve, confirm the “what’s included” list in the current official docs for your tenant and region.
Major components (conceptual)
Even when underlying implementations vary, you can think of Microsoft Planetary Computer Pro as consisting of:
-
A metadata catalog
Stores/serves metadata about datasets (Collections) and individual assets (Items). -
A data layer
Where the actual geospatial assets live—often Azure Storage (Blob Storage / ADLS Gen2). Assets may be optimized formats like COG (Cloud Optimized GeoTIFF) and Zarr. -
APIs and access patterns
Endpoints that allow searching the catalog and retrieving asset references for downstream processing. -
Identity, governance, and operations
Authentication/authorization model, audit logs, monitoring, and lifecycle management.
Service type
Microsoft Planetary Computer Pro should be treated as a managed geospatial catalog + access capability aligned to Azure. Whether it is delivered as: – a fully managed Azure service, – a managed application deployed into your subscription, or – a Marketplace offer that creates Azure resources in your tenant
…depends on the current release. Verify in official docs for the exact delivery model and responsibilities (what Microsoft manages vs what you manage).
Scope (subscription/region)
In practice, the scope and availability typically depend on: – Azure subscription/tenant where you create the resource(s) – Region availability for supporting services – Networking requirements (public endpoint vs private endpoint support)
Treat this as subscription-scoped in terms of billing and governance, and region-dependent in terms of deployment. Verify region support in official documentation.
How it fits into the Azure ecosystem
Microsoft Planetary Computer Pro is commonly used as a data discovery and access layer in Azure geospatial and AI + Machine Learning architectures, integrating with:
- Azure Storage (Blob Storage / ADLS Gen2) for assets
- Azure Kubernetes Service (AKS) / App Service / other compute for API hosting (depending on the offering)
- Azure Machine Learning, Azure Databricks, or Synapse for analytics/model training
- Microsoft Entra ID (Azure AD) for identity
- Azure Monitor and Log Analytics for operations
3. Why use Microsoft Planetary Computer Pro?
Business reasons
- Faster time-to-insight: Less time hunting data; more time analyzing.
- Reuse across teams: A centralized catalog avoids duplicate data onboarding.
- Better data governance: Control who can discover and read datasets.
- Reduced project risk: Standard patterns reduce custom one-off solutions.
Technical reasons
- Standards-based discovery: STAC-like patterns are broadly supported by open tooling (
pystac-client, STAC Browser, GDAL workflows). - Works with cloud-optimized formats: When paired with COG/Zarr, enables efficient range requests and partial reads.
- Decouples metadata from compute: Any compliant client can query the catalog; compute can be swapped (Azure ML, Databricks, containers).
Operational reasons
- Repeatable onboarding: Establish a consistent way to publish new collections.
- Observability: Integrate logs/metrics with Azure Monitor patterns (depending on offering).
- Lifecycle management: Manage datasets and metadata versions more systematically.
Security/compliance reasons
- Identity-aligned access: Prefer Entra ID-based auth and role-based authorization.
- Auditability: Centralized access patterns are easier to audit than ad-hoc file sharing.
- Network controls: In many Azure architectures you can use private endpoints, VNet integration, and restrict public access (verify support in the current offering).
Scalability/performance reasons
- Search at scale: Index metadata for fast spatial/temporal queries rather than scanning folders.
- Efficient data access: Optimized formats reduce bandwidth and speed up analytics.
When teams should choose it
Choose Microsoft Planetary Computer Pro when:
- You have multiple geospatial datasets that must be discoverable by many users/systems.
- You want to build geospatial AI + Machine Learning pipelines on Azure with governed access.
- You need standardized metadata, consistent publishing, and reliable search.
- You want to avoid building and operating your own STAC API and catalog stack from scratch.
When teams should not choose it
Consider alternatives when:
- You only have a single small dataset and can manage with basic storage + documentation.
- Your architecture requires features not supported (for example, strict air-gapped deployment, specialized geospatial tiling, or a specific catalog schema not supported).
- You already have an enterprise geospatial platform (for example, a mature Esri/GeoServer stack) and Planetary Computer Pro would duplicate capabilities.
- You cannot meet data residency, compliance, or networking constraints with the current offering (verify constraints with official docs).
4. Where is Microsoft Planetary Computer Pro used?
Industries
- Environmental monitoring and climate risk
- Agriculture and food supply
- Energy (renewables siting, grid resilience, vegetation management)
- Insurance and catastrophe modeling
- Public sector and smart cities
- Mining and natural resources
- Forestry and land management
- Water resources and hydrology
- Transportation and infrastructure planning
- Research institutions and NGOs
Team types
- Data engineering teams publishing curated geospatial layers
- Data science / ML teams training segmentation/regression models
- GIS teams integrating with enterprise spatial workflows
- Platform engineering teams standardizing data access
- Security and compliance teams governing access to sensitive location data
Workloads
- Dataset onboarding and metadata standardization
- Spatial/temporal search (AOI/ROI + date range)
- Batch analytics and feature extraction
- Model training (e.g., land cover classification)
- Data product publishing (internal “data marketplace” for geospatial)
- Application backends serving map tiles or analytics outputs (often alongside a separate tiling service)
Architectures
- Lakehouse architectures with geospatial layers in ADLS Gen2
- ML pipelines (Azure ML) that query a catalog for training data
- Event-driven ingestion (new imagery triggers indexing and QC)
- Multi-tenant data platforms with RBAC and per-project datasets
Real-world deployment contexts
- Production: Central catalog for multiple datasets, strict RBAC, private networking, automated ingestion, monitored APIs.
- Dev/test: Sandbox catalog with a subset of datasets for experimentation, lower-cost compute, and relaxed retention policies.
5. Top Use Cases and Scenarios
Below are realistic scenarios where Microsoft Planetary Computer Pro typically fits. Each includes the problem, why this service fits, and a short scenario.
1) Enterprise geospatial data catalog for internal reuse
- Problem: Datasets are stored in multiple storage accounts and shared via tribal knowledge.
- Why this fits: A centralized catalog makes datasets searchable and consistently documented.
- Example: A utilities company publishes vegetation indices, LiDAR footprints, and outage polygons as Collections so analytics teams can self-serve.
2) Satellite imagery discovery for ML training pipelines
- Problem: Model training needs repeatable selection of imagery by AOI/time/cloud cover.
- Why this fits: A queryable catalog standardizes how training data is discovered.
- Example: A flood segmentation model queries Items for a basin and month, then pipelines download only required windows.
3) Climate risk analytics across portfolios
- Problem: Analysts need consistent layers (temperature anomalies, drought indices) across time.
- Why this fits: Cataloging ensures consistent versions and metadata lineage.
- Example: An insurer catalogs derived rasters by year/version; actuaries can reproduce reports.
4) Onboarding third-party geospatial datasets with governance
- Problem: Third-party datasets have license restrictions; access must be controlled.
- Why this fits: Central access layer can enforce authorization and auditing.
- Example: A team catalogs licensed imagery footprints while restricting asset access to approved groups.
5) Cross-team standardization on STAC for interoperability
- Problem: Different teams use different metadata schemas and tooling.
- Why this fits: STAC patterns allow broad tool compatibility.
- Example: GIS publishes Items compatible with
pystac-client; ML uses the same catalog in notebooks.
6) Data productization for internal “geospatial marketplace”
- Problem: Users don’t know what datasets exist or how to use them.
- Why this fits: Catalog + descriptions + examples become a discoverability layer.
- Example: A city platform team provides curated Collections for zoning, tree canopy, and impervious surfaces.
7) Near-real-time ingestion and indexing of new scenes
- Problem: New imagery arrives daily; manual indexing is slow and error-prone.
- Why this fits: Standard ingestion/indexing pipeline can be automated.
- Example: A wildfire monitoring team indexes daily scenes and triggers alerts when thresholds are exceeded.
8) Auditable access to sensitive location datasets
- Problem: Some datasets reveal sensitive infrastructure locations.
- Why this fits: Align catalog and asset access to Entra ID groups, log access.
- Example: A government agency restricts high-resolution layers to cleared personnel.
9) Data lineage and versioning for derived products
- Problem: Derived layers change; users need to know which version was used.
- Why this fits: Metadata can record processing version, source datasets, and timestamps.
- Example: A deforestation index is reprocessed quarterly; each version is a new Collection with clear lineage.
10) Hybrid analytics: query in one place, compute anywhere in Azure
- Problem: Teams use different compute platforms (Databricks vs Azure ML).
- Why this fits: A common discovery layer decouples compute choice.
- Example: Data engineering curates Collections; DS uses Azure ML, BI uses Synapse—both pull from the same catalog.
11) Standardizing ROI-based subsetting and efficient reads
- Problem: Analysts download entire rasters because they don’t have range-read-friendly formats.
- Why this fits: With cloud-optimized formats and consistent access, clients read only needed windows.
- Example: A hydrology team reads only watershed polygons’ intersecting windows from COGs.
12) Multi-tenant geospatial platform in a regulated enterprise
- Problem: Different business units require separation and governance.
- Why this fits: Organize Collections per BU/project, enforce RBAC, and centralize operations.
- Example: A global company creates per-region Collections with group-based access and consistent naming/tagging.
6. Core Features
Note: Feature availability can differ by release, region, or whether the offering is preview/GA. Verify the current Microsoft Planetary Computer Pro documentation for the authoritative feature list.
1) Geospatial catalog with STAC concepts (Collections and Items)
- What it does: Provides a structured metadata model for datasets and individual assets.
- Why it matters: Enables consistent discovery across teams and tooling.
- Practical benefit: Analysts can query by time range, bounding box, and properties instead of browsing folders.
- Caveats: Your metadata quality determines search quality; you need strong publishing standards.
2) Search APIs for spatial/temporal discovery
- What it does: Exposes endpoints to search Items by AOI/ROI, date range, and attributes (for example, cloud cover).
- Why it matters: Search is the core of scalable geospatial analytics.
- Practical benefit: ML pipelines become deterministic and repeatable.
- Caveats: Complex queries and large result sets need pagination and careful client design.
3) Integration with Azure-hosted assets (e.g., Azure Storage)
- What it does: References asset URLs stored in Azure, enabling “analyze in place”.
- Why it matters: Moving TB-scale data is expensive and slow.
- Practical benefit: Read only needed portions of COG/Zarr with range requests.
- Caveats: Performance depends on format, layout, and networking path.
4) Identity and access alignment with Azure (Entra ID)
- What it does: Uses Azure identity patterns for user/service authentication (implementation varies).
- Why it matters: Central governance and least privilege are easier with Entra ID.
- Practical benefit: You can map catalog access to groups and roles.
- Caveats: Ensure service-to-service auth (managed identity/service principal) is supported for your use case.
5) Operationalization: monitoring, logging, and governance hooks
- What it does: Enables operational visibility into API usage, errors, and performance (exact signals vary).
- Why it matters: Catalog becomes a production dependency for multiple teams.
- Practical benefit: Faster incident response and better capacity planning.
- Caveats: You must decide SLOs, alert thresholds, and retention policies.
6) Support for cloud-optimized geospatial formats (pattern-level)
- What it does: Encourages formats like COG and Zarr that support efficient HTTP range reads.
- Why it matters: Greatly reduces data transfer and speeds up analysis.
- Practical benefit: Many workloads can stream subsets rather than download whole files.
- Caveats: Converting to COG/Zarr is a pipeline responsibility; not automatic.
7) Dataset publishing workflows (ingestion/indexing)
- What it does: Provides a mechanism to register Collections/Items and their assets into the catalog.
- Why it matters: Publishing must be repeatable and auditable.
- Practical benefit: New datasets can be onboarded with consistent checks and metadata rules.
- Caveats: The exact ingestion tooling (portal/API/CLI) must be confirmed in official docs.
8) Ecosystem compatibility (STAC tooling)
- What it does: Works with common STAC clients and geospatial Python tools.
- Why it matters: Reduces lock-in and speeds development.
- Practical benefit: Use
pystac-client,pystac,rasterio,stackstac, etc., with minimal glue. - Caveats: Client compatibility depends on the exact STAC API version and extensions supported.
7. Architecture and How It Works
High-level architecture
At a high level, Microsoft Planetary Computer Pro typically sits between your data lake (assets) and your consumers (apps, notebooks, pipelines):
- Assets land in Azure Storage (or other supported storage).
- Metadata is produced (STAC Collections/Items) by ingestion pipelines.
- Catalog indexes metadata and exposes a search endpoint.
- Consumers query the catalog for Items matching AOI/time/attributes.
- Consumers access assets (directly via URLs, possibly with controlled access) and run analytics in Azure compute.
Request/data/control flow (conceptual)
- Control plane: Provisioning, configuration, RBAC/roles, network settings.
- Data plane:
- Publish metadata (Collections/Items).
- Query/search metadata.
- Retrieve asset URLs and read data.
Integrations with related Azure services
Common integration patterns include:
- Azure Storage: durable asset storage; lifecycle policies; tiering.
- Azure Machine Learning: training pipelines that query catalog and read assets.
- Azure Databricks / Spark: large-scale feature extraction and ETL.
- Azure Functions / Logic Apps: event-driven ingestion and metadata generation.
- Azure Container Apps / AKS: hosting custom processing services or STAC tooling.
- Azure Key Vault: secrets/certificates if needed for custom components.
- Azure Monitor: logs, metrics, dashboards, alerts.
Dependency note: The exact dependencies used by Microsoft Planetary Computer Pro can differ from the list above based on release and deployment model. Use the official docs to identify which resources are created and which are required.
Security/authentication model (conceptual)
- User auth: typically via Microsoft Entra ID.
- Service-to-service: managed identities or service principals.
- Authorization: role-based access to the catalog and to underlying storage.
- Auditing: logs for API access and storage access (Storage analytics, Azure Monitor).
Networking model (conceptual)
- Public endpoint: simplest to start; requires strong auth and IP restrictions where possible.
- Private endpoint / VNet integration: preferred for regulated environments; often paired with private access to Storage.
- Outbound access: ingestion pipelines may need outbound to fetch third-party datasets (control with Firewall/NAT).
Monitoring/logging/governance considerations
- Define SLIs such as:
- Search latency (p95)
- Error rate (4xx/5xx)
- Ingestion latency
- Request volume by tenant/team
- Centralize logs in Log Analytics.
- Apply Azure Policy and tagging:
- Data classification tags
- Owner/team
- Cost center
- Use budgets and alerts to prevent unexpected costs.
Simple architecture diagram (Mermaid)
flowchart LR
A[Producers / Ingestion Pipeline] -->|STAC metadata| B[Microsoft Planetary Computer Pro Catalog]
A -->|Assets (COG/Zarr/GeoParquet)| S[Azure Storage]
U[Users / Notebooks / Apps] -->|Search (AOI+time)| B
U -->|Read asset URLs| S
U --> C[Azure ML / Databricks Compute]
C -->|Read subsets| S
Production-style architecture diagram (Mermaid)
flowchart TB
subgraph Ingestion
D1[Data Landing Zone\n(ADLS Gen2/Blob)] --> P1[Metadata Generator\n(Container/Function/Spark)]
P1 --> Q1[Quality Checks\nSchema/Projection/Overviews]
Q1 --> C1[Catalog Publish\nCollections/Items]
end
subgraph Catalog
MPCP[Microsoft Planetary Computer Pro\n(Search/API)]
end
subgraph Security
ID[Microsoft Entra ID\nUsers/Groups/Apps]
KV[Azure Key Vault\n(if custom secrets needed)]
end
subgraph Consumers
N1[Analyst Notebook\nPython STAC Client] --> MPCP
M1[Azure ML Pipeline] --> MPCP
DBX[Azure Databricks Jobs] --> MPCP
APP[Internal App/API] --> MPCP
end
C1 --> MPCP
MPCP --> D1
ID --> MPCP
ID --> M1
ID --> DBX
subgraph Observability
MON[Azure Monitor + Log Analytics]
end
MPCP --> MON
M1 --> MON
DBX --> MON
8. Prerequisites
Account/subscription/tenant requirements
- An Azure subscription where you can deploy and pay for resources.
- A Microsoft Entra ID tenant associated with the subscription.
- Access to Microsoft Planetary Computer Pro in your tenant (may require enrollment or specific availability). Verify in official docs and/or Azure Marketplace.
Permissions / IAM roles
At minimum (exact roles depend on the offering and deployment model):
- Subscription Contributor (or equivalent) to create resources, or a combination of:
- Resource Group Contributor
- Role assignments admin privileges (User Access Administrator) if you must grant RBAC
- Permissions to create and configure:
- Resource groups
- Networking (if using VNets/private endpoints)
- Storage accounts/containers
- Application registrations (if needed for client auth)
For least privilege, separate roles: – Platform team: resource creation + policies – Data publishers: publish collections/items and write to storage containers – Data consumers: read catalog and read approved storage paths
Billing requirements
- A subscription with an active payment method.
- If Microsoft Planetary Computer Pro is obtained via Azure Marketplace, confirm procurement policies allow marketplace purchases.
CLI/SDK/tools needed
For the lab in this tutorial:
- Azure CLI: https://learn.microsoft.com/cli/azure/install-azure-cli
- Python 3.10+ (or current supported): https://www.python.org/downloads/
- Python packages:
pystac-clientpystacrequestspython-dotenv(optional)rasterio(optional for reading COGs)curlfor API testing
Install example:
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows PowerShell
pip install pystac-client pystac requests python-dotenv
Region availability
- Microsoft Planetary Computer Pro region availability is not universal. Verify in official docs and the Azure portal “Create resource” experience.
Quotas/limits
Potential quotas that commonly matter: – API request rate limits (service-defined) – Storage account limits (throughput, request rate) – Network egress (cost and throttling) – Identity token limits/timeouts
Always check: – The service’s quota/limit documentation (verify in official docs) – Azure Storage scalability targets: https://learn.microsoft.com/azure/storage/common/scalability-targets-standard-account
Prerequisite services
You will almost always need: – Azure Storage for assets – Entra ID configuration for authentication – Azure Monitor for logs/metrics (recommended)
9. Pricing / Cost
Pricing for Microsoft Planetary Computer Pro can depend on how it is delivered (managed service vs marketplace deployment) and what underlying Azure resources it uses. Do not rely on third-party summaries—verify with Microsoft’s official pricing or marketplace listing.
Current pricing model (how to think about it)
Use this approach:
- Service charge (if any)
If Microsoft Planetary Computer Pro is offered as a managed Azure service/SaaS, pricing may include: – A base monthly/hourly charge per instance – Usage-based charges (requests, indexed items, storage for metadata)
Verify in official docs or the Azure Marketplace offer.
- Underlying Azure resource costs (nearly always)
Even if the service is “managed,” your architecture commonly incurs: – Azure Storage (assets; transactions; capacity) – Networking (egress, NAT, private endpoints) – Compute for ingestion and analytics (Functions/AKS/Databricks/Azure ML) – Monitoring (Log Analytics ingestion/retention)
Pricing dimensions to expect
Depending on implementation and your workloads:
- Number of Collections/Items indexed (metadata volume)
- Search request volume
- Asset storage size (GB/TB) and access frequency
- Data transfer (especially egress out of Azure or across regions)
- Ingestion compute hours (conversion to COG/Zarr, tiling, validations)
- Log volume (API logs, diagnostics)
Free tier
- The public Microsoft Planetary Computer experience is separate from Planetary Computer Pro.
- Any free tier for Microsoft Planetary Computer Pro (if available) must be confirmed in official sources. Verify in official docs.
Cost drivers (direct + indirect)
Direct – Storage capacity and transaction costs – Compute used for conversions/processing – Search/API hosting charges (if applicable) – Monitoring ingestion/retention costs
Indirect/hidden – Cross-region access (latency + data transfer) – Excessively verbose logging – Multiple copies of the same dataset (raw + processed + cached) – Public access patterns that force large downloads instead of range reads
Network/data transfer implications
- Keep compute in the same region as storage to reduce egress and latency.
- Prefer COG/Zarr and windowed reads to avoid full-file downloads.
- Be mindful of:
- Egress to the public internet
- Cross-AZ and cross-region transfers
- Private endpoint and DNS configuration overhead
How to optimize cost
- Convert large rasters into COG with internal tiling/overviews so clients read subsets.
- Use lifecycle policies in Azure Storage (hot → cool → archive when appropriate).
- Separate raw and curated zones; minimize duplication.
- Use budgets + alerts at resource group level.
- Avoid indexing low-value metadata fields that explode index size (confirm indexing model in docs).
- Control log verbosity; set appropriate retention in Log Analytics.
Example low-cost starter estimate (model, not numbers)
A low-cost sandbox typically includes: – 1 storage account with a small container of sample COGs (a few GB) – A small amount of ingestion compute (local laptop or a small Azure compute job) – A small catalog instance (if supported) – Minimal logging retention (for example, 7–30 days)
Because actual pricing varies by region and offer, use: – Azure Pricing Calculator: https://azure.microsoft.com/pricing/calculator/ – Azure Storage pricing: https://azure.microsoft.com/pricing/details/storage/blobs/ – If applicable, the Microsoft Planetary Computer Pro marketplace/pricing page (verify availability)
Example production cost considerations (what usually dominates)
In production, costs are commonly dominated by: – Data storage (TB–PB) – Data processing to curate/convert datasets (COG/Zarr creation) – Network egress when sharing data externally – Analytics compute (Databricks/Spark, Azure ML training) – Monitoring at scale (high request volume)
10. Step-by-Step Hands-On Tutorial
This lab focuses on a practical, low-risk workflow: create a small Azure Storage container with one sample geospatial asset, create minimal STAC metadata, publish it to your Microsoft Planetary Computer Pro catalog endpoint (or compatible STAC endpoint provided by the service), and query it from Python.
Important: The exact UI steps and publish API for Microsoft Planetary Computer Pro can differ by release. Where your tenant’s flow differs, follow the official docs for publishing and adapt the metadata and query steps below.
Objective
Publish a small geospatial asset into a Microsoft Planetary Computer Pro catalog and query it via a STAC client.
Lab Overview
You will:
- Create a resource group and storage account.
- Upload a sample Cloud Optimized GeoTIFF (COG) (or any small file if you don’t have a COG yet).
- Create STAC
CollectionandItemJSON metadata referencing the uploaded asset. - Publish the metadata to your Microsoft Planetary Computer Pro endpoint (method varies—portal or API).
- Query the catalog using
pystac-client. - Clean up Azure resources.
Step 1: Create a resource group and storage account (Azure CLI)
Action
az login
az account show
# Set your subscription if needed:
# az account set --subscription "<SUBSCRIPTION_ID>"
RG="rg-mpcp-lab"
LOC="eastus" # choose a region supported by your org and the service
STG="stmpcplab$RANDOM" # must be globally unique, lowercase, 3-24 chars
az group create -n "$RG" -l "$LOC"
az storage account create \
-n "$STG" \
-g "$RG" \
-l "$LOC" \
--sku Standard_LRS \
--kind StorageV2
Expected outcome – Resource group exists. – Storage account exists.
Verify
az storage account show -n "$STG" -g "$RG" --query "name"
Step 2: Create a private container and upload a sample asset
For a realistic pattern, keep the container private. For the lab, you can still access the blob using a SAS URL.
Action
CONTAINER="assets"
az storage container create \
--account-name "$STG" \
--name "$CONTAINER" \
--auth-mode login
Upload a small file. If you have a small COG, use it. If not, upload any small file to validate the catalog flow (the catalog won’t validate the raster format unless configured to do so).
# Example: upload a local file (replace path)
FILE_PATH="./sample.tif" # or ./sample.txt for a basic test
BLOB_NAME="sample.tif"
az storage blob upload \
--account-name "$STG" \
--container-name "$CONTAINER" \
--name "$BLOB_NAME" \
--file "$FILE_PATH" \
--auth-mode login
Create a SAS URL for the blob (read-only, short expiry) so clients can fetch it during the lab.
EXPIRY=$(date -u -d "2 hours" '+%Y-%m-%dT%H:%MZ' 2>/dev/null || date -u -v+2H '+%Y-%m-%dT%H:%MZ')
SAS=$(az storage blob generate-sas \
--account-name "$STG" \
--container-name "$CONTAINER" \
--name "$BLOB_NAME" \
--permissions r \
--expiry "$EXPIRY" \
--auth-mode login \
-o tsv)
URL="https://${STG}.blob.core.windows.net/${CONTAINER}/${BLOB_NAME}?${SAS}"
echo "$URL"
Expected outcome – Blob exists and can be read using the SAS URL.
Verify
curl -I "$URL"
You should see HTTP/1.1 200 OK (or 206 Partial Content for range-enabled reads).
Step 3: Prepare STAC metadata (Collection and Item)
Create a folder stac/ and add:
3.1 Create a STAC Collection JSON
stac/collection.json:
{
"type": "Collection",
"stac_version": "1.0.0",
"id": "mpcp-lab-collection",
"title": "MPCP Lab Collection",
"description": "A minimal STAC Collection for a Microsoft Planetary Computer Pro lab.",
"license": "proprietary",
"extent": {
"spatial": { "bbox": [[-180.0, -90.0, 180.0, 90.0]] },
"temporal": { "interval": [["2024-01-01T00:00:00Z", null]] }
},
"links": [],
"keywords": ["azure", "stac", "planetary-computer-pro", "lab"]
}
3.2 Create a STAC Item JSON referencing your asset URL
stac/item.json (replace <ASSET_URL> with the SAS URL you printed earlier):
{
"type": "Feature",
"stac_version": "1.0.0",
"id": "mpcp-lab-item-001",
"collection": "mpcp-lab-collection",
"geometry": {
"type": "Polygon",
"coordinates": [[[0, 0],[0, 1],[1, 1],[1, 0],[0, 0]]]
},
"bbox": [0, 0, 1, 1],
"properties": {
"datetime": "2024-06-01T00:00:00Z"
},
"assets": {
"data": {
"href": "<ASSET_URL>",
"type": "image/tiff; application=geotiff",
"roles": ["data"]
}
},
"links": []
}
Expected outcome – You have valid JSON files that conform to basic STAC 1.0 structures.
Verify locally (optional) If you want quick validation:
python -c "import json; json.load(open('stac/collection.json')); json.load(open('stac/item.json')); print('JSON OK')"
Step 4: Create or access your Microsoft Planetary Computer Pro endpoint
This step depends on your tenant’s onboarding.
Option A (common): Create Microsoft Planetary Computer Pro from Azure portal
Action 1. In the Azure portal, search for Microsoft Planetary Computer Pro. 2. Select Create. 3. Choose subscription, resource group, region. 4. Configure authentication (Entra ID) per your organization’s standard. 5. Create the resource.
Expected outcome – You have a service endpoint (often a URL) for catalog/search and (if supported) publishing.
Verify – Locate endpoint URLs and auth requirements in the resource’s Overview blade or docs.
Option B: You were provided an endpoint by your platform team
Action
– Obtain:
– Catalog base URL (e.g., https://<your-endpoint>/)
– Authentication method (Entra ID / OAuth2)
– Publishing method (portal upload / API endpoint)
Expected outcome – You can authenticate and reach the catalog.
Step 5: Publish the Collection and Item to Microsoft Planetary Computer Pro
Because publishing workflows vary, use one of these patterns:
Pattern 1: Publish via portal UI (if available)
Action
1. Open Microsoft Planetary Computer Pro in Azure portal.
2. Find the Catalog / Collections area.
3. Create/import a collection using stac/collection.json.
4. Add/import an item using stac/item.json.
Expected outcome – Collection and Item appear in the catalog UI.
Pattern 2: Publish via HTTP API (if your deployment exposes it)
Some STAC API implementations accept:
– POST /collections for collections
– POST /collections/{collectionId}/items for items
Action (example) Set:
CATALOG_BASE="https://<YOUR_MPCP_ENDPOINT>"
Then (no auth headers shown here because auth varies):
curl -X POST "${CATALOG_BASE}/collections" \
-H "Content-Type: application/json" \
--data-binary @stac/collection.json
curl -X POST "${CATALOG_BASE}/collections/mpcp-lab-collection/items" \
-H "Content-Type: application/json" \
--data-binary @stac/item.json
Expected outcome – HTTP 200/201 responses (depends on implementation). – The collection and item become searchable.
Important caveat
If your Microsoft Planetary Computer Pro deployment requires OAuth2 tokens (likely), you must add Authorization: Bearer <token>. Obtain tokens using your organization’s documented flow. Verify in official docs for the supported auth method.
Step 6: Query the catalog using Python (pystac-client)
Create query.py:
from pystac_client import Client
CATALOG_BASE = "https://<YOUR_MPCP_ENDPOINT>" # e.g., https://... (STAC API root)
client = Client.open(CATALOG_BASE)
# Search in the collection for the item we published
search = client.search(
collections=["mpcp-lab-collection"],
bbox=[0, 0, 1, 1],
datetime="2024-06-01T00:00:00Z/2024-06-02T00:00:00Z",
max_items=10,
)
items = list(search.get_items())
print(f"Items found: {len(items)}")
for it in items:
print(it.id, it.properties.get("datetime"))
print("Assets:", list(it.assets.keys()))
print("Asset href:", it.assets["data"].href)
Run it:
python query.py
Expected outcome – The script prints 1 item found, with the asset URL.
If authentication is required Some deployments require authenticated requests even for search. In that case: – Use a token-aware HTTP session supported by your STAC client, or – Use the official SDK or recommended authentication approach in the Microsoft Planetary Computer Pro docs (verify in official docs).
Validation
You have successfully completed the lab if:
- You can list the collection in the catalog (via UI or API).
- A search query returns the item you published.
- The asset URL is retrievable (SAS URL works) and returns HTTP 200/206.
Quick checks:
# Example: list collections (if supported)
curl "${CATALOG_BASE}/collections" | head
# Example: search endpoint (if supported)
curl -X POST "${CATALOG_BASE}/search" \
-H "Content-Type: application/json" \
-d '{"collections":["mpcp-lab-collection"],"bbox":[0,0,1,1],"limit":5}'
Troubleshooting
Issue: 403 Forbidden when querying or publishing
Causes – Missing/invalid auth token – Insufficient RBAC role – Private endpoint requires being on VNet/VPN
Fix – Confirm Entra ID app permissions and roles. – Acquire a valid OAuth2 token (per official docs). – Test from a network location with access (VNet, jumpbox, or VPN).
Issue: 404 Not Found for /search or /collections
Cause – Endpoint is not STAC API root, or path differs.
Fix – Confirm the base URL from the Microsoft Planetary Computer Pro resource overview/docs. – Try the documented endpoints for your deployment.
Issue: Item publishes but search returns nothing
Causes
– Wrong collection ID in the item
– Search filter mismatch (datetime format, bbox)
– Indexing delay
Fix
– Recheck collection matches exactly.
– Query without filters first, then add constraints.
– Wait a few minutes and retry if indexing is asynchronous.
Issue: Asset URL fails to download
Causes – SAS expired – Wrong blob name/container – Storage firewall restrictions
Fix – Regenerate SAS with longer expiry. – Confirm blob exists. – Adjust Storage networking/firewall for your test environment.
Cleanup
Delete the resource group to remove storage and associated charges:
az group delete -n "$RG" --yes --no-wait
If you created a Microsoft Planetary Computer Pro resource, delete it as well (if it’s in the same resource group, it will be deleted automatically). Confirm no additional resources remain (VNets, private endpoints, log analytics workspaces) if created separately.
11. Best Practices
Architecture best practices
- Separate zones: raw landing zone vs curated/optimized zone (COG/Zarr/GeoParquet).
- Standardize metadata: enforce required fields and consistent property naming across Collections.
- Design for range reads: COG internal tiling + overviews; avoid “monolithic GeoTIFFs.”
- Keep compute close to data: same region whenever possible.
- Treat the catalog as a platform dependency: define SLOs and scaling strategy.
IAM/security best practices
- Use least privilege:
- Publishers: write to specific containers + publish metadata
- Consumers: read only approved Collections/assets
- Prefer managed identities for pipelines and services.
- Avoid long-lived SAS in production; prefer identity-based access where supported.
Cost best practices
- Use Storage lifecycle management (hot/cool/archive).
- Minimize duplicate copies of assets.
- Keep logging at an intentional level; avoid high-volume debug logging in production.
- Add budgets and cost alerts per resource group/data product.
Performance best practices
- Index only what you need for search.
- Ensure assets are cloud-optimized and compressed appropriately.
- Tune clients:
- Use pagination
- Use bounding boxes and date ranges to limit results
- Avoid wide-open searches in interactive apps
Reliability best practices
- Automate ingestion with retries and idempotency.
- Validate metadata before publish (schema checks).
- Use staged rollouts for new collections/versions.
- Define backup/restore approach for metadata (verify what Microsoft manages vs you manage).
Operations best practices
- Centralize logs in Log Analytics.
- Monitor:
- request rates
- latency
- errors
- ingestion success/failure
- Use runbooks for common failures (token issues, storage access issues, indexing backlogs).
Governance/tagging/naming best practices
- Standard naming pattern:
mpcp-<org>-<env>-<region>- Tag resources:
Owner,CostCenter,DataClassification,Environment,Product- Maintain a dataset registry document with:
- source system
- license
- retention policy
- publishing SLA
12. Security Considerations
Identity and access model
- Prefer Microsoft Entra ID for:
- interactive user access
- service principals / managed identities for pipelines
- Use RBAC for:
- catalog operations (publish, update, delete)
- underlying storage access (read/write)
- If your architecture spans subscriptions, plan for:
- cross-tenant restrictions
- guest users
- managed identity scope boundaries
Encryption
- Azure Storage encryption at rest is enabled by default (Microsoft-managed keys).
- For higher control, use customer-managed keys (CMK) via Azure Key Vault (if required).
- Encrypt data in transit with HTTPS.
- Verify whether the Microsoft Planetary Computer Pro endpoint supports TLS configuration standards required by your org.
Network exposure
- Prefer private access patterns when available:
- private endpoints to Storage
- private access to catalog endpoint (if supported)
- If public endpoints are used:
- enforce strong auth
- consider IP allowlists (where supported)
- protect endpoints with WAF patterns if fronted by an application gateway (architecture-specific)
Secrets handling
- Avoid embedding SAS tokens in code repos.
- Store secrets (if unavoidable) in Azure Key Vault.
- Rotate credentials; use short-lived tokens.
Audit/logging
- Enable and retain:
- catalog access logs (if available)
- storage access logs (where appropriate)
- Azure Activity Log for control-plane actions
- Export logs to SIEM if needed (Microsoft Sentinel).
Compliance considerations
Compliance depends on: – data residency – encryption requirements – access logging retention – whether the service is GA/preview and what compliance attestations exist
For regulated workloads, confirm: – compliance documentation for the offering – whether preview features are permitted by policy
Common security mistakes
- Public blob containers for convenience (avoid in production).
- Long-lived SAS tokens.
- No separation between publisher and consumer roles.
- No audit trail for dataset publishing.
- Cross-region data movement without review.
Secure deployment recommendations
- Use private networking where possible.
- Use managed identity for ingestion pipelines.
- Require code review for metadata schema changes.
- Implement data classification tags and enforce via Azure Policy.
13. Limitations and Gotchas
Confirm current limitations in official docs; the list below focuses on common real-world pitfalls for geospatial catalog platforms.
Known limitations (typical categories)
- Region availability: not all Azure regions may be supported.
- Preview constraints: some features may be preview-only with limited SLA.
- Auth complexity: integrating STAC clients with enterprise auth can require extra work.
- Metadata quality: poor metadata leads to poor search; you must invest in standards.
- Large catalogs: indexing millions of items requires careful design (pagination, query patterns).
- Asset access mismatch: catalog may return URLs, but consumers still need proper authorization to read blobs.
Quotas
Potential quotas to watch (verify in official docs): – Max items per search response – Rate limits per client – Max properties indexed – Max payload sizes for publish requests
Regional constraints
- Cross-region access can increase latency and cost.
- If assets and catalog are in different regions, expect:
- slower reads
- higher transfer costs
- more complex networking/security
Pricing surprises
- Storage transactions and egress can dominate.
- Log ingestion can become significant at high request volumes.
- Reprocessing datasets into COG/Zarr can consume large compute.
Compatibility issues
- STAC extensions: clients may expect certain extensions; the service may support only a subset.
- Datetime formatting and timezone assumptions can break queries.
- Coordinate reference systems: STAC geometry is generally WGS84 (EPSG:4326) for bbox/geometry; ensure consistent metadata.
Operational gotchas
- Token expiration causing intermittent failures.
- Indexing delays (publish is accepted, item isn’t searchable immediately).
- Inconsistent item IDs and collection IDs causing duplication.
Migration challenges
- Migrating from a folder-based lake to a catalog requires:
- metadata backfill
- format conversion
- governance decisions about ownership and lifecycle
Vendor-specific nuances
- If the service is delivered as a managed application, patching and scaling responsibilities may be shared—clarify the support model in official docs.
14. Comparison with Alternatives
Microsoft Planetary Computer Pro sits at the intersection of geospatial cataloging, data discovery, and Azure-based analytics enablement. Alternatives include other Azure services, other clouds’ geospatial platforms, and self-managed open source stacks.
Comparison table
| Option | Best For | Strengths | Weaknesses | When to Choose |
|---|---|---|---|---|
| Microsoft Planetary Computer Pro (Azure) | Organizations wanting a standardized geospatial catalog pattern aligned with Azure | Azure-aligned identity/governance; STAC ecosystem compatibility; integrates with Azure storage/compute | Availability/feature set depends on release; may require learning STAC; specifics can vary | You want a managed/standard approach rather than building your own |
| Build your own STAC API on Azure (self-managed) | Teams needing full control and customization | Maximum flexibility; choose your DB/indexing; tailored auth | Higher ops burden; scaling/search tuning is on you | You have specialized requirements or existing platform engineering capacity |
| Azure Data Explorer (Kusto) + custom geospatial indexing | Fast query over structured telemetry and some geospatial operations | Great for time-series; powerful query language | Not a STAC-native catalog; raster asset access is separate | You primarily query metadata/telemetry and only secondarily reference assets |
| Azure Databricks + Delta Lake + custom metadata tables | Lakehouse-first orgs with Spark expertise | Strong ETL/ML; unified governance with Unity Catalog (Databricks features) | Not a STAC-native interface; discoverability depends on custom tables | Your org standard is Spark/lakehouse and you can implement the catalog layer |
| Google Earth Engine (other cloud) | Planet-scale geospatial analytics with built-in datasets | Huge dataset catalog; powerful analysis environment | Different cloud ecosystem; governance and enterprise integration differ | You need Earth Engine’s built-in datasets/algorithms and can accept platform constraints |
| AWS (self-managed STAC + S3) | AWS-first teams wanting STAC patterns | Mature object storage ecosystem | You build/operate the catalog; separate from Azure tooling | Your platform is standardized on AWS |
| Open-source catalogs (STAC API + Postgres/Elasticsearch) | Portable, standards-based solutions | Avoid vendor lock-in; large ecosystem | Ops burden; auth integration is on you | You want multi-cloud portability and can operate the stack |
15. Real-World Example
Enterprise example: Climate risk and asset exposure platform
- Problem: A global insurer needs consistent access to climate hazard layers (heat, flood, wildfire) and must reproduce portfolio risk calculations over time with strict auditability.
- Proposed architecture
- Azure Storage as curated layer store (COG/Zarr + vector layers)
- Microsoft Planetary Computer Pro as the discovery/catalog layer (Collections per hazard model and version)
- Azure Databricks for batch feature extraction at scale
- Azure ML for training hazard impact models
- Entra ID groups for role separation (underwriters vs data engineers)
- Azure Monitor + Log Analytics for operational visibility
- Why this service was chosen
- Standard, queryable catalog interface reduces friction across teams.
- Strong alignment with Azure security/governance patterns.
- Enables consistent dataset versioning as Collections.
- Expected outcomes
- Faster analytics onboarding (days instead of weeks).
- Better reproducibility (explicit dataset versions).
- Stronger governance (audited access and controlled publishing).
Startup/small-team example: Agricultural remote sensing product
- Problem: A startup provides field-level crop health metrics from satellite imagery, but struggles with dataset discovery, repeatability, and internal dataset sharing.
- Proposed architecture
- Azure Storage with COG tiles per date/sensor
- Microsoft Planetary Computer Pro as a lightweight internal catalog for scenes and derived indices
- Containerized ingestion pipeline that creates STAC Items per new acquisition
- Azure ML endpoints for inference on new imagery
- Why this service was chosen
- Avoids building a custom catalog service from scratch.
- Uses standard tooling (STAC clients) for rapid development.
- Expected outcomes
- Repeatable training datasets and faster debugging.
- Reduced duplication of datasets and clearer ownership.
16. FAQ
1) Is Microsoft Planetary Computer Pro the same as the public Microsoft Planetary Computer website?
No. The public Microsoft Planetary Computer provides a curated catalog of open datasets. Microsoft Planetary Computer Pro is intended for organizations to enable similar catalog/discovery patterns for their own data. Verify exact positioning and capabilities in official docs.
2) Is Microsoft Planetary Computer Pro a database?
It is better thought of as a geospatial catalog and discovery layer that indexes metadata and points to assets stored elsewhere (often Azure Storage).
3) Do I need to store data in Azure to use it?
Typically, the most efficient pattern is storing assets in Azure Storage close to the catalog and compute. Some architectures may reference external URLs, but performance, governance, and costs can be problematic. Verify supported asset locations in official docs.
4) Does it support STAC API?
Microsoft Planetary Computer is strongly associated with STAC patterns. Confirm the supported STAC API version, endpoints, and extensions in official docs for Microsoft Planetary Computer Pro.
5) Can I use pystac-client with Microsoft Planetary Computer Pro?
If the service exposes a STAC API-compatible endpoint, yes. If the endpoint requires enterprise auth, you may need additional client configuration.
6) How do I authenticate?
Commonly via Microsoft Entra ID. Exact flows (interactive vs client credentials) and token audience/scopes are service-specific—verify in official documentation.
7) How do I publish data—do I upload files into the service?
In most catalog patterns, you store assets in Azure Storage and publish metadata (Collections/Items) that reference those assets. Publishing workflows vary—portal-based or API-based—verify in official docs.
8) Does it generate map tiles?
Planetary Computer Pro is primarily about catalog/discovery. Tile serving may require separate services (for example, a dedicated tiling service). Verify if tiling is included in your version.
9) Does it support raster formats like COG and Zarr?
These formats are commonly used with planetary-computer-style architectures because they enable efficient reads. Support can be “by convention” (assets referenced in those formats). Verify any validation or special handling in the service.
10) Can I keep my data private and still allow search?
Yes, typical designs allow restricted search and restricted asset access. You must configure authorization carefully so users only see what they should.
11) What’s the difference between catalog access and storage access?
Catalog access controls whether you can query metadata. Storage access controls whether you can read the actual raster/vector files. You must secure both.
12) How do I handle versioning of datasets?
A common approach is:
– New Collection ID per major version, or
– Maintain one Collection with version properties and clear deprecation policy
Choose a strategy and enforce it.
13) How many items can it handle?
Capacity depends on the underlying implementation and quotas. Verify scaling and limits in official docs, and test with realistic load.
14) Can I run it in a VNet with private endpoints only?
Many Azure services support private networking patterns, but you must verify Microsoft Planetary Computer Pro’s private connectivity support in official docs.
15) Is it suitable for regulated industries?
Potentially, if it supports required security controls and has appropriate compliance posture. Confirm compliance documentation, region availability, and GA vs preview status.
16) What are the biggest cost risks?
Usually: storage growth, repeated reprocessing, data egress, and logging volume.
17) What skills do teams need to be successful with it?
STAC concepts, geospatial formats (COG/Zarr/GeoParquet), Azure identity/security, and data engineering for ingestion pipelines.
17. Top Online Resources to Learn Microsoft Planetary Computer Pro
| Resource Type | Name | Why It Is Useful |
|---|---|---|
| Official site (Planetary Computer) | https://planetarycomputer.microsoft.com/ | Background on the Planetary Computer ecosystem, datasets, and tooling patterns |
| Official documentation (Azure Learn) | https://learn.microsoft.com/ (search “Microsoft Planetary Computer Pro”) | Canonical docs for setup, security, endpoints, and supported features (verify current URL/path) |
| Official Azure Marketplace | https://azuremarketplace.microsoft.com/ (search “Planetary Computer Pro”) | If offered via Marketplace, the listing provides pricing, terms, and deployment model |
| STAC specification | https://stacspec.org/ | Core concepts (Collections/Items/assets) that underpin many planetary-computer patterns |
pystac-client docs |
https://pystac-client.readthedocs.io/ | Practical client usage for searching STAC APIs |
pystac docs |
https://pystac.readthedocs.io/ | Build and validate STAC metadata in Python |
| Microsoft Planetary Computer GitHub (ecosystem) | https://github.com/microsoft/planetary-computer | Reference implementations, docs, and examples (confirm repos/paths) |
| Azure Storage docs | https://learn.microsoft.com/azure/storage/ | Best practices for storing and securing geospatial assets |
| Azure Architecture Center | https://learn.microsoft.com/azure/architecture/ | Patterns for security, networking, monitoring, and data platforms on Azure |
| Cloud Optimized GeoTIFF (COG) | https://www.cogeo.org/ | Why COG matters and how to structure rasters for efficient cloud access |
18. Training and Certification Providers
| Institute | Suitable Audience | Likely Learning Focus | Mode | Website URL |
|---|---|---|---|---|
| DevOpsSchool.com | DevOps engineers, platform teams, cloud engineers | Azure DevOps, cloud operations, CI/CD, fundamentals that support production geospatial/AI platforms | Check website | https://www.devopsschool.com/ |
| ScmGalaxy.com | Beginners to intermediate IT professionals | DevOps/SCM concepts, automation basics, cloud foundations | Check website | https://www.scmgalaxy.com/ |
| CLoudOpsNow.in | Cloud engineers, SRE/ops teams | Cloud operations practices, monitoring, operational readiness | Check website | https://www.cloudopsnow.in/ |
| SreSchool.com | SREs, reliability-focused engineers | SRE practices, SLIs/SLOs, incident response, reliability engineering | Check website | https://www.sreschool.com/ |
| AiOpsSchool.com | Ops and platform teams adopting AI operations | AIOps concepts, monitoring/automation with AI tooling | Check website | https://www.aiopsschool.com/ |
19. Top Trainers
| Platform/Site | Likely Specialization | Suitable Audience | Website URL |
|---|---|---|---|
| RajeshKumar.xyz | DevOps/cloud training content (verify scope on site) | Engineers seeking practical training resources | https://www.rajeshkumar.xyz/ |
| devopstrainer.in | DevOps training (verify course catalog on site) | Individuals/teams looking for DevOps coaching | https://www.devopstrainer.in/ |
| devopsfreelancer.com | Freelance DevOps services/training resources (verify offerings) | Teams needing short-term help or advisory | https://www.devopsfreelancer.com/ |
| devopssupport.in | DevOps support/training resources (verify scope) | Ops teams and engineers needing support guidance | https://www.devopssupport.in/ |
20. Top Consulting Companies
| Company Name | Likely Service Area | Where They May Help | Consulting Use Case Examples | Website URL |
|---|---|---|---|---|
| cotocus.com | Cloud/DevOps consulting (verify service catalog) | Platform setup, automation, operational readiness | Azure landing zone setup; CI/CD pipeline standardization; monitoring strategy | https://www.cotocus.com/ |
| DevOpsSchool.com | DevOps and cloud consulting/training (verify offerings) | Delivery enablement, DevOps practices, skills uplift | Build deployment pipelines; implement IaC; train teams on ops practices | https://www.devopsschool.com/ |
| DEVOPSCONSULTING.IN | DevOps consulting (verify service catalog) | CI/CD, automation, operational practices | Toolchain implementation; release process improvements; cloud cost governance | https://www.devopsconsulting.in/ |
21. Career and Learning Roadmap
What to learn before Microsoft Planetary Computer Pro
- Azure fundamentals – Resource groups, identity, networking basics
- Azure Storage – Blob/ADLS Gen2, SAS vs RBAC, lifecycle management
- Geospatial fundamentals – CRS/projections, raster vs vector, bounding boxes
- Cloud-optimized formats – COG, Zarr, GeoParquet concepts
- STAC basics – Collections, Items, Assets, search patterns
What to learn after
- Geospatial data engineering at scale
- Spark-based ETL, tiling/subsetting strategies
- Azure ML productionization
- pipelines, model registry, deployment, monitoring
- Data governance
- catalog stewardship, data contracts, lineage, access reviews
- Advanced networking/security
- private endpoints, hub-spoke VNets, firewall egress control
- Observability/SRE
- SLOs for APIs, capacity planning, incident response
Job roles that use it
- Cloud solution architect (data/AI)
- Geospatial data engineer
- ML engineer (geospatial)
- GIS platform engineer
- DevOps/SRE supporting data/AI platforms
- Security engineer for data platforms
Certification path (if available)
There is no widely known standalone certification specifically for Microsoft Planetary Computer Pro. A practical Azure certification path for teams working in AI + Machine Learning and data platforms includes:
– Azure Fundamentals (AZ-900)
– Azure Data fundamentals (DP-900)
– Azure Data Engineer (DP-203)
– Azure AI Engineer (AI-102)
Verify current certification lineup on Microsoft Learn:
https://learn.microsoft.com/credentials/
Project ideas for practice
- Build an ingestion pipeline that converts GeoTIFF → COG and publishes STAC Items.
- Create a “dataset contract” template for Collections (required properties, naming, lineage).
- Implement access tiers: public metadata vs restricted asset access.
- Benchmark query performance for large item counts and refine indexing strategy (within service limits).
22. Glossary
- AOI (Area of Interest): The geographic area you want data for (often a bounding box or polygon).
- Asset: A file referenced by a STAC Item (e.g., a COG, Zarr store, GeoJSON, GeoParquet).
- Azure Storage (Blob/ADLS Gen2): Object storage commonly used to store geospatial assets.
- Bounding box (bbox):
[minLon, minLat, maxLon, maxLat]used for spatial filtering. - COG (Cloud Optimized GeoTIFF): A GeoTIFF structured for efficient HTTP range reads with internal tiling and overviews.
- Collection (STAC): A dataset definition grouping Items with shared properties and extent.
- Datetime filter: A STAC search parameter restricting results by time or time range.
- Entra ID: Microsoft identity platform (formerly Azure Active Directory).
- GeoParquet: Columnar Parquet files with geospatial metadata; efficient for analytics.
- Item (STAC): A single spatiotemporal entity (e.g., one satellite scene or one derived raster for a date).
- ML (Machine Learning): Training models that learn patterns from data; geospatial ML includes segmentation/classification/regression over raster/vector features.
- RBAC: Role-based access control, used to grant permissions.
- SAS (Shared Access Signature): A token appended to Storage URLs granting time-limited permissions.
- STAC: SpatioTemporal Asset Catalog specification for cataloging geospatial assets.
- STAC API: A web API convention for searching and retrieving STAC metadata.
- Zarr: Chunked, compressed array storage format suitable for cloud object storage and parallel reads.
23. Summary
Microsoft Planetary Computer Pro (Azure) is a geospatial catalog and discovery capability designed to help organizations standardize how they publish, find, and use Earth observation and geospatial datasets for AI + Machine Learning and analytics on Azure. It fits as a central metadata/search layer in front of Azure-hosted assets (often in Azure Storage), enabling repeatable spatial/temporal discovery and governed access patterns that support notebooks, pipelines, and production applications.
Key points to keep in mind: – Cost is usually driven by storage, processing/format conversion, egress, and logging—not just the catalog itself. – Security requires controlling both catalog access and asset access, ideally with Entra ID, least privilege, and private networking where supported. – When to use it: multiple teams, multiple datasets, and a need for standardized geospatial discovery and governance on Azure. – Next step: confirm your tenant’s Microsoft Planetary Computer Pro documentation (endpoints, publishing workflow, auth model), then expand the lab into an automated ingestion pipeline that converts data to COG/Zarr and publishes STAC metadata consistently.