{"id":72531,"date":"2026-04-12T22:41:29","date_gmt":"2026-04-12T22:41:29","guid":{"rendered":"https:\/\/www.devopsschool.com\/blog\/senior-capacity-planning-analyst-role-blueprint-responsibilities-skills-kpis-and-career-path\/"},"modified":"2026-04-12T22:41:29","modified_gmt":"2026-04-12T22:41:29","slug":"senior-capacity-planning-analyst-role-blueprint-responsibilities-skills-kpis-and-career-path","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/blog\/senior-capacity-planning-analyst-role-blueprint-responsibilities-skills-kpis-and-career-path\/","title":{"rendered":"Senior Capacity Planning Analyst: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">1) Role Summary<\/h2>\n\n\n\n<p>The Senior Capacity Planning Analyst ensures that cloud and infrastructure platforms have the right amount of compute, storage, network, and supporting service capacity to meet product demand\u2014without chronic overprovisioning or capacity-related outages. This role translates business and engineering signals into quantitative forecasts, actionable capacity plans, and prioritized investments across cloud spend, reservations\/commitments, and infrastructure roadmaps.<\/p>\n\n\n\n<p>This role exists in a software or IT organization because modern cloud and hybrid infrastructure environments are dynamic: demand fluctuates, services scale unevenly, and cost\/performance tradeoffs must be managed continuously. The Senior Capacity Planning Analyst creates business value by improving availability and performance under growth, reducing capacity-driven incidents, and optimizing cost through better forecasting, right-sizing, and commitment strategies.<\/p>\n\n\n\n<p>Role horizon: <strong>Current<\/strong> (core to operating reliable, cost-effective cloud platforms today).<\/p>\n\n\n\n<p>Typical teams and functions this role interacts with include:\n&#8211; Cloud Infrastructure Engineering and Platform Engineering\n&#8211; SRE \/ Reliability Engineering and Operations\n&#8211; FinOps \/ Cloud Cost Management\n&#8211; Engineering leadership (VP Eng \/ Directors), Product leadership (Product Ops)\n&#8211; Data Engineering \/ Analytics Enablement\n&#8211; Procurement \/ Vendor Management (for contracts and commitments)\n&#8211; Security and Risk (where capacity intersects with resilience and compliance)\n&#8211; ITSM \/ Service Management (Change, Incident, Problem, and Availability)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">2) Role Mission<\/h2>\n\n\n\n<p><strong>Core mission:<\/strong><br\/>\nDeliver accurate, decision-grade capacity forecasts and plans that ensure infrastructure and cloud platforms meet performance and reliability requirements while optimizing cost and enabling predictable growth.<\/p>\n\n\n\n<p><strong>Strategic importance to the company:<\/strong><br\/>\nCapacity is one of the largest controllable cost and reliability levers in Cloud &amp; Infrastructure. Under-capacity drives outages and SLA breaches; over-capacity drives waste and margin erosion. This role operationalizes the \u201cright capacity at the right time\u201d discipline across multiple platforms and teams, connecting product growth to infrastructure investment.<\/p>\n\n\n\n<p><strong>Primary business outcomes expected:<\/strong>\n&#8211; Reduced capacity-related incidents and performance regressions\n&#8211; Improved forecast accuracy for demand and spend\n&#8211; Transparent, repeatable capacity planning process integrated into quarterly planning\n&#8211; Cost avoidance through right-sizing and commitment optimization\n&#8211; Clear decision support for engineering and finance (what to buy\/build\/commit, when, and why)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3) Core Responsibilities<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Strategic responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Own multi-horizon capacity forecasting<\/strong> (weekly, monthly, quarterly, annual) across compute, storage, database, network, and critical managed services; align forecasts to product roadmaps, customer growth, and seasonality.<\/li>\n<li><strong>Translate business plans into infrastructure demand signals<\/strong> by partnering with Product, Sales\/RevOps (where applicable), and Engineering to quantify growth drivers and scenario assumptions.<\/li>\n<li><strong>Define target headroom and scaling strategies<\/strong> (e.g., N+1, regional redundancy, burst capacity) for key platforms based on SLOs\/SLAs, risk tolerance, and resilience requirements.<\/li>\n<li><strong>Build capacity investment narratives<\/strong> to support annual and quarterly planning: what capacity is needed, the cost, the risk of not investing, and alternatives (optimization, architectural changes, throttling).<\/li>\n<li><strong>Establish capacity planning standards<\/strong> across teams: forecasting methodology, reporting cadence, definitions (utilization, saturation, headroom), and how capacity risks are escalated and tracked.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Operational responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"6\">\n<li><strong>Run the monthly\/quarterly capacity planning cycle<\/strong>: collect inputs, refresh models, validate assumptions, publish forecasts, and drive decision meetings to approve actions.<\/li>\n<li><strong>Maintain a capacity risk register<\/strong> for critical services: identify constraints, lead times, quota risks, and operational mitigations; ensure owners and due dates are assigned.<\/li>\n<li><strong>Partner with SRE\/Operations to reduce capacity-driven incidents<\/strong> by identifying early warning indicators and implementing proactive actions (right-sizing, autoscaling tuning, pre-warming).<\/li>\n<li><strong>Support peak readiness and seasonal events<\/strong> (launches, marketing campaigns, annual renewal cycles) with pre-event capacity checks, stress-test reviews, and rollback contingencies.<\/li>\n<li><strong>Track capacity execution<\/strong>: validate that approved actions (reservations, quota increases, cluster expansions, DB scaling) were implemented and produced expected outcomes.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Technical responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"11\">\n<li><strong>Develop quantitative models<\/strong> (time-series, regression, cohort-based growth, workload attribution) to forecast demand; document assumptions and limitations for auditability.<\/li>\n<li><strong>Instrument and analyze utilization and saturation<\/strong> using observability data (CPU, memory, IOPS, latency, queue depth, network throughput, pod scheduling pressure).<\/li>\n<li><strong>Perform workload segmentation and attribution<\/strong> (by service, team, environment, region, tenant\/customer tier) to identify key drivers and improve forecast explainability.<\/li>\n<li><strong>Evaluate scaling constraints and lead times<\/strong>: cloud quotas, managed service limits, data migration windows, procurement lead times (if any), and change-control requirements.<\/li>\n<li><strong>Partner on commitment strategy<\/strong> (reserved instances\/savings plans\/committed use discounts) using demand baselines and risk-adjusted forecasts.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Cross-functional or stakeholder responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"16\">\n<li><strong>Provide decision support to Engineering and Finance<\/strong> by reconciling capacity forecasts with budgets, cost allocation, and unit economics (cost per request, per tenant, per GB).<\/li>\n<li><strong>Facilitate cross-team alignment<\/strong> on capacity plans: ensure platform teams, service owners, and FinOps agree on assumptions, prioritization, and timelines.<\/li>\n<li><strong>Create executive-ready reporting<\/strong> that connects capacity health to business outcomes: risk, cost, delivery impact, and operational readiness.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Governance, compliance, or quality responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"19\">\n<li><strong>Ensure quality and reproducibility of reporting<\/strong>: metric definitions, data lineage, versioned models, and documented methodology; support internal audits or operational reviews.<\/li>\n<li><strong>Embed capacity considerations into change governance<\/strong> (where applicable): ensure high-risk changes consider capacity impacts and rollback requirements; track post-change capacity effects.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Leadership responsibilities (Senior IC scope)<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"21\">\n<li><strong>Mentor analysts\/junior planners<\/strong> (if present) on modeling methods, stakeholder management, and analytical rigor; review outputs for quality and coherence.<\/li>\n<li><strong>Lead capacity-related working groups<\/strong> and act as the de facto SME for forecasting, headroom policy, and capacity risk communication.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">4) Day-to-Day Activities<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Daily activities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Monitor key capacity health signals and exception alerts (hot clusters, quota thresholds, storage growth spikes, sustained high latency correlated with resource saturation).<\/li>\n<li>Review anomaly reports and utilization dashboards; identify \u201csilent saturation\u201d patterns (e.g., memory pressure, disk I\/O wait, noisy neighbor behavior).<\/li>\n<li>Respond to ad hoc questions:<\/li>\n<li>\u201cCan we support this launch in 3 weeks?\u201d<\/li>\n<li>\u201cWhat is our runway on database storage in region X?\u201d<\/li>\n<li>\u201cIs our autoscaling policy causing waste or risk?\u201d<\/li>\n<li>Partner with service owners to validate demand shifts and confirm whether changes are organic growth, feature-driven, or incident-related.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Weekly activities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Update rolling forecasts (4\u201312 weeks) for critical platforms; refresh assumptions with latest traffic, backlog, and release plans.<\/li>\n<li>Capacity risk review with SRE\/Platform: top constraints, actions in flight, and escalating items.<\/li>\n<li>Cost and commitment check-ins with FinOps: coverage, utilization, and potential rebalancing opportunities.<\/li>\n<li>Perform deep dives on 1\u20132 focus areas (e.g., Kubernetes node pool saturation, database read-replica scaling, CDN egress trends).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monthly or quarterly activities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Run the formal capacity planning cycle:<\/li>\n<li>Intake roadmap\/demand signals<\/li>\n<li>Update models and scenario ranges (base\/upside\/downside)<\/li>\n<li>Publish forecasts and required actions<\/li>\n<li>Present and obtain approvals in capacity review boards<\/li>\n<li>Headroom policy validation: confirm target buffers still match SLOs and business risk profile.<\/li>\n<li>Reservations\/commitments planning (cloud-dependent): propose buys\/renewals aligned to stable baselines.<\/li>\n<li>Quarterly readiness reviews for peak events; coordinate load tests and capacity validation.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Recurring meetings or rituals<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Weekly Capacity &amp; Performance Standup (SRE + Platform + service owners)<\/li>\n<li>Monthly Capacity Review Board (Cloud\/Infra leadership + FinOps + key engineering leads)<\/li>\n<li>Quarterly Planning sync (Finance\/FP&amp;A, Infra leadership, Product\/Engineering leadership)<\/li>\n<li>Incident review and Problem Management sessions (for capacity-related or performance incidents)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Incident, escalation, or emergency work (when relevant)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Participate in major incident bridges when capacity is suspected (resource exhaustion, scaling limits, quota breaches).<\/li>\n<li>Rapid triage: determine whether the bottleneck is compute, storage, network, DB, or third-party limits; quantify time-to-exhaustion and recommend mitigations.<\/li>\n<li>Post-incident capacity analysis: identify leading indicators, missed forecasts, monitoring gaps, and required model or process improvements.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">5) Key Deliverables<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Rolling Capacity Forecast<\/strong> (weekly\/monthly): multi-horizon demand projections for compute, storage, network, and managed services; includes confidence bands and assumptions.<\/li>\n<li><strong>Quarterly Capacity Plan<\/strong>: prioritized actions, timelines, owners, dependencies, and risk reduction impact.<\/li>\n<li><strong>Capacity Headroom Policy &amp; Targets<\/strong>: defined per platform\/tier (e.g., critical path services) with rationale tied to SLOs and failure domains.<\/li>\n<li><strong>Capacity Health Dashboards<\/strong>: standardized views for utilization, saturation, headroom, forecast vs actual, and constraint lead times.<\/li>\n<li><strong>Capacity Risk Register<\/strong>: constraints, quota risks, lead times, mitigation actions, and escalation status.<\/li>\n<li><strong>Commitment \/ Reservation Recommendations<\/strong> (cloud context): buy\/renewal proposals with break-even analysis, risk assessment, and sensitivity to demand variance.<\/li>\n<li><strong>Unit Economics &amp; Efficiency Reports<\/strong>: cost-to-serve indicators (cost per request, per tenant, per GB stored, per build minute) and trends.<\/li>\n<li><strong>Peak Readiness Pack<\/strong>: pre-event capacity checks, scaling validation results, backout plans, and stakeholder sign-off artifacts.<\/li>\n<li><strong>Post-Incident Capacity Findings<\/strong>: documented root cause contributions, forecast misses, and corrective actions (instrumentation, modeling, process).<\/li>\n<li><strong>Data Model \/ Metric Definition Documentation<\/strong>: lineage, transformations, metric definitions, and governance notes.<\/li>\n<li><strong>Automation Scripts \/ Notebooks<\/strong> (where applicable): data ingestion, forecast runs, anomaly detection, and report generation.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">6) Goals, Objectives, and Milestones<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">30-day goals (onboarding and baseline)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Understand infrastructure landscape: key platforms, top workloads, scaling mechanisms, and known constraints.<\/li>\n<li>Gain access to observability, billing, and data sources; validate data quality and gaps.<\/li>\n<li>Shadow existing capacity planning rituals and identify immediate pain points (manual steps, unclear ownership, inconsistent metrics).<\/li>\n<li>Deliver a \u201ccurrent state\u201d readout:<\/li>\n<li>Top 10 capacity risks<\/li>\n<li>Current forecasting approach maturity<\/li>\n<li>Quick wins for visibility and process<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">60-day goals (first measurable improvements)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Publish a standardized capacity dashboard for at least 2\u20133 critical platforms (e.g., Kubernetes, databases, storage).<\/li>\n<li>Implement a rolling 8\u201312 week forecast for top drivers with documented assumptions and error tracking.<\/li>\n<li>Establish a repeatable capacity risk review cadence and create an actionable risk register with owners.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">90-day goals (operational ownership)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Run a full monthly capacity planning cycle end-to-end with stakeholder sign-off.<\/li>\n<li>Demonstrate improved forecast accuracy (baseline vs new model) for at least one critical resource domain.<\/li>\n<li>Deliver a commitment strategy recommendation (if applicable) with Finance alignment.<\/li>\n<li>Document headroom targets for critical tiers and secure agreement from SRE\/Infra leadership.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6-month milestones (scaling and institutionalization)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Expand coverage to most critical platforms\/services; standardize metric definitions and reporting across teams.<\/li>\n<li>Integrate capacity planning into quarterly planning and budgeting (FP&amp;A \/ FinOps cadence).<\/li>\n<li>Reduce capacity-driven incidents or near-misses through proactive interventions and better alert thresholds.<\/li>\n<li>Implement scenario planning:<\/li>\n<li>Base growth<\/li>\n<li>Upside growth<\/li>\n<li>Stress scenario (regional failover, sudden traffic spike)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12-month objectives (business impact)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Achieve sustained forecasting performance (tracked MAPE\/WMAPE) within agreed thresholds for primary drivers.<\/li>\n<li>Demonstrate measurable cost avoidance or efficiency improvement through right-sizing and commitment tuning.<\/li>\n<li>Mature capacity governance:<\/li>\n<li>Clear decision rights<\/li>\n<li>Auditable methodology<\/li>\n<li>Consistent executive reporting<\/li>\n<li>Establish capacity planning as a trusted, low-friction process used by engineering teams for launch readiness and roadmap decisions.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Long-term impact goals (strategic)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enable predictable infrastructure scalability with minimal \u201cfirefighting\u201d and fewer emergency procurements or quota escalations.<\/li>\n<li>Provide data-driven guardrails that improve both reliability and margins as the company scales.<\/li>\n<li>Contribute to platform strategy decisions (e.g., build vs buy, multi-region expansion, managed service adoption) with quantified capacity\/cost implications.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Role success definition<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Capacity risks are visible early, mitigations are executed on time, and leadership can make timely tradeoffs.<\/li>\n<li>Forecasts are trusted enough to drive commitments, budgeting, and engineering prioritization.<\/li>\n<li>Capacity is no longer a recurring root cause category for major incidents.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">What high performance looks like<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Forecasts consistently explain \u201cwhy\u201d demand changed (driver-based, not just trend lines).<\/li>\n<li>Stakeholders proactively seek input before launches and architectural changes.<\/li>\n<li>The role reduces toil by automating data pipelines and standardizing metrics, freeing time for higher-value analysis.<\/li>\n<li>Recommendations are pragmatic and actionable (clear owners, lead times, and measurable outcomes).<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">7) KPIs and Productivity Metrics<\/h2>\n\n\n\n<p>The measurement framework below balances outputs (what the role produces) with outcomes (what the business gets), including quality, efficiency, reliability, and stakeholder trust.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Metric name<\/th>\n<th>What it measures<\/th>\n<th>Why it matters<\/th>\n<th>Example target \/ benchmark<\/th>\n<th>Frequency<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Forecast accuracy (WMAPE\/MAPE) \u2013 compute<\/td>\n<td>Error between forecasted vs actual compute demand<\/td>\n<td>Determines trust and planning effectiveness<\/td>\n<td>WMAPE \u2264 10\u201315% for 4\u20138 week horizon<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Forecast accuracy \u2013 storage growth<\/td>\n<td>Error for storage consumption and growth rate<\/td>\n<td>Prevents sudden \u201cout of space\u201d events and costly emergency actions<\/td>\n<td>WMAPE \u2264 10% (workload-dependent)<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Forecast accuracy \u2013 key managed services<\/td>\n<td>Error for DB\/queue\/cache throughput or instance sizing<\/td>\n<td>These are common bottlenecks with hard limits<\/td>\n<td>WMAPE \u2264 15%<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Capacity-related incident rate<\/td>\n<td>Number of incidents where capacity was a primary or contributing factor<\/td>\n<td>Direct reliability indicator<\/td>\n<td>&lt; 1 P1\/P2 per quarter attributable to capacity (context-dependent)<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Capacity near-miss rate<\/td>\n<td>Count of events where thresholds were nearly breached (with mitigation)<\/td>\n<td>Indicates early detection and prevention<\/td>\n<td>Trend downward quarter over quarter<\/td>\n<td>Monthly\/Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Headroom compliance (critical platforms)<\/td>\n<td>% time headroom stays above minimum buffer<\/td>\n<td>Measures resilience and readiness<\/td>\n<td>\u2265 99% compliance for Tier-0\/Tier-1 services<\/td>\n<td>Weekly\/Monthly<\/td>\n<\/tr>\n<tr>\n<td>Time-to-identify capacity constraint (TTIC)<\/td>\n<td>Time from signal to confirmed constraint diagnosis<\/td>\n<td>Reduces outage duration and scramble<\/td>\n<td>&lt; 1 business day for top platforms<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Time-to-mitigate capacity constraint (TTMC)<\/td>\n<td>Time from constraint identification to executed mitigation<\/td>\n<td>Indicates execution effectiveness (often shared)<\/td>\n<td>2\u20134 weeks typical; faster for quota\/autoscaling<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Cost avoidance from right-sizing<\/td>\n<td>Estimated savings from removing waste without harming SLOs<\/td>\n<td>Shows financial value<\/td>\n<td>Target set with FinOps (e.g., 3\u20138% of infra spend\/yr)<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Commitment coverage and utilization (cloud)<\/td>\n<td>% of eligible usage covered and effectively utilized<\/td>\n<td>Optimizes spend while limiting risk<\/td>\n<td>Coverage 70\u201390% with utilization &gt; 90% (context-specific)<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Budget variance explained by capacity drivers<\/td>\n<td>% of variance attributable to defined drivers<\/td>\n<td>Builds credibility with Finance<\/td>\n<td>&gt; 80% variance explained for major shifts<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Dashboard adoption<\/td>\n<td>Active stakeholder usage \/ views \/ subscriptions<\/td>\n<td>Indicates usefulness and transparency<\/td>\n<td>Adopted by all platform leads; consistent weekly usage<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Data quality score for capacity inputs<\/td>\n<td>Completeness, freshness, and correctness of key datasets<\/td>\n<td>Bad data leads to bad decisions<\/td>\n<td>\u2265 95% freshness and completeness on key tables<\/td>\n<td>Weekly<\/td>\n<\/tr>\n<tr>\n<td>Model reproducibility \/ versioning compliance<\/td>\n<td>% of forecasts produced via versioned pipelines with documented assumptions<\/td>\n<td>Reduces fragility and audit risk<\/td>\n<td>\u2265 90% automated\/standard runs<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Action plan completion rate<\/td>\n<td>% of capacity actions completed by due date<\/td>\n<td>Converts planning into outcomes<\/td>\n<td>\u2265 85% on-time completion<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Launch readiness on-time sign-off<\/td>\n<td>% of priority launches with capacity sign-off before release<\/td>\n<td>Prevents late surprises<\/td>\n<td>\u2265 95%<\/td>\n<td>Monthly\/Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Stakeholder satisfaction score<\/td>\n<td>Surveyed satisfaction of engineering\/finance stakeholders<\/td>\n<td>Measures trust and service quality<\/td>\n<td>\u2265 4.2\/5 average<\/td>\n<td>Biannual<\/td>\n<\/tr>\n<tr>\n<td>Cross-team alignment rate<\/td>\n<td>% of plans agreed without escalation due to inconsistent assumptions<\/td>\n<td>Indicates clarity and facilitation<\/td>\n<td>Increasing trend; target set locally<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Mentoring \/ enablement impact (if applicable)<\/td>\n<td>Number of enablement sessions, playbooks, improvements adopted by teams<\/td>\n<td>Scales capacity discipline beyond one person<\/td>\n<td>1\u20132 enablement artifacts per quarter<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<p>Notes on targets:\n&#8211; Targets vary by maturity, workload volatility, and platform autoscaling sophistication. For early-stage processes, start with looser accuracy targets and tighten as data quality and driver attribution improve.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">8) Technical Skills Required<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Must-have technical skills<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Capacity planning fundamentals (Critical)<\/strong> <\/li>\n<li><em>Description:<\/em> Concepts of utilization vs saturation, headroom, bottlenecks, queueing effects, and lead-time planning.  <\/li>\n<li><em>Use:<\/em> Setting targets, identifying constraints, building decision-ready plans.<\/li>\n<li><strong>Cloud infrastructure concepts (Critical)<\/strong> <\/li>\n<li><em>Description:<\/em> Understanding of compute, storage, network, and managed services; scaling patterns and quotas\/limits.  <\/li>\n<li><em>Use:<\/em> Mapping forecasts to real scaling actions and constraints across cloud services.<\/li>\n<li><strong>Observability\/monitoring analytics (Critical)<\/strong> <\/li>\n<li><em>Description:<\/em> Ability to interpret metrics (CPU, memory, I\/O, latency, p95\/p99, error rates), and relate them to saturation and user impact.  <\/li>\n<li><em>Use:<\/em> Detecting early capacity risk and validating whether provisioning is adequate.<\/li>\n<li><strong>SQL and data analysis (Critical)<\/strong> <\/li>\n<li><em>Description:<\/em> Querying time-series and billing datasets; joining telemetry with service metadata and cost allocation.  <\/li>\n<li><em>Use:<\/em> Building reproducible analyses and driver-based reporting.<\/li>\n<li><strong>Spreadsheet and financial modeling (Important)<\/strong> <\/li>\n<li><em>Description:<\/em> Scenario modeling, sensitivity analysis, break-even calculations for commitments\/reservations.  <\/li>\n<li><em>Use:<\/em> Communicating options and tradeoffs clearly to finance and leadership.<\/li>\n<li><strong>Forecasting methods (Important)<\/strong> <\/li>\n<li><em>Description:<\/em> Time-series basics, decomposition (trend\/seasonality), regression, and error measurement (MAPE\/WMAPE).  <\/li>\n<li><em>Use:<\/em> Creating defensible forecasts with confidence intervals and backtesting.<\/li>\n<li><strong>Systems thinking for distributed platforms (Important)<\/strong> <\/li>\n<li><em>Description:<\/em> Understanding how bottlenecks propagate (DB contention, cache hit rate impact, queue backlog).  <\/li>\n<li><em>Use:<\/em> Avoiding simplistic CPU-only planning; identifying true limiting resources.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Good-to-have technical skills<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Python (or equivalent) for analytics (Important)<\/strong> <\/li>\n<li><em>Use:<\/em> Notebooks, pipelines, automation, statistical models, reproducible reporting.<\/li>\n<li><strong>FinOps and cloud billing semantics (Important)<\/strong> <\/li>\n<li><em>Use:<\/em> Translating demand to spend, understanding pricing dimensions (instance families, storage tiers, egress).<\/li>\n<li><strong>Kubernetes capacity concepts (Important)<\/strong> <\/li>\n<li><em>Use:<\/em> Node\/pod scheduling, requests\/limits, HPA\/VPA behavior, cluster autoscaler constraints.<\/li>\n<li><strong>Data visualization tools (Important)<\/strong> <\/li>\n<li><em>Use:<\/em> Building dashboards that stakeholders actually use (trendlines, burn rates, forecast vs actual).<\/li>\n<li><strong>ITSM familiarity (Optional)<\/strong> <\/li>\n<li><em>Use:<\/em> Linking capacity actions to change management, incident\/problem processes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Advanced or expert-level technical skills<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Driver-based forecasting and attribution (Important to Critical in mature orgs)<\/strong> <\/li>\n<li><em>Description:<\/em> Forecasting demand based on causal drivers (active tenants, requests per user, ingestion volume), not only historical trends.  <\/li>\n<li><em>Use:<\/em> Explaining growth and improving forecast stability through business-aligned metrics.<\/li>\n<li><strong>Advanced performance and saturation analysis (Important)<\/strong> <\/li>\n<li><em>Description:<\/em> Understanding p95\/p99 latency behavior, tail amplification, and saturation signals (queue depth, disk wait, GC pressure).  <\/li>\n<li><em>Use:<\/em> Preventing performance regressions by identifying bottlenecks early.<\/li>\n<li><strong>Scenario planning under uncertainty (Important)<\/strong> <\/li>\n<li><em>Description:<\/em> Monte Carlo or probabilistic ranges; modeling upside\/downside; risk-adjusted commitment decisions.  <\/li>\n<li><em>Use:<\/em> Avoiding brittle \u201csingle-number\u201d plans.<\/li>\n<li><strong>Data pipeline design for metric reliability (Optional to Important)<\/strong> <\/li>\n<li><em>Description:<\/em> Building robust ingestion and transformation for metrics\/cost data, with validation checks.  <\/li>\n<li><em>Use:<\/em> Reducing manual effort and increasing trust in reporting.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Emerging future skills for this role (next 2\u20135 years)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Automated anomaly detection and forecasting augmentation (Optional today; likely Important)<\/strong> <\/li>\n<li><em>Use:<\/em> Using ML-assisted detection to catch leading indicators and improve forecast iteration speed.<\/li>\n<li><strong>Policy-as-code for capacity guardrails (Context-specific)<\/strong> <\/li>\n<li><em>Use:<\/em> Enforcing headroom\/limits via templates and deployment policies (e.g., resource requests\/limits standards).<\/li>\n<li><strong>Advanced unit-economics analytics (Important)<\/strong> <\/li>\n<li><em>Use:<\/em> Connecting cost-to-serve to product levers and platform optimizations as margins come under scrutiny.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">9) Soft Skills and Behavioral Capabilities<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Structured problem solving<\/strong> <\/li>\n<li><em>Why it matters:<\/em> Capacity issues are multi-factor; stakeholders often jump to \u201cadd servers\u201d without diagnosing the bottleneck.  <\/li>\n<li><em>Shows up as:<\/em> Hypothesis-driven analysis, clear root cause articulation, prioritizing the most constraining resource.  <\/li>\n<li>\n<p><em>Strong performance:<\/em> Delivers concise problem statements and options with quantified impacts and risks.<\/p>\n<\/li>\n<li>\n<p><strong>Stakeholder management and influence without authority<\/strong> <\/p>\n<\/li>\n<li><em>Why it matters:<\/em> Execution often depends on multiple engineering teams; the analyst rarely \u201cowns\u201d implementation.  <\/li>\n<li><em>Shows up as:<\/em> Facilitating agreement on assumptions, negotiating timelines, and tracking commitments.  <\/li>\n<li>\n<p><em>Strong performance:<\/em> Teams adopt plans because they are credible and practical, not because they are mandated.<\/p>\n<\/li>\n<li>\n<p><strong>Business and financial acumen<\/strong> <\/p>\n<\/li>\n<li><em>Why it matters:<\/em> Capacity decisions directly affect cloud spend and margin; finance partners need explainable drivers.  <\/li>\n<li><em>Shows up as:<\/em> Cost-impact framing, sensitivity analysis, budget variance explanations.  <\/li>\n<li>\n<p><em>Strong performance:<\/em> Produces recommendations that balance risk and cost; reduces surprise spend.<\/p>\n<\/li>\n<li>\n<p><strong>Communication clarity (written and verbal)<\/strong> <\/p>\n<\/li>\n<li><em>Why it matters:<\/em> Capacity plans must be understandable by executives and implementable by engineers.  <\/li>\n<li><em>Shows up as:<\/em> Clean narratives, well-labeled charts, explicit assumptions, and crisp action lists.  <\/li>\n<li>\n<p><em>Strong performance:<\/em> Execs can make a decision in minutes; engineers can implement without reinterpreting.<\/p>\n<\/li>\n<li>\n<p><strong>Analytical rigor and skepticism<\/strong> <\/p>\n<\/li>\n<li><em>Why it matters:<\/em> Telemetry and billing data can be misleading; false confidence causes bad commitments.  <\/li>\n<li><em>Shows up as:<\/em> Backtesting, validation checks, triangulation across sources, documenting uncertainty.  <\/li>\n<li>\n<p><em>Strong performance:<\/em> Avoids overfitting and catches data quality issues early.<\/p>\n<\/li>\n<li>\n<p><strong>Operational mindset and urgency<\/strong> <\/p>\n<\/li>\n<li><em>Why it matters:<\/em> Capacity constraints can become incidents quickly; timeliness matters.  <\/li>\n<li><em>Shows up as:<\/em> Clear escalation, time-to-exhaustion estimates, and fast mitigation pathways.  <\/li>\n<li>\n<p><em>Strong performance:<\/em> Prevents incidents by acting early and aligning owners.<\/p>\n<\/li>\n<li>\n<p><strong>Facilitation and conflict resolution<\/strong> <\/p>\n<\/li>\n<li><em>Why it matters:<\/em> Different teams may disagree on drivers, ownership, or risk tolerance.  <\/li>\n<li><em>Shows up as:<\/em> Running effective meetings, capturing decisions, making tradeoffs explicit.  <\/li>\n<li>\n<p><em>Strong performance:<\/em> Meetings result in decisions and actions, not debate loops.<\/p>\n<\/li>\n<li>\n<p><strong>Continuous improvement orientation<\/strong> <\/p>\n<\/li>\n<li><em>Why it matters:<\/em> Capacity planning maturity improves through iteration, automation, and better driver signals.  <\/li>\n<li><em>Shows up as:<\/em> Automating repetitive reporting, refining models, improving metric definitions.  <\/li>\n<li><em>Strong performance:<\/em> Planning becomes faster, more accurate, and less dependent on heroics.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">10) Tools, Platforms, and Software<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Category<\/th>\n<th>Tool \/ platform \/ software<\/th>\n<th>Primary use<\/th>\n<th>Common \/ Optional \/ Context-specific<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Cloud platforms<\/td>\n<td>AWS \/ Azure \/ GCP<\/td>\n<td>Capacity signals (quotas, service limits), cost drivers, reservations\/commitments<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Cloud cost management<\/td>\n<td>Native billing tools (AWS Cost Explorer, Azure Cost Mgmt, GCP Billing)<\/td>\n<td>Spend analysis, commitment coverage, chargeback\/showback<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>FinOps platforms<\/td>\n<td>Apptio Cloudability, VMware Aria Cost, Harness CCM<\/td>\n<td>Advanced allocation, anomaly detection, forecasting<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Monitoring \/ metrics<\/td>\n<td>Prometheus<\/td>\n<td>Time-series metrics for infrastructure and services<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Monitoring \/ visualization<\/td>\n<td>Grafana<\/td>\n<td>Dashboards for utilization, saturation, headroom<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Observability suites<\/td>\n<td>Datadog, New Relic, Dynatrace<\/td>\n<td>Infra + APM correlation; capacity risk signals<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Logging \/ SIEM<\/td>\n<td>Splunk, Elastic Stack<\/td>\n<td>Incident investigation, trend analysis<\/td>\n<td>Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Cloud-native monitoring<\/td>\n<td>CloudWatch \/ Azure Monitor \/ Cloud Logging<\/td>\n<td>Service-specific metrics and alarms<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Incident \/ ITSM<\/td>\n<td>ServiceNow, Jira Service Management<\/td>\n<td>Linking capacity risks to incidents\/changes<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Ticketing \/ work management<\/td>\n<td>Jira<\/td>\n<td>Tracking capacity actions and ownership<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Documentation<\/td>\n<td>Confluence, Notion<\/td>\n<td>Methodology, runbooks, planning docs<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Data warehouse<\/td>\n<td>Snowflake, BigQuery, Redshift, Databricks<\/td>\n<td>Centralizing telemetry + billing + metadata<\/td>\n<td>Common (varies by org)<\/td>\n<\/tr>\n<tr>\n<td>Data transformation<\/td>\n<td>dbt<\/td>\n<td>Reproducible metric models, lineage<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Query tools<\/td>\n<td>SQL editors, Athena, Trino\/Presto<\/td>\n<td>Ad hoc and scheduled queries<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>BI \/ dashboards<\/td>\n<td>Tableau, Power BI, Looker<\/td>\n<td>Exec-ready reporting and self-serve analytics<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Spreadsheet tools<\/td>\n<td>Excel, Google Sheets<\/td>\n<td>Scenario modeling, quick sensitivity<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Scripting \/ analytics<\/td>\n<td>Python (pandas, statsmodels), R<\/td>\n<td>Forecast models, automation, notebooks<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Notebook platforms<\/td>\n<td>Jupyter, Google Colab, Databricks notebooks<\/td>\n<td>Reproducible analysis, sharing<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Infrastructure as code<\/td>\n<td>Terraform<\/td>\n<td>Understanding infra changes, capacity additions<\/td>\n<td>Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Config management<\/td>\n<td>Ansible<\/td>\n<td>Capacity-related automation (rare)<\/td>\n<td>Optional<\/td>\n<\/tr>\n<tr>\n<td>Container orchestration<\/td>\n<td>Kubernetes<\/td>\n<td>Cluster capacity, scheduling constraints<\/td>\n<td>Common (in many cloud orgs)<\/td>\n<\/tr>\n<tr>\n<td>CI\/CD visibility<\/td>\n<td>GitHub Actions, GitLab CI, Jenkins<\/td>\n<td>Build infrastructure capacity and usage drivers<\/td>\n<td>Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Source control<\/td>\n<td>GitHub \/ GitLab<\/td>\n<td>Versioning forecasting notebooks\/pipelines<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Collaboration<\/td>\n<td>Slack \/ Microsoft Teams<\/td>\n<td>Stakeholder comms, incident coordination<\/td>\n<td>Common<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">11) Typical Tech Stack \/ Environment<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Infrastructure environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Predominantly <strong>public cloud<\/strong> (AWS\/Azure\/GCP) or hybrid, with:<\/li>\n<li>Kubernetes clusters (managed or self-managed)<\/li>\n<li>VM fleets (autoscaling groups\/scale sets) for legacy or specialized workloads<\/li>\n<li>Managed databases (relational + NoSQL), caches, queues\/streams<\/li>\n<li>Object storage and block storage, with lifecycle policies<\/li>\n<li>Common capacity constraints:<\/li>\n<li>Cloud quotas and regional service limits<\/li>\n<li>Cluster\/node capacity fragmentation<\/li>\n<li>Storage IOPS limits and growth runways<\/li>\n<li>Network egress costs and throughput ceilings<\/li>\n<li>Database connections, IOPS, and replication lag<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Application environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Microservices and APIs with variable traffic patterns<\/li>\n<li>Batch\/ETL workloads (data processing) that create periodic demand spikes<\/li>\n<li>Multi-tenant SaaS patterns where customer growth changes workload shape<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Data environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Telemetry sources: metrics, traces, logs<\/li>\n<li>Billing and cost allocation datasets<\/li>\n<li>Service catalogs\/CMDB-like metadata (service ownership, tiering, environments)<\/li>\n<li>Warehousing and BI layer for executive reporting<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Least-privilege access to telemetry and billing data<\/li>\n<li>Governance around data retention (logs\/metrics), especially in regulated contexts<\/li>\n<li>Audit expectations for cost allocations and decision documentation (varies by company)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Delivery model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Agile delivery with monthly\/quarterly planning and frequent releases<\/li>\n<li>Infrastructure changes handled via platform teams; capacity planning provides inputs and guardrails<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scale or complexity context<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Typically supports:<\/li>\n<li>Multiple environments (prod, staging, dev)<\/li>\n<li>Multi-region deployments<\/li>\n<li>Hundreds to thousands of services or workloads (in larger orgs)<\/li>\n<li>Complexity is often driven by heterogeneity (different teams, stacks, maturity levels) and volatile demand.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team topology<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The role usually sits in <strong>Cloud &amp; Infrastructure<\/strong> under:<\/li>\n<li>Capacity &amp; Performance Engineering, OR<\/li>\n<li>Infrastructure Operations \/ SRE enablement, OR<\/li>\n<li>FinOps\/Cloud Economics with strong technical alignment (less common for \u201cSenior Capacity Planning Analyst\u201d in Infra)<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">12) Stakeholders and Collaboration Map<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Internal stakeholders<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Director\/Manager, Cloud Infrastructure Operations \/ Platform Engineering (manager)<\/strong> <\/li>\n<li>Collaboration: priorities, escalation, decision-making forums, resource alignment.<\/li>\n<li><strong>SRE \/ Reliability Engineering<\/strong> <\/li>\n<li>Collaboration: capacity risks, headroom targets, incident prevention, peak readiness.<\/li>\n<li><strong>Platform Engineering \/ Kubernetes team<\/strong> <\/li>\n<li>Collaboration: cluster sizing, autoscaling strategies, scheduling constraints, quota planning.<\/li>\n<li><strong>Service owners \/ Application engineering leads<\/strong> <\/li>\n<li>Collaboration: demand drivers, launch plans, performance bottlenecks, scaling patterns.<\/li>\n<li><strong>FinOps \/ Cloud Cost Management<\/strong> <\/li>\n<li>Collaboration: forecast-to-budget alignment, commitment strategy, cost anomaly explanation.<\/li>\n<li><strong>FP&amp;A (in larger orgs)<\/strong> <\/li>\n<li>Collaboration: budget cycles, variance explanations, multi-quarter outlooks.<\/li>\n<li><strong>Data Engineering \/ Analytics Engineering<\/strong> <\/li>\n<li>Collaboration: telemetry pipelines, data quality, metric definitions, dashboards.<\/li>\n<li><strong>Security \/ Risk \/ Compliance (context-specific)<\/strong> <\/li>\n<li>Collaboration: resilience requirements, retention policies, auditability.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">External stakeholders (if applicable)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cloud vendor account teams<\/strong> (AWS\/Azure\/GCP)  <\/li>\n<li>Collaboration: quota increases, roadmap constraints, commitment negotiations (often via Procurement\/FinOps).<\/li>\n<li><strong>Key third-party providers<\/strong> (CDN, observability, DB vendors)  <\/li>\n<li>Collaboration: usage forecasts and contract sizing.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Peer roles<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>FinOps Analyst \/ Cloud Economist<\/li>\n<li>SRE (Senior) \/ Performance Engineer<\/li>\n<li>Infrastructure Data Analyst \/ BI Analyst<\/li>\n<li>Capacity Planner (non-senior) or Resource Manager (in some orgs)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Upstream dependencies<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Product roadmap and launch calendars<\/li>\n<li>Service ownership metadata and tiering definitions<\/li>\n<li>Telemetry pipeline quality and availability<\/li>\n<li>Cost allocation models (tagging, account\/project structure)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Downstream consumers<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Infrastructure leadership (investment decisions)<\/li>\n<li>Platform teams (implementation)<\/li>\n<li>Finance (budgeting, commitments)<\/li>\n<li>Service teams (launch readiness, performance planning)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Nature of collaboration and decision-making authority<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The Senior Capacity Planning Analyst typically <strong>recommends<\/strong> actions and <strong>drives alignment<\/strong>, while engineering leaders <strong>approve and execute<\/strong> platform changes.<\/li>\n<li>Authority is often strongest in:<\/li>\n<li>Defining the planning process and reporting standards<\/li>\n<li>Establishing risk visibility and escalation<\/li>\n<li>Setting analytical \u201csingle source of truth\u201d metrics (with stakeholder buy-in)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Escalation points<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Capacity risk that threatens SLOs or launch dates escalates to:<\/li>\n<li>SRE\/Infra on-call lead (immediate operational risk)<\/li>\n<li>Platform Engineering Manager\/Director (execution prioritization)<\/li>\n<li>FinOps\/Finance partner (if cost\/commitment decisions needed)<\/li>\n<li>Engineering leadership (if product tradeoffs required)<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">13) Decision Rights and Scope of Authority<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Can decide independently<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Analytical methodologies, models, and forecast techniques (within agreed standards)<\/li>\n<li>Dashboard definitions and reporting formats (aligned to stakeholders)<\/li>\n<li>Prioritization of analytical deep dives and investigation focus<\/li>\n<li>When to flag and escalate capacity risks based on defined thresholds<\/li>\n<li>Recommendations for headroom targets (subject to leadership approval)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Requires team approval (SRE\/Platform\/FinOps alignment)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Changes to shared metric definitions and tiering that affect multiple teams<\/li>\n<li>Capacity planning cadence and governance changes (meeting structure, required inputs)<\/li>\n<li>Adoption of new dashboards as \u201cofficial\u201d reporting<\/li>\n<li>Commitment strategy proposals (coverage targets, risk posture) before exec\/finance sign-off<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Requires manager\/director\/executive approval<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Major spend decisions or changes in commitment strategy beyond thresholds (e.g., large savings plan purchases)<\/li>\n<li>Infrastructure expansion roadmaps requiring material investment or staffing<\/li>\n<li>Changes to reliability posture that affect customer SLAs (e.g., lowering headroom in critical tiers)<\/li>\n<li>Vendor contract impacts and procurement commitments<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Budget, architecture, vendor, delivery, hiring, compliance authority (typical)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Budget:<\/strong> Influences and recommends; typically not the final approver.<\/li>\n<li><strong>Architecture:<\/strong> Advises through constraints and scaling economics; final decisions rest with architects\/platform leads.<\/li>\n<li><strong>Vendor:<\/strong> Provides usage forecasts and sizing; procurement\/finops leads negotiations.<\/li>\n<li><strong>Delivery:<\/strong> Drives planning milestones; execution ownership remains with engineering teams.<\/li>\n<li><strong>Hiring:<\/strong> May interview and assess analytical talent; not typically a hiring manager.<\/li>\n<li><strong>Compliance:<\/strong> Ensures documentation and auditability for capacity\/cost decisions where required; compliance teams own policy.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">14) Required Experience and Qualifications<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Typical years of experience<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>6\u201310 years<\/strong> in analytics, infrastructure operations, SRE support, performance engineering, or FinOps-adjacent capacity roles.<\/li>\n<li>Seniority indicates the ability to operate independently, manage stakeholders, and run planning cycles\u2014not necessarily people management.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Education expectations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Bachelor\u2019s degree commonly in:<\/li>\n<li>Computer Science, Information Systems, Engineering, Mathematics, Statistics, Economics  <\/li>\n<li>Equivalent experience is often acceptable if the candidate demonstrates strong technical analytics and infrastructure understanding.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certifications (relevant but not mandatory)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Common\/Optional:<\/strong><\/li>\n<li>FinOps Certified Practitioner (helpful in cloud spend and commitment decisions)<\/li>\n<li>Cloud certifications (AWS\/Azure\/GCP Associate level) for platform familiarity<\/li>\n<li><strong>Context-specific:<\/strong><\/li>\n<li>ITIL Foundation (if operating heavily within ITSM governance)<\/li>\n<li>Kubernetes fundamentals (CKA\/CKAD) if deeply involved in k8s capacity constraints<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Prior role backgrounds commonly seen<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Capacity Planning Analyst \/ Resource Planning Analyst (infra)<\/li>\n<li>SRE Analyst \/ Operations Analyst with strong data skills<\/li>\n<li>FinOps Analyst with strong technical depth in infrastructure metrics<\/li>\n<li>Performance\/Load Testing Analyst (with expansion into planning)<\/li>\n<li>Data Analyst embedded in Platform\/Infrastructure org<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Domain knowledge expectations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cloud service scaling patterns, quotas, and performance drivers<\/li>\n<li>Observability metrics interpretation and reliability concepts (SLOs\/SLIs)<\/li>\n<li>Cost drivers and pricing dimensions (compute families, storage tiers, egress)<\/li>\n<li>Planning cycles and governance in engineering organizations<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Leadership experience expectations (Senior IC)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Leading cross-functional working sessions and driving alignment<\/li>\n<li>Mentoring and reviewing analytical work (where applicable)<\/li>\n<li>Communicating to executive audiences with clear risk\/cost framing<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">15) Career Path and Progression<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Common feeder roles into this role<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Capacity Planning Analyst (mid-level)<\/li>\n<li>Infrastructure\/Data Analyst (platform embedded)<\/li>\n<li>FinOps Analyst (with technical aptitude)<\/li>\n<li>SRE\/Operations Engineer transitioning into analytical planning<\/li>\n<li>Performance Analyst \/ Site Reliability Analyst<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Next likely roles after this role<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Lead\/Principal Capacity Planning Analyst<\/strong> (if the org has a deep analyst ladder)<\/li>\n<li><strong>Capacity &amp; Performance Engineering Lead<\/strong> (IC leader role)<\/li>\n<li><strong>FinOps Lead \/ Cloud Economics Manager<\/strong> (if leaning toward cost + governance)<\/li>\n<li><strong>Platform Strategy Analyst \/ Technical Program Manager (Infrastructure)<\/strong> (if leaning toward cross-org planning and execution)<\/li>\n<li><strong>SRE \/ Performance Engineer<\/strong> (if leaning toward engineering execution and reliability)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Adjacent career paths<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Infrastructure Product Management (platform product owner)<\/li>\n<li>Cloud Architecture (capacity and cost-informed design)<\/li>\n<li>Data Analytics Engineering (owning metric pipelines and models at scale)<\/li>\n<li>Strategic Finance (technology FP&amp;A specializing in cloud)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Skills needed for promotion (Senior \u2192 Lead\/Principal)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Consistent driver-based forecast accuracy across multiple domains<\/li>\n<li>Ability to influence multi-quarter investment decisions and commitment strategy<\/li>\n<li>Mature governance design: definitions, controls, and scalable processes<\/li>\n<li>Coaching others and establishing reusable tooling\/pipelines<\/li>\n<li>Strong executive communication and risk framing<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How this role evolves over time<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Early phase: dashboarding, baseline forecasting, fixing data quality, reducing immediate risks.<\/li>\n<li>Mature phase: scenario modeling, unit-economics optimization, commitments strategy, embedding capacity guardrails into platform standards.<\/li>\n<li>Advanced phase: capacity becomes integrated with product planning and reliability engineering; the role shapes platform strategy with quantified tradeoffs.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">16) Risks, Challenges, and Failure Modes<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Common role challenges<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data fragmentation:<\/strong> telemetry, cost, and ownership metadata live in different systems with inconsistent identifiers.<\/li>\n<li><strong>Unstable demand drivers:<\/strong> product changes can alter workload mix, making historical trends misleading.<\/li>\n<li><strong>Attribution gaps:<\/strong> difficulty linking resource usage to specific services\/tenants\/features.<\/li>\n<li><strong>Lead time constraints:<\/strong> quotas, migrations, procurement cycles, and change windows introduce planning complexity.<\/li>\n<li><strong>Stakeholder misalignment:<\/strong> engineering prioritizes feature delivery; finance prioritizes cost; SRE prioritizes risk reduction.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Bottlenecks<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Lack of service catalog and ownership clarity<\/li>\n<li>Incomplete tagging\/accounting for cost allocation<\/li>\n<li>Insufficient observability granularity (missing saturation metrics)<\/li>\n<li>Limited automation causing manual reporting toil<\/li>\n<li>Execution capacity: plans exist but teams cannot implement due to competing priorities<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Anti-patterns<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Planning only on CPU utilization (ignoring memory, I\/O, queues, and tail latency)<\/li>\n<li>Treating forecasts as single-point truths without confidence bands and scenarios<\/li>\n<li>Over-indexing on commitments without risk-adjusted baselines<\/li>\n<li>Producing reports without clear decisions\/actions attached<\/li>\n<li>Capacity planning as an annual event rather than a continuous discipline<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common reasons for underperformance<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Weak infrastructure fundamentals (can\u2019t diagnose bottlenecks)<\/li>\n<li>Inability to influence and drive alignment across teams<\/li>\n<li>Poor analytical rigor (no backtesting, unclear assumptions)<\/li>\n<li>Communication that is too technical for executives or too vague for engineers<\/li>\n<li>Failure to operationalize plans (no tracking, no owners, no escalation)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Business risks if this role is ineffective<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Increased outage risk and SLA breaches due to under-capacity<\/li>\n<li>Chronic waste and margin erosion due to over-capacity<\/li>\n<li>Unplanned spend spikes and missed budgets<\/li>\n<li>Missed launch dates or degraded customer experience during peaks<\/li>\n<li>Leadership decisions made on intuition rather than quantified tradeoffs<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">17) Role Variants<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">By company size<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Startup \/ early growth:<\/strong> <\/li>\n<li>More hands-on with dashboards, ad hoc analysis, and immediate firefighting.  <\/li>\n<li>Likely closer to engineering teams; fewer formal governance processes.<\/li>\n<li><strong>Mid-size scale-up:<\/strong> <\/li>\n<li>Formal monthly\/quarterly planning begins; multi-team alignment becomes central.  <\/li>\n<li>Strong focus on commitments, unit economics, and scaling playbooks.<\/li>\n<li><strong>Enterprise:<\/strong> <\/li>\n<li>Heavy governance, multiple regions\/business units, complex chargeback.  <\/li>\n<li>More specialization (separate teams for FinOps, performance, capacity).  <\/li>\n<li>Stronger audit requirements and formal decision boards.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">By industry<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>B2B SaaS:<\/strong> strong focus on multi-tenant growth drivers, tiering, and predictable seasonality (renewals, usage peaks).  <\/li>\n<li><strong>Consumer \/ media \/ gaming:<\/strong> heavy peak-event planning, rapid spikes, and resilience under burst traffic.  <\/li>\n<li><strong>Internal IT \/ platform for business units:<\/strong> more emphasis on chargeback, service catalog maturity, and governance alignment.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">By geography<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Planning may need region-specific considerations:<\/li>\n<li>Data residency and multi-region requirements<\/li>\n<li>Vendor availability and quota differences<\/li>\n<li>Time zone coordination for incident and planning rituals<br\/>\n(Principles remain consistent globally.)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Product-led vs service-led company<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Product-led:<\/strong> capacity ties directly to product roadmap, feature adoption, and user behavior analytics; stronger need for driver attribution.  <\/li>\n<li><strong>Service-led \/ managed services:<\/strong> capacity planning often integrates with contractual commitments and customer onboarding schedules, sometimes with explicit capacity reservations.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Startup vs enterprise operating model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Startup:<\/strong> speed and agility; fewer constraints; more manual work tolerated.  <\/li>\n<li><strong>Enterprise:<\/strong> formal boards, audit trails, and standardized metrics; more stakeholders and longer lead times.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Regulated vs non-regulated environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Regulated (finance\/health):<\/strong> stronger requirements for resilience, auditability of decisions, retention policies for telemetry, and documented risk management.  <\/li>\n<li><strong>Non-regulated:<\/strong> faster experimentation and lighter governance, but still needs strong reliability discipline at scale.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">18) AI \/ Automation Impact on the Role<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Tasks that can be automated (today and near-term)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data extraction and normalization:<\/strong> automated ingestion from metrics, billing, and service catalogs.<\/li>\n<li><strong>Recurring reporting:<\/strong> scheduled forecast runs, dashboard refreshes, variance reports.<\/li>\n<li><strong>Baseline anomaly detection:<\/strong> detecting unusual growth rates, sudden utilization shifts, quota approach, and cost anomalies.<\/li>\n<li><strong>Forecast backtesting and error tracking:<\/strong> automated evaluation of model performance over time.<\/li>\n<li><strong>Narrative drafting:<\/strong> first-pass executive summaries (\u201cwhat changed, why, and what to do\u201d) based on structured inputs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Tasks that remain human-critical<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Assumption setting and scenario framing:<\/strong> selecting drivers, interpreting roadmap signals, and judging uncertainty.<\/li>\n<li><strong>Cross-functional alignment:<\/strong> resolving conflicts, negotiating tradeoffs, and securing commitment from owners.<\/li>\n<li><strong>Risk decisions:<\/strong> choosing headroom targets and resilience posture based on business impact.<\/li>\n<li><strong>Root cause reasoning in complex systems:<\/strong> interpreting multi-signal evidence, avoiding false causality.<\/li>\n<li><strong>Executive communication:<\/strong> translating technical capacity constraints into business outcomes and decision options.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How AI changes the role over the next 2\u20135 years<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The role shifts from \u201cbuilding reports\u201d to \u201crunning an integrated capacity intelligence system\u201d:<\/li>\n<li>Faster iteration on models through automated feature engineering and anomaly surfacing<\/li>\n<li>More real-time capacity runway insights (time-to-exhaustion forecasts)<\/li>\n<li>Tighter coupling between product analytics and infrastructure forecasts (driver-based planning becomes standard)<\/li>\n<li>Expectations increase for:<\/li>\n<li>Automation and reproducibility (less manual spreadsheet work)<\/li>\n<li>Governance of model outputs (explainability, monitoring model drift)<\/li>\n<li>Proactive recommendations (\u201cnext best action\u201d) rather than retrospective insights<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">New expectations caused by AI, automation, or platform shifts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Ability to validate AI-generated insights and avoid overreliance<\/li>\n<li>Stronger data governance and metric definition discipline<\/li>\n<li>Higher bar for forecasting explainability and scenario communication<\/li>\n<li>Greater emphasis on unit economics and optimization as cloud costs remain a board-level concern<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">19) Hiring Evaluation Criteria<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to assess in interviews<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Infrastructure capacity fundamentals:<\/strong> Can the candidate reason about bottlenecks beyond CPU?<\/li>\n<li><strong>Forecasting approach:<\/strong> How do they choose models, measure error, and handle seasonality\/outliers?<\/li>\n<li><strong>Data fluency:<\/strong> SQL skills, comfort joining disparate datasets, handling missing data.<\/li>\n<li><strong>Stakeholder influence:<\/strong> Ability to align SRE, platform, and finance without direct authority.<\/li>\n<li><strong>Decision framing:<\/strong> Can they present options with tradeoffs, not just analysis?<\/li>\n<li><strong>Operational credibility:<\/strong> Comfort participating in incident analysis and translating learnings into prevention.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Practical exercises or case studies<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Capacity forecast case (90 minutes):<\/strong><br\/>\n   &#8211; Provide time-series usage + a product growth driver + a known seasonality event.<br\/>\n   &#8211; Ask for: forecast, confidence range, key assumptions, risks, and recommended actions.<\/li>\n<li><strong>Bottleneck identification scenario (45 minutes):<\/strong><br\/>\n   &#8211; Provide a dashboard screenshot set (CPU, memory, latency, queue depth, IOPS).<br\/>\n   &#8211; Ask: identify likely constraint, what additional metrics you\u2019d request, and mitigation plan.<\/li>\n<li><strong>Commitment recommendation (optional, 45 minutes):<\/strong><br\/>\n   &#8211; Provide baseline usage and pricing; ask for buy\/hold recommendation with risk analysis.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Strong candidate signals<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Explains capacity in terms of <strong>saturation and user impact<\/strong>, not just average utilization.<\/li>\n<li>Uses <strong>backtesting<\/strong> and discusses model limitations honestly.<\/li>\n<li>Demonstrates <strong>driver-based thinking<\/strong> (ties infrastructure demand to product\/business signals).<\/li>\n<li>Communicates clearly with <strong>assumptions, ranges, and decision options<\/strong>.<\/li>\n<li>Has examples of influencing engineering execution (actions completed, incidents prevented, savings realized).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Weak candidate signals<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Treats capacity planning as static spreadsheet work without continuous iteration.<\/li>\n<li>Focuses on \u201cmore resources\u201d without diagnosing constraints.<\/li>\n<li>Cannot explain variance between forecast and actual beyond hand-waving.<\/li>\n<li>Avoids stakeholder conflict or lacks experience driving cross-team decisions.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Red flags<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Overconfidence in forecasts without uncertainty ranges or validation<\/li>\n<li>No understanding of quotas\/limits and lead times in cloud environments<\/li>\n<li>Inability to discuss reliability concepts (SLOs\/SLIs, headroom rationale)<\/li>\n<li>Produces \u201cpretty dashboards\u201d that don\u2019t drive decisions or actions<\/li>\n<li>Blames data without proposing pragmatic improvements or triangulation<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Interview scorecard dimensions<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Dimension<\/th>\n<th style=\"text-align: right;\">Weight<\/th>\n<th>What \u201cmeets\u201d looks like<\/th>\n<th>How to assess<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Capacity planning fundamentals<\/td>\n<td style=\"text-align: right;\">15%<\/td>\n<td>Correctly identifies bottlenecks, headroom needs, lead times<\/td>\n<td>Technical interview + scenario<\/td>\n<\/tr>\n<tr>\n<td>Forecasting &amp; modeling<\/td>\n<td style=\"text-align: right;\">15%<\/td>\n<td>Chooses reasonable methods; uses error metrics; handles seasonality<\/td>\n<td>Case exercise + discussion<\/td>\n<\/tr>\n<tr>\n<td>Data fluency (SQL\/analytics)<\/td>\n<td style=\"text-align: right;\">15%<\/td>\n<td>Can query\/join datasets, validate data, produce insights<\/td>\n<td>Live SQL\/pseudocode + case<\/td>\n<\/tr>\n<tr>\n<td>Cloud platform literacy<\/td>\n<td style=\"text-align: right;\">10%<\/td>\n<td>Understands scaling primitives, quotas, managed services constraints<\/td>\n<td>Technical interview<\/td>\n<\/tr>\n<tr>\n<td>Observability interpretation<\/td>\n<td style=\"text-align: right;\">10%<\/td>\n<td>Reads dashboards, distinguishes utilization vs saturation<\/td>\n<td>Scenario review<\/td>\n<\/tr>\n<tr>\n<td>FinOps\/cost awareness<\/td>\n<td style=\"text-align: right;\">10%<\/td>\n<td>Understands cost drivers and commitment tradeoffs<\/td>\n<td>Behavioral + mini-case<\/td>\n<\/tr>\n<tr>\n<td>Stakeholder influence<\/td>\n<td style=\"text-align: right;\">15%<\/td>\n<td>Demonstrates alignment and execution tracking<\/td>\n<td>Behavioral interview<\/td>\n<\/tr>\n<tr>\n<td>Communication quality<\/td>\n<td style=\"text-align: right;\">10%<\/td>\n<td>Clear narrative, assumptions, decision framing<\/td>\n<td>Written summary + verbal<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">20) Final Role Scorecard Summary<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Category<\/th>\n<th>Summary<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Role title<\/strong><\/td>\n<td>Senior Capacity Planning Analyst<\/td>\n<\/tr>\n<tr>\n<td><strong>Role purpose<\/strong><\/td>\n<td>Ensure cloud and infrastructure capacity meets demand reliably and cost-effectively through forecasting, planning, risk management, and decision support.<\/td>\n<\/tr>\n<tr>\n<td><strong>Top 10 responsibilities<\/strong><\/td>\n<td>1) Multi-horizon capacity forecasting 2) Driver-based demand modeling 3) Run monthly\/quarterly planning cycle 4) Maintain capacity risk register 5) Define headroom targets 6) Capacity health dashboards 7) Peak readiness planning 8) Capacity incident prevention support 9) Commitment\/reservation recommendations (cloud) 10) Executive reporting and alignment facilitation<\/td>\n<\/tr>\n<tr>\n<td><strong>Top 10 technical skills<\/strong><\/td>\n<td>1) Capacity planning fundamentals 2) Cloud infrastructure concepts 3) Observability analytics 4) SQL 5) Forecasting methods and error tracking 6) Spreadsheet\/financial modeling 7) Python analytics 8) Kubernetes capacity concepts 9) Cost driver understanding (FinOps) 10) Scenario planning under uncertainty<\/td>\n<\/tr>\n<tr>\n<td><strong>Top 10 soft skills<\/strong><\/td>\n<td>1) Structured problem solving 2) Influence without authority 3) Business\/financial acumen 4) Clear communication 5) Analytical rigor 6) Operational urgency 7) Facilitation and conflict resolution 8) Stakeholder empathy 9) Continuous improvement mindset 10) Executive decision framing<\/td>\n<\/tr>\n<tr>\n<td><strong>Top tools or platforms<\/strong><\/td>\n<td>Cloud platform (AWS\/Azure\/GCP), Grafana, Prometheus, Cloud billing tools, SQL warehouse (Snowflake\/BigQuery\/Redshift), BI (Tableau\/Power BI\/Looker), Python notebooks, Jira, Confluence\/Notion, ServiceNow\/JSM (optional)<\/td>\n<\/tr>\n<tr>\n<td><strong>Top KPIs<\/strong><\/td>\n<td>Forecast accuracy (compute\/storage\/services), capacity-related incident rate, headroom compliance, time-to-identify\/mitigate constraints, cost avoidance from right-sizing, commitment coverage\/utilization, budget variance explained by drivers, action plan completion rate, launch readiness sign-off rate, stakeholder satisfaction<\/td>\n<\/tr>\n<tr>\n<td><strong>Main deliverables<\/strong><\/td>\n<td>Rolling forecasts, quarterly capacity plan, headroom policy, capacity dashboards, capacity risk register, commitment recommendations, unit economics reports, peak readiness pack, post-incident capacity analysis, metric\/model documentation<\/td>\n<\/tr>\n<tr>\n<td><strong>Main goals<\/strong><\/td>\n<td>90 days: run planning cycle end-to-end, establish dashboards + baseline forecast accuracy, agree headroom targets. 6\u201312 months: expand coverage, integrate into budgeting, reduce capacity incidents, deliver measurable cost avoidance and improved reliability readiness.<\/td>\n<\/tr>\n<tr>\n<td><strong>Career progression options<\/strong><\/td>\n<td>Lead\/Principal Capacity Planning Analyst; Capacity &amp; Performance Engineering Lead; FinOps Lead\/Manager; Infrastructure TPM (strategy\/execution); Platform Strategy Analyst; SRE\/Performance Engineer (execution-oriented path).<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>The Senior Capacity Planning Analyst ensures that cloud and infrastructure platforms have the right amount of compute, storage, network, and supporting service capacity to meet product demand\u2014without chronic overprovisioning or capacity-related outages. This role translates business and engineering signals into quantitative forecasts, actionable capacity plans, and prioritized investments across cloud spend, reservations\/commitments, and infrastructure roadmaps.<\/p>\n","protected":false},"author":61,"featured_media":0,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_joinchat":[],"footnotes":""},"categories":[24453,24455],"tags":[],"class_list":["post-72531","post","type-post","status-publish","format-standard","hentry","category-analyst","category-cloud-infrastructure"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/72531","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/users\/61"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=72531"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/72531\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=72531"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=72531"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=72531"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}