{"id":72980,"date":"2026-04-13T09:28:37","date_gmt":"2026-04-13T09:28:37","guid":{"rendered":"https:\/\/www.devopsschool.com\/blog\/lead-infrastructure-architect-role-blueprint-responsibilities-skills-kpis-and-career-path\/"},"modified":"2026-04-13T09:28:37","modified_gmt":"2026-04-13T09:28:37","slug":"lead-infrastructure-architect-role-blueprint-responsibilities-skills-kpis-and-career-path","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/blog\/lead-infrastructure-architect-role-blueprint-responsibilities-skills-kpis-and-career-path\/","title":{"rendered":"Lead Infrastructure Architect: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">1) Role Summary<\/h2>\n\n\n\n<p>The <strong>Lead Infrastructure Architect<\/strong> designs, evolves, and governs the target-state infrastructure architecture that powers reliable, secure, cost-effective delivery of software products and internal platforms. This role leads architecture decisions across cloud, on-prem (where applicable), networking, identity, compute, storage, and operational tooling\u2014translating business and engineering needs into practical reference architectures and roadmaps that teams can implement.<\/p>\n\n\n\n<p>This role exists in a software or IT organization to ensure infrastructure decisions are <strong>intentional, scalable, standardized where it matters, and adaptable where it doesn\u2019t<\/strong>\u2014reducing operational risk, accelerating delivery, improving resilience, and controlling cost. The business value is realized through higher platform reliability, faster environment provisioning, improved security posture, measurable cloud efficiency, and reduced engineering friction.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Role horizon:<\/strong> Current (enterprise-proven practices; continuous evolution with cloud-native and platform engineering trends)<\/li>\n<li><strong>Typical interaction teams\/functions:<\/strong> Product Engineering, SRE\/Operations, Security, Network Engineering, Enterprise Architecture, DevOps\/Platform Engineering, ITSM, Finance\/FinOps, Procurement\/Vendor Management, Compliance\/Risk, Data\/Analytics platform teams<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">2) Role Mission<\/h2>\n\n\n\n<p><strong>Core mission:<\/strong><br\/>\nDefine and drive the infrastructure architecture strategy and execution guardrails that enable engineering teams to ship and run services safely, reliably, and efficiently at scale.<\/p>\n\n\n\n<p><strong>Strategic importance to the company:<\/strong><br\/>\nInfrastructure is a force multiplier: strong infrastructure architecture reduces downtime, speeds delivery, increases security assurance, and improves unit economics. The Lead Infrastructure Architect ensures that platform and infrastructure choices are coherent across domains (compute, network, identity, observability, automation) and aligned with business priorities and risk appetite.<\/p>\n\n\n\n<p><strong>Primary business outcomes expected:<\/strong>\n&#8211; Reduced service outages and faster recovery via resilient, standardized patterns\n&#8211; Faster delivery through self-service infrastructure and paved-road platforms\n&#8211; Stronger security posture through consistent controls, segmentation, and identity design\n&#8211; Lower cloud and infrastructure costs through right-sizing, lifecycle management, and FinOps governance\n&#8211; Higher engineering productivity via simpler, documented, reusable infrastructure patterns<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3) Core Responsibilities<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Strategic responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Infrastructure target architecture &amp; roadmap:<\/strong> Define multi-year target state across cloud\/hybrid infrastructure, networking, identity, and operational platforms; maintain a roadmap that sequences modernization with measurable outcomes.<\/li>\n<li><strong>Reference architectures &amp; standards:<\/strong> Establish and maintain approved architecture patterns (e.g., landing zones, network segmentation, workload archetypes, Kubernetes baseline, secrets management).<\/li>\n<li><strong>Technology selection &amp; lifecycle governance:<\/strong> Lead evaluation and standardization of core infrastructure technologies; define deprecation paths and migration strategies to reduce fragmentation.<\/li>\n<li><strong>Cloud operating model alignment:<\/strong> Shape the cloud\/hybrid operating model (shared services vs product-aligned teams), including ownership boundaries, SLO expectations, and platform support models.<\/li>\n<li><strong>Capacity and resilience strategy:<\/strong> Define resilience tiers and capacity planning approaches; influence multi-region strategy, DR policy, and service criticality classification.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Operational responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"6\">\n<li><strong>Architecture support for delivery:<\/strong> Partner with delivery teams to remove architectural blockers, enabling faster provisioning, environment readiness, and production onboarding.<\/li>\n<li><strong>Operational readiness &amp; production gating:<\/strong> Ensure new platforms\/services meet operational readiness requirements (monitoring, alerting, runbooks, SLOs, incident response procedures).<\/li>\n<li><strong>Incident and problem management contribution:<\/strong> Participate in major incident response as an escalation point; lead or contribute to root-cause analysis and systemic fixes.<\/li>\n<li><strong>Cost governance and optimization:<\/strong> Partner with FinOps to implement tagging standards, chargeback\/showback models, cost anomaly detection, and optimization playbooks.<\/li>\n<li><strong>Vendor and contract input:<\/strong> Provide technical input to vendor selection, contract renewals, and licensing strategies (including commercial and open-source support models).<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Technical responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"11\">\n<li><strong>Cloud foundation \/ landing zone architecture:<\/strong> Architect accounts\/subscriptions structure, identity integration, guardrails\/policies, network hub\/spoke, and baseline security controls.<\/li>\n<li><strong>Network and connectivity architecture:<\/strong> Design connectivity patterns (VPC\/VNet, peering, transit, VPN\/Direct Connect\/ExpressRoute), DNS strategy, egress controls, and segmentation.<\/li>\n<li><strong>Compute, container, and orchestration strategy:<\/strong> Define workload placement strategy (VMs vs containers vs serverless), Kubernetes platform architecture, cluster lifecycle, and multi-tenancy approach.<\/li>\n<li><strong>Infrastructure as Code (IaC) and automation:<\/strong> Drive IaC standards (Terraform\/Bicep\/CloudFormation), module strategy, pipeline integration, policy-as-code, and drift detection.<\/li>\n<li><strong>Observability and reliability architecture:<\/strong> Define logging\/metrics\/tracing standards, alerting strategy, SLO instrumentation expectations, and telemetry pipelines.<\/li>\n<li><strong>Identity, secrets, and key management architecture:<\/strong> Ensure coherent designs for IAM, RBAC, privileged access, secrets management, and encryption key lifecycle.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Cross-functional or stakeholder responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"17\">\n<li><strong>Architecture governance forums:<\/strong> Lead\/participate in architecture review boards; provide lightweight, outcome-driven governance that enables speed with guardrails.<\/li>\n<li><strong>Security partnership:<\/strong> Co-author patterns with Security (threat models, zero trust principles, secure-by-default configurations) and ensure controls are implementable.<\/li>\n<li><strong>Developer experience collaboration:<\/strong> Partner with platform engineering to deliver \u201cpaved roads\u201d and self-service workflows; reduce cognitive load for product teams.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Governance, compliance, or quality responsibilities<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"20\">\n<li><strong>Compliance-by-design:<\/strong> Ensure infrastructure designs support audit requirements (e.g., SOC 2, ISO 27001, PCI DSS, HIPAA\u2014context-dependent), including logging retention, access controls, and change traceability.<\/li>\n<li><strong>Risk management and exception handling:<\/strong> Define a pragmatic approach for architectural exceptions, including risk acceptance, compensating controls, and expiration dates.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Leadership responsibilities (Lead-level scope)<\/h3>\n\n\n\n<ol class=\"wp-block-list\" start=\"22\">\n<li><strong>Technical leadership and mentoring:<\/strong> Mentor architects and senior engineers; set expectations for design quality and documentation; raise the overall infrastructure architecture capability.<\/li>\n<li><strong>Cross-domain alignment:<\/strong> Align infrastructure architecture with enterprise architecture, application architecture, and security architecture to prevent local optimizations that harm the whole.<\/li>\n<li><strong>Influence and decision facilitation:<\/strong> Drive decisions through structured options, trade-offs, and clear recommendations; resolve conflicts between teams with competing priorities.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">4) Day-to-Day Activities<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Daily activities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Review and respond to architecture questions from platform, SRE, and product teams (e.g., network flows, IAM approach, Kubernetes patterns).<\/li>\n<li>Validate infrastructure designs for new services or major changes (production onboarding, scaling, new region, new data residency needs).<\/li>\n<li>Consult on incident remediation when infrastructure design contributes to instability (e.g., noisy neighbor, misconfigured autoscaling, DNS issues).<\/li>\n<li>Collaborate with Security on high-priority findings and ensure fixes align to reference patterns rather than one-off changes.<\/li>\n<li>Provide quick design feedback on IaC pull requests affecting shared foundations (landing zones, network hubs, cluster baselines).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Weekly activities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Participate in architecture review board \/ technical design reviews; approve, reject, or request iteration based on standards and risk.<\/li>\n<li>Sync with platform engineering and SRE leads on roadmap, reliability trends, and upcoming migrations.<\/li>\n<li>Review cost and usage dashboards with FinOps; prioritize optimization initiatives.<\/li>\n<li>Validate operational readiness for services approaching launch (SLOs, observability, runbooks, on-call readiness).<\/li>\n<li>Hold office hours for teams to accelerate decisions and reduce back-and-forth.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Monthly or quarterly activities<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Update target architecture and roadmaps; publish changes to standards and patterns.<\/li>\n<li>Run resilience reviews (DR tests, game days) and track systemic improvements.<\/li>\n<li>Conduct lifecycle reviews: deprecations, version upgrades (Kubernetes, OS images), and platform EOL planning.<\/li>\n<li>Review vendor performance and platform SLAs; recommend adjustments or alternatives.<\/li>\n<li>Run cross-team retrospectives on incident themes (e.g., IAM sprawl, network complexity, insufficient alerts).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Recurring meetings or rituals<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Architecture Review Board (weekly\/biweekly)<\/li>\n<li>Platform\/SRE roadmap sync (weekly)<\/li>\n<li>Security architecture sync (biweekly\/monthly)<\/li>\n<li>FinOps review (monthly)<\/li>\n<li>Change advisory or production readiness gate (context-specific; often weekly)<\/li>\n<li>Quarterly architecture strategy review with senior leadership<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Incident, escalation, or emergency work (when relevant)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Serve as escalation during Severity 1\/2 incidents involving cloud\/network\/platform foundations.<\/li>\n<li>Provide rapid risk-based decisions (e.g., temporary bypass vs safe rollback).<\/li>\n<li>Lead post-incident corrective action design ensuring long-term resilience (not just immediate patching).<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">5) Key Deliverables<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Infrastructure Target Architecture<\/strong> (current state, target state, transition states, and sequencing)<\/li>\n<li><strong>Cloud\/Hybrid Reference Architectures<\/strong> (landing zone, connectivity, network segmentation, identity patterns)<\/li>\n<li><strong>Standardized Infrastructure Blueprints<\/strong> (workload archetypes: stateless service, stateful service, batch job, event-driven, data pipeline)<\/li>\n<li><strong>Architecture Decision Records (ADRs)<\/strong> for foundational decisions and trade-offs<\/li>\n<li><strong>Technology Standards Catalog<\/strong> (approved tools, versions, guardrails, supported patterns, deprecation timelines)<\/li>\n<li><strong>Infrastructure Roadmaps<\/strong> (quarterly and annual; tied to reliability, security, cost, and delivery goals)<\/li>\n<li><strong>Operational Readiness Checklists<\/strong> (SLOs, observability, runbooks, on-call, capacity testing)<\/li>\n<li><strong>Resilience &amp; DR Strategy<\/strong> (RTO\/RPO tiers, backup patterns, cross-region plans, test schedules)<\/li>\n<li><strong>IaC Standards and Module Strategy<\/strong> (module\/library conventions, testing requirements, policy-as-code integration)<\/li>\n<li><strong>Network and DNS Strategy Documents<\/strong> (naming, resolution, private endpoints, egress controls)<\/li>\n<li><strong>IAM and Privileged Access Model<\/strong> (role design, break-glass, just-in-time access\u2014context-specific)<\/li>\n<li><strong>Observability Standards<\/strong> (logging fields, trace propagation, metrics naming, alert quality guidelines)<\/li>\n<li><strong>Cost Optimization Playbooks<\/strong> (right-sizing, lifecycle policies, reserved capacity approach, storage tiering)<\/li>\n<li><strong>Compliance Evidence Support Artifacts<\/strong> (control mappings, architecture narratives, logging\/access documentation)<\/li>\n<li><strong>Migration Plans<\/strong> (data center exit, Kubernetes adoption, legacy network simplification, tool consolidation)<\/li>\n<li><strong>Enablement Materials<\/strong> (architecture onboarding sessions, internal documentation, design templates)<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">6) Goals, Objectives, and Milestones<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">30-day goals (orientation and baseline)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Build relationships with platform engineering, SRE, security, and key product engineering leaders.<\/li>\n<li>Inventory current infrastructure landscape: cloud accounts\/subscriptions, network topology, identity integration, observability toolchain, IaC maturity.<\/li>\n<li>Identify top reliability and security risks in the foundation (single points of failure, inconsistent IAM, unmonitored critical paths).<\/li>\n<li>Establish a lightweight engagement model: office hours, intake process, review cadence.<\/li>\n<li>Deliver one high-impact \u201cquick win\u201d (e.g., standard tagging policy, baseline network flow documentation, or improved landing zone guardrail).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">60-day goals (standards and alignment)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Publish v1 of infrastructure reference architectures and a standards catalog (what\u2019s supported vs discouraged).<\/li>\n<li>Align with Security on baseline controls and exception handling process.<\/li>\n<li>Define production readiness expectations for platform components and new services.<\/li>\n<li>Propose a 2\u20133 quarter roadmap with prioritized initiatives and measurable outcomes.<\/li>\n<li>Introduce ADR discipline for foundational decisions and ensure visibility across teams.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">90-day goals (execution influence and measurable improvements)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Deliver a target architecture and transition plan for the most critical foundation domain (commonly: landing zone + network + IAM baseline).<\/li>\n<li>Improve consistency in IaC through module strategy and pipeline guardrails (linting, policy-as-code, drift checks).<\/li>\n<li>Reduce at least one major operational pain point (e.g., DNS instability, cluster upgrade risk, manual provisioning delays).<\/li>\n<li>Establish baseline SLOs for shared platform components (e.g., cluster API availability, CI runners, artifact registry\u2014context-specific).<\/li>\n<li>Demonstrate clear stakeholder satisfaction improvements (faster decisions, fewer escalations, clearer patterns).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6-month milestones<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Foundation modernization underway with adoption metrics (percentage workloads in compliant landing zone, standardized network segmentation coverage).<\/li>\n<li>Observability standards adopted by a meaningful subset of services; improved alert quality (lower noise, faster detection).<\/li>\n<li>Defined DR tiers and at least one completed DR exercise for a critical system.<\/li>\n<li>FinOps governance operating effectively: tags, showback, and a prioritized cost optimization backlog.<\/li>\n<li>Reduced tool sprawl or improved consistency (e.g., consolidating monitoring, secrets management, or CI runners).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">12-month objectives<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Achieve measurable reliability improvements for shared platforms (reduced Sev-1 incidents tied to infrastructure; improved MTTR).<\/li>\n<li>Increase delivery throughput via self-service and paved road adoption (reduced lead time for environment provisioning).<\/li>\n<li>Meet audit and compliance requirements with fewer \u201cscramble\u201d efforts (controls embedded in architecture and pipelines).<\/li>\n<li>Mature platform lifecycle management: predictable upgrades, deprecations, and migration patterns.<\/li>\n<li>Institutionalize architectural governance that is enabling (faster decisions, fewer exceptions, lower rework).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Long-term impact goals (18\u201336 months)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Infrastructure architecture becomes a competitive advantage: faster product launches, safer experimentation, and sustainable operations.<\/li>\n<li>Clear, product-aligned platform strategy enabling autonomous teams with guardrails.<\/li>\n<li>Lower total cost of ownership (TCO) for infrastructure through standardized patterns and better capacity economics.<\/li>\n<li>A strong internal architecture community: shared practices, reusable modules, and consistent operational excellence.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Role success definition<\/h3>\n\n\n\n<p>The role is successful when infrastructure choices consistently support business goals and engineering velocity, while reducing reliability\/security risk and controlling cost\u2014without creating unnecessary governance friction.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What high performance looks like<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Architectural decisions are timely, transparent, and adopted (not just documented).<\/li>\n<li>Platform and product teams trust the guidance and use standardized patterns by default.<\/li>\n<li>Clear improvements in reliability, security posture, and cost efficiency are evidenced by metrics.<\/li>\n<li>The architect anticipates constraints (scale, data residency, audit, latency) and prevents avoidable rework.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">7) KPIs and Productivity Metrics<\/h2>\n\n\n\n<p>The following framework balances <strong>outputs<\/strong> (what gets produced), <strong>outcomes<\/strong> (business\/engineering impact), and <strong>operational health<\/strong>.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Metric name<\/th>\n<th>What it measures<\/th>\n<th>Why it matters<\/th>\n<th>Example target \/ benchmark<\/th>\n<th>Frequency<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Reference architecture adoption rate<\/td>\n<td>% of new workloads using approved patterns (landing zone, network, IAM, observability)<\/td>\n<td>Adoption indicates standards are usable and reducing variation<\/td>\n<td>70\u201390% of new workloads adopt baseline patterns<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Architecture decision cycle time<\/td>\n<td>Time from intake to decision\/recommendation<\/td>\n<td>Slow decisions block delivery and cause shadow architecture<\/td>\n<td>Median 5\u201310 business days for standard decisions<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Infrastructure provisioning lead time<\/td>\n<td>Time to provision compliant environments (dev\/test\/prod)<\/td>\n<td>Directly impacts delivery speed and developer experience<\/td>\n<td>Reduce by 30\u201350% within 2\u20133 quarters<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Change failure rate (infra)<\/td>\n<td>% infra changes causing incidents\/rollbacks<\/td>\n<td>Indicates stability of IaC, testing, and rollout patterns<\/td>\n<td>&lt;10\u201315% (context-dependent)<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Sev-1\/Sev-2 incidents attributable to infra foundations<\/td>\n<td>Count and severity where root cause ties to platform\/network\/IAM<\/td>\n<td>Measures effectiveness of architectural foundations<\/td>\n<td>Reduction of 20\u201340% YoY<\/td>\n<td>Monthly\/Quarterly<\/td>\n<\/tr>\n<tr>\n<td>MTTR for infra-driven incidents<\/td>\n<td>Mean time to restore during infra events<\/td>\n<td>Captures resilience and operational readiness<\/td>\n<td>Improve by 15\u201330%<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>DR readiness score<\/td>\n<td>Coverage of tested backups, failover runbooks, and RTO\/RPO compliance<\/td>\n<td>Validates resilience beyond \u201cpaper DR\u201d<\/td>\n<td>90% of Tier-1 systems tested annually<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Policy compliance rate (guardrails)<\/td>\n<td>% of resources compliant with security\/config policies<\/td>\n<td>Reduces audit risk and security exposure<\/td>\n<td>&gt;95% compliance for critical policies<\/td>\n<td>Weekly\/Monthly<\/td>\n<\/tr>\n<tr>\n<td>Cost optimization realized<\/td>\n<td>Savings from committed use discounts, right-sizing, storage lifecycle, idle cleanup<\/td>\n<td>Demonstrates financial impact of architecture governance<\/td>\n<td>5\u201315% annual run-rate optimization (context-dependent)<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Unit cost trends<\/td>\n<td>Cost per transaction\/request\/customer or per environment<\/td>\n<td>Links infra changes to product economics<\/td>\n<td>Stable or improving unit costs as usage grows<\/td>\n<td>Monthly\/Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Observability coverage<\/td>\n<td>% services meeting logging\/metrics\/tracing standards<\/td>\n<td>Improves detection, diagnosis, and SLO accuracy<\/td>\n<td>80% coverage for critical services<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Alert quality index<\/td>\n<td>Ratio of actionable alerts vs noisy\/false alerts<\/td>\n<td>Reduces on-call fatigue; increases signal<\/td>\n<td>&gt;70% actionable; reduce noisy alerts by 25%<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Architecture exceptions count and age<\/td>\n<td>Number of exceptions and how long they remain open<\/td>\n<td>Exceptions are risk; aging indicates governance weakness<\/td>\n<td>Exceptions have expiry; &lt;10% past due<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Stakeholder satisfaction<\/td>\n<td>Surveyed satisfaction of platform\/product\/security leaders<\/td>\n<td>Ensures architecture is enabling, not obstructive<\/td>\n<td>\u22654.2\/5 satisfaction<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<tr>\n<td>Mentorship and enablement throughput<\/td>\n<td># sessions, office hours, or internal trainings delivered<\/td>\n<td>Scales architecture capability beyond one person<\/td>\n<td>1\u20132 enablement events\/month<\/td>\n<td>Monthly<\/td>\n<\/tr>\n<tr>\n<td>Roadmap delivery predictability<\/td>\n<td>% roadmap initiatives delivered on planned quarter<\/td>\n<td>Ensures strategy turns into execution<\/td>\n<td>70\u201385% on-time (complexity-adjusted)<\/td>\n<td>Quarterly<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">8) Technical Skills Required<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Must-have technical skills<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Cloud infrastructure architecture (Critical)<\/strong><br\/>\n   &#8211; Description: Designing multi-account\/subscription structures, networking, identity integration, security baselines<br\/>\n   &#8211; Typical use: Landing zone design, workload patterns, governance guardrails  <\/li>\n<li><strong>Networking fundamentals and cloud networking (Critical)<\/strong><br\/>\n   &#8211; Description: Routing, segmentation, DNS, load balancing, private connectivity, egress control<br\/>\n   &#8211; Typical use: Hub\/spoke, peering\/transit, service exposure, private endpoints  <\/li>\n<li><strong>Infrastructure as Code (Critical)<\/strong><br\/>\n   &#8211; Description: Declarative provisioning and configuration, module design, environment promotion<br\/>\n   &#8211; Typical use: Standard modules, reusable blueprints, drift detection, policy enforcement  <\/li>\n<li><strong>Security architecture for infrastructure (Critical)<\/strong><br\/>\n   &#8211; Description: IAM\/RBAC, secrets, key management, secure defaults, threat modeling basics<br\/>\n   &#8211; Typical use: Guardrails, privileged access model, encryption, auditability  <\/li>\n<li><strong>Reliability engineering foundations (Critical)<\/strong><br\/>\n   &#8211; Description: High availability patterns, failure domains, SLO concepts, observability principles<br\/>\n   &#8211; Typical use: Resilience tiering, DR design, operational readiness requirements  <\/li>\n<li><strong>Container and orchestration concepts (Important)<\/strong><br\/>\n   &#8211; Description: Kubernetes fundamentals, cluster architecture, workloads, ingress\/egress patterns<br\/>\n   &#8211; Typical use: Platform baselines, multi-tenancy, upgrade strategy, runtime security  <\/li>\n<li><strong>Operating systems and runtime fundamentals (Important)<\/strong><br\/>\n   &#8211; Description: Linux, patching, images, performance basics, configuration management concepts<br\/>\n   &#8211; Typical use: Base image strategy, AMI\/image pipelines, node hardening  <\/li>\n<li><strong>Systems design and architectural documentation (Critical)<\/strong><br\/>\n   &#8211; Description: Clear diagrams, ADRs, trade-off analysis, NFR definition<br\/>\n   &#8211; Typical use: Review boards, decision facilitation, standards publication  <\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Good-to-have technical skills<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Multi-cloud and hybrid connectivity (Important)<\/strong><br\/>\n   &#8211; Use: Migrations, business continuity, data residency needs  <\/li>\n<li><strong>Service mesh and advanced traffic management (Optional\/Context-specific)<\/strong><br\/>\n   &#8211; Use: mTLS, traffic shaping, multi-cluster connectivity  <\/li>\n<li><strong>CI\/CD systems for infrastructure (Important)<\/strong><br\/>\n   &#8211; Use: IaC pipelines, policy gates, automated testing  <\/li>\n<li><strong>Configuration and policy-as-code (Important)<\/strong><br\/>\n   &#8211; Use: Guardrails that scale; automated compliance  <\/li>\n<li><strong>Data platform infrastructure (Optional\/Context-specific)<\/strong><br\/>\n   &#8211; Use: Lakehouse\/warehouse infrastructure, streaming foundations  <\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Advanced or expert-level technical skills<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Large-scale distributed systems infrastructure (Important)<\/strong><br\/>\n   &#8211; Use: Designing for large traffic, multi-region, low latency, failure isolation  <\/li>\n<li><strong>Kubernetes platform architecture at scale (Important\/Context-specific)<\/strong><br\/>\n   &#8211; Use: Cluster fleet management, multi-tenancy, platform SLOs, upgrades  <\/li>\n<li><strong>Identity architecture and privileged access design (Important)<\/strong><br\/>\n   &#8211; Use: Role engineering, JIT access, break-glass patterns  <\/li>\n<li><strong>Resilience engineering and DR program design (Important)<\/strong><br\/>\n   &#8211; Use: RTO\/RPO tiering, active-active vs active-passive, test programs  <\/li>\n<li><strong>FinOps and cost modeling (Important)<\/strong><br\/>\n   &#8211; Use: Unit cost metrics, commitment strategies, cost governance mechanisms  <\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Emerging future skills for this role (2\u20135 years)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Platform engineering product mindset (Important)<\/strong><br\/>\n   &#8211; Use: Treating infrastructure platforms as products with roadmaps, usability, adoption metrics  <\/li>\n<li><strong>AI-augmented operations and AIOps literacy (Optional\/Context-specific)<\/strong><br\/>\n   &#8211; Use: Alert correlation, anomaly detection, incident summarization, capacity forecasting  <\/li>\n<li><strong>Policy automation and continuous compliance (Important)<\/strong><br\/>\n   &#8211; Use: Stronger automated evidence collection and control enforcement  <\/li>\n<li><strong>Software supply chain security for infrastructure code (Important)<\/strong><br\/>\n   &#8211; Use: Signing, provenance, artifact trust, dependency governance for IaC modules  <\/li>\n<li><strong>Sustainability-aware infrastructure (Optional\/Context-specific)<\/strong><br\/>\n   &#8211; Use: Carbon-aware scheduling and cost\/energy optimization where relevant<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">9) Soft Skills and Behavioral Capabilities<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>\n<p><strong>Systems thinking<\/strong><br\/>\n   &#8211; Why it matters: Infrastructure decisions create second- and third-order effects across reliability, security, cost, and velocity<br\/>\n   &#8211; How it shows up: Explicit trade-off articulation, identifies coupling, avoids local optimizations<br\/>\n   &#8211; Strong performance: Designs minimize complexity while meeting NFRs; anticipates downstream operational impact  <\/p>\n<\/li>\n<li>\n<p><strong>Influence without authority<\/strong><br\/>\n   &#8211; Why it matters: Lead architects often drive standards across teams they do not manage<br\/>\n   &#8211; How it shows up: Clear recommendations, stakeholder mapping, negotiation, building coalitions<br\/>\n   &#8211; Strong performance: High adoption of patterns; conflicts resolved quickly with shared outcomes  <\/p>\n<\/li>\n<li>\n<p><strong>Decision quality and pragmatism<\/strong><br\/>\n   &#8211; Why it matters: Perfect architecture that arrives late is a delivery risk<br\/>\n   &#8211; How it shows up: Time-boxed exploration, \u201cgood enough\u201d standards, explicit risk acceptance paths<br\/>\n   &#8211; Strong performance: Decisions are timely, defensible, and revisited when assumptions change  <\/p>\n<\/li>\n<li>\n<p><strong>Technical communication (written and visual)<\/strong><br\/>\n   &#8211; Why it matters: Architecture scales through clarity\u2014diagrams, standards, and documentation<br\/>\n   &#8211; How it shows up: Crisp ADRs, readable reference architectures, meeting facilitation<br\/>\n   &#8211; Strong performance: Teams implement designs correctly without repeated explanation  <\/p>\n<\/li>\n<li>\n<p><strong>Operational ownership mindset<\/strong><br\/>\n   &#8211; Why it matters: Infrastructure architecture must work in real operations, not just in design reviews<br\/>\n   &#8211; How it shows up: Focus on monitoring, incident learnings, runbooks, rollbacks, upgrade strategies<br\/>\n   &#8211; Strong performance: Fewer preventable incidents; smoother releases; improved on-call experience  <\/p>\n<\/li>\n<li>\n<p><strong>Risk-based thinking<\/strong><br\/>\n   &#8211; Why it matters: Security and compliance needs vary by system criticality and data sensitivity<br\/>\n   &#8211; How it shows up: Tiering models, compensating controls, exception governance<br\/>\n   &#8211; Strong performance: Reduced audit findings and security exposure while keeping delivery moving  <\/p>\n<\/li>\n<li>\n<p><strong>Mentorship and capability building<\/strong><br\/>\n   &#8211; Why it matters: Architecture bottlenecks are common; mentoring increases throughput<br\/>\n   &#8211; How it shows up: Office hours, templates, pair-design sessions, coaching engineers into architects<br\/>\n   &#8211; Strong performance: Teams make better decisions independently; architecture reviews become faster  <\/p>\n<\/li>\n<li>\n<p><strong>Stakeholder empathy (engineering + business)<\/strong><br\/>\n   &#8211; Why it matters: Infrastructure architecture must meet business outcomes and developer realities<br\/>\n   &#8211; How it shows up: Understands product delivery pressures, supports usability, prioritizes roadmap value<br\/>\n   &#8211; Strong performance: High trust across engineering, security, and leadership; fewer escalations  <\/p>\n<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">10) Tools, Platforms, and Software<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Category<\/th>\n<th>Tool, platform, or software<\/th>\n<th>Primary use<\/th>\n<th>Common \/ Optional \/ Context-specific<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Cloud platforms<\/td>\n<td>AWS \/ Azure \/ GCP<\/td>\n<td>Core compute, network, storage, managed services<\/td>\n<td>Common (at least one)<\/td>\n<\/tr>\n<tr>\n<td>Container &amp; orchestration<\/td>\n<td>Kubernetes (EKS\/AKS\/GKE or self-managed)<\/td>\n<td>Standard runtime for microservices<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Container registry<\/td>\n<td>ECR \/ ACR \/ GCR \/ Artifactory<\/td>\n<td>Image storage, scanning integration<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Infrastructure as Code<\/td>\n<td>Terraform<\/td>\n<td>Provisioning and reusable modules<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Infrastructure as Code<\/td>\n<td>CloudFormation \/ Bicep \/ Pulumi<\/td>\n<td>Cloud-native or alternative IaC approaches<\/td>\n<td>Optional\/Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Policy-as-code \/ guardrails<\/td>\n<td>OPA \/ Gatekeeper, Kyverno<\/td>\n<td>Kubernetes policy enforcement<\/td>\n<td>Optional\/Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Policy-as-code \/ guardrails<\/td>\n<td>Sentinel, Azure Policy, AWS Organizations SCPs<\/td>\n<td>Cloud governance guardrails<\/td>\n<td>Common\/Context-specific<\/td>\n<\/tr>\n<tr>\n<td>CI\/CD<\/td>\n<td>GitHub Actions \/ GitLab CI \/ Jenkins \/ Azure DevOps<\/td>\n<td>IaC pipelines, deployment automation<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Source control<\/td>\n<td>GitHub \/ GitLab \/ Bitbucket<\/td>\n<td>Version control, reviews, collaboration<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Observability<\/td>\n<td>Prometheus, Grafana<\/td>\n<td>Metrics and dashboards<\/td>\n<td>Common\/Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Observability<\/td>\n<td>Datadog \/ New Relic \/ Dynatrace<\/td>\n<td>Unified observability platform<\/td>\n<td>Optional\/Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Logging<\/td>\n<td>ELK\/Elastic, OpenSearch<\/td>\n<td>Log indexing and search<\/td>\n<td>Common\/Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Tracing<\/td>\n<td>OpenTelemetry, Jaeger<\/td>\n<td>Distributed tracing standards<\/td>\n<td>Common\/Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Incident management<\/td>\n<td>PagerDuty \/ Opsgenie<\/td>\n<td>On-call, incident response workflow<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>ITSM<\/td>\n<td>ServiceNow \/ Jira Service Management<\/td>\n<td>Change, incident, request workflows<\/td>\n<td>Optional\/Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Security (IAM\/PAM)<\/td>\n<td>Okta \/ Entra ID (Azure AD)<\/td>\n<td>SSO, identity integration<\/td>\n<td>Common\/Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Secrets management<\/td>\n<td>HashiCorp Vault \/ AWS Secrets Manager \/ Azure Key Vault<\/td>\n<td>Secrets storage and rotation<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Key management<\/td>\n<td>AWS KMS \/ Azure Key Vault \/ GCP KMS<\/td>\n<td>Encryption key lifecycle<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Runtime security<\/td>\n<td>Falco, Prisma Cloud, Wiz<\/td>\n<td>Detection and posture management<\/td>\n<td>Optional\/Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Vulnerability scanning<\/td>\n<td>Trivy, Snyk, Clair<\/td>\n<td>Container\/IaC scanning<\/td>\n<td>Optional\/Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Networking<\/td>\n<td>Cloud-native NLB\/ALB, Azure Load Balancer, Cloud DNS<\/td>\n<td>Traffic management, service exposure<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Collaboration<\/td>\n<td>Slack \/ Microsoft Teams<\/td>\n<td>Daily comms and incident coordination<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Documentation<\/td>\n<td>Confluence \/ Notion<\/td>\n<td>Standards, runbooks, architecture docs<\/td>\n<td>Common\/Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Diagramming<\/td>\n<td>Lucidchart \/ Draw.io<\/td>\n<td>Architecture diagrams<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Project tracking<\/td>\n<td>Jira<\/td>\n<td>Roadmaps, epics, delivery tracking<\/td>\n<td>Common<\/td>\n<\/tr>\n<tr>\n<td>Cost management<\/td>\n<td>CloudHealth \/ Cloudability \/ native cloud cost tools<\/td>\n<td>Cost visibility and optimization<\/td>\n<td>Optional\/Context-specific<\/td>\n<\/tr>\n<tr>\n<td>Automation\/scripting<\/td>\n<td>Python, Bash, PowerShell<\/td>\n<td>Glue automation, validation, tooling<\/td>\n<td>Common<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">11) Typical Tech Stack \/ Environment<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Infrastructure environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Predominantly cloud-first, with <strong>hybrid<\/strong> elements possible (private connectivity, legacy on-prem workloads, colocation)<\/li>\n<li>Standardized landing zone with:<\/li>\n<li>Separate accounts\/subscriptions\/projects by environment and\/or business unit<\/li>\n<li>Centralized networking (hub\/spoke or transit)<\/li>\n<li>Centralized logging and security monitoring<\/li>\n<li>Standard tagging and policy enforcement<\/li>\n<li>Compute mix:<\/li>\n<li>Kubernetes for microservices<\/li>\n<li>VMs for legacy or specialized workloads<\/li>\n<li>Serverless for event-driven components (context-dependent)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Application environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Microservices and APIs, typically 12-factor aligned<\/li>\n<li>Mix of stateless and stateful services, with managed databases where feasible<\/li>\n<li>Service-to-service auth via mTLS\/identity mechanisms (context-dependent)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Data environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Managed data stores (relational, NoSQL, object storage)<\/li>\n<li>Data pipelines and streaming platforms may exist (context-specific)<\/li>\n<li>Emphasis on encryption, backups, retention, and data access governance<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Security environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Central identity provider and SSO<\/li>\n<li>Role-based access control (RBAC) and least privilege<\/li>\n<li>Secrets and key management integrated into platform patterns<\/li>\n<li>Centralized audit logging and SIEM integration (context-specific)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Delivery model<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Product-aligned teams deploying independently to shared runtime platforms<\/li>\n<li>Platform engineering provides paved-road tooling and self-service workflows<\/li>\n<li>SRE\/Operations focuses on reliability, incident response, and platform health<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Agile or SDLC context<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Agile delivery with iterative platform improvements<\/li>\n<li>IaC and platform changes treated as software: code review, testing, versioning, rollout strategies<\/li>\n<li>Change management varies by organization maturity (lightweight change review to formal CAB)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scale or complexity context<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Multiple environments, multiple regions, and multiple criticality tiers<\/li>\n<li>High need for standardization and automation to prevent operational overhead<\/li>\n<li>Complexity drivers: compliance obligations, multi-tenant platforms, acquisitions, legacy migrations<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Team topology<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Lead Infrastructure Architect sits in an Architecture function, partnering tightly with:<\/li>\n<li>Platform engineering (build)<\/li>\n<li>SRE\/Operations (run)<\/li>\n<li>Security engineering (protect)<\/li>\n<li>Product engineering (consume)<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">12) Stakeholders and Collaboration Map<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Internal stakeholders<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>CTO \/ VP Engineering \/ Head of Platform (executive sponsors):<\/strong> Align on investment priorities, risk posture, and roadmap funding.<\/li>\n<li><strong>Director\/Head of Architecture (typical manager):<\/strong> Governance model, cross-domain alignment, prioritization, and escalation support.<\/li>\n<li><strong>Platform Engineering:<\/strong> Co-design paved roads; translate architecture into reusable modules and self-service.<\/li>\n<li><strong>SRE \/ Operations \/ NOC:<\/strong> Align on reliability patterns, monitoring, incident response, and operational readiness.<\/li>\n<li><strong>Security (Security Architecture, SecEng, GRC):<\/strong> Ensure controls are designed-in and auditable; manage risk decisions.<\/li>\n<li><strong>Network Engineering (if separate):<\/strong> Coordinate network topology, private connectivity, DNS, and firewall policies.<\/li>\n<li><strong>Product Engineering teams:<\/strong> Implement patterns, request exceptions, provide feedback on usability and constraints.<\/li>\n<li><strong>FinOps \/ Finance partners:<\/strong> Cost governance, unit economics, optimization initiatives.<\/li>\n<li><strong>Enterprise Architecture (if present):<\/strong> Alignment with broader enterprise standards, integration patterns, and technology portfolios.<\/li>\n<li><strong>Compliance \/ Risk \/ Audit:<\/strong> Evidence requirements and control interpretations (regulated contexts).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">External stakeholders (as applicable)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cloud providers and strategic vendors:<\/strong> Architecture reviews, escalations, roadmap influence, support cases.<\/li>\n<li><strong>Systems integrators or managed service providers:<\/strong> Ensure services align to standards and do not create lock-in or operational risk.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Peer roles<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Lead\/Principal Application Architect, Security Architect, Data Architect, Platform Engineering Lead, SRE Lead, Enterprise Architect.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Upstream dependencies<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Business strategy and product roadmap<\/li>\n<li>Security policies and control requirements<\/li>\n<li>Vendor contracts and licensing constraints<\/li>\n<li>Existing platform capabilities and team skill levels<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Downstream consumers<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Platform engineering deliverables and modules<\/li>\n<li>Product team deployments and runtime needs<\/li>\n<li>SRE runbooks and operational tooling<\/li>\n<li>Audit\/compliance evidence processes<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Nature of collaboration<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Mostly <strong>matrixed<\/strong>: influence through standards, reviews, roadmaps, and co-design.<\/li>\n<li>Emphasis on <strong>joint ownership<\/strong> of outcomes (reliability, security, cost) rather than \u201carchitecture handoffs.\u201d<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Typical decision-making authority<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Recommends and approves standards and reference designs within architecture governance.<\/li>\n<li>Co-owns platform roadmap with platform\/SRE leads; escalates priority conflicts to engineering leadership.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Escalation points<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Sev-1 incidents: escalate to SRE lead\/Incident Commander, Head of Platform, Security as needed.<\/li>\n<li>Architecture disputes: escalate to Head of Architecture, CTO\/VP Eng for final arbitration.<\/li>\n<li>Risk exceptions: escalate to Security\/GRC and the relevant executive risk owner.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">13) Decision Rights and Scope of Authority<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Can decide independently (typical lead-architect scope)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Reference architecture recommendations and updates (within agreed governance)<\/li>\n<li>Technical options analysis and preferred approach for infrastructure patterns<\/li>\n<li>IaC conventions and module design standards (in partnership with platform engineering)<\/li>\n<li>Operational readiness criteria for shared platform components (co-authored with SRE)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Requires team or forum approval (architecture governance)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>New foundational platform selections (e.g., observability suite, secrets manager standard)<\/li>\n<li>Significant changes to network topology, account\/subscription structure, or identity model<\/li>\n<li>Adoption of new workload runtime standards (e.g., mandated Kubernetes baseline)<\/li>\n<li>Policies that materially impact developer workflows (e.g., stricter egress restrictions)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Requires manager\/director\/executive approval<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Large budget impacts: cloud commitments, major vendor contracts, platform re-platforming initiatives<\/li>\n<li>Risk acceptance for high-severity exceptions (e.g., deviation from encryption or logging requirements)<\/li>\n<li>Organization-level operating model changes (team boundaries, ownership changes, on-call model changes)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Budget, architecture, vendor, delivery, hiring, compliance authority<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Budget:<\/strong> Typically influences rather than owns; may have delegated authority for tooling renewals in some orgs.<\/li>\n<li><strong>Architecture:<\/strong> High influence; can approve designs within governance; final arbitration may sit with Head of Architecture\/CTO.<\/li>\n<li><strong>Vendor:<\/strong> Leads technical evaluation; procurement and exec sponsors approve commercial terms.<\/li>\n<li><strong>Delivery:<\/strong> Drives roadmap definition and sequencing; execution typically owned by platform teams.<\/li>\n<li><strong>Hiring:<\/strong> Often contributes to interview loops and role definition for infrastructure\/platform hires.<\/li>\n<li><strong>Compliance:<\/strong> Ensures designs meet controls; formal compliance sign-off typically sits with GRC\/security.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">14) Required Experience and Qualifications<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Typical years of experience<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>10\u201315+ years<\/strong> in infrastructure engineering, SRE, platform engineering, or systems architecture (flexible based on depth and scope)<\/li>\n<li><strong>3\u20136+ years<\/strong> in architecture ownership (reference designs, standards, governance) or equivalent technical leadership<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Education expectations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Bachelor\u2019s degree in Computer Science, Engineering, Information Systems, or equivalent experience<\/li>\n<li>Advanced degree is optional; not a substitute for real-world architecture and operations experience<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Certifications (relevant but not always required)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Common\/valued:<\/strong> AWS Solutions Architect Professional, Azure Solutions Architect Expert, Google Professional Cloud Architect<\/li>\n<li><strong>Optional\/context-specific:<\/strong> Kubernetes (CKA\/CKS), HashiCorp Terraform Associate, CISSP (for security-leaning architects), ITIL (if ITSM-heavy org)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Prior role backgrounds commonly seen<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Senior\/Staff Infrastructure Engineer<\/li>\n<li>Senior\/Staff SRE<\/li>\n<li>Platform Engineering Lead \/ Senior Platform Engineer<\/li>\n<li>Network\/Systems Engineer with cloud specialization<\/li>\n<li>Cloud Engineer \/ Cloud Platform Engineer<\/li>\n<li>DevOps Engineer (with strong infrastructure depth)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Domain knowledge expectations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Strong understanding of cloud primitives and distributed systems constraints<\/li>\n<li>Practical security-by-design for infrastructure (identity, secrets, segmentation, logging)<\/li>\n<li>Operational excellence mindset (SLOs, incident response learnings, lifecycle management)<\/li>\n<li>Cost awareness and ability to partner with FinOps<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Leadership experience expectations (Lead scope)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Proven cross-team influence and mentorship<\/li>\n<li>Ownership of standards and patterns used by multiple teams<\/li>\n<li>Experience facilitating architecture reviews and driving alignment through trade-offs<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">15) Career Path and Progression<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Common feeder roles into this role<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Staff\/Senior Infrastructure Engineer<\/li>\n<li>Senior\/Staff SRE<\/li>\n<li>Senior Platform Engineer \/ Platform Tech Lead<\/li>\n<li>Cloud Infrastructure Engineer (with governance and architecture exposure)<\/li>\n<li>Network architect\/engineer transitioning into cloud foundation architecture<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Next likely roles after this role<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Principal Infrastructure Architect<\/strong> (broader enterprise scope; multi-domain authority)<\/li>\n<li><strong>Distinguished Architect \/ Enterprise Architect<\/strong> (portfolio-level, cross-organization)<\/li>\n<li><strong>Head\/Director of Platform Engineering<\/strong> (people leadership + platform product ownership)<\/li>\n<li><strong>Director of Architecture<\/strong> (multi-architecture domain leadership and governance)<\/li>\n<li><strong>Chief Architect<\/strong> (enterprise-wide standards and strategic alignment)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Adjacent career paths<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Security Architecture (IAM, zero trust, cloud security posture specialization)<\/li>\n<li>Reliability leadership (SRE Manager\/Director)<\/li>\n<li>Platform Product Management (platform-as-product)<\/li>\n<li>Cloud FinOps leadership (cost governance and unit economics)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Skills needed for promotion<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Demonstrated impact across multiple domains (network + IAM + runtime + observability)<\/li>\n<li>Clear evidence of outcomes: reliability improvements, cost reductions, delivery acceleration<\/li>\n<li>Ability to set multi-year strategy and drive execution via influence at scale<\/li>\n<li>Mature governance design: guardrails that enable autonomy with safety<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How this role evolves over time<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Moves from \u201cdesigning standards\u201d to \u201coperating the architecture system\u201d: adoption metrics, exception lifecycle, platform product maturity.<\/li>\n<li>Becomes increasingly data-driven: reliability\/cost\/security metrics guiding architectural investment.<\/li>\n<li>Expands scope to enterprise portfolio management and cross-business-unit alignment in larger orgs.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">16) Risks, Challenges, and Failure Modes<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Common role challenges<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Balancing speed vs guardrails:<\/strong> Too strict slows teams; too loose creates sprawl and risk.<\/li>\n<li><strong>Legacy constraints:<\/strong> Existing network topology, IAM sprawl, and inconsistent tooling create migration friction.<\/li>\n<li><strong>Organizational boundaries:<\/strong> Network\/security\/platform ownership split can stall decisions.<\/li>\n<li><strong>Talent and maturity gaps:<\/strong> Teams may lack skills to implement advanced patterns safely.<\/li>\n<li><strong>Cloud cost complexity:<\/strong> Optimization requires behavioral change, not just tooling.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Bottlenecks<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Architect becomes a required approver for everything (review bottleneck).<\/li>\n<li>Too few standardized modules; teams build bespoke infrastructure repeatedly.<\/li>\n<li>Security exceptions handled informally without lifecycle tracking.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Anti-patterns<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Ivory-tower architecture:<\/strong> Beautiful documents with low adoption.<\/li>\n<li><strong>Over-standardization:<\/strong> One-size-fits-all patterns that ignore workload diversity.<\/li>\n<li><strong>Tool sprawl acceptance:<\/strong> Multiple overlapping observability, secrets, and CI tools with no strategy.<\/li>\n<li><strong>Shadow platforms:<\/strong> Teams create their own clusters\/accounts outside governance due to friction.<\/li>\n<li><strong>Weak deprecation discipline:<\/strong> Old patterns persist indefinitely, creating long-term risk.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Common reasons for underperformance<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Insufficient depth in networking\/IAM\/operations leading to fragile designs<\/li>\n<li>Lack of influence skills; inability to drive adoption<\/li>\n<li>Governance that is slow, unclear, or inconsistent<\/li>\n<li>Over-focus on new technology vs improving reliability and operability<\/li>\n<li>Failure to partner with platform engineering\u2014architecture not translated into reusable implementations<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Business risks if this role is ineffective<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Higher outage frequency and longer recovery times<\/li>\n<li>Increased security exposure and audit findings<\/li>\n<li>Uncontrolled cloud spending and poor unit economics<\/li>\n<li>Slower product delivery due to manual provisioning and inconsistent environments<\/li>\n<li>Increased attrition due to poor developer experience and on-call burnout<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">17) Role Variants<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">By company size<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Small startup (Series A\u2013B):<\/strong> <\/li>\n<li>Role may be hands-on building foundations directly (landing zone, Terraform modules, clusters).  <\/li>\n<li>Governance is lightweight; speed is prioritized; documentation is pragmatic.<\/li>\n<li><strong>Mid-size scale-up:<\/strong> <\/li>\n<li>Strong emphasis on standardization, platform productization, and multi-team adoption.  <\/li>\n<li>Architect partners closely with platform engineering and SRE; begins formal review processes.<\/li>\n<li><strong>Large enterprise:<\/strong> <\/li>\n<li>Greater emphasis on compliance, auditability, vendor management, and operating model design.  <\/li>\n<li>Architecture governance is formal; role focuses more on alignment and portfolio coherence.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">By industry<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Regulated (fintech, healthcare, public sector):<\/strong> <\/li>\n<li>Stronger focus on controls, evidence, data residency, segregation of duties, and formal change management.<\/li>\n<li><strong>SaaS (typical software company):<\/strong> <\/li>\n<li>Focus on multi-tenant reliability, cost efficiency at scale, and rapid delivery enablement.<\/li>\n<li><strong>B2B services\/IT organization:<\/strong> <\/li>\n<li>More emphasis on standardized service catalogs, ITSM integration, and customer-specific compliance needs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">By geography<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Requirements may vary for:<\/li>\n<li>Data residency and cross-border traffic controls<\/li>\n<li>Encryption standards and key ownership<\/li>\n<li>On-call and operational coverage models<\/li>\n<li>Core architecture principles remain consistent; compliance implementation changes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Product-led vs service-led company<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Product-led:<\/strong> Optimize for developer autonomy, paved roads, self-service, and rapid experimentation with guardrails.<\/li>\n<li><strong>Service-led\/consulting IT:<\/strong> Optimize for repeatable delivery templates, multi-client isolation, and contractual SLAs.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Startup vs enterprise<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Startup:<\/strong> Build and ship; fewer constraints; architect is also builder.  <\/li>\n<li><strong>Enterprise:<\/strong> Align and govern; architect drives consistency across many teams, tools, and constraints.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Regulated vs non-regulated environment<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Regulated:<\/strong> Higher documentation rigor, control mapping, and audit evidence requirements.  <\/li>\n<li><strong>Non-regulated:<\/strong> More freedom; risk management still matters for availability and customer trust.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">18) AI \/ Automation Impact on the Role<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Tasks that can be automated (increasingly)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Architecture documentation acceleration:<\/strong> Drafting ADRs, summarizing meetings, generating diagrams from structured inputs (with human validation).<\/li>\n<li><strong>Policy compliance checks:<\/strong> Automated detection of misconfigurations, drift, missing tags, insecure security groups, exposed endpoints.<\/li>\n<li><strong>IaC code quality and security scanning:<\/strong> Automated linting, policy-as-code validation, secret detection, module version checks.<\/li>\n<li><strong>Incident support:<\/strong> Automated correlation, timeline reconstruction, and suggested remediation steps from logs\/metrics\/traces.<\/li>\n<li><strong>Capacity and cost forecasting:<\/strong> Anomaly detection, rightsizing recommendations, and trend-based forecasts.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Tasks that remain human-critical<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Trade-off decisions with business context:<\/strong> Balancing cost, risk, delivery speed, and organizational capability.<\/li>\n<li><strong>Operating model and ownership design:<\/strong> Determining the right team boundaries, responsibilities, and escalation paths.<\/li>\n<li><strong>Stakeholder alignment and conflict resolution:<\/strong> Negotiating priorities, managing exceptions, building trust.<\/li>\n<li><strong>Risk acceptance decisions:<\/strong> Determining when exceptions are acceptable and what compensating controls are sufficient.<\/li>\n<li><strong>Architecture coherence:<\/strong> Ensuring decisions across domains fit together into a consistent system.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How AI changes the role over the next 2\u20135 years<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Higher expectation that architects use AI tools to <strong>increase throughput<\/strong> (more reviews, faster drafts, quicker analysis).<\/li>\n<li>More <strong>continuous compliance<\/strong> and telemetry-driven governance: architecture quality measured in real time through signals, not periodic audits.<\/li>\n<li>Increased focus on <strong>platform product management<\/strong>: adoption funnels, developer satisfaction metrics, and paved road optimization.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">New expectations caused by AI, automation, or platform shifts<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Architects will need to define <strong>automation-friendly standards<\/strong> (machine-verifiable policies, structured templates).<\/li>\n<li>Increased emphasis on <strong>supply chain security<\/strong> (signed artifacts, provenance for IaC modules, trusted pipelines).<\/li>\n<li>More frequent technology shifts (managed services, serverless, new runtime security models) requiring disciplined lifecycle management.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">19) Hiring Evaluation Criteria<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What to assess in interviews<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Infrastructure architecture depth:<\/strong> cloud foundations, networking, identity, observability, resilience.<\/li>\n<li><strong>Design judgment:<\/strong> ability to choose pragmatic solutions, articulate trade-offs, and define NFRs.<\/li>\n<li><strong>Operational maturity:<\/strong> incident experience, postmortem thinking, readiness and lifecycle management.<\/li>\n<li><strong>Governance and adoption strategy:<\/strong> how they drive standardization without blocking delivery.<\/li>\n<li><strong>Communication:<\/strong> clarity of documentation, diagrams, and stakeholder management.<\/li>\n<li><strong>Cost and efficiency thinking:<\/strong> ability to reason about cost drivers and optimization levers.<\/li>\n<li><strong>Security-by-design:<\/strong> least privilege, segmentation, secrets, encryption, auditability.<\/li>\n<li><strong>Leadership behaviors:<\/strong> mentorship, influence, collaboration with platform\/SRE\/security.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Practical exercises or case studies (recommended)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Case study: Cloud foundation design<\/strong><br\/>\n  Provide a scenario (multi-team SaaS, compliance requirements, multi-region). Ask for: account\/subscription structure, network topology, IAM model, guardrails, logging strategy, DR tiers.<\/li>\n<li><strong>Case study: Kubernetes platform standardization<\/strong> (context-specific)<br\/>\n  Ask for baseline cluster architecture, multi-tenancy approach, upgrade strategy, ingress\/egress, observability, runtime security.<\/li>\n<li><strong>Case study: Incident learning \u2192 architecture change<\/strong><br\/>\n  Provide a post-incident summary; ask candidate to propose systemic architecture improvements, prioritized roadmap, and rollout approach.<\/li>\n<li><strong>Architecture review simulation<\/strong><br\/>\n  Candidate reviews a short design doc and identifies risks, missing NFRs, and recommended changes.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Strong candidate signals<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Explains trade-offs with clarity (security vs usability, cost vs resilience, complexity vs flexibility).<\/li>\n<li>Demonstrates patterns that scale: landing zones, standardized modules, policy-as-code, paved roads.<\/li>\n<li>Operational credibility: knows how designs fail in production and how to prevent recurrence.<\/li>\n<li>Can describe a successful adoption story: how they drove teams to use standards voluntarily.<\/li>\n<li>Shows ability to simplify: reducing sprawl, consolidating tooling, designing clear ownership boundaries.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Weak candidate signals<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Over-indexes on tools and buzzwords without operational or governance depth.<\/li>\n<li>Treats architecture as documentation rather than enabling implementation and outcomes.<\/li>\n<li>Avoids making decisions; provides only options without recommendations.<\/li>\n<li>Lacks understanding of networking\/IAM fundamentals (common failure area for infra architects).<\/li>\n<li>No evidence of influencing multiple teams or handling conflicts constructively.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Red flags<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Dismisses security\/compliance as \u201csomeone else\u2019s problem.\u201d<\/li>\n<li>Proposes fragile designs (single region for critical workloads, no DR strategy, no audit logging).<\/li>\n<li>Suggests bypassing governance without a risk-managed exception mechanism.<\/li>\n<li>Cannot discuss incidents or failures they learned from (or blames others without accountability).<\/li>\n<li>Overcomplicates solutions without clear justification.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Scorecard dimensions (with example weighting)<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Dimension<\/th>\n<th>What \u201cmeets the bar\u201d looks like<\/th>\n<th style=\"text-align: right;\">Example weighting<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Cloud foundation architecture<\/td>\n<td>Coherent landing zone, guardrails, identity, networking<\/td>\n<td style=\"text-align: right;\">20%<\/td>\n<\/tr>\n<tr>\n<td>Networking &amp; connectivity<\/td>\n<td>Correct segmentation, routing, DNS, exposure patterns<\/td>\n<td style=\"text-align: right;\">15%<\/td>\n<\/tr>\n<tr>\n<td>Security-by-design<\/td>\n<td>IAM rigor, secrets, encryption, auditability, threat awareness<\/td>\n<td style=\"text-align: right;\">15%<\/td>\n<\/tr>\n<tr>\n<td>Reliability &amp; DR<\/td>\n<td>SLO thinking, resilience tiers, tested DR strategy<\/td>\n<td style=\"text-align: right;\">15%<\/td>\n<\/tr>\n<tr>\n<td>IaC and automation<\/td>\n<td>Module strategy, pipeline guardrails, drift\/compliance automation<\/td>\n<td style=\"text-align: right;\">10%<\/td>\n<\/tr>\n<tr>\n<td>Observability architecture<\/td>\n<td>Logging\/metrics\/tracing standards and operational usefulness<\/td>\n<td style=\"text-align: right;\">10%<\/td>\n<\/tr>\n<tr>\n<td>Communication &amp; documentation<\/td>\n<td>Clear artifacts, diagrams, decision records<\/td>\n<td style=\"text-align: right;\">10%<\/td>\n<\/tr>\n<tr>\n<td>Leadership &amp; influence<\/td>\n<td>Adoption strategy, mentoring, cross-team facilitation<\/td>\n<td style=\"text-align: right;\">5%<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">20) Final Role Scorecard Summary<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Category<\/th>\n<th>Summary<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Role title<\/td>\n<td>Lead Infrastructure Architect<\/td>\n<\/tr>\n<tr>\n<td>Role purpose<\/td>\n<td>Define, govern, and evolve infrastructure architecture (cloud\/hybrid\/network\/IAM\/observability) to enable secure, reliable, cost-effective software delivery at scale.<\/td>\n<\/tr>\n<tr>\n<td>Top 10 responsibilities<\/td>\n<td>Target-state infrastructure architecture and roadmap; cloud landing zone standards; network and connectivity architecture; IAM\/secrets\/key management patterns; resilience and DR strategy; IaC standards and module strategy; observability standards; operational readiness and production gating; cost governance with FinOps; architecture reviews and mentorship.<\/td>\n<\/tr>\n<tr>\n<td>Top 10 technical skills<\/td>\n<td>Cloud architecture; cloud networking; IAM and RBAC; secrets and key management; Infrastructure as Code (Terraform); Kubernetes\/platform architecture; reliability engineering and SLO concepts; observability (logs\/metrics\/traces); security-by-design and compliance-aware architecture; cost optimization\/FinOps literacy.<\/td>\n<\/tr>\n<tr>\n<td>Top 10 soft skills<\/td>\n<td>Systems thinking; influence without authority; pragmatic decision-making; clear written\/visual communication; operational ownership mindset; risk-based thinking; stakeholder empathy; negotiation and conflict resolution; mentorship and coaching; prioritization and roadmap thinking.<\/td>\n<\/tr>\n<tr>\n<td>Top tools or platforms<\/td>\n<td>AWS\/Azure\/GCP; Kubernetes; Terraform; GitHub\/GitLab; CI\/CD platform (GitHub Actions\/GitLab CI\/Jenkins\/Azure DevOps); Vault\/Secrets Manager\/Key Vault; Prometheus\/Grafana or Datadog\/New Relic; ELK\/OpenSearch; PagerDuty\/Opsgenie; Jira\/Confluence\/Lucidchart.<\/td>\n<\/tr>\n<tr>\n<td>Top KPIs<\/td>\n<td>Adoption rate of reference architectures; architecture decision cycle time; provisioning lead time; infra change failure rate; infra-attributable Sev-1\/2 incidents; MTTR; DR readiness score; policy compliance rate; realized cost optimization; stakeholder satisfaction.<\/td>\n<\/tr>\n<tr>\n<td>Main deliverables<\/td>\n<td>Target architecture; reference architectures; ADRs; standards catalog; roadmaps; operational readiness checklists; DR strategy and test plans; IaC module strategy; observability standards; migration plans and enablement materials.<\/td>\n<\/tr>\n<tr>\n<td>Main goals<\/td>\n<td>90 days: publish v1 standards and target architecture for critical domains, improve IaC guardrails and reduce a key operational pain point; 12 months: measurable reliability\/security\/cost improvements with high adoption and mature lifecycle governance.<\/td>\n<\/tr>\n<tr>\n<td>Career progression options<\/td>\n<td>Principal Infrastructure Architect; Distinguished\/Enterprise Architect; Director of Platform Engineering; Director of Architecture; Security Architect (cloud\/IAM); SRE leadership.<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>The **Lead Infrastructure Architect** designs, evolves, and governs the target-state infrastructure architecture that powers reliable, secure, cost-effective delivery of software products and internal platforms. This role leads architecture decisions across cloud, on-prem (where applicable), networking, identity, compute, storage, and operational tooling\u2014translating business and engineering needs into practical reference architectures and roadmaps that teams can implement.<\/p>\n","protected":false},"author":61,"featured_media":0,"comment_status":"open","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_joinchat":[],"footnotes":""},"categories":[24465,24464],"tags":[],"class_list":["post-72980","post","type-post","status-publish","format-standard","hentry","category-architect","category-architecture"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/72980","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/users\/61"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=72980"}],"version-history":[{"count":0,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/72980\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=72980"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=72980"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=72980"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}