Senior Site Reliability Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Senior Site Reliability Engineer (SRE)** ensures that customer-facing and internal cloud services are **reliable, performant, resilient, and cost-effective** at scale. This role applies software engineering principles to operations—designing reliability into systems through automation, observability, incident management rigor, and continuous improvement.
Senior Reliability Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Senior Reliability Engineer** is a senior individual contributor in the **Cloud & Infrastructure** organization responsible for ensuring production services meet defined reliability, availability, performance, and recoverability targets. This role designs and operates reliability mechanisms (SLOs, error budgets, observability, automation, incident response, resilience engineering) to reduce customer-impacting outages and improve operational efficiency at scale.
Senior Production Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
A **Senior Production Engineer** is a senior individual contributor in the Cloud & Infrastructure organization responsible for ensuring that production systems are **reliable, scalable, secure, and cost-efficient** while enabling fast, safe delivery of software changes. The role blends software engineering, systems engineering, and operational excellence to reduce downtime, improve performance, and increase developer velocity through automation and well-defined production practices.
Senior Observability Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
A **Senior Observability Engineer** designs, builds, and operates the monitoring, logging, tracing, and alerting capabilities that enable engineering teams to **detect, diagnose, and resolve production issues quickly** while meeting reliability and performance objectives. The role sits at the intersection of platform engineering, SRE/operations, and software engineering, translating system behavior into actionable signals and standards that scale across teams and services.
Senior Network Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The Senior Network Engineer designs, builds, and operates reliable, secure, and scalable network connectivity across cloud and on-prem environments to enable product delivery, internal engineering productivity, and enterprise-grade service reliability. This role balances deep hands-on engineering (routing/switching, WAN, firewalls, load balancing, DNS, connectivity) with operational excellence (monitoring, incident response, change management, capacity planning) and modern automation practices (Infrastructure as Code, configuration management, CI/CD integration).
Senior Network Automation Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Senior Network Automation Engineer** is a senior individual contributor in the **Cloud & Infrastructure** organization responsible for designing, building, and operating automation systems that provision, configure, validate, and continuously manage network infrastructure at scale. The role bridges traditional network engineering and modern software engineering practices (NetDevOps), enabling safe, repeatable, and observable network change through code, pipelines, and policy-driven controls.
Senior Monitoring Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The Senior Monitoring Engineer designs, implements, and continuously improves the organization’s monitoring and observability capabilities across cloud infrastructure, platforms, and production services. This role ensures that engineering teams can detect incidents early, diagnose issues quickly, and measure reliability through actionable metrics, logs, traces, and service-level objectives (SLOs).
Senior Linux Systems Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Senior Linux Systems Engineer** is a senior individual contributor responsible for the reliability, security, performance, and lifecycle management of Linux-based compute platforms that power production services, internal engineering systems, and core infrastructure. This role designs and operates scalable Linux environments across on-premises and cloud, automates system configuration and fleet operations, and hardens platforms to meet uptime and security requirements.
Senior Kubernetes Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The Senior Kubernetes Engineer designs, builds, secures, and operates Kubernetes platforms that reliably run production workloads at scale. This role exists to provide a standardized, automated, and supportable container orchestration foundation—so application teams can ship faster while meeting enterprise expectations for availability, security, cost, and compliance.
Senior Infrastructure Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The Senior Infrastructure Engineer designs, builds, and operates reliable, secure, and scalable infrastructure platforms that enable product engineering teams to ship and run software with confidence. This role is accountable for improving availability, performance, and operational efficiency across cloud and/or hybrid environments, while reducing risk through automation, standardization, and strong operational controls.
Senior DevOps Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Senior DevOps Engineer** is a senior individual contributor in the **Cloud & Infrastructure** department responsible for building, operating, and continuously improving the platforms, automation, and operational practices that enable engineering teams to deliver software safely, quickly, and reliably. This role designs and runs cloud infrastructure, CI/CD systems, observability, and operational controls that reduce lead time and change risk while improving availability and performance.
Senior Cloud Native Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Senior Cloud Native Engineer** designs, builds, and operates cloud-native platforms and runtime capabilities that enable application teams to ship secure, scalable, reliable software with high delivery velocity. This role sits in the **Cloud & Infrastructure** department and focuses on modern infrastructure engineering: containers, Kubernetes, service networking, infrastructure-as-code, CI/CD enablement, observability, and reliability practices.
Senior Cloud Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Senior Cloud Engineer** designs, builds, and operates secure, reliable, and cost-efficient cloud infrastructure that enables product engineering teams to deliver software quickly and safely. This role is accountable for production-grade cloud foundations (networking, compute, identity, observability, automation) and for evolving them into scalable internal platforms and patterns.
Reliability Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The Reliability Engineer ensures that cloud-based services and the infrastructure they run on are available, performant, resilient, and recoverable under real-world conditions—including failures, traffic spikes, deployments, and dependency issues. This role blends software engineering, operational excellence, and systems thinking to reduce customer-impacting incidents, improve mean time to restore (MTTR), and raise the reliability baseline through automation and engineering standards.
Reliability and Platform Engineering Leader: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The Reliability and Platform Engineering Leader is accountable for the reliability, scalability, and operational readiness of the company’s production systems while building a developer platform that enables fast, safe, and cost-effective software delivery. This role leads Site Reliability Engineering (SRE) and Platform Engineering capabilities across cloud infrastructure, Kubernetes/container platforms, CI/CD foundations, and observability—balancing uptime, feature velocity, security, and cost.
Production Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
A **Production Engineer** ensures that customer-facing services and internal platforms run safely, reliably, and efficiently in live (“production”) environments. The role blends software engineering, systems engineering, and operational excellence to reduce downtime, improve performance, increase deployment safety, and minimize manual operational toil through automation.
Principal Systems Reliability Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Principal Systems Reliability Engineer** is a senior individual-contributor (IC) role responsible for designing, governing, and continuously improving reliability outcomes across cloud infrastructure and the production systems that run on it. This role sets reliability strategy, defines measurable reliability standards (SLOs/SLIs/error budgets), and drives systemic improvements that reduce incidents, accelerate recovery, and increase customer trust.
Principal Storage Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The Principal Storage Engineer is the senior individual-contributor authority for enterprise storage platforms that underpin application reliability, data durability, performance, and cost efficiency across on-prem, hybrid, and cloud environments. The role designs, standardizes, automates, and continuously improves storage services (block, file, object) and data protection capabilities (backup, replication, archive) to meet production-grade requirements.
Principal SRE Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Principal SRE Engineer** is a senior individual contributor (IC) responsible for shaping, scaling, and continuously improving the reliability, performance, and operational excellence of cloud-hosted products and core infrastructure. This role drives enterprise-grade Site Reliability Engineering practices—particularly SLO-based reliability management, resilient architectures, high-quality observability, and automated operations—across multiple teams and services.
Principal Site Reliability Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The Principal Site Reliability Engineer (SRE) is a senior individual contributor responsible for ensuring that critical cloud services are reliable, scalable, secure, and cost-efficient, while enabling rapid product delivery. This role designs and governs reliability engineering practices (SLOs/SLIs, error budgets, incident management, observability, resilience testing) and drives cross-team execution of reliability improvements across the platform.
Principal Reliability Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Principal Reliability Engineer** is a senior individual-contributor (IC) role responsible for **setting reliability strategy and technical direction** across critical cloud infrastructure and production services, while directly improving **availability, latency, scalability, incident response maturity, and operational efficiency**. This role exists to ensure that engineering teams can ship changes quickly **without compromising production stability**, and that reliability is designed, measured, and governed as a first-class product attribute.
Principal Production Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The Principal Production Engineer is a senior individual contributor in the Cloud & Infrastructure organization responsible for ensuring that customer-facing and internal production systems are reliable, scalable, secure, and cost-efficient. This role blends deep systems engineering with operational excellence and influences architecture and engineering practices across multiple teams and services.
Principal Observability Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The Principal Observability Engineer is a senior individual contributor (IC) in the Cloud & Infrastructure organization accountable for the end-to-end observability strategy, platform architecture, and operational outcomes across distributed systems. This role builds and evolves the telemetry foundations (metrics, logs, traces, profiling, synthetics) that enable engineering teams to detect, understand, and remediate reliability, performance, and customer-impacting issues quickly and safely.
Principal Network Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The Principal Network Engineer is the senior-most individual contributor (IC) network specialist responsible for designing, governing, and continuously improving the network foundations that power reliable, secure, and scalable cloud and infrastructure services. This role sets technical direction for enterprise networking across data center, cloud, and edge environments, while partnering closely with platform, security, SRE, and application engineering to ensure the network enables product delivery—not blocks it.
Principal Network Automation Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Principal Network Automation Engineer** is a senior individual contributor responsible for designing, delivering, and operationalizing network automation capabilities that improve reliability, security, speed, and consistency across cloud and infrastructure networks. This role builds the automation “platform” and engineering practices that enable network changes to be delivered safely through code, testing, and CI/CD—at enterprise scale.
Principal Monitoring Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Principal Monitoring Engineer** is the technical authority responsible for designing, standardizing, and continuously improving the organization’s monitoring and observability capabilities across cloud infrastructure, platforms, and production services. This role ensures that engineering teams can detect, diagnose, and resolve issues quickly through high-quality telemetry (metrics, logs, traces, events) and reliable alerting, aligned to customer-impacting outcomes and SLOs.
Principal Linux Systems Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Principal Linux Systems Engineer** is the senior-most (or among the senior-most) individual contributor responsible for the reliability, security, performance, and lifecycle of Linux-based infrastructure that underpins production services. This role designs and governs the Linux platform “golden path” across bare metal, virtualized, and cloud environments, ensuring systems are automated, observable, compliant, and cost-effective at scale.
Principal Kubernetes Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The Principal Kubernetes Engineer is a senior individual contributor (IC) responsible for designing, evolving, and governing the organization’s Kubernetes platform(s) to deliver secure, reliable, scalable, and cost-effective container orchestration capabilities. This role combines deep Kubernetes expertise with platform engineering practices, reliability engineering, and a strong operating-model mindset to enable product teams to ship faster with fewer incidents.
Principal Infrastructure Engineer: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The Principal Infrastructure Engineer is a senior individual contributor (IC) responsible for designing, evolving, and governing the company’s cloud and infrastructure foundations so product engineering teams can deliver secure, reliable, scalable software quickly. This role owns high-impact technical decisions across compute, networking, storage, identity, observability, and automation, and drives the infrastructure operating model (standards, patterns, self-service, and reliability practices) across multiple teams.
Principal Engineer – Cloud and Reliability: Role Blueprint, Responsibilities, Skills, KPIs, and Career Path
The **Principal Engineer – Cloud and Reliability** is the senior individual-contributor authority responsible for designing, evolving, and governing the cloud platform and reliability practices that keep production services **available, performant, secure, and cost-effective at scale**. This role blends deep cloud engineering with SRE-style reliability leadership, establishing technical direction across teams while remaining hands-on in critical systems, incidents, and platform improvements.
