The Four Golden Signals of SRE (Site Reliability Engineering) are a simple but very powerful way to monitor system health and reliability. They help teams quickly understand what is happening inside a system and detect problems before they become serious.
The four signals are:
1. Latency
Latency measures how long it takes for a system to respond to a request.
It includes both successful and failed requests. For example, if a website normally responds in 200ms but suddenly takes 2–3 seconds, that’s a warning sign.
Latency helps answer:
- Is the system responding slowly?
- Are users experiencing delays?
2. Traffic
Traffic refers to how much demand is being placed on the system.
It could be:
- Requests per second (RPS)
- Number of users
- API calls
This helps teams understand workload patterns and plan capacity.
3. Errors
Errors track the rate of failed requests or incorrect responses.
This includes:
- HTTP 500/400 errors
- Failed API calls
- Application exceptions
Errors are one of the clearest signals that something is going wrong in the system.
4. Saturation
Saturation shows how “full” or stressed the system is.
It reflects resource usage like:
- CPU usage
- Memory usage
- Disk or network capacity
If saturation is high, the system is close to overload.
How these signals help
Together, these four signals give a complete picture of system health:
- Latency shows user experience
- Traffic shows demand
- Errors show failures
- Saturation shows resource pressure
By monitoring all four, teams can detect issues early, troubleshoot faster, and maintain better reliability.
Which signal is most important?
If I had to choose one, I would say Errors are the most important for preventing issues.
The reason is simple: errors are the most direct indicator that something is already broken or not working as expected. While latency and saturation show early warning signs, errors confirm actual impact on users.
That said, in real systems, all four signals work together. Focusing on only one can be misleading, but errors usually trigger the fastest response from engineering teams.
Simple summary
The Four Golden Signals—Latency, Traffic, Errors, and Saturation—help teams understand system performance and reliability in a structured way. Among them, errors are often the most critical because they directly reflect user-facing problems, but the real strength comes from monitoring all four together.