What Are the Golden Signals?

Golden Signals are four key metrics defined in Google's Site Reliability Engineering book that every production service should monitor: latency, traffic, errors, and saturation. These signals provide a comprehensive view of service health from the user's perspective and form the foundation of effective monitoring and alerting strategies for any production system running in Kubernetes or cloud environments.

Why the Golden Signals Matter

With hundreds of possible metrics to track, teams often struggle to decide what to monitor. The golden signals cut through this complexity by identifying the four metrics that most directly reflect user experience and service health. If you monitor only these four signals, you will catch the vast majority of issues that affect your users. They provide a proven framework for building dashboards, setting alerts, and defining SLOs.

Teams that understand and adopt golden signals gain a significant operational advantage, reducing manual effort and improving the reliability and scalability of their infrastructure. As cloud-native adoption accelerates, familiarity with golden signals has become a core competency for DevOps engineers, platform teams, and site reliability engineers working in production Kubernetes and cloud environments.

How to Implement the Golden Signals

For each service, define how to measure each golden signal. Latency is the time to serve a request, tracked separately for successful and failed requests. Traffic is the volume of demand as requests per second. Errors are the rate of failed requests, including requests that succeed but return incorrect results. Saturation measures how full the service is, such as CPU utilization, memory usage, or queue depth approaching capacity limits.

Understanding how golden signals fits into the broader cloud-native ecosystem is important for making informed architecture decisions. It works alongside other tools and practices in the DevOps and platform engineering space, and choosing the right combination depends on your team's specific requirements, scale, and operational maturity.

Key Features

Latency

Measure request duration for both successful and failed requests. High latency on errors often indicates retries consuming resources.

Traffic

Track the volume of requests your service handles. Changes in traffic patterns can explain changes in other signals.

Errors

Monitor the rate of requests that fail, including explicit errors and requests returning degraded results.

Saturation

Measure how close to capacity each resource is. High saturation is a leading indicator of future performance problems.

Common Use Cases

Building a Grafana dashboard for each microservice that displays all four golden signals at a glance.

Setting alerts based on golden signals to detect issues that directly impact user experience.

Using golden signals as the basis for SLO definitions that track availability and latency targets.

Onboarding new engineers to production monitoring by teaching the golden signals framework as a starting point.

How Obsium Helps

Obsium's managed observability team helps organizations implement and optimize golden signals as part of production-grade infrastructure. Whether you are adopting golden signals for the first time or looking to improve an existing implementation, our engineers bring hands-on experience across cloud platforms and Kubernetes environments. Learn more about our managed observability services →

×

Contact Us