umesh k singla: Defining objectives, performance indicators, agreements: Notes from Google's SRE Book (#3)

SLIs, SLOs and SLAs, concepts which the Indian tech companies lack.

We use intuition, experience, and an understanding of what users want to define service level indicators (SLIs), objectives (SLOs), and agreements (SLAs).

- SLI is a quantitative measure of a level of service.

- SLO is a bound on SLIs (upper or lower or target).

- SLA has the consequences if SLOs are met or not met.

Seems like SLA is not just something you can shut people on but more objective.

- - -

Without an explicit SLO, users often develop their own beliefs about desired performance, which may be unrelated to the beliefs held by the people designing and operating the service. This dynamic can lead to both over-reliance on the service, when users incorrectly believe that a service will be more available than it actually is, and under-reliance, when prospective users believe a system is flakier and less reliable than it actually is.

- - -

On the Chubby example:

In any given quarter, if a true failure has not dropped availability below the target, a controlled outage will be synthesized by intentionally taking down the system. In this way, we are able to flush out unreasonable dependencies on Chubby shortly after they are added. Doing so forces service owners to reckon with the reality of distributed systems sooner rather than later.

- - -

SRE doesn’t typically get involved in constructing SLAs, because SLAs are closely tied to business and product decisions. SRE does, however, get involved in helping to avoid triggering the consequences of missed SLOs.

- - -

Whether or not a particular service has an SLA, it’s valuable to define SLIs and SLOs and use them to manage the service.

- - -

umesh k singla

Defining objectives, performance indicators, agreements: Notes from Google's SRE Book (#3)

No comments:

Post a Comment