Chasing the wrong production metrics, not only hinders progress, it creates confusion for your customers and stakeholders. Here’s how IT teams can set the right objectives to achieve the right results.
Picture the scene…you’re an IT leader and your business stakeholders request a monthly update on performance across your production environment. You run a few checks and report that all your metrics are green across the board. Job well done.
Well, not quite.
Your stakeholders look confused. From their perspective, performance has been anything but good. In fact, there has been a flurry of service issues none of which are mentioned in your report.
Sound familiar? You’re not alone.
The right metrics lead to the right business outcomes
All too often, there is a huge divide between how IT teams and business stakeholders perceive performance. Sometimes this can be down to a lack of understanding or even a terminology barrier between the two functions. But more often than not, it’s because the metrics IT teams use to track performance do not reflect how effective a service is, or whether it is achieving the right business outcomes.
Narrow performance metrics like output or utilisation, while helpful, don’t consider the actual value a service brings to users. Or indeed whether users have had a positive or negative experience of that service.
So, as an IT leader, how can you choose the right set of metrics to better assess service performance?
An excellent place to start is by creating Service Level Indicators (SLIs) and Service Level Objectives (SLOs) which form the key pillars of Site Reliability Engineering (SRE).
SLIs, SLOs, and SLAs —what’s the difference?
If you work in IT, you will already be familiar with the term SLA. It stands for Service Level Agreement, the contractual commitment you make to your customers to uphold an agreed level of service. But what about SLIs and SLOs? How do they differ? And, what exactly do they bring to the table?
Firstly, let’s define what each of these terms means:
Service Level Indicator (SLI) is the indicator of whether each delivery of a service has been good or not. For example, if we consider user logins to a system, a good indicator of service is a login that was successful and under 5 seconds.
Service Level Objective (SLO) quantifies the overall reliability of the service in percentage terms, i.e.what percentage of all SLI’s over a given time period do you expect to be good. For example, if 95% of all user logins are successful and are executed under 5 seconds (measured over a 4 week period).
Service Level Agreement (SLA) is effectively the same as the SLO in terms of how it is calculated; however, if the objective is not met, there will be contractual consequences.
Make outcomes your anchor point
SLOs and SLIs help IT teams align performance objectives and metrics with the desired business outcome. They provide a more accurate way to reflect the service the producer (IT operations) delivers in the eye of the customer (business stakeholders).
For example, an outcome could be that you want customers to log onto a system quickly (under five seconds) or you want to process payments within a specific timeframe. This outcome becomes your anchor point, the reason the service exists in the first place. Once you know what it’s there to achieve, you can then examine how to measure its effectiveness, including what data you need from the service to prove it’s delivering the right outcome.
Meaningful metrics: your first step to better performance
No outages, no problem?
Not quite.
The absence of outages in your environment is not a reliable indicator that all is well. If you don’t understand what a service is there to achieve, how can you ever be sure it is performing well or if it is providing any value to the business?
By helping to reflect on what a service is there to achieve, SLIs and SLOs help you to build a stronger relationship with your business stakeholders and customers. With the right metrics in place, you can provide stakeholders with a much more accurate assessment of performance. And if there is an incident or outage in your environment, your SLIs and SLOs will help you assess the impact quickly and decide on the right course of action.
So to sum up, SLIs and SLOs play an essential role in helping IT teams:
- Align perspectives: bringing the business and IT closer together.
- Embrace a customer-centric approach: reflecting how effective a service is in the eyes of customers (users).
- Focus on delivering the best business outcomes: with the correct measurements in place, teams have a clearer understanding of what a service needs to achieve from a business perspective.