03 June 2022

7567 views

11 0

Metrics that matter for IT organizations on an agile journey

Measuring the wrong things is worse than not measuring anything. In this article, Mallika Gunturu explains the right things to measure for agile.

Key Takeaways

Having a set of good metrics is essential for the success of agile teams and IT organizations.
Metrics are not numbers to beat. Understanding the value delivered by monitoring any metric is important for the team to truly embrace them.
A balance of six dimensions – Business delivery excellence, Operational excellence, Technical excellence, Innovation, Happiness, and Financial excellence is essential for the success of agile teams and IT organizations.

As IT organizations move towards an agile way of functioning, it is natural that teams undergo a major learning curve to understand and adapt to agile principles. It is also natural at this stage that teams lose focus on the “things that matter” to the top and bottom lines of the organization as their focus is more on adjusting and settling down in the new way of working. It is at this juncture that a set of “good metrics” would help teams navigate in the right direction. It is worthwhile to note that the standard agile tools like JIRA, Git, Jenkins, etc., provide a standard set of metrics. Teams should consider these as a starting point and continue to define further metrics that make sense to their team as well as the organization that they are part of. Rather than looking at metrics as a number to beat or something imposed by management, it is important that the teams understand the organization’s objectives and how metrics can help them to continuously improve while meeting the organization’s objectives. This article looks at what should typically be the dimensions on which IT organizations assess and fine tune their team performances and processes and how should the individual teams tailor their metrics to contribute to the overall IT organization goals.

Factors that are critical for the success of organizations of any nature or size are productivity, agility, predictability of the outcomes, quality of the work, and the happiness of the teams.

An IT organisation should focus on the following dimensions so that it is able to support the business, mission and vision of the organization.

Business delivery excellence
Operational excellence
Technical excellence
Innovation
Happiness of the teams
Financial excellence

While the CEO, CTO, and CIO establish objectives and key results around these dimensions, the various agile teams should establish OKRs at their team level that are aligned with and contribute to the organization level OKRs. Let us further look at what should be measured (metrics) at both an IT organization level and individual team level. At the onset, it is important to note that these metrics should not be used to compare one member against another or one team against another. They should rather be looked at holistically in the right spirit to identify areas of improvement, ultimately leading to truly agile teams.

1. Business delivery excellence

In simple terms, business delivery excellence objectively looks at how well the IT organization’s deliveries and outcomes are aligned to the business strategy. To successfully measure and improve business delivery excellence, IT organizations should have the means to measure alignment, agility, and predictability of the organization and teams within.

The set of metrics that would provide a good insight into business delivery excellence are as follows:

1.1 Alignment of IT team efforts to strategic business initiatives

The purpose of this metric is to provide visibility into

The alignment of IT projects with strategic business initiatives
Whether enough resources and funds are being invested to support business initiatives
Whether there any projects/efforts that should be stopped or slowed down

At a team level, the following metrics should be used to report towards this organization metric.

Percentage of business requests accepted and committed for delivery by the team

This metric helps establish a dialogue between management and teams on identifying and addressing the root causes due to which team is able to commit only a set of business demands or to understand what the team is sacrificing to commit to business demands.

Team effort spread on strategic initiatives vs. technical upgrades vs. support requests
Team effort spread on the various strategic initiatives

The above two metrics help understand if sufficient resources are assigned to business initiatives and if there are any specific areas that the teams should divert their focus away from and towards strategic business initiatives. It also gives visibility into whether the team is spending too much of its efforts in handling routine upgrades or business-as-usual kind of support requests, which can then feed into identifying areas of automation that can reduce the need for such maintenance efforts.

1.2 Agility of IT organization in catering to business needs

The purpose of this metric is to provide visibility into how quickly the IT organization can adapt to the changing business needs and is also an indicator of how fungible IT teams are.

At a team level, following metrics should be used to report towards this organization metric.

Lead and cycle time to deliver new features to existing products or services
Lead and cycle time to deliver new products or services

These metrics give important insights into how fast a request is getting picked up by the team and how long it takes for the team to complete a request and deliver value after starting to work on the request.

Longer lead times indicate that there is either a long cycle time or a long time the request is waiting to be picked by the team. These metrics help establish a dialogue on identifying and addressing the root causes for the long lead or cycle times.

1.3 Predictability of IT deliveries

The Agile Alliance defines “Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.” as one of the principles based on the Agile Manifesto. A regular cadence of working software delivery is essential to build trust with customers. This metric helps the management and teams to measure and assess the predictability of releases and establish how confident they could be of the cadence vis-a-vis actions needed to fix any issues.

Team velocity variation measures how team velocity is varying over a period of time. Agile teams generally reach a near-constant velocity after a few sprint cycles. Any large deviations in the velocity are an indicator that there is a high chance of not meeting the commitments. Now the cause for such a deviation need not always be due to team inefficiencies. This could also happen when the team decides to implement pair programming or is learning new technology, all valid reasons that would ultimately improve team performance. Having this metric in place helps the team and management to align on the reasons why predictability could be impacted and stakeholder it appropriately with customers.
Planned to Done ratio measures the work that the team completes by the end of the sprint compared to the work the team committed to at the start. While there could be many valid reasons for work to spill over into the next sprints, a view of this ratio over a number of sprints would give a general idea of the estimation accuracy of the team and the predictability of their commitments. A review of this metric provides the team an opportunity to fine-tune the estimation rules they follow.

2. Operational excellence

The Institute for Operational Excellence defines operational excellence as a point in the organization’s continuous improvement journey where each and every employee can see the flow of value to the customer, and fix that flow before it breaks down.

In simple terms operational excellence objectively looks at how well the services are operating and how well the customers perceive the value delivered by those services.

Following are a set of metrics that would provide a good insight towards operational excellence:

2.1 Service Availability

Availability is generally expected as a percentage of time the service is operational within the reporting period. In organizations where there is a segmentation of the service offerings like core/non-core, critical/non-critical etc., this metric can be segmented to provide availability metrics per service segment.

At a team level, the following metrics should be used to report towards this organization metric.

Frequency of service downtime measuring when and the number of times the service went down during the reporting period
Frequency of scheduled vs. unscheduled deployments

The above two metrics together help answer whether the service downtime was expected (due to scheduled deployments, for instance) or if there is an underlying cause that needs to be looked at as it is causing the downtime with no clear, immediate explanation.

Service uptime measures the amount of time that the service was up and running during the reporting period.
Service downtime measures the amount of time that the service was down during the reporting period

Availability SLA compliance can be computed using the service uptime, downtime metrics, and customer committed SLAs. This is an important metric as it provides a clear indication of whether the customer is able to access and use the service as agreed upon contractually.

Service response time – simulating customer interactions with the service or the website and measuring the response time. This metric is typically captured for critical services. This metric gives a good indication of customer experience with the service and areas that would need to be addressed to further enhance this experience.

2.2 Service Reliability

Reliability measures the probability that a service maintains its performance standards in a specific period of time. Following are the most common reliability metrics that are to be monitored.

Mean Time Between Failures (MTBF) measures the average time between consecutive application or service failures. A failure could be a major incident like a total network outage or a small incident lasting a few seconds. Nature of the industry and business, customer service level agreements (SLAs) determine the tolerance for the frequency and duration of service failures. A lower value of MTBF could point to inefficiencies with code quality or hardware and other infrastructure issues. Regular review of this metric provides an opportunity to identify and improve upon these inefficiencies.
Mean Time to Recovery (MTTR) measures the average time needed to detect, troubleshoot, fix and return the service to a healthy state. It indicates the speed at which a service downtime is resolved. The review of this metric provides an opportunity to discuss and understand how the different processes leading up to the downtime resolution are working – Is the alerting mechanism working optimally? Is the team taking a long time to troubleshoot and fix the problem, and if so, why?
Mean Time to Resolve is the average time needed to return the service to a healthy state and ensure that the underlying cause is fully resolved so that the issue will not recur. As this metric aims at improving the long-term performance and reliability of the service, it is an important metric that can be strongly correlated with customer satisfaction.
SLA compliance ratio measuring the percentage of issues resolved within all the SLA parameters. While it is important to have a higher SLA compliance ratio, it does not necessarily mean the IT services are operating at optimal levels. In this context, SLOs (Service Level Objective) and SLIs (Service Level Indicator), which are more stringent, should be looked at. An SLO is a target value or range of values for a service level that is measured by an SLI. SLIs are measured internally to determine whether the SLO and hence SLA is being met. This enables the team to identify when they are nearing an SLA breach and hence respond and react quickly to avoid such a breach.

2.3 Customer service and satisfaction

Average first response time indicates the time it takes for the support team to reply to a customer request.
Average issue resolution time indicates the time it takes to solve an issue reported by the customer.

A shorter duration for the above two metrics provides a better customer experience and thus impacts customer satisfaction positively.

Net Promoter Score (NPS) measures how likely a customer is to recommend the product and services of your organization to others. This is an important metric as it gauges how strong customer engagement is and can have a direct impact on sales and revenue. It is worthwhile to note that Customer Service is not the only factor impacting the NPS. Other key factors that can have a direct impact on NPS are the quality of the product, price, and ease of use.

3 Technical excellence

This dimension should focus on measuring factors that impact technical excellence, namely quality of delivery, agility in the team, and team culture to adapt to industry technology evolution.

3.1 Quality of delivery

Escaped defects is the number of defects that have entered the live environment in a particular release. In an ideal situation, these defects should have been caught in test cycles and addressed. Hence, this metric is an indicator of the release quality and the rigor of the quality control processes followed by the teams.
Code coverage provides a view of the extent to which the code base is covered by automated tests. It is important to note that having a 100% code coverage metric does not necessarily assure zero defects, however, it is a good measure of untested code.
Functional test coverage which provides a view of which features in the release meet the acceptance criteria
Sprint Goal success which is an indicator of how often the team meets customer requirements and is also an indicator of the team’s agile maturity.

3.2 Agility in delivery and team culture

Lead and cycle time to deliver service requests and other business as usual activities (BAU) – The earlier sections of this article discussed a similar metric related to business deliveries. A typical agile team has to also focus on BAU requests, and this metric gives insights into how fast a request is getting picked up by the team and how long it takes for the team to complete a request and deliver value after starting to work on the request. This metric, when juxtaposed with the related metric for business delivery, gives insights into whether the team prefers business requests over BAU or vice-versa.
Average time builds stay “Red” is an indicator that the team is probably unable to focus on a work item till its completion. This could be due to the assignee multi-tasking or not giving priority to fix the build or that a dependency is not met due to which the build cannot be successful. In either case, this metric provides insights into patterns and behaviors involving team dynamics.
Technical debt is a tricky thing to measure and track. However, it can severely impact the team’s ability to maintain and enhance software easily. Many times, teams end up focusing on delivering business requests and will be unable to take dedicated time to refactor or re-architect the hot-spots. One common approach many teams follow is club refactoring when that portion of code needs changes to deliver business demands. A metric to monitor how technical debt is evolving from release to release gives a good indication of the team mindset – whether they are focused on delivering the best software or are content with delivering just the business needs.
Duration in which a new member of the team becomes productive is another tricky thing to measure and track. However, it is an important element that impacts the overall team productivity and gives insights into the effectiveness of the on-boarding process and team culture, and team dynamics.

4. Innovation

This dimension should focus on measuring factors that impact innovation advancements in the organization.

Innovative Culture gauges how conducive is the environment within the organization to promote innovation at all levels and how it is evolving over time. Some of the important metrics covering this aspect are:
- Employee perception of organization’s innovativeness measured by periodic surveys across all departments of the organization
- Number of employees who have undergone training related to innovation covering innovation frameworks, internal processes for managing innovation
- Number of ideas converted into projects having an executive sponsorship
- Number of ideas proposed by staff vs. executives
- Effort spent by the teams on innovative ideas vs. other initiatives
Innovation pipeline

The focus here is to measure the efficiency of the innovation pipeline and ideation process. Some of the important metrics covering this aspect are:

- Ratio of new ideas proposed to those implemented
- Average time an idea stays in the different stages of the idea life cycle
- How soon ideas that don’t get implemented are killed
Innovation outcomes

The focus here is to measure the effectiveness of the innovation process

- Time to market measuring how long it takes for an idea to get implemented and released to customers
- Innovation spending converted to new product sales measuring how much revenue each dollar spent on innovation is bringing in
- Customer perception of the organization’s innovativeness measured by periodic customer surveys and other methods.

5. People growth and happiness

This dimension should focus on gauging how “happy” the teams are. At the most foundational level, a team’s success or failure is highly dependent on its team members.

Employee engagement: Employee satisfaction survey is a popular means that organizations have been using for a long time to gauge employee engagement. Net Promoter Score is another popular metric used. However, rather than asking a single question, “How likely are you to recommend others to work at our organization” or “How happy are you with the organization” which evokes a very subjective response, it is prudent to have a set of specific questions so that specific areas of improvement can be identified based on employee feedback.
Employee churn is a measure of how many people leave the organization in any given period. This is a strong indicator of how happy employees are, as happy employees are more likely to stay in the organization.
Employee upskilling and re-skilling is a measure of how the staff of the organisation are preparing for the future, which is a key factor for building a resilient and adaptable workforce.

6. Financial Excellence

IT Financial Excellence aims to optimize the IT spend while maximizing the value delivered by IT resources.

The following are a set of metrics that would provide a good insight into financial excellence. These metrics help put other key metrics of other dimensions (business delivery, operational and technical) in the context of IT resources and investment.

Budget spent on strategic initiatives vs. support requests vs. innovation provides a view of how much is being spent on growing, running and transforming the business and whether this aligns with the organisation’s objectives.
Total cost of ownership per service enables IT teams and managers to know the true cost of the service (human resources, infrastructure, and other IT resources including, CapEx and OpEx costs) and enables them to make better decisions on the IT spend considering the service criticality and current spend.

Conclusion

There are many metrics that could be measured. Ensuring a balanced focus on the six dimensions of Business delivery excellence, Operational excellence, Technical excellence, Innovation, Happiness, and Financial excellence and understanding the value each of the metrics brings in are essential for the success of agile teams.