Impact Minutes

When I started working in operations departments, we were always concerned about downtime. In a simpler world, often inside an organization, this meant was a particular machine working or not. We did have networking issues at times, but often we were measured by how often a server was not reachable from clients.

These days, with many machines often involved in backing an application, downtime can be a debate, but often we have particular places from which we can test if an application is down. Some services, like Slack, might test multiple parts of the application, which I like. However, ultimately for any of these, there could be a simple (up/down) or complex (up, down, degraded, maintenance, etc.) status.

I was listening to a DevOps talk recently from an Operations group that talked about how they prioritize and triage work. There are times that the amount of work during an incident overwhelms resources, so that they need to decide what to work on first, or who needs to work on what.

This group had the concept of impact, which essentially was a product of two values, downtime and blast radius. Blast radius was essentially the number of people affected, though sometimes this was weighted. Finance or sales person impact might be greater than average employee impact. They would do a calculation and decide where to focus time.

If one part of a website of say, 4 parts, was down, the impact could be lower than if the database is down. If a database is down, but 10 people are affected, this could be less important than a network issue affecting 100 people.

I think I've often intuited the number of people affected, but rarely have I thought about this directly. To me, this is a good calculation to have handy, with an awareness of how heavily the various systems are being used. While most of us aren't supporting something as widely use as Slack, we often are supporting both big and small systems, and having a way to rank the relative importance is handy in a crisis.

How Paranoid Are You About Backups?

by Steve Jones

SQLServerCentral

Sometimes just running a backup isn't enough, especially in this era of ransomware. Steve has a few thoughts on backup strategies and recovery skills.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2021-08-09

435 reads

Discuss

Incident Response Data

by Steve Jones

SQLServerCentral

Being prepared for a disaster might mean having a way to collect data when something occurs.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(1)

You rated this post out of 5. Change rating

2021-05-12

249 reads

Discuss

Recovering Databases From a Master Backup

by Steve Jones

SQLServerCentral

Losing your instance might result in the need to get information from what you have. Steve Jones looks at a way to get the proper version and patch, and database list, from what limited resources you might have.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(5)

You rated this post out of 5. Change rating

2020-10-27

2,556 reads

Discuss

Make SQL Server Agent Jobs HADR Aware

by Steve Rezhener

SQLServerCentral

Introduction Always On Availability Groups (AGs/AG...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(3)

You rated this post out of 5. Change rating

2020-10-22

9,339 reads

Discuss

DR as a Service

by Steve Jones

SQLServerCentral

Disaster Recovery (DR)

It's not the first task when I start a new job, but often as a DBA or developer, I usually ask about Disaster Recovery (DR) plans sometime within the first six months. If I'm a DBA, of course I need a plan. If I'm a developer, however, I still need to understand how this might […]

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2020-10-01

111 reads

Discuss

Impact Minutes

Rate

Share

Categories

Share

Rate

Impact Minutes

Rate

Share

Categories

Share

Rate

Related content

How Paranoid Are You About Backups?

Incident Response Data

Recovering Databases From a Master Backup

Make SQL Server Agent Jobs HADR Aware

DR as a Service