January 21, 2026 at 12:00 am
Comments posted to this topic are about the item Learning From Breakage
January 21, 2026 at 3:18 am
This was removed by the editor as SPAM
January 21, 2026 at 3:18 am
This was removed by the editor as SPAM
January 21, 2026 at 9:27 am
I broke a reporting system at month-end. I had 18 hours of intense learning that day.
With a production breakage, you can't step away from it, especially if you are a senior in a department. You are accountable, even if you weren't responsible.
No one wants to be the cause of a breakage, and it is painful to own up. As a principal engineer, I try to make an environment where juniors feel safe to own up and be honest about their involvement so we can work together. It is a nightmare trying to solve a puzzle where half the pieces are missing.
A breakage doesn't just teach us about a system. It teaches us the value of a methodical diagnosis process.
I anonymise who broke it in any written document. My only interest in "who" is who can provide the most useful detail. I'm not a fan of blame games.
At my company, we put together a post-mortem document for any incident, whether we've caused the breakage or the cause is an external factor. This will contain the timeline of actions to diagnose, fix and confirm that the fix is satisfactory.
It will also contain a list of actions to reduce the risk of it happening again.
We go through any post-mortem as part of team retrospectives, so we all learn from it. It is painful but necessary.
January 21, 2026 at 1:49 pm
We go through any post-mortem as part of team retrospectives, so we all learn from it. It is painful but necessary.
I'm a big fan of post-mortems. Especially, when a written post-mortem is reviewed by a senior engineer who was not involved in the incident. Sometimes people aren't quite sure exactly why the issue occurred and end up convincing themselves of some narrative.
Also, worst schema I've come across so far.....ticket numbers from the companies ticketing system. Oh God Why?!
January 25, 2026 at 1:23 am
David.Poole wrote:We go through any post-mortem as part of team retrospectives, so we all learn from it. It is painful but necessary.
I'm a big fan of post-mortems. Especially, when a written post-mortem is reviewed by a senior engineer who was not involved in the incident. Sometimes people aren't quite sure exactly why the issue occurred and end up convincing themselves of some narrative.
Also, worst schema I've come across so far.....ticket numbers from the companies ticketing system. Oh God Why?!
In my previous job we occasionally did post-mortems. We didn't tend to have major catastrophes.
In my current job, I suspect that post-mortems are done, but only upper management are involved. We don't tend to have major catastrophes, either. In my 10 years there, there's only been one that I can recall. A upper level manager asked me some questions about the incident, then took that information with her to the post-mortem. Anyway, I'd like to be in one, even if it involves me, just so I can learn from it.
I laughed out loud when I read your commend about ticket numbers from the companies ticketing system!
Rod
Viewing 6 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply