The TSB disaster: Where were the grown-ups?

Question

The TSB disaster: Where were the grown-ups?

Phil Factor

SSC-Insane

Points: 20254
More actions
August 11, 2018 at 1:01 am

#360953

Comments posted to this topic are about the item The TSB disaster: Where were the grown-ups?
Best wishes,
Phil Factor

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

RonKyle SSC-Dedicated Points: 31575 More actions · Answer 1

However, in banking processes, I'm a conservative who grips thedepartment's computer manual with whited knuckles. The consequences of errorare so dire, and the complexity is so great.

Well said. There are places to be if you want to be on the bleeding edge of tech advances. Banks are not one of them. (Except maybe when it comes to security, but I don't know enough to have a good opinion on this).

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 2

TSB was like many others, a disaster waiting to happen. Even when voicing concerns with similar institutes, no one would listen and let alone do anything. This is not about technology, it is all about beggars beliefs and ivory towers.
😎

I have walked away from many contracts were the alarm bells rung loudly, pretty much a proper barge pole exerciser, never to be proven wrong in doing so!

andycadley SSCertifiable Points: 5296 More actions · Answer 3

I think the problem is that the Facebook mantra of "ship often and break things" is fine when the effect of some users suffering bugs is that they don't get to see their cat videos, but applied to systems where even little bugs matter a lot it becomes a recipe for disaster. The suggestion that limited checks on production readiness and issues with active/active data centres probably hints at systems skewing over time and fixes/updates possibly not being fully deployed. Sometimes you just need careful timed and staged deployments, with extensively tested rollback strategies and comprehensive testing that has verified the deployment process as much as is possible - as much as the Agile development crowd would like to think you don't.

Beatrix Kiddo SSC-Dedicated Points: 32407 More actions · Answer 4

I don't know anybody who works for them, but I bet their DBAs and sysadmins had been flagging issues for months before this happened. We've all been there and been ignored.

Tom Gillies SSCrazy Points: 2761 More actions · Answer 5

By complete coincidence I was "on the outside" dealing with an issue relating to TSB on Wednesday and was very suspicious about the data. On Friday Phil's editorial turned up. By Sunday I had established that the problem was caused by a third party - not me, not the bank. The doubt in everyone's minds made the situation harder to deal with. Lesson for me: odd and unpleasant coincidences do happen.

I think good may come of this mess. I have already saved Phil Factor's editorial and may use it in the future.

The best practitioners of Agile I have known were absolute demons for repeated automated testing. It was a development project. _Everything_ was tested at several different levels. They were also rigorous about being able to roll-back. They had very few regression failures. That was development of new function.

Something Phil does not emphasise in his article (you can only cover so much in limited space) is the "data migration" aspect of this. Not only does "function" have to work, but the data has to be right too. When you have existing data, then migration can be a substantial project in its own right, and it needs to be tested too. You need to be able to _prove_ it has worked properly.

There are times when not being able to roll something back may be acceptable. I've done it myself and I've heard it referred to as "a success oriented strategy" (even at the time the description was intended to be ironic). If you cannot roll back then you have to accept the consequences of failure. In TSB's case that really shouldn't have been accepted.

As Phil says "it's horses for courses" or should be anyway.

Tom Gillies LinkedIn Profile www.DuhallowGreyGeek.com[/url]

Thomas Rushton SSC-Insane Points: 22649 More actions · Answer 6

Tom Gillies - Monday, August 13, 2018 5:57 AM
There are times when not being able to roll something back may be acceptable. I've done it myself and I've heard it referred to as "a success oriented strategy" (even at the time the description was intended to be ironic). If you cannot roll back then you have to accept the consequences of failure. In TSB's case that really shouldn't have been accepted.

"Success Oriented Strategy". Or, abbreviated, "SOS". Riiiiight... 😉

Thomas Rushton
blog: https://thelonedba.wordpress.com

Robert Sterbal SSChampion Points: 11071 More actions · Answer 7

Does anyone know where this ranks with other IT rollout disasters?

412-977-3526 call/text

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 8

Another proof that if you want it real bad, that's the way you'll get it. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 9

robert.sterbal 56890 - Monday, August 13, 2018 8:39 AM
Does anyone know where this ranks with other IT rollout disasters?

Qualitatively, about the same as any UK Government IT rollouts (fallouts?) in the last two decades, doesn't get "better" than that!
😎

On the quantity side, Microsoft is the winner with several (heart) breaking updates, slowly getting better at it but some room for improvements still 😀

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 10

Jeff Moden - Monday, August 13, 2018 8:42 AM
Another proof that if you want it real bad, that's the way you'll get it. 😀

The only relation to "proof" I can see is that only after few sips of a "100 proof" could one ever accept such results :doze:
😎

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 11

Tom Gillies - Monday, August 13, 2018 5:57 AM
By complete coincidence I was "on the outside" dealing with an issue relating to TSB on Wednesday and was very suspicious about the data. On Friday Phil's editorial turned up. By Sunday I had established that the problem was caused by a third party - not me, not the bank. The doubt in everyone's minds made the situation harder to deal with. Lesson for me: odd and unpleasant coincidences do happen.
I think good may come of this mess. I have already saved Phil Factor's editorial and may use it in the future.
The best practitioners of Agile I have known were absolute demons for repeated automated testing. It was a development project. _Everything_ was tested at several different levels. They were also rigorous about being able to roll-back. They had very few regression failures. That was development of new function.
Something Phil does not emphasise in his article (you can only cover so much in limited space) is the "data migration" aspect of this. Not only does "function" have to work, but the data has to be right too. When you have existing data, then migration can be a substantial project in its own right, and it needs to be tested too. You need to be able to _prove_ it has worked properly.
There are times when not being able to roll something back may be acceptable. I've done it myself and I've heard it referred to as "a success oriented strategy" (even at the time the description was intended to be ironic). If you cannot roll back then you have to accept the consequences of failure. In TSB's case that really shouldn't have been accepted.
As Phil says "it's horses for courses" or should be anyway.

What's really sad is that we (society) know how to adequately plan for failures, but we continue to pretend that IT-related projects don't need to go through the engineering rigors "actual" engineers go through before rollout. Simply going through FMCEA, and building out the statistical predictors prevalent in some of the industries where "failure = death" still rings true would have given them the tools to know when and where to roll back, the costs of failing forward, and time to plan how to react when things don't just pan out exactly right.

This on the other hand was a total failure to plan, and Murphy decided to pay a visit and stay for a while.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Eric M Russell SSC Guru Points: 125620 More actions · Answer 12

Allowing customers to use the system while it's in the process of being migrated, I guess that's one way to meet the technical requirements of your department's 99.9% SLA. That is, unless something goes wrong ... Perhaps somebody took a bet and lost, not knowing that real IT managers make their own luck by planning ahead.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Tom Gillies SSCrazy Points: 2761 More actions · Answer 13

Eric M Russell - Tuesday, August 14, 2018 12:48 PM
... Perhaps somebody took a bet and lost, not knowing that real IT managers make their own luck by planning ahead.

That's a nice way of putting it. Your post has reminded me NOT to follow up the job enquiry from SPECTRE! ("this organisation does not tolerate failure!") 😉

Tom Gillies LinkedIn Profile www.DuhallowGreyGeek.com[/url]

Phil Factor SSC-Insane Points: 20254 More actions · Answer 14

And if anyone thought I was exaggerating TSBs inability to do an upgrade, they are at it again!
TSB suffers another weekend of downtime

TSB customers have suffered another weekend of downtime, this time affecting the bank's mobile app and online services, following what would appear to be another botched upgrade.
The bank had scheduled an upgrade for between 11pm on Friday and 3am on Saturday. However, many customers have been plagued by problems since then, with the bank claiming that it had finally solved the problems this morning - only to withdraw the claim hours later.

Best wishes,
Phil Factor