The TSB disaster: Where were the grown-ups?

  • Comments posted to this topic are about the item The TSB disaster: Where were the grown-ups?

    Best wishes,
    Phil Factor

  • However, in banking processes, I'm a conservative who grips thedepartment's computer manual with whited knuckles. The consequences of errorare so dire, and the complexity is so great.

    Well said.  There are places to be if you want to be on the bleeding edge of tech advances.  Banks are not one of them.  (Except maybe when it comes to security, but I don't know enough to have a good opinion on this).

  • TSB was like many others, a disaster waiting to happen. Even when voicing concerns with similar institutes, no one would listen and let alone do anything. This is not about technology, it is all about beggars beliefs and ivory towers.
    😎 

    I have walked away from many contracts were the alarm bells rung loudly, pretty much a proper barge pole exerciser, never to be proven wrong in doing so!

  • I think the problem is that the Facebook mantra of "ship often and break things" is fine when the effect of some users suffering bugs is that they don't get to see their cat videos, but applied to systems where even little bugs matter a lot it becomes a recipe for disaster. The suggestion that limited checks on production readiness and issues with active/active data centres probably hints at systems skewing over time and fixes/updates possibly not being fully deployed. Sometimes you just need careful timed and staged deployments, with extensively tested rollback strategies and comprehensive testing that has verified the deployment process as much as is possible - as much as the Agile development crowd would like to think you don't.

  • I don't know anybody who works for them, but I bet their DBAs and sysadmins had been flagging issues for months before this happened. We've all been there and been ignored.

  • By complete coincidence I was "on the outside" dealing with an issue relating to TSB on Wednesday and was very suspicious about the data. On Friday Phil's editorial turned up. By Sunday I had established that the problem was caused by a third party - not me, not the bank. The doubt in everyone's minds made the situation harder to deal with. Lesson for me: odd and unpleasant coincidences do happen.

    I think good may come of this mess. I have already saved Phil Factor's editorial and may use it in the future.

    The best practitioners of Agile I have known were absolute demons for repeated automated testing. It was a development project. _Everything_ was tested at several different levels. They were also rigorous about being able to roll-back. They had very few regression failures. That was development of new function.

    Something Phil does not emphasise in his article (you can only cover so much in limited space) is the "data migration" aspect of this. Not only does "function" have to work, but the data has to be right too. When you have existing data, then migration can be a substantial project in its own right, and it needs to be tested too. You need to be able to _prove_ it has worked properly.

    There are times when not being able to roll something back may be acceptable. I've done it myself and I've heard it referred to as "a success oriented strategy" (even at the time the description was intended to be ironic). If you cannot roll back then you have to accept the consequences of failure. In TSB's case that really shouldn't have been accepted.

    As Phil says "it's horses for courses" or should be anyway.

    Tom Gillies LinkedIn Profilewww.DuhallowGreyGeek.com[/url]

  • Tom Gillies - Monday, August 13, 2018 5:57 AM

    There are times when not being able to roll something back may be acceptable. I've done it myself and I've heard it referred to as "a success oriented strategy" (even at the time the description was intended to be ironic). If you cannot roll back then you have to accept the consequences of failure. In TSB's case that really shouldn't have been accepted.

    "Success Oriented Strategy".  Or, abbreviated, "SOS".  Riiiiight...  😉

    Thomas Rushton
    blog: https://thelonedba.wordpress.com

  • Does anyone know where this ranks with other IT rollout disasters?

    412-977-3526 call/text

  • Another proof that if you want it real bad, that's the way you'll get it. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • robert.sterbal 56890 - Monday, August 13, 2018 8:39 AM

    Does anyone know where this ranks with other IT rollout disasters?

    Qualitatively, about the same as any UK Government IT rollouts (fallouts?) in the last two decades, doesn't get "better" than that!
    😎

    On the quantity side, Microsoft is the winner with several (heart) breaking updates, slowly getting better at it but some room for improvements still 😀

  • Jeff Moden - Monday, August 13, 2018 8:42 AM

    Another proof that if you want it real bad, that's the way you'll get it. 😀

    The only relation to "proof" I can see is that only after few sips of a "100 proof" could one ever accept such results :doze:
    😎

  • Tom Gillies - Monday, August 13, 2018 5:57 AM

    By complete coincidence I was "on the outside" dealing with an issue relating to TSB on Wednesday and was very suspicious about the data. On Friday Phil's editorial turned up. By Sunday I had established that the problem was caused by a third party - not me, not the bank. The doubt in everyone's minds made the situation harder to deal with. Lesson for me: odd and unpleasant coincidences do happen.

    I think good may come of this mess. I have already saved Phil Factor's editorial and may use it in the future.

    The best practitioners of Agile I have known were absolute demons for repeated automated testing. It was a development project. _Everything_ was tested at several different levels. They were also rigorous about being able to roll-back. They had very few regression failures. That was development of new function.

    Something Phil does not emphasise in his article (you can only cover so much in limited space) is the "data migration" aspect of this. Not only does "function" have to work, but the data has to be right too. When you have existing data, then migration can be a substantial project in its own right, and it needs to be tested too. You need to be able to _prove_ it has worked properly.

    There are times when not being able to roll something back may be acceptable. I've done it myself and I've heard it referred to as "a success oriented strategy" (even at the time the description was intended to be ironic). If you cannot roll back then you have to accept the consequences of failure. In TSB's case that really shouldn't have been accepted.

    As Phil says "it's horses for courses" or should be anyway.

    What's really sad is that we (society) know how to adequately plan for failures, but we continue to pretend that IT-related projects don't need to go through the engineering rigors "actual" engineers go through before rollout.  Simply going through FMCEA, and building out the statistical predictors prevalent in some of the industries where "failure = death" still rings true would have given them the tools to know when and where to roll back, the costs of failing forward, and time to plan how to react when things don't just pan out exactly right.

    This on the other hand was a total failure to plan, and Murphy decided to pay a visit and stay for a  while.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Allowing customers to use the system while it's in the process of being migrated, I guess that's one way to meet the technical requirements of your department's 99.9% SLA. That is, unless something goes wrong ... Perhaps somebody took a bet and lost, not knowing that real IT managers make their own luck by planning ahead.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell - Tuesday, August 14, 2018 12:48 PM

     ... Perhaps somebody took a bet and lost, not knowing that real IT managers make their own luck by planning ahead.

    That's a nice way of putting it. Your post has reminded me NOT to follow up the job enquiry from SPECTRE! ("this organisation does not tolerate failure!") 😉

    Tom Gillies LinkedIn Profilewww.DuhallowGreyGeek.com[/url]

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply