Serious Software Glitches

Recently Robert Sterbal pointed out a podcast to me. This link is for Apple Podcasts, but it's for the Journal, which is on other platforms (I listened on Spotify). It's the story of a computer glitch in UK post office software, which resulted in quite a few local postmasters being criminally prosecuted, many convicted, and even a few committing suicide. It's a sad story, and it's complex, but there are some technology-related elements.

First, the overall story is Fujitsu sold the UK a point-of-sale system for post offices. There was a computer glitch here, which incorrectly calculated lots of totals and showed postmasters owing more money than they should. They were upset, called support, got nowhere and many were liable for paying money they didn't owe. The UK postal management hid information about the widespread nature of the problem, while prosecuting many local postmasters. Fujitsu support didn't disclose to callers how others were experiencing this same issue. This also coincided with a (an unrelated) law that changed saying computer systems were presumed correct and anyone accused of a crime had to prove the computer was wrong.

Without a doubt, there are human failings here with support people, management, a vendor, and likely others. I don't want to minimize those, and I do think quite a few people involved, especially management, should face charges. However, since this is a database-related site, I wanted to focus on the code quality here. I don't know the exact nature of the calculation issue, but there is clearly a bug somewhere in the system. Do we, as technologists, think we're better developers or database people than those at Fujitsu? Would we not produce calculation bugs that might be hidden in aggregations? I have to say that I see this stuff all the time and not just in development. I run into these bugs in production, and I think this is often because we don't embrace enough testing. I see this in all sorts of systems, with developers of many different experiences.

While application developers have gotten very good at unit testing, that same habit hasn't gotten as widely deployed among database developers. What's more, I often find that people writing aggregation queries for reports often use lots of live data, and they don't write tests or even perform calculations to ensure complex formulas are correct. If you've ever done complex aggregations in SQL or DAX, you might find there can be strange effects from filters, from NULLs, and even from the way a window or range of rows is processed. It's easy to say that a report on 1,000 rows of data out of 100,000 is roughly correct with some total, when you haven't actually verified that calculation manually.

I certainly think Fujitsu deserves a lot of blame in this case. Ultimately, they are the source of issues. Those that covered up the problems, both at the UK government organization and at Fujitsu should be prosecuted and held liable, but the programmers and testers are also at fault. They didn't do a good job testing their software, and worse, didn't do the job of tracking down the bugs, finding issues, and correcting them. I hope those issues are fixed now, but they weren't addressed promptly as this situation took place across years.

I often work with companies trying to build software better, but I find it hard to get them to test database software. I know the testing frameworks are immature, the tooling is poor, and honestly, too few of us have a good test data management process in place. However, we can start to learn to add unit tests to our code. At the very least, we ought to write some repeatable, automated test when a bug is reported. Clearly, in that situation, we (as a team) didn't write good code if a bug was found. Either because of tech skills or we didn't get the specification correct. In either case, we need to improve and automated tests to ensure we don't make this mistake again are a way to start getting better.

Much of the software I've worked on isn't directly related to affecting human lives. That's probably true for most of you unless you write software that controls some sort of vehicle movement or medical device that dispenses care or drugs. My son works on rocket flight software, and he takes that seriously since people will be riding those, but for most of us, the work we do isn't critical to anyone living or dying.

However, this story shows that we might still affect human lives. We ought to take that responsibility seriously and ensure we are doing the best job we can to produce quality software. Having some testing (and good test data), is a way to double-check ourselves and our team. It's worked well to raise the quality level of mobile software dramatically. We database people ought to learn from that and adopt better testing.

Planning a Database Testing Strategy for Flyway

by Additional Articles

SimpleTalk

With Flyway, you can adopt a test-driven development strategy that will allow you to test and evaluate databases, and database objects, at every phase of the database development lifecycle. The further down the delivery pipeline that bugs appear, the more costly in time and resources they are to fix. This approach will allow you to catch many of them before the database change even gets committed to version control, making a continuous delivery process much easier to adopt and sustain.

2024-02-12

How We Ate Our Own Dog Food To Level-Up Internal Testing with Redgate Clone

by Additional Articles

SimpleTalk

Most applications have large and complex databases at the back end, making it hard for developers to adequately test their work before it goes out. Having a fast, repeatable process to deliver data on demand is an essential part of an effective software development lifecycle, ultimately leading to improved customer satisfaction. In this article, we’ll explore the journey our own engineering team went on to leverage our own tool, Redgate Clone, to spin up short-lived database instances in containers for automated testing.

2023-10-25

Where to Test Your Code

by Steve Jones

SQLServerCentral

When and where should you test code? Steve has a few thoughts when we consider the database as a crucial part of our software.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2024-06-26

87 reads

Discuss

What is subsetting, what are the advantages, and how does it make test data management easier?

by Additional Articles

Redgate

As data grows and databases become larger and more complicated, data
subsetting provides a method of working with a smaller, lighter copy of a
database to make development and testing faster and easier.

In this article, James Hemson poses the questions; what exactly is data subsetting, how and why are developers using it – or not using it, and
what’s prompting conversations about it?

2023-12-11

SQL Unit Testing reference guide for beginners

by Additional Articles

SQL Shack

In this article, we are going to learn the basics of SQL unit testing and how to write a SQL unit test through the tSQLt framework.

2024-01-09 (first published: 2023-09-08)

Serious Software Glitches

Rate

Share

Categories

Share

Rate

Serious Software Glitches

Rate

Share

Categories

Share

Rate

Related content

Planning a Database Testing Strategy for Flyway

How We Ate Our ​Own Dog Food​ To Level-Up Internal Testing with Redgate Clone

Where to Test Your Code

What is subsetting, what are the advantages, and how does it make test data management easier?

SQL Unit Testing reference guide for beginners

How We Ate Our Own Dog Food To Level-Up Internal Testing with Redgate Clone