Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
Log in  ::  Register  ::  Not logged in

Bad Data Costs Lives

By Phil Factor,

Database professionals soon learn, by experience, a great respect for the need for quality in the data. I remember vividly the time I first learned this lesson, while building applications for dealers on the London Metal Exchange. One of my SUM aggregations returned the wrong answer (double entry bookkeeping caught the error). Due to a slight problem with the BCD Math package I was using, the grand total was a couple of pennies out in totals of around 5 million pounds. Jokingly, I offered to pay the Stockbroker the difference but he was horrified, "Either data is right or it is wrong. There is no in-between. Get it right!"

It's a lesson that stayed with me. In this case, we were dealing with exact measurements and, if the data is 'good', we can trust the judgments we make based on its analysis, providing we get the calculations right. More generally, we live in a world of uncertainty, and have to be clear about the level of uncertainty when we present figures. Moreover, if data is 'bad', it is very difficult to 'cleanse' it in a way that we can rely on the calculations and decisions we derive from it. There is no magic cleansing agent in statistics. Unfortunately, there are cases where important decisions based on 'bad' or at least 'uncertain' data can cost lives, as was the case in the recent scandal that hit the UK Mid-Staffordshire NHS Foundation Trust.

In the UK, hospitals receive a rank, on behalf of the government, based on their Hospital Standardized Mortality Ratios (HSMRs). In short, and as described in more detail here, hospitals attribute "diagnostic codes" to their patients based on the disorders and diseases from which they are suffering. The HSMRs derived from these codes aim to account for every important variable that determines whether a patient admitted to hospital lives or dies, so that what is left is a way to compare directly the quality of care across hospitals. The idea is that the low-ranking hospitals get an incentive to increase their quality of care, and the public can select the best hospital in their area.

It's a good idea, but reality got in the way. Firstly, the recording of the diagnosis for a patient is not always accurate. Last year, for example, the Hospital Episode Statistics (HES) data, which converts the hospitals records into internationally recognized ICD or OPCS coding, recorded that 16,992 of the 785,263 patients coded as having had "in-patient Obstetrics episodes" were male. Hmm. Wrong.

Even more worrying is what happens if a hospital decides that a low rank is not a problem with its care, but with its coding. For example, the "palliative care" code can have a significant impact in reducing HSMR. If a patient is assigned this code, allowances are made in the HSMR calculation to prevent hospitals from blame in cases where a patient's life cannot be saved. The use of this code has increased, for valid reasons in many cases, but the fear is that hospitals can respond to poor rankings not with proper inspections and improved care procedures, but by disguising the true mortality rates with data 'cleansing' (recoding), so putting the lives of patients at risk.

The problem in mid-Staffordshire seems to be one of managers, monitoring quality of care at their hospitals, putting too much faith in data that was divorced from reality. The data said that mortality rates were low, in direct contradiction of the testimony of relatives of those who felt relatives had died unnecessarily, which went unheeded. "They must be wrong, because we have the data". When The Francis Report published, in February, the government spoke of a culture of 'metrics and league tables' in the way that hospitals are judged as a key factor in the scandal.

As database professionals, we are all too familiar with the concept of Bad Data, and have the experience to spot it and prevent its misuse. Indeed, perhaps it is time we took the lead in ensuring that the specialism of 'Data Scientist' is based on responsible use of data and respect for data quality.

Phil Factor.

Total article views: 176 | Views in the last 30 days: 3
Related Articles

Data Quality

What is data quality? How do you measure it and how does it affect your data warehouse? SQL Server w...


Data Quality

Today we have an editorial reprinted from Dec 12, 2005 as Steve is on vacation. Steve talks about th...


Database Design for Hospital Management

Hi, I want to create a database for hospital management system. but i dont have much idea to des...


What's Your Code Quality?

Today we have an editorial reprinted from Jan 3, 2006 as Steve is on vacation. What's the quality of...


Microsoft Data Quality Services in SQL Server 2012

SQL Server Data Quality Services (DQS) is a knowledge-driven data quality product that is new to SQL...

data quality    
database weekly    

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones