Using Big Data to Improve Health

One of the frustrations of medical research is in getting good representative data. It should, you might suppose, be easy: you just scoop up all the medical records into a huge medical database, anonymize the data, and then just sharpen up your Python skills. Sigh; nope.

The problem with the main 'big-data' medical sets such as the IBM Watson/Truven database is that they largely represent only people with medical insurance. This doesn't always cause problems, provided all conclusions bear in mind that it is a skewed population. This is why the medical records of a country such as the UK are potentially so valuable for research: they represent an entire population. With the help of a database that provides information on this scale, we can do a great deal more to advance medical and pharmaceutical science.

What, if anything, stands in our way? The first problem is that such records can't reliably be pseudonymized or 'de-identified'. If you are researching individuals, and you know quite a bit about them, there are likely to be a couple of unusual characteristics that will allow you to match the records to the individual. Researchers have shown that it can result in a match rate of over 90%.

The second problem is a regulatory problem. Medical records in Europe are not owned by the entity that collects or stores the data but to the patient. Although few people will disagree with the intention of allowing their records to be used, many will refuse consent because there are doubts about security. If there is a breach, you can't change your medical history as you would your password. The GDPR's guidance is that confidential patient information can only be used by hospital and university researchers, medical royal colleges and pharmaceutical companies researching new treatments.

The third problem is the poor general understanding of the constraints of statistics. Statistical methods should come with the same warnings as a chain-saw. We still haven't reversed out of the statistical nightmares that presented a false '40%' conclusion about the value of statins in reducing cholesterol, certain types of which we now discover we need in spadesful for good health, unlike statins.

The fourth problem is that there is no central database for medical records. It is difficult for this to happen because of the poor "interoperability" of the data, and the inability of the many health information systems to work together to join up the many separate, and sometimes warring 'care settings' making up the NHS (National Health Service).

A fifth problem is bad data; by which I mean poor data quality, completeness and accuracy. There have been many reports of potentially incorrect codes being used to record illnesses and treatments, as well as missing or invalid identifiers, such as NHS numbers.

It is typical of projects of this sort that few, if any, problems involve database technology. We have all the analysis tools we need. The tasks we face are mainly organizational and they'll take time to resolve.

Protecting Data from the Inside

by Phil Factor

SQLServerCentral

Phil Factor on the difficult task of guarding against the theft of data from within an organization.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(2)

You rated this post out of 5. Change rating

2019-10-19

209 reads

Discuss

SQL Tamagotchi! Mini-Servers as Database Pets

by Phil Factor

SQLServerCentral

If you've ever stuffed SQL Server onto a Raspberry Pi 4 and given it a pet name, don't worry, you're not alone…

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2019-08-24

298 reads

Discuss

The Sequel to SQL

by Phil Factor

SQLServerCentral

Phil Factor tacks a few new buzzwords onto his CV and looks forward to joining the new wave of "digital innovators".

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(4)

You rated this post out of 5. Change rating

2019-08-10

704 reads

Discuss

Over-thinking Database Build Scripts

by Phil Factor

SQLServerCentral

Database Builds

With SQL Server we tend to build databases, when necessary, from one or more build scripts. If making changes to existing versions of the database, we then script the required changes. Usually, a synchronization tool will create a script that can be tweaked to work; although occasionally it will require something more complicated, as when […]

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(1)

You rated this post out of 5. Change rating

2019-06-29

341 reads

Discuss

Databases made for Ops, not 'Oops!'

by Phil Factor

SQLServerCentral

Monitoring

Just as you write a unit test before writing the code, so you must devise the means to monitor a database, to ensure its smooth operation, before creating the database.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(1)

You rated this post out of 5. Change rating

2019-06-01

221 reads

Discuss

Using Big Data to Improve Health

Rate

Share

Share

Rate

Using Big Data to Improve Health

Rate

Share

Share

Rate

Related content

Protecting Data from the Inside

SQL Tamagotchi! Mini-Servers as Database Pets

The Sequel to SQL

Over-thinking Database Build Scripts

Databases made for Ops, not 'Oops!'