Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

But, but, but, my data is clean!

By Gail Shaw,

If there’s one phrase that makes me laugh, whenever I hear it from developers or application owners, it’s "My system doesn’t have bad data!"

Chances are, it does. Chances are there are orphaned rows, incorrect values in columns, out of range values or just general garbage that sneaks in to any system once it gets used.

Customers who have the name ‘Dummy **DO NOT USE**’ with thousands of dollars of purchases, gender columns with values like ‘asdfdasl’ or ‘Yes’ in them, deliveries which were dispatched in the future and arrived before the fall of Rome, or purchases of products which just aren’t to be found in the product catalogue. I have seen all of these in production systems in the last couple of years.

Chances are that most production systems have some degree of bad data in them. But how does this happen? How do we end up with gender columns which contain 57 different values? Or shipment records which have no information about where the shipment was sent from?

I’m going to borrow and misuse a quote from the physicist Murray Gell-Mann:

“Everything not forbidden is mandatory.” (http://en.wikipedia.org/wiki/Totalitarian_principle)

If we do not explicitly prohibit bad data from entering the database, eventually some will get in there. Whether it be from bugs or missing validation in the application, someone connecting Excel to ‘just do some quick analysis’ or from the web page that the CEO’s nephew wrote up quickly last weekend.

Why do we allow this? Why do we not create the required constraints to prevent this kind of garbage from entering the database? Is it too much work? Are we too busy to do a complete design job? Is it ‘unagile’ to create lookup tables, foreign keys and check constraints? Or do we just enjoy so much cleaning up the bad data once it is in our systems?

I've laughed over some of these examples of bad data, after the event, but when you run across them in a database from which you're trying to get meaningful data, they aren't quite as amusing. If you'd like to share your weirdest or most worrying examples of "dirty data" in a production database, I'd love to hear them.

Gail Shaw (Guest Editor).

Total article views: 184 | Views in the last 30 days: 64
 
Related Articles
FORUM

Changing of Data type int to Big int in production Database which is used for replication

Changing of Data type int to Big int in production Database which is used for replication

FORUM

System databases

System databases

ARTICLE

Initial Installation of the Production Database

Your software has passed all testing phase(s) and it is time to install your database into productio...

FORUM

Development and Production Database

Insert Into Development and Production Database at the same time

SCRIPT

Automate Test Database Restoration

Automate test database restorations from your production system.

Tags
 
Contribute