Blog Post

Human Data Quality

,

Disclaimer: I read and speak one language, having failed pretty well at learning Latin, Spanish, and then Japanese in my schooling. I'm sure there are more than a few people that would actually say I've not doing too well with English, either!

I've got a few examples here of "data quality" issues that I've seen in emails and posts lately. I don't intent to make fun of anyone, and I'm sure I would make much worse mistakes if I were to attempt to post on a non-English site. Instead I thought these highlighted some great challenges in the data world. First my examples:

"Greet" in response to fixing something.

"I'm thinning about the best way to ..." - A post wondering about a T-SQL query.

"sintax error"

That last one might be easily corrected, and I've seen other errors that are worse (and I can't find right now). But how smart does a routine need to be to decode these types of grammatical issues?

You might think a grammar checked can handle things, but I've written a lot of sentences that Word flags as having an issue, but isn't sure what to do with them. And Word is a free-form application. Imagine if you are trying to do some type of parsing or clean-up of data that isn't constrained with look-up tables?

Data quality is becoming a bigger and bigger issue in our world, and I'm not even sure that we realize it. More and more systems exchange data, and greater amounts of it. As companies seek to work together, and partner to develop new applications, they are merging data between them, depending on employees that aren't always DBAs to somehow match up data. Or they depend on automated systems to "guess" what should go where?  I'm not always sure they do a good job matching up data.

And then information is lost.

Not that DBAs do a better job, but I think a human has a better chance of learning from past mistakes and correcting them in the future.

I'm not an ETL expert, but I think there is a tremendous amount of flexibility and power in the SSIS programming model to help you figure out how to best match up data from disparate sources and clean if before it infects your system.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating