Data Farming

Steve Jones, 2013-07-25 (first published: 2008-10-13)

This editorial was originally published on Oct 13, 2008. It is being re-run as Steve is out today, traveling to a SQL in the City event and SQL Saturday in Sacramento.

It's a fictional story, or at least I hope it is, but Bruce Schneier has a great piece on Identity Farming, a long term way to create false identities that would fit great in the spy world. Mr. Schneier doesn't see a practical point in doing it, but it's interesting from a data standpoint because he brings up a point. This could be done without a real person existing to back up all the data that's created.

The part that strikes me from this piece is that all too often we make assumptions about the people or entities that created all the data we use. One bad foreign key, orphaned child row, or incorrectly transformed piece of data could snowball downhill at a tremendous rate and it might be hard to determine what is wrong.

The blog entry talks about our data shadows, which grow larger and larger all the time. Unless you are actively trying to limit yours, every day that are likely new entries in some database about your life. And more and more, various companies and institutions interact with our data shadows instead of us. Credit checks, marketing efforts, when we board an airplane or make a purchase, all of these require checks on the shadow of data in our lives, not necessarily ensuring that the shadow is tightly linked to each of us.

I've had more than my share of confusion because of my name; it's common, in almost every database, and shared by thousands, if not millions of people. On on hand it means that I'm a little lost in the flood of "Steve Joneses" out there. On the other hand it makes it hard to correct mistakes. If there are 6 people with the same name and you have an orphaned record, who do you link it to? Do you guess? Infer it from the other data? I'd like to think you need to somehow research this, contact me, and make a note that the quality of this data could be suspect.

People working with information try to be accurate, but they get busy, and mistakes occur. I'm sure I'll find more and more over time, and I don't have a great solution for what might work better. I'd like to think that we would implement better checks for data quality, fuzzy searches, and somehow assign "risk" values to data. Something to let people know that there might have been some issue.

It's a thorny problem, one that's not going away, and likely to become more problematic in the future.

Contract or Perm

by Steve Jones

SQLServerCentral.com

Editorial

If you are accepting a DBA position, does it make sense to work as a contractor or permanent employee?

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2007-11-21

242 reads

Discuss

Mini-Me

by Steve Jones

SQLServerCentral.com

Editorial

Will the next version of Windows be a "Mini-Me" version of Vista? Who knows, and it's too early to tell, but apparently there's a mini-kernel version of Windows 7, the one after Vista, which fits into 25MB on disk. That's a touch lower than the 4GB that Vista takes up. Granted it's not a full […]

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2007-10-25

141 reads

Discuss

An Hour in Time

by Steve Jones

SQLServerCentral.com

Editorial

Daylight Savings time switches a little later this year. In fact it's November 4th this year, after having been in October for all of my life. In case you don't remember which way we move the clocks, here's a saying: Spring forward, fall back.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

(1)

You rated this post out of 5. Change rating

2007-10-17

404 reads

Discuss

Software is Like Building a House

by Steve Jones

SQLServerCentral.com

Editorial

One of the really classic analogies in software is that it's like building a house. You have a foundation, multiple teams, lots of contractors that specialize in something, etc. And it's an analogy that's debated as to its relevance over and over. I won't go into the correctness of this analogy, but I wanted to comment on it.