SQLServerCentral Editorial

A Poor Data Model

,

Recently there was some online complaints about social security numbers (SSNs) in the US being duplicated and re-used by individuals. This is really political gamesmanship, so ignore the political part. Just know that social security numbers appear to be one of the contenders used in many data models.

I found a good piece about how SSNs aren't unique, and have a mess of problems. Despite this, many people seem to want to use SSNs as a primary or alternate key in their database systems. They also aren't well secured in many systems, even though we should consider this sensitive PII data.

As we have moved many analog systems to digital ones, we often find that our assumptions about the rules governing data aren't that well defined. I've worked in many systems where data elements were assumed to be unique in some way, but they actually weren't. I've seen invoices, POs, and other "unique numbers" actually duplicated because of simple mistakes by humans. When we try to enforce uniqueness in a database, we end up with problems. Often we actually need to drop keys and make exceptions because the data must be entered.

This has led many people to not create unique constraints or even foreign keys in their systems, and I understand why they don't. There are real problems when we assume the real world has the same strict structures we implement in code. I've seen systems go online and then FKs removed because of poor data quality. It might be a mistake, but it's also a reality when we find there is existing child data without a parent. We might create a pseudo-parent at times, but we might also decide not to do so if that creates other problems.

I suspect over time the real world will migrate some of their problematic keys to something more robust. However, some that are used widely in older systems, like SSNs, are unlikely to change in my lifetime. There are far too many places where this is in use and I am not sure that there is any consensus to undertake the massive amount of work to implement something else. I suspect even trying to add digits to the value is a task we'll put off indefinitely.

Be careful of using natural keys in your data models unless you are sure they are really a natural key. To me, I'm better off with some surrogate key in the event that my "natural key" turns out to not have the uniqueness I expected.

Rate

5 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (1)

You rated this post out of 5. Change rating