RE: Enforcing Data Quality while using Surrogate Keys

SSChasing Mays

Points: 601

September 8, 2009 at 12:46 am

It's an interesting take on surrogate keys.

I've designed a few data warehouses over the years and I've not really regarded duplicity as a problem, rather inherent in the data, if it is by nature going to provide historical analysis. Here's why...

- I will often include metadata in the design at the physical level to help understand the lifecycle of the data. These fields include created dates, modified dates, current flag and so on...

- Multiple versions of the same ID would be identified by the "created" and "modified" dates, as well as a flag indicating it is the current version of the record; in essence, this is a slow changing dimension/entity.

- ETL processes check if there is a current record for the incoming data and assign the existing key or create a new one as necessary.

- I *always* move away from intelligent keys for the physical structures; many times I have joined an existing ETL project where the model has relied on the business keys, only to find that the data quality/constraint enforcement causes more problems than it solves.

I'm liking your work, Marvin. I'll keep an eye out for more of your posts.

Cheers