Eric M Russell (11/7/2012)
When it comes to data quality in large enterprise organizations, what's essential is that the rules be formally documented (before development begins).
- If Col1 = 'A' then Col2 must be 1, 2, or 3.
- When Col3 is updated, then Col4 must also be updated with Date+Time EST.
The documentation must be continuously revised post-production as the business rules change, be published so that any user who queries the database has access to them, and be endorsed and promoted by mangement.
Here is how data typically flows within a large enterprise organization. Even if data quality is enforced at one layer, it can get mangled, mis-interpreted, misplaced, etc. as it moves into the next layer. The weakest link in the chain are those users running ad-hoc queries who are far removed from inside knowledge about the true meaning of the data.
- Data injested from client or other 3rd party sources into OLTP / application database.
- Data extracted from application database into operational data store / warehouse.
- Data extracted from ODS into 3rd party or proprietary reporting datamart (PeopleSoft, Infomatics, OLAP, etc.).
- Data extracted from datamart into MS Excel, SSRS, dashboard, ad-hoc queries, etc.
"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."