• Eric M Russell (11/7/2012)


    When it comes to data quality in large enterprise organizations, what's essential is that the rules be formally documented (before development begins).

    For example:

    - If Col1 = 'A' then Col2 must be 1, 2, or 3.

    - When Col3 is updated, then Col4 must also be updated with Date+Time EST.

    The documentation must be continuously revised post-production as the business rules change, be published so that any user who queries the database has access to them, and be endorsed and promoted by mangement.

    Here is how data typically flows within a large enterprise organization. Even if data quality is enforced at one layer, it can get mangled, mis-interpreted, misplaced, etc. as it moves into the next layer. The weakest link in the chain are those users running ad-hoc queries who are far removed from inside knowledge about the true meaning of the data.

    - Data injested from client or other 3rd party sources into OLTP / application database.

    - Data extracted from application database into operational data store / warehouse.

    - Data extracted from ODS into 3rd party or proprietary reporting datamart (PeopleSoft, Infomatics, OLAP, etc.).

    - Data extracted from datamart into MS Excel, SSRS, dashboard, ad-hoc queries, etc.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho