Data Quality - Addressing non-stated requirements


My first encounter with a professional software tester was a bruising encouter. As someone who takes a great deal of pride in what I produce I was initially shocked but on reflection greatly impressed by what they discovered. The bit that struck home for me was that a large part of what they discovered were sins of omission. I had fulfilled the requirement as supplied to me but not put sufficient consideration into what my colleagues today call the NSRs or Non-Stated Requirements.

When I started using TDD (Test Driven Development) the discipline also uncovered some embarrassing sins of omission.   

In my years designing and developing databases I cannot recall many explicit data quality requirements. There was an unstated assumption that a data person would “just know” what data quality rules were needed.  If I as a data specialist didn’t know what the data quality rules were for a particular item then how could anyone expect a non-data specialist to fare better?

Taking a UK postcode as an example I know that it will conform to one of the following alphanumeric patterns: -

  • A9 9AA
  • A99 9AA
  • AA9 9AA
  • AA99 9AA
  • A9A 9AA
  • AA9A 9AA

However there are rules I was not aware of until fairly recently

  • The first character can never be Q, V or X
  • The second character can never be J
  • The second half of the postcode would not include the letters C, I, K, M, O or V.

What I would have found useful both for myself, and my colleagues in development, was a crib sheet for the business data types along the lines of the (greatly abbreviated) example I've put together below.

I have split this sheet into three tiers, each of which is described below.:

  1. Questions we ask regardless of the data type
  2. Questions we ask having established the data type
  3. Questions we ask for the specific attribute

Such a crib sheet becomes more useful when it is shared and used to discuss data with technical and non-technical colleagues. For example, an avid Dr Who fan revealled that UK 07 mobile numbers have a specific sub-range reserved for drama and TV.  She had found this out when one episode listed The Doctor's mobile number.  It might not sound relevant but that number turned out to be relevant in fraud detection! 

Another business colleague asked for help with a project to identify and reduce the breakage rates in deliveries.  He might not have known much about IT systems but he knew a fantastic amount about the shipping cube and weight calculations and how to spot a suspicious error.  

These two colleagues reinforce the point that although a data rules crib sheet is something an experienced data person can come up with quite quickly it gains real value when it is an evolving artefact to which technical and non-technical colleagues share ownership, contribute to and correct errors and omissions.

When the rules from a data quality crib sheet are implemented then in addition to driving up data quality they also drive a shared understanding of business data rules and trust in the data for when the data is telling us something we would rather not hear.

For those of you who recognise the importance of a data dictionary but have struggled to get it off the ground a data qualtiy crib sheet can be a useful first step.


4.8 (15)




4.8 (15)