Most companies have their own internal language for frequently used terms. There may be a broad general understanding of what these terms mean. To a data modeler tasked with applying their craft, it can soon become apparent that the definitions are shrouded in subtleties and nuances.
This phenomenon can extend outwards to terms used by 3rd party systems and consultancies. Terms such as “click”, “page view” and “customer” as concepts are broadly understood, but as precise terms, their definition is opaque.
Companies can bumble along for years with no clearly defined definition of the terms they use every day or the realisation that each of their disciplines are talking about something different.
This is a ticking time bomb. The realisation that this situation either exists and/or matters will occur during a time of crisis. All of a sudden figures used for making or justifying decisions are now objects of uncertainty. This uncertainty has the power to paralyse the decision making process. Instead of dealing with the crisis, people are distracted by a question of whether figures are valid or not.
Such a situation provokes an investigation into how terms are calculated, and what this unearths are a set of practices that would impress the finance minister of a banana republic. Especially when a term forms a measure on which a particular business discipline is remunerated. This makes any transition to an agreed and unified definition of its terms a lengthy and painful process.
The pain can be particularly acute where different departments across the organisation have been allowed to choose different vendors to supply the same conceptual data. Web Analytics is an extreme case in point. Even when vendors agree on a definition of “Click” or “Page View”, the approach to the way in which the underlying data is collected can be fundamentally different. The figures can be distorted further by the way a web page is assembled. If an organisation decides to correct a problem, such as having multiple web analytics tools, then they may decide to unify the figures in an attempt to form a means of comparing the tools. This is a fool’s errand as there are too many variables in the mix and if the figures reconcile in any time period then it is pure coincidence.
What does this mean for the data professional?
If we cannot pin down precise terms then this rather hamstrings our attempts at producing a data model aligned to the business process. Having produced a model the same constraints will hamper us when we try to optimise it.
Attempting to provide BI capability on top of a model based on disparate definitions means that we are likely to produce artefacts that reinforce the separateness of the different business disciplines rather than bring them together. Instead of a handful of artefacts to provide information on a customer we end up with armfuls to maintain and operate.
I am not sure what the answer to this situation is. Business users rarely have the enthusiasm to devote resource to fixing the problem and IT rarely has the clout to insist that it be done. Hopefully the new breed of CEO insisting that their companies should become data led will bring a sea change for us but I expect there to be a long hard slog ahead of us.