"The canny BI specialist will spend a lot of time verifying and validating the raw data before emerging with the seductive visualizations."
I have a tale of woe from a few years back, the moral of which was the opposite.
I was contracting for a UK police service and was tasked with writing a system that would extract, consolidate, and report on Incident, Crime, Arrest, and police HR data. In total the data was gathered from 5 separate databases, and as many again Excel sheets kept by various departments. A colleague and I spent weeks analysing, writing cleansing routines, interpreting, structuring... and finally writing the code to load the fruits of our labours into a clean star schema ready for reporting. We did all of this under high time pressure, and with limited budget (looking back, a worrying amout of the system was written in Access VBA as that was the most reliable tool we had to hand, only at the end of the project did we twist enough arms to get SQL Server 2005). We also had little help from the coppers themselves as (obviously) they were busy with other things. We had to collect and document as many acronyms and cop-phrases as we could to understand the data that had been entered by frontline officers. Data dictionaries were rare in policing circles back then, but with some diligence we tracked down standard Home Office and ACPO codes and definitions in order to standardise our reporting and make it intelligible outside the section we were working in.
A huge amount of time spent verifying and validating raw data!
A couple of months after our first reports were made available and had wow-ed middle ranking officers, a new reporting suite wass announced by Force HQ that had been developed by an external company. They had web pages with snazzy graphics, charts, speedos, and various other dancing-bear-ware... but their data was crap! Amoung other things it was inconsistent within itself, missed certain groups of Crime category, mis-stated the number of officers per Sector, and didn't cope with common anomolies (such as officers booking on and off duty more than once within a 24hr period). It also worked to a working day of 8am to 8pm rather than 24hrs as police do.
You'd have thought with such glaring data errors that senior and mid-ranking officers would laugh it out of town. After all, what decisions can they make with duff data??
Did they hate it?
Did they f*ck! They *loved* it! All the pretty 3D pie-charts and colour-coded tables left them drooling! And, what's worse, the mid-rankers on the Area I was contracting for kept asking why the data from our system didn't agree with the data from FHQ. It took a lot of explaining as to why their data sucked (time which we then couldn't spend developing swish reports of our own (we'd only just discovered SSRS back in 2006, prior to that all reporting was done in Excel))
To convince senior officers of the superior quality of our data over the New Toy data took nearly as many weeks as we'd spent developing the system in the first place. When they were finally agreed and had challenged the external company with the flaws in their data they thought up an excellent solution: they would tap into the database we had painstakingly created and use the data from there! Ta-da! Oh, and seeing as the data part of the project was already completed and that they wouldn't need the SSRS reports we weere busy working on, they also convinced senior officers that they no longer needed our input on the project. We were "freed up" to work on other projects.
"Graphs and charts are dangerous in the wrong hands, and if built on data that is carelessly gathered will mislead as often as they lead."
I couldn't agree more. However they also bamboozle, and if it *looks* pretty then there is a natural conclusion that they *must* be right.
It was a painful lesson to learn.
The canny BI specialist will spend *as much* time creating seductive visualisations as they do validating and verifying raw data. Otherwise your hard work ends up in File 13.