A Tale of Broken BI

  • Comments posted to this topic are about the item A Tale of Broken BI

    Best wishes,
    Phil Factor

  • I had a similiar experience.

    A little explaination. In some states there are regulated electric suppliers (big utilities) where energy rates are set by the a state commission (not just the energy but also transmission and distribution costs). These states allow unregulated electric suppliers to offer energy (electricity) on different plans (some bargains, some a trap). The regulated utility still recouped the distribution costs and read the meter.

    I worked for one of the unregulated electric suppliers. Started out in one market and expanded.

    Now to get to the non-boring part I hope. The regulated utility would send us the meter readings in EDI in a "Standard" format. The company I worked at developed a forecasting method based on the EDI that came from regulated utility. Worked pretty well.

    But as they expanded markets they interacted with more regulated utilities. The forecasting there stunk.

    When I was contracted it was figure out what the heck is going on.

    Turns out the EDI messages were standard, but it was up to each utility to interpret what went where. Given the regulated utilities were not especially fond of the unregulated suppliers they all sent the data out differently.

    Moral of the story - the data is there, you just have to find it.

    Off topic - Bad data

    One of the utilities had one customer had used 1 gigawatts of data for 6 straight days. Right - we had to put data filtering in for instances like plus under reporting of data.

    I'm glad to say I did manage to get all data cleanup, EDI processing and forecasting done while I worked myself out of a job.

    Phil, I read your "Confessions of an IT Manager" e-Book last week. I felt so guilty about you not getting compensated I made a donation in your name. The establishment was so happy they gave me a pint of Sierra Nevada :-D.

  • RIP Mars Climate Orbiter(formerly the Mars Surveyor '98 Orbiter)

    The metric mixup error

    The MCO MIB has determined that the root cause for the loss of the MCO spacecraft was the failure to use metric units in the coding of a ground software file, “Small Forces,” used in trajectory models. Specifically, thruster performance data in Imperial units instead of metric units was used in the software application code titled SM_FORCES (small forces). The output from the SM_FORCES application code as required by a MSOP Project Software Interface Specification (SIS) was to be in metric units of newton-seconds (N-s). Instead, the data was reported in Imperial units of pound-seconds (lbf-s). The Angular Momentum Desaturation (AMD) file contained the output data from the SM_FORCES software. The SIS, which was not followed, defines both the format and units of the AMD file generated by ground-based computers. Subsequent processing of the data from AMD file by the navigation software algorithm therefore, underestimated the effect on the spacecraft trajectory by a factor of 4.45, which is the required conversion factor from force in pounds to newtons. An erroneous trajectory was computed using this incorrect data.

    — Mars Climate Orbiter Mishap Investigation Phase I Report[/b]

  • "Acceleration, for example, is the time it takes to reach a particular speed, which in turn is the distance that would be covered in that time interval if the car travelled at the same rate for that time interval."

    Blimey, I hope your science teacher is not reading this.

  • Oops I don't even remember what I meant to write

    Best wishes,
    Phil Factor

  • They asked me to develop a BI solution for a mid sized health care organization. To be more specific: they (the management of the organization) asked me to implement a dashboard (with all those fancy traffic lights, gauges and graphs).

    It turned out that cleansing and combining the data from three different sources (without any master data management) was already quite a challenge. Dealing with invalid measures and references, badly formatted names and reuse of existing codes in such a way that it could be maintained by their own IT department was a very time consuming preparation phase so easily overlooked by those managers. By now we are ready to put some useful data into a cube that might replace some spreadsheets that are still filled by hand every month by some financial guru.

    We have asked them to specify what they want to see on their dashboard. Of coarse they know what they want to see: productivity, costs, number of clients, the usual stuff. We decided now is the time to ask them what they want to see exactly, where, how and about whom, to specify limits for the traffic lights, ranges for the gauges. As you might have expected, they wonder why we can't figure that out, because most of them do not even have the slightest idea.

    Getting clean data gets you only halfway. By the time you have managed to keep the garbage out of your BI solution, you know more about the sources than you ever wanted to know, and probably even more than most of their own DBA's. And then the managers expect you to tell them what they actualy want to see on their dashboards. Good luck!

  • george-975422 (10/27/2012)


    I had a similiar experience.

    A little explaination. In some states there are regulated electric suppliers (big utilities) where energy rates are set by the a state commission (not just the energy but also transmission and distribution costs). These states allow unregulated electric suppliers to offer energy (electricity) on different plans (some bargains, some a trap). The regulated utility still recouped the distribution costs and read the meter.

    I worked for one of the unregulated electric suppliers. Started out in one market and expanded.

    I work for a similar type of company. It amazes me that the different electric areas can't even tell time the same way. One reports usage based on hour ending, another reports based on hour beginning and a third reports in 15 minute increments.



    The opinions expressed herein are strictly personal and do not necessarily reflect the views or policies of my employer.

  • "The canny BI specialist will spend a lot of time verifying and validating the raw data before emerging with the seductive visualizations."

    I have a tale of woe from a few years back, the moral of which was the opposite.

    I was contracting for a UK police service and was tasked with writing a system that would extract, consolidate, and report on Incident, Crime, Arrest, and police HR data. In total the data was gathered from 5 separate databases, and as many again Excel sheets kept by various departments. A colleague and I spent weeks analysing, writing cleansing routines, interpreting, structuring... and finally writing the code to load the fruits of our labours into a clean star schema ready for reporting. We did all of this under high time pressure, and with limited budget (looking back, a worrying amout of the system was written in Access VBA as that was the most reliable tool we had to hand, only at the end of the project did we twist enough arms to get SQL Server 2005). We also had little help from the coppers themselves as (obviously) they were busy with other things. We had to collect and document as many acronyms and cop-phrases as we could to understand the data that had been entered by frontline officers. Data dictionaries were rare in policing circles back then, but with some diligence we tracked down standard Home Office and ACPO codes and definitions in order to standardise our reporting and make it intelligible outside the section we were working in.

    A huge amount of time spent verifying and validating raw data!

    A couple of months after our first reports were made available and had wow-ed middle ranking officers, a new reporting suite wass announced by Force HQ that had been developed by an external company. They had web pages with snazzy graphics, charts, speedos, and various other dancing-bear-ware... but their data was crap! Amoung other things it was inconsistent within itself, missed certain groups of Crime category, mis-stated the number of officers per Sector, and didn't cope with common anomolies (such as officers booking on and off duty more than once within a 24hr period). It also worked to a working day of 8am to 8pm rather than 24hrs as police do.

    You'd have thought with such glaring data errors that senior and mid-ranking officers would laugh it out of town. After all, what decisions can they make with duff data??

    Did they hate it?

    Did they f*ck! They *loved* it! All the pretty 3D pie-charts and colour-coded tables left them drooling! And, what's worse, the mid-rankers on the Area I was contracting for kept asking why the data from our system didn't agree with the data from FHQ. It took a lot of explaining as to why their data sucked (time which we then couldn't spend developing swish reports of our own (we'd only just discovered SSRS back in 2006, prior to that all reporting was done in Excel))

    To convince senior officers of the superior quality of our data over the New Toy data took nearly as many weeks as we'd spent developing the system in the first place. When they were finally agreed and had challenged the external company with the flaws in their data they thought up an excellent solution: they would tap into the database we had painstakingly created and use the data from there! Ta-da! Oh, and seeing as the data part of the project was already completed and that they wouldn't need the SSRS reports we weere busy working on, they also convinced senior officers that they no longer needed our input on the project. We were "freed up" to work on other projects.

    "Graphs and charts are dangerous in the wrong hands, and if built on data that is carelessly gathered will mislead as often as they lead."

    I couldn't agree more. However they also bamboozle, and if it *looks* pretty then there is a natural conclusion that they *must* be right.

    It was a painful lesson to learn.

    The canny BI specialist will spend *as much* time creating seductive visualisations as they do validating and verifying raw data. Otherwise your hard work ends up in File 13.

  • @john-2 Neville

    Oh so true. Yes, a similar thing happened to me once with a different company. In my case, the rival system used an olap cube whereas mine used more static reports based on Excel- but the graphs were excellent. My data was correct whereas theirs wasn't. The business chose the cubes, dammit. I should have mentioned this danger, but an editorial has to be short and can't be that comprehensive.

    Best wishes,
    Phil Factor

  • Would it be fair to say that in nearly all the examples here, the “raw” data simply wasn’t raw enough? As soon as you’re pulling data into your ETL system where you have to assume that a column is derived from say [(column A minus Column B) divided by 2pi] instead of just pulling in column A and B and doing the math yourself, you’re increasing the chance of inheriting mistakes beyond your control.

    @vliet I feel your pain. It’s been my experience that big Health Organisations are often run by teams of highly intelligent, persuasive well educated people with massive egos. They often ascend to positions of seniority through being excellent nurses or paediatricians or [insert whatever clinical specialty here]

    Because they’ve been good at what they did to get where they are, they fall into the trap of thinking that they know best about everything (especially BI architecture!). So they’ll come to the BI team not with a business need, but with a request for what they see as the solution to a business need. It’s the same as if I went to the doctor and said “I want you to resect half of my liver.” instead of “I have some right upper quadrant pain, can you help me?”

    ...One of the symptoms of an approaching nervous breakdown is the belief that ones work is terribly important.... Bertrand Russell

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply