Business Intelligence or Data Warehouse

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/vRainardi/businessintelligenceordatawarehouse.asp

  • I agree !

    You could add the terms MOLAP when BI applications take data from cubes, and ROLAP when they take data from DDS. Or HOLAP when they take data from both.

     

  • I wish there were more articles like this, clearly delineating all of the differences between terms.  More often than not, we use incorrect terms to describe the IS business and in too many cases I find that the miscommunication sometimes wreaks havoc.  I thought this was a great article!


    Regards,

    Dave Doyle

  • Very Interesting!!!  Thanks a lot for making it that clear (drawings always help).

     

    Regards,

    aL.

  • Great job of explaining the distinctions between these terms and great use of examples and links.

    David 

  • A very nice clear presentation of an often (and sometimes deliberately) confusing set of terms.  I have one other suggestion, though.  The term business intelligence system is correctly defined in this article.  The term business intelligence applications is also correctly defined.  But business intelligence is something that either exists or doesn't exist in the brains of managers and executives in a business.  The systems and applications are only useful if they are serving someone who is willing to take the information delivered and use it to make intelligent decisions.

  • I got a compilation of definitions of data warehouse glossary by the experts some time ago.

    Business Intelligence - The capability to perform in-depth analysis and possibly data mining, of detailed business data, providing real and significant information to business users.  Business Intelligence usually makes use of tools designed to easily access data warehouse data.

    Data warehouse - A collection of integrated, subject-oriented databases designed to support the decision support system (DSS) function, where each unit of data is relevant to some moment in time.  The data warehouse contains atomic data and lightly summarized data.

  • It was a very good article, for sure, but I want to rain on everyone's parade by proposing a more generic definition of data warehouse. To me, a data warehouse is a holding area for data from multiple source systems, used for reporting purposes only. This isn't just pickayune. I created one of the first, if not the only clinical data warehouses; I found that the preconceptions caused by the more specific definitions really got in my way.

    First of all, for clinical data, there is no hierarchy. There is no drill down. Reports are output as SAS datasets. Each clinical trial is its own source system, with heterogeneously structured tables. We had 250 trials in production, and added one a week. Each trial had about 30-40 distinct tables. You do the math.

    I constantly was fighting with the DW consultant who tried to force things into the OLAP/BI framework. That was one of the main reasons for the project's failure.

  • Yes, this indeed is an excellent, detailed overview look at data warehousing.

    However, as others have suggested here, the data warehousing concept would not exist except for its value to the business and management. In that regard, it is imperative to remember that the central reason for DW at all is to provide pre-calculated and massaged data to management and the business. In this manner, all calculations for a particular count or equation can be managed and, hopefully, the entire enterprise will be looking at the same number calculated in the same way at a given time.

    As indicated in your diagram, there are usually two distinct types of data that can be reported on and which require entirely different processing methods. OLAP (cubes) can do an excellent job of calculating based on historical records kept in a cube structure but is not as efficient at state data. State data is not usually historical and benefits from the data warehouse's ability to summarize data across the enterprise, joining data from probably very disparate systems.

    Also, it is important for the enterprise to pay close attention to data warehouse metadata as depicted in your very accurate diagram. Metadata is the prime tool in the data warehouse for managing data quality. In this manner, all incorrect and inappropriate data can be automatically categorized and attended to by the data quality team through use of metadata reporting functions. Another aspect of this feature is the transparency of data quality. With proper data quality reporting, it should be patently obvious which data source is providing good data and which is not.

     

  • "hopefully, the entire enterprise will be looking at the same number calculated in the same way at a given time"

    Sounds great in theory, but it never works out in real life. I've found that the biggest difficulties with DW are emotional. When you bring data sources from different places, something is always wrong. Plus, you look under all the rocks and find really nasty stuff.

    Finally, there is never just one version of the truth. Again, sounds nice in theory, but there is a reason why organizations break down into silos.

  • Stephen, sounds to me that what you have built is not a datawarehouse, rather a collection of source systems (clinical trials) on a level playing field in terms of storage technology, for the purpose of spitting out reports which each relate to one specific trial.

    You say that clinical data has no hierarchy. Clinical data might not, but clinical infomation surely does. Information, as far as I am concerned, is when data is brought together and structured in a way that facilitates decision making. Data is just numbers. One number compared with another number (eg Target Weight Loss vs Actual Weight Loss, Revenue This Year vs Revenue Last Year, Blood sugar levels pre- vs post-trial) becomes information. This information could no doubt be classed in a hierachical fashion. Is there no one in your organisation that might want to analyse trial results of drugs relating to a particular bodily system, perhaps drilling down further to a particular organ, ultimately drilling down to specific disease/condition? Using a well structured datawarehouse, the user that wants to know this should be able to find it out pretty easily, without scouring through all the tables for each trial, which are all very much separate, despite residing perhaps in the same database.

    You say that the results are output as SAS datasets. This to me further highlights that you have designed a way to access very low-level data, and only low-level data. This is the key difference between your system and a datawarehouse. A DW/BI system may provide access this information if required, but will likely do so from a more top-down perspective. The executive user (or the poor sap preparing a report for your shareholders) might want to see 'Number of trials due for completion this year relating to heart disease', or 'Total Trials in Pipeline'. He/she may want to then find out what the trials are in the pipeline, then get further info on the trial itself if they want.

    I suspect that the failure of your project should not be blamed on the OLAP/BI framework, rather perhaps the DW consultant that did not correctly grasp the reporting needs of the target audience. They wanted reams of numbers, he wanted to give aggregated scorecards. A good consultant should have worked that part out very early on in the piece, thus being able to deliver what you wanted.

  • "Stephen, sounds to me that what you have built is not a datawarehouse, rather a collection of source systems (clinical trials) on a level playing field in terms of storage technology, for the purpose of spitting out reports which each relate to one specific trial."

    That is my point. It was indeed a DW, just not a DW/BI.

    "You say that clinical data has no hierarchy. Clinical data might not, but clinical infomation surely does."

    By my definition, data and information are exact synonyms. YMMV.

    "Information, as far as I am concerned, is when data is brought together and structured in a way that facilitates decision making."

    I would call that a report, or, more specifically, a report dataset.

    "Data is just numbers. One number compared with another number (eg Target Weight Loss vs Actual Weight Loss, Revenue This Year vs Revenue Last Year, Blood sugar levels pre- vs post-trial) becomes information. This information could no doubt be classed in a hierachical fashion. Is there no one in your organisation that might want to analyse trial results of drugs relating to a particular bodily system, perhaps drilling down further to a particular organ, ultimately drilling down to specific disease/condition?"

    Again, that is my point. I should have been more specific with the definition of hierarchy. Clinical subjects have clinical visits, where various data is collected. Some of the data belongs to the clinical subject, some to the clinical visit, so there's that hierarchy. However, there is no drill down hierarchy in clinical data. What you are talking about makes no sense. They do statistical analyses, which are very different from the slice and dice of financial or marketing data.

    "Using a well structured datawarehouse, the user that wants to know this should be able to find it out pretty easily, without scouring through all the tables for each trial, which are all very much separate, despite residing perhaps in the same database."

    The point of the clinical data warehouse was to enable reporting across trials. The current SAS reports did intra-trial reporting just fine.

    "You say that the results are output as SAS datasets. This to me further highlights that you have designed a way to access very low-level data, and only low-level data."

    Exactly. And I still hold that it is a datawarehouse, albeit w/o common BI functionality. Don't forget, you're not looking at BI-type information. You're looking at vital signs, alcohol history, arithmetic tests, creatinine levels, ECG's, etc.

    "This is the key difference between your system and a datawarehouse. A DW/BI system may provide access this information if required, but will likely do so from a more top-down perspective. The executive user (or the poor sap preparing a report for your shareholders) might want to see 'Number of trials due for completion this year relating to heart disease', or 'Total Trials in Pipeline'. He/she may want to then find out what the trials are in the pipeline, then get further info on the trial itself if they want."

    What you are talking about there is trial administration data, which does fit very well into the DW/BI model very well.

    "I suspect that the failure of your project should not be blamed on the OLAP/BI framework, rather perhaps the DW consultant that did not correctly grasp the reporting needs of the target audience. They wanted reams of numbers, he wanted to give aggregated scorecards. A good consultant should have worked that part out very early on in the piece, thus being able to deliver what you wanted."

    The DW didn't grasp the needs of the users, for sure, but the OLAP/BI framework was the wrong tool for the job.

    My point in writing this was to show people that the DW/BI is not the only way to warehouse data.

  • Article is very good.I have worked at the places were people don't even know they are seating on the Bussiness Intelligence system.

  • Building a data warehouse is not just building a couple tables.  It needs data analysis and data modeling. The modeling will help to find the hierarchy of the data.  For example, State -> County-> City -> zip.  The big thing on data warehouse is meta data.  It explains the data in the data warehouse, where they come from, how they get calculated and standardize the data.  That is another thing you need.

    There is a book called The data warehouse toolkit by Ralph Kimball.  It is very helpful.

  • Handling the metadata was the really big challenge for our DW. 250 trials, each with 30-40 distinct tables, adding a new trial a week, I had to develop a dynamic ETL tool that was fed by a metadata repository. Finally, the metadata of an individual trial could and would sometimes change in the middle of a trial.

    Unfortunately, Kimball was worse than no help at all. His methodologies assume certain things, like static metadata, a small number of source systems, etc.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply