Data Warehouse Development: Version 0

  • Comments posted to this topic are about the item Data Warehouse Development: Version 0

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • An interesting article for a Data Warehouse newb. I'm very much looking forward to reading the follow up articles in this series. Thanks. 🙂

  • Andy,

    An intersting article and of great interest having being a DW developer for the last 8/9 years, and I would always describe version 0 as a Proof of Concept that shows the CxO's what they can get if they invest in the development of a DW.

    However in my experience most CxO's dont really know what a data warehouse is, let alone what they want from one, execpt the obligatory 'every thing we currently have, and more'. They also have a tendancy to think that there is nothing wrong with thier data, and so when you produce a single report that doesnt match expectations, its your problem.

    Even when you explain that the data is correct, and its different because Annie in accounting manually combines Figures X and Y in a spread sheet so that it looks better.

    Often the biggest issue I come across is that there is a disconnect between the requirements of actual users and the CxO's and other senior managers. Who are generally only interested in pretty graphs, and the visualisation part, were as those lower down the chain (departmental heads, analysts etc) are interested in being able to dive around the data and look at the anomolies.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Thanks for posting this article. I am new to DW project and in learning stage. I think the series of this article takes me in the right path.

    Please suggest if any of you have posted anyother DW article to start with.

  • Great article. But I'd take it one step further. While automating a manually produced report that takes days if not weeks to produce might be the ultimate goal, a report could consist of information captured in multiple business processes from multiple systems. The danger of letting a report or a small set of reports guide your requirements is that you run the risk of creating a series of spreadsheet marts that each contain data that cannot be easily integrated. While gathering requirements, data sources need to be classified into two categories, business actor (dimensions) or business process (facts). And the Kimball data bus matrix (or SSAS cube editor Dimension Usage tab) needs to be the final product of the requirement gathering stage. From those results, the data warehouse can be constructed iteratively and incrementally, one source, one business actor, one business process at a time. And as the business changes, so will the data bus matrix.

  • This is an excellent example of pressing the right buttons. You bring up a fantastic point about the 'environment' of relatioinships and getting to know WHO and WHERE to start. By starting at the 'C' level, you maximize your ability to obtain management buy in. Decision makers will push the project on the business side. To add to your intro article, it is important to understand the business entities and focus on one at a time. Break your staff up so each is point for a different business unit. This helps build rapport with users by putting a familiar face to the data warehouse. Then cross train your developers to insure they can support one another. Finally, request subject matter experts from the business units. This allows your point developer to be part of a cross functional team and insures an actionable end product from the warehouse. Great intro article!

  • Interesting article Andy, thanks! Interesting in the sense that the organisation I work for is currently looking at developing a BI solution, and the approach that's being taken by those leading the development is contrary to the approach I think should be taken (which I think is more in line with your recommendations).

    Their approach seems to be to take what's currently in place, and simply use SSIS to transfer the data from its various sources in to a data mart of some description (the schema of which is more or less being defined by the customer).

    I'm trying to encourage them to look at this from the end user perspective, and try and establish what it is the user wants to see at the end of this process in their reports and what they want to be able to query. Then those with the experience and knowledge of data warehouse solutions can model the BI solution around that. Unfortunately, the concept of using SSAS or SSRS to support the overall requirement appears to be lost on them.

    Someone build me another brick wall to hit my head against please - I've already knocked down the last one!

  • This is a great method to get started, but make sure you are answering a relevant question. It may not always be "What do you use to measure how your business is doing?". If you're working with a CIO and the CIO wants to improve change management processes, the question might be something like "What runs where?". In that case, you would dig through the spreadsheets to find which applications run on which servers and so on. Being able to easily answer a relevant question is what makes the DW valuable to management.

  • There are a series of canned questions that should always be asked upfront, and these should be documented in the requirements. However, I also find it essential to ask the business to provide a mockup of the end product. Let them actually draw a picuture (preferably in Excel with real numbers that add up) of what the report or dashboard screen fed from the datawarehouse should look like. This Excel sheet is something that should be passed around to all the stake holders and end users so they have an opportunity to input. As a reality check, compare that mockup back to the original requirements document and you'll be amazed how many critical holes it fills, especially if the requirements were written by a 3rd party or were pre-written before you were hired on the project.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Thanks, Andy!

    This is a topic that has come to the forefront for me lately, specifically, we have an operational database, and we want to get to the point of being able to use cubes. The missing piece for us is a data warehouse.

    I already have a degree of familiarity with development methodologies, SQL Server BI stack, our institutional culture with regard to data, etc. I really like your approach, and look forward to the rest of this series.

    I am specifically interested on any tips for getting our operational data into the needed fact/dimension format.

    Thanks!

    Dan

  • I think the first assertion needs a slight modification; a first version DW should produce one reporting subject area, not just one report. From there, it's a question of how many reports are needed, which may be only one but could be several.

    So the question to the CxO becomes 'About what subject in our business do you need better information?' Typical answers include 'sales', 'inventory', 'customers'. Then we can ask what in particular about that subject is needed (sales across systems, inventory across systems, customers across time) as a genesis for some potential reports (plural) from that starter DW.

    The single report question can set wild expectations and backfire into a monster project request that has to be scaled back, which is why I'm wary of it.

    Asking up front for potentially anything sets a high first expectation that may need dramatic scaling down. Let's say the answer from CxO is "I really need a combined sales report from systems A, B, C, D, E and F". CxO smiles, having an instant vision of such a report. Good luck bashing that out in a few weeks, so right off the bat DW person has to tell CxO we can only do systems A and B to start. CxO frowns at the delay.

    Now imagine asking about what subject first. CxO says, "I need better insight into total sales because I have to look at 5 reports in different formats." CxO frowns at thinking of the hassle. DW person: "I can make a version 1 DW for you in a month that has a unified sales report from systems A and B and we can add the other systems every month or two after that depending on their complexity." CxO smiles.

    Meanwhile, a subject area DW should be able to immediately serve other needs. If the one report method is strictly followed then some small incremental data Y that is involved in the subject area in general but not needed for the report would be excluded. Then when VP1 hears about CxO's upcoming new report and says, "I need exactly that with Y added to it" DW person can either say 'we can put that in the report writers' queue because the data is already on the way' (VP and CxO smile) or 'I'll have to see about adding that in" (VP frowns at the delay).

    The subject area method can take 10% to 20% more time but the ability to provide N reports instead of 1 provides N*100% more worth and visibility.

  • magarity kerns (4/7/2011)


    I think the first assertion needs a slight modification; a first version DW should produce one reporting subject area, not just one report. From there, it's a question of how many reports are needed, which may be only one but could be several.

    I totally agree with you. Start with one subject area, not one report. If you begin with one subject area and use the Kimball four step process, (1) identify the business process, (2) explicitly state the grain or level of detail required, (3) identify the dimensions or business actors acting in that process, and (4) identify the metrics used in the analysis of that business process, you're well on your way to success. Plus, your building the blocks for additional subject areas to be integrated by sharing conformed dimensions. If you're guided by a report, you're much more likely to build a data silo that will be much more difficult to extend and maintain.

  • Martin Mason (4/7/2011)


    magarity kerns (4/7/2011)


    I think the first assertion needs a slight modification; a first version DW should produce one reporting subject area, not just one report. From there, it's a question of how many reports are needed, which may be only one but could be several.

    I totally agree with you. Start with one subject area, not one report. If you begin with one subject area and use the Kimball four step process, (1) identify the business process, (2) explicitly state the grain or level of detail required, (3) identify the dimensions or business actors acting in that process, and (4) identify the metrics used in the analysis of that business process, you're well on your way to success. Plus, your building the blocks for additional subject areas to be integrated by sharing conformed dimensions. If you're guided by a report, you're much more likely to build a data silo that will be much more difficult to extend and maintain.

    The problem is that in most cases the business doesn't start out by saying "we need a datawarehouse, so let's put our heads together and figure out the best way to go about doing that"

    The initital build of the datawarehouse typically stems from a request coming from uppermost management for an enterprise level report or dashboard that shows them X, Y, Z and the IT department just sort of figures out at some point (hopefully before the first deployment) that a datawarehouse of some type would be essential.

    Not unless they bring in an outside consultant, or at least one person on the project is up to speed on database architecture, would the initial phase of development ever involve a discussion of Kimball, Inmon, OLAP cubes, or even data replication.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (4/7/2011)


    The initital build of the datawarehouse typically stems from a request coming from uppermost management for an enterprise level report or dashboard that shows them X, Y, Z and the IT department just sort of figures out at some point (hopefully before the first deployment) that a datawarehouse of some type would be essential.

    Not unless they bring in an outside consultant, or at least one person on the project is up to speed on database architecture, would the initial phase of development ever involve a discussion of Kimball, Inmon, OLAP cubes, or even data replication.

    Hi Eric,

    I concur. I've seen customers purchase a reporting solution - usually an expensive one at that - and believe they have purchased a business intelligence solution. My guess is they spoke with a salesperson first, and then no one else after. If a business wants to implement a successful data warehouse, I believe they should speak to Data Warehouse or Business Intelligence Architect first. A good BI/DW architect is going to meet with the business people and the IT Department.

    Magarity and Martin:

    I mentioned Craig Utley in the article and his deference to "boiling the ocean." I agree with him and resist any attempts to increase the scope of Version 0. I've seen the larger projects fail many times. This Version 0 approach hasn't failed yet.

    :{>

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • Most of the responses here are about what I believe Andy would see as the Version 1. But he talks about the step BEFORE diving in too deep into this. His approach is an exercise about getting to know the business. And using a single report as a guide/goal to help focusing the people you talk to. This one report will probably be thrown away when version 1 comes along which includes the whole subject area and allows for obtaining the same report again (but implemented in a more solid way).

    The big thing for version 0 is to get to know the environment in which the data warehouse is going to operate (where are the original data-sources, how reliable is the data-input, who is fudging data on the way to the top, who will be obstructing any change, who will sponsor the project, etc, etc). Delivering the actual version 0 report is just a bonus to show your merits to management, but the real deliverable will be YOUR knowledge about the business inner workings, which will increase your chances of success for version 1 many-fold.

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply