Data Warehouse Design - Issue with 1:M Dimension

  • Hi,

    I am building a Data Warehouse that will store report data from a 3rd party system. The system report data is extracted in an SSIS package by passing in a number of parameters. The parameters used could be any number from 3 to 27. One report can have many parameters. The same report could be run multiple times with different parameters used each time - depending on the report data detail required.

    I am looking at 2 DW design options for this. My main concern is how best to store the parameters used in each report run.

    In the first design below (Design A), I store the parameters as a Name/Value pair (DimParameter tbl). i.e. Stored as rows rather than columns. By doing this I have to create a Bridge table (well I think it's a Bridge) called DimReportRun. This is used to link DimParameter tbl with the Fact tbl (FactReportData). The other option here I think is to store all parameters in an XML column in the FactReportData tbl and so avoid the need for DimReportRun or DimParameter?

    Design A:

    My other option I believe, is to create DimParameter to store each parameter name as a column (Design B). This would mean 27 columns required for the parameter names.

    Design B:

    I have done a list of advantages/disadvantages of each Design (I imagine there are more to add). But is there a better way of designing this? I would be very grateful for any advice/suggestions.

    DesignA:

    DesignB:

    Thanks!

  • Kudos for the detailed information in this post!

    Before deciding on the best design, I'd recommend that we first consider how this data will be used by the end users (business). The main goal of the data warehouse is not to simplify (or complicate for that matter) the design, but rather let the business requirements drive your design. 

    Can you provide more info on that?

  • Unfortunately this has not yet been determined. We are looking to design and plan for all possible requirements.

    The DW data is likely to be used in Excel macros or in SSRS reports. All the 3rd party report data will be required, but it is also likely that we would need to display in the SSRS report what parameters were used to fetch the report data.

  • kevin.obrien 66193 - Friday, February 16, 2018 10:49 AM

    Unfortunately this has not yet been determined. We are looking to design and plan for all possible requirements.

    The DW data is likely to be used in Excel macros or in SSRS reports. All the 3rd party report data will be required, but it is also likely that we would need to display in the SSRS report what parameters were used to fetch the report data.

    From a reporting perspective, how do you anticipate the parameters to be used? Is it simply something people would want to see, i.e. the list of parameters and values used for a given report? Or would it be necessary to return let's say all reports that were executed with a particular value for a specific parameter? 

    Also, do you anticipate the parameter list to be similar for all reports...or would it vary significantly based on the report. What is the likelihood that the list of possible parameters will grow over time?

  • Martin Schoombee - Friday, February 16, 2018 10:58 AM

    From a reporting perspective, how do you anticipate the parameters to be used? Is it simply something people would want to see, i.e. the list of parameters and values used for a given report? Or would it be necessary to return let's say all reports that were executed with a particular value for a specific parameter? 

    Also, do you anticipate the parameter list to be similar for all reports...or would it vary significantly based on the report. What is the likelihood that the list of possible parameters will grow over time?

    One report could be run differently a number of times (i.e. with different parameters each time). It just depends on the required output. So then we want to ensure the end user can see what report data was run for what parameters.

    There are at the moment 6 different reports and some have similar parameter requirements. But others require different parameters.

    Hope this makes sense 😀

  • Somewhat 🙂

    The point I am trying to make is this: If I am more concerned about seeing report-level information (i.e. which report was run when and by whom), and then sometimes I'd like to see which report parameters were used...then I don't see the need to have report parameters listed individually in either design option. 

    If on the other hand you'd like to be able to analyze let's the frequency of reports executed with a specific value of a specific parameter (i.e. where the PeriodStart parameter is X), then you design would have to account for that and I'd probably go with something similar to Design B (given the fact that report parameters vary from report to report).

  • I'm uncertain whether you need to build a data warehouse or a database.  Based on what I'm seeing, you may want to assume that you're building a database that will support the running of multiple reports.

  • RonKyle - Friday, February 16, 2018 11:54 AM

    I'm uncertain whether you need to build a data warehouse or a database.  Based on what I'm seeing, you may want to assume that you're building a database that will support the running of multiple reports.

    This.

    Just in my own courosity, why exactly are you storing the params of a report and why is SSIS being passed it? I ask because you said "third party" as in this data is from some other business than your own versus tracking users who are running reports in your business. Seems pretty complex and the idea for a data warehouse is to be simplified. 

    Overall I guess it does sound like an database, not a data warehouse.

  • OK. We have decided we do not need to store all parameters in a separate table. Some parameters are required.
    In the below updated design, I have put parameters Fund, PeriodStart, PeriodEnd, PeriodName, AccCalendarName into the DimReportMetaData table. Then I have added a ReportQueryParams column that stores a text value of all parameters used (just for reference).

    I now have 5 Dimension tables and a Bridge table(DimReportMetadata).
    Then the Fact table(FactReportData) is related to the DimReportMetadata bridge tbl.

    Any thoughts on this design?
    In the DimReportMetadata bridge tbl we will likely have rows containing NULL values for PeriodNameKey and AccCalendarNameKey. Is this allowed in DW design?


    Thanks!

  • I would be more inclined to create a proper star-schema that looks something like this: 

    DimReport (ReportKey; ReportName; ...)
    DimFund (FundKey; FundName; ...)
    DimDate (DateKey; ...)
    DimUser (UserKey; UserName; ...)
    DimReportParameter (ReportParameterKey; ReportQueryParams; ...)

    FactReportExecution (ReportKey; FundKey; DateKey_PeriodStart; DateKey_PeriodEnd; UserKey; ReportParameterKey; Quantity; Amount; ReportRunDateTime)

    In my opinion the DimReportParameter table can either be a 1:1 with the fact table, or you could store unique values only as the relationship with the fact table will still make it easy to reference.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply