Business logic

  • With a typical DW/BI stack we have:

    Source system

    Data warehouse tables (3NF schema)

    Data mart (star schema)

    Cube (multi-dim)

    SSRS Reports

    Dashboards

    The Source system is locked. I'd prefer to keep extra business logic/calculations in the Data Warehouse / DataMart.

    This way the output from such items can be presented in reports/dashboards even if they use the DataMart or Cube as a source.

    Having said this, some calculations are better suited to the cube (mdx script), especially when a different grain of fact is required.

    With a large user deployment with a dynamic business there is plenty of change. When under pressure developers can be tempted store business logic/transforms in the report (with a named query). I feel this is a worse case as it can lead to duplication of logic and is a nightmare to maintain.

    I'm interested in what others have done to manage a deployment over the long term to keep maintenance to the minimum. I suppose part of this is setting expectation with the business that reports can't always be created the following day ! To enable a more measured approach in deciding where/how to implement changes to support reporting.

    The spread of the stack gives us great flexibility but this comes risk of creating a maintenance headache.

    How do you decide where to place the rules/calculations?

  • For consistency and ease of maintenance, you only want a calculation to be performed once. How far up in the stack a particular computed attribute is introduced depends on who or what needs it. For example, will end users ever be reporting directly against the star-schema or will they always access the DataMart via the presentation layer ?

    If you're unsure, then look to Kimball for inspiration.

    http://www.kimballgroup.com/category/articles-design-tips/

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • If you have a lot of calculations that return the same answer and a lot of people calling the same reports and being "this instant up-to-date" just isn't necessary, consider materializing pre-aggregated data, which would also only be pre-aggregated by one thing.

    I'll also say, to the chagrin of many, that if you need to do something with the data according to business rules, its many times the best thing to do it in the language of data without a break in the idea that you have 3 layers that should be mostly separate. However, wherever you put the business logic, it would be good to concentrate on two things...

    1. It should only exist in one place but still be able to be used in multiple places and ...

    2. Should anyone take exception to that notion for whatever reason, refer them to the extreme importance of #1 above. Use a bat to drive the point home, if necessary. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It really depends. Do you have a dedicated team per level or is this all on your plate?

    Personally, I try not to worry about the logic at the front-end of the stack. That's generally in the end users hands. The only time I worry about what's at the reporting level is if the other levels I do manage cannot support the levels above it. For example, if I have 100 different Excel sheets querying the data mart where the end users who are non-developers are creating Excel formulas to transform the data, there is absolutely no way I can manage that.

    However, if you have actual visualization/report developers who are making these reports for end users, then ideally, it's up to them to maintain those respective logic. The only time it would matter to me, who is maintaining levels closer to the data, is when what they are doing can no longer be supported by the data mart for example. When that happens, then that's when you revisit the business requirements of the data mart and see what you can do to support their needs without having to worry about everything else above the data mart and beyond.

    That's just what I follow simply because at some point, it is a lot to manage when it comes to the source data, the conforming of data to the data warehouse, the business process/logic when building out each data mart and eventually the cubes and front-end reports for whatever BI tools you are using to get the data in the end users hands. Thus, I focus more efforts on the source, data warehouse and data marts and try to offload the cubes and reports to someone else while trying to support those other teams the best I can with the data warehouse and data mart.

    Unfortunately, that being said, you cannot bake every possible logic in the data mart or cube. But, you can do your best to try and support those missed questions as they get discovered later on down the road. Other than that, try not to limit the end users by not allowing them to conform the data on the front-end. Let them have that freedom until you cannot support their requirements and then step in.

  • Thanks everyone for the excellent points.

    I'm thinking about this now as I manage a team of developers. I will definitely keep to DRY principles with the calculations at the lowest possible level. I know from experience that some things are more suitable to cubes/mdx but will keep these to the minimum. I will have to materialise aggregates in the DM as there are too many calculations for the MDX script IMO.

    It's great we have so much choice 🙂

    In terms of Excel calculations. The pace of change and flexibility analysts have with Excel is beyond that of the BI developers (day to day) but my view on this is to bring in logic into the DM that will benefit the wider corp level or save significant time. I think we have to do this otherwise the value of the DW will be eroded over time.

    Thanks again.

  • leehbi (10/21/2016)


    Thanks everyone for the excellent points.

    I'm thinking about this now as I manage a team of developers. I will definitely keep to DRY principles with the calculations at the lowest possible level. I know from experience that some things are more suitable to cubes/mdx but will keep these to the minimum. I will have to materialise aggregates in the DM as there are too many calculations for the MDX script IMO.

    It's great we have so much choice 🙂

    In terms of Excel calculations. The pace of change and flexibility analysts have with Excel is beyond that of the BI developers (day to day) but my view on this is to bring in logic into the DM that will benefit the wider corp level or save significant time. I think we have to do this otherwise the value of the DW will be eroded over time.

    Thanks again.

    I think of it as balance personally. Good balance does help the stack. I kind of given up trying to provide all the answers with the data. I just focus on the core ones and let the users help discover new ones. Then it's a decision on where it goes from there. That's the fun part! :hehe:

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply