Reporting from Data Warehouse

  • I recently attended a presentation from a few major companies in my local area where they talked about reporting solutions they were adapting. While most of this seemed to be a clear sales pitch for Tableau reporting, it raised some interesting questions from me as a learning database guy.

    I work primarily with a small business that deals with a lot of data. My role is mostly on the development end of databases and not that of a DBA. I look at these two roles completely separate with different objectives even though I also have to wear that hat due to the limited resources in my organization.

    One thing that struck me as odd was the act of using reporting tools (including statistical based tools like R/SAS) to query the data warehouse directly. I was wondering what may be the pro's and con's of doing so versus creating data marts for those interactions?

    I ask, because I was taught to always create a separate data source for reporting. That means, data marts or cubes are created with read-only access for all end user usage. That way those separate data sources can live anywhere, can be replenished & maintained easier as well provide an extra layer that does not directly impact another layer (i.e.: taxing on the DW ETL).

  • No thoughts on this?

  • Data Mars or OLAP cube usually serve a specific purpose: they provide pre-aggregated data in an (somewhat) easy to browse format.

    However, this does not mean you can't query your DWH directly.

    I have quite some reports in SSRS which report directly on the DWH database. Mainly because they are easier and quicker to create in SQL than in MDX. SQL Server has the capacity to deal with such queries: columnstore indexes, star join optimization and so on.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (8/26/2014)


    Data Mars or OLAP cube usually serve a specific purpose: they provide pre-aggregated data in an (somewhat) easy to browse format.

    However, this does not mean you can't query your DWH directly.

    I have quite some reports in SSRS which report directly on the DWH database. Mainly because they are easier and quicker to create in SQL than in MDX. SQL Server has the capacity to deal with such queries: columnstore indexes, star join optimization and so on.

    See, this is where I get hung up the most.

    Reporting that include a high amount of dimensions is a performance hit because of the large number of joins. It makes more sense to either preaggregate that data into a data mart within the same structure than to query the DW directly. Insert the purpose of OLAP cubes here from Microsoft.

    Then the question comes up, why not always consistently follow the same routine with marts and cubes for all reporting. Why query the DW at all, when you can remove a large amount of traffic from the DW and shove it towards a data mart or OLAP cube instead?

    Maybe I'm just overthinking everything. I just like layers. DW is a layer that is for storage and acts as a data source for data marts and OLAP cubes. Therefore, data marts and OLAP cubes are specifically for reporting that's updated daily from the DW.

  • xsevensinzx (8/26/2014)


    Koen Verbeeck (8/26/2014)


    Data Mars or OLAP cube usually serve a specific purpose: they provide pre-aggregated data in an (somewhat) easy to browse format.

    However, this does not mean you can't query your DWH directly.

    I have quite some reports in SSRS which report directly on the DWH database. Mainly because they are easier and quicker to create in SQL than in MDX. SQL Server has the capacity to deal with such queries: columnstore indexes, star join optimization and so on.

    See, this is where I get hung up the most.

    Reporting that include a high amount of dimensions is a performance hit because of the large number of joins. It makes more sense to either preaggregate that data into a data mart within the same structure than to query the DW directly. Insert the purpose of OLAP cubes here from Microsoft.

    Then the question comes up, why not always consistently follow the same routine with marts and cubes for all reporting. Why query the DW at all, when you can remove a large amount of traffic from the DW and shove it towards a data mart or OLAP cube instead?

    Maybe I'm just overthinking everything. I just like layers. DW is a layer that is for storage and acts as a data source for data marts and OLAP cubes. Therefore, data marts and OLAP cubes are specifically for reporting that's updated daily from the DW.

    OLAP is good for aggregated data, but doesn't perform well for more detailed queries.

    In my opinion, the DW is read-only (except during ETL of course), so it should be able to handle some traffic.

    I believe for canned reports, the choice between SQL or MDX (e.g. SQL Server vs SSAS) is a personal choice. Except if you need some functionality that one tool has and the other hasn't. For self-service browsing, SSAS is preferred.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (8/26/2014)


    OLAP is good for aggregated data, but doesn't perform well for more detailed queries.

    In my opinion, the DW is read-only (except during ETL of course), so it should be able to handle some traffic.

    I believe for canned reports, the choice between SQL or MDX (e.g. SQL Server vs SSAS) is a personal choice. Except if you need some functionality that one tool has and the other hasn't. For self-service browsing, SSAS is preferred.

    Yepyep, should always be read only outside of the ETL system. I think it's more or less how the DW is being used that makes me raise the question. Simplistic or even discovery queries are fine. But, when it comes to very complex queries that include a high amount of dimensions and facts that also include millions or sometimes hundreds-of-millions of records, the DW takes a pretty big hit.

    Due to that, I do everything I possibly can to massage and prep the data in batch after it comes into the DW to optimize those detailed queries. I guess it's more work for me, but my objective is to reduce the downtime for the end user when doing reports. I'm achieving that goal while also reducing the amount of traffic to the DW. So, it all works out in the end.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply