Seeking Expert Advice For Reporting

  • Greets!

    I'm an accidental DBA that's helping design a new BI solution for my current employer. Although this sounds kind of scary, I do have 7 years of software development experience behind me. The original intention for me was to simply manage the project, but have taken a heavier interest in helping design too. Thus, I had a few questions for the BI community.

    My current database is a well structured OLAP data mart in a snowflake schema. I'm dealing with about 1 GB of data a day that has to be imported into this data mart and be processed for reporting. The process for reporting is the area where my questions lie for this thread.

    I'm stuck between the best option for processing raw data into aggregated data for reporting.

    For example, let's say there is a request for a few canned reports to be developed that will be used on a daily basis that aggregate this data when it's available for the day. My solution for this is to develop a stored procedure that pulls all the data needed, aggregates it and dumps the results into a physical table. Then I create a new report for SSRS that queries that table for the data it needs to display to the end user.

    The reason I chose this solution is for a few reasons. The first reason is because the data source is in it's rawest form. There are many Fact tables with about many dimensions used for each query. Certain elements of that data need to be transformed before it ends up in the end users hand. Therefore, if I process the data before the end user queries it, I can make those transformations with an ETL process before aggregation is even done within a new schema just for reporting. The second reason is for speed. Due to the amount of data being aggregated, simply setting up an view or aggregating the data off the query does take some time. If I preprocess this aggregation over night, dump the results into a table, then selecting the result is lighting fast.

    That all said, is there a better way? Is this the right way? Am I being silly?

    I still feel good about this approach because I'm basically doing the same methods as an OLAP Cube. I define a cube, a bake the sucker (aggregate the data) and it's ready to be queried by the end user for reporting. I'm just not using the technology just yet. But, I want too. However, that's really part of my question too! :hehe:

    Current Technologies:

    SQL Server 2008 R2

    SSIS/SSAS/SSRS

    P.S for Forum Admins

    I posted this in the BI section. But, I didn't realize that was for SQL 2005 forums. Please remove that one.

  • Any reason you are not using SSAS for this?

    Aggregate fact tables can be useful, but make sure you are not creating a new table for every new report.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (4/7/2014)


    Any reason you are not using SSAS for this?

    Aggregate fact tables can be useful, but make sure you are not creating a new table for every new report.

    +1

    Although it is usually best to run SSAS on a separate server.

    You are trying to mimic a cube, yes, but think of the flexibility of the end user being able to mix and match measures, dimensions, and filters.

    You can build SSRS reports off a cube, just like the tables you are creating today.

  • Koen Verbeeck (4/7/2014)


    Any reason you are not using SSAS for this?

    Aggregate fact tables can be useful, but make sure you are not creating a new table for every new report.

    Thanks for the response.

    I didn't specify this in detail, but the data we are importing and aggregating is not controlled by us. It's delivered to us as a data mart in flat files every day. This includes fact and dimension flat files that our ETL system imports into our database.

    Due to the issue of not having control over this data, we are building the data source in it's raw untouched format. So, there is very little transformation happening as the data is loaded into the database.

    When that's all said and done, we want to transform the data based on the business (reporting) requirements, but do not want to touch the raw data. Therefore, we transform that data through aggregation of the data as we dump the result set into a new table. This clearly allows us to have untouched data in one schema and transformed reporting data in another schema. Not ideally a report per table, but report sources that are pre-aggregated and denormalized.

    On your question, the reason we have not used SSAS and specifically OLAP cubes is because we have little experience with them. We want too, but our aggregations are not just totalling up sales per region per quarter either. We have pretty intense aggregations with a lot of complex logic that would need to be converted in the use of SSAS from MY understanding. But, that's really the question:

    Is aggregating the data, dumping the results into a physical table a crazy option when there is SSAS available?

  • Greg Edwards-268690 (4/7/2014)


    Koen Verbeeck (4/7/2014)


    Any reason you are not using SSAS for this?

    Aggregate fact tables can be useful, but make sure you are not creating a new table for every new report.

    +1

    Although it is usually best to run SSAS on a separate server.

    You are trying to mimic a cube, yes, but think of the flexibility of the end user being able to mix and match measures, dimensions, and filters.

    You can build SSRS reports off a cube, just like the tables you are creating today.

    Oh, I know, but the reports we run are static. They never change in terms of dimension usage. It may also confuse the end user more than help if that's available too.

    BTW, how big can cubes be? How are they stored? Could I build a cube with 30 dimensions and 100 million records without breaking the system? Or are you guys suggesting small chunks of data in a cube?

  • xsevensinzx (4/7/2014)


    Koen Verbeeck (4/7/2014)


    Is aggregating the data, dumping the results into a physical table a crazy option when there is SSAS available?

    Depends. If you don't have much experience with SSAS and MDX and there is some really complex logic going on, staying in the relation database might be more cost effective from a development point of view. As long as the performance is acceptable using the aggregated tables, it's more of a personal choice.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • xsevensinzx (4/7/2014)


    BTW, how big can cubes be? How are they stored? Could I build a cube with 30 dimensions and 100 million records without breaking the system? Or are you guys suggesting small chunks of data in a cube?

    Cubes can get really big, but you need some experience handling them.

    SSAS should be able to handle 30 dimensions and 100 million rows easily.

    The biggest SSAS cube out there is a few terabytes big.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (4/7/2014)


    xsevensinzx (4/7/2014)


    Koen Verbeeck (4/7/2014)


    Is aggregating the data, dumping the results into a physical table a crazy option when there is SSAS available?

    Depends. If you don't have much experience with SSAS and MDX and there is some really complex logic going on, staying in the relation database might be more cost effective from a development point of view. As long as the performance is acceptable using the aggregated tables, it's more of a personal choice.

    Yeah, that's the fear. Aggregating the data using TSQL is easier and cost effective right now. I'm the accidental DBA here, but I am a programmer by nature. So, picking up MDX is not a huge issue, but will require more time.

    As the system is batch, there is plenty of time to process the data just as if a cube was being processed overnight too. But, the idea of a cube is so attractive too. When I asked one of my DBA friends, they said cubes were resource intensive, needed loads of time to populate (bake) and there were alternatives out there like columnar data stores. I assume he is referring to HP Vertica.

  • xsevensinzx (4/7/2014)


    Koen Verbeeck (4/7/2014)


    xsevensinzx (4/7/2014)


    Koen Verbeeck (4/7/2014)


    Is aggregating the data, dumping the results into a physical table a crazy option when there is SSAS available?

    Depends. If you don't have much experience with SSAS and MDX and there is some really complex logic going on, staying in the relation database might be more cost effective from a development point of view. As long as the performance is acceptable using the aggregated tables, it's more of a personal choice.

    Yeah, that's the fear. Aggregating the data using TSQL is easier and cost effective right now. I'm the accidental DBA here, but I am a programmer by nature. So, picking up MDX is not a huge issue, but will require more time.

    As the system is batch, there is plenty of time to process the data just as if a cube was being processed overnight too. But, the idea of a cube is so attractive too. When I asked one of my DBA friends, they said cubes were resource intensive, needed loads of time to populate (bake) and there were alternatives out there like columnar data stores. I assume he is referring to HP Vertica.

    SQL Server has columnstore indexes as well.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Yeah, I saw with 2012. Currently on 2008 R2.

  • We ran a 50 GB cube, 500 + measures, and over 300 dimensions, on a 32 GB server.

    Largest fact table was 25M + records, but as you can see, very rich with data.

    Processing time can depend on several factors.

    Ours with a full process was 2.5 hours.

    Column Store and Traditional SSAS have some differences that may be important.

    Security is one. MDX or DAX is another.

    You can also look at splitting some of the logic for complex measures. Some in ETL, the rest in the cube.

    Yes, there is a learning curve to SSAS.

    You may also want to take a check with your users and what they are doing with the data.

    Sometimes they are doing much more than you imagine.

  • Greg Edwards-268690 (4/7/2014)


    We ran a 50 GB cube, 500 + measures, and over 300 dimensions, on a 32 GB server.

    Largest fact table was 25M + records, but as you can see, very rich with data.

    Processing time can depend on several factors.

    Ours with a full process was 2.5 hours.

    Column Store and Traditional SSAS have some differences that may be important.

    Security is one. MDX or DAX is another.

    You can also look at splitting some of the logic for complex measures. Some in ETL, the rest in the cube.

    Yes, there is a learning curve to SSAS.

    You may also want to take a check with your users and what they are doing with the data.

    Sometimes they are doing much more than you imagine.

    Awesome, thanks for the stats. Those are very attractable times to process. That's similar to what I'm doing now with my system utilizing all stored procedures. The cube however, would provide much more than I could though. That's why I ask. 🙂

    The users however are doing very complex operations with the raw data right now. The reason I'm doing what I'm doing is so we can offload some of those heavy aggregations to the server as opposed to relying on the team to crunch the numbers. The more the backend can do, the more time the team can have to analyze the insights. That's the goal.

  • One of the better reasons to crunch numbers in your process is that the logic is consistent.

    Nothing worse than someone from above asks a question, and then gets multiple different answers.

    Besides the confusion, multiple people are doing redundant work.

    Better to work together defining the source.

    I remember the first cube we processed with SQL 2000.

    We couldn't believe it could finish that much quicker than our procs - similar to what you are doing.

    Not to say you won't have some challenges - but you should find it well worth the effort.

  • Greg Edwards-268690 (4/7/2014)


    One of the better reasons to crunch numbers in your process is that the logic is consistent.

    Nothing worse than someone from above asks a question, and then gets multiple different answers.

    Besides the confusion, multiple people are doing redundant work.

    Better to work together defining the source.

    I remember the first cube we processed with SQL 2000.

    We couldn't believe it could finish that much quicker than our procs - similar to what you are doing.

    Not to say you won't have some challenges - but you should find it well worth the effort.

    Oh, I wouldn't mind moving over to a cube. It's just that, the cube is effectively a new data source for reporting. This means more development work making a cube, making new reports and so forth. The previous design was simple. How are you reporting on this data? We can extract the data, transform the data based on business logic, aggregate the data and then load the data into a physical table for reporting (ETAL baby).

    Simple, easy, done. Just select the data, it's prebaked.

    If we owned the data source, I could effectively transform that data before it even hits the cube because ideally a data warehouse would exist by then. But, those steps of creating a warehouse has not happened yet. I must keep the data raw and transform it elsewhere in case of fire and then feed into a cube. However, most of the logic applies to dimension only, so duplicating the dimensions (small data footprint) may be a good option to feed into a Cube too if not just define views in the Cube as well.

  • xsevensinzx (4/7/2014)


    Greg Edwards-268690 (4/7/2014)


    One of the better reasons to crunch numbers in your process is that the logic is consistent.

    Nothing worse than someone from above asks a question, and then gets multiple different answers.

    Besides the confusion, multiple people are doing redundant work.

    Better to work together defining the source.

    I remember the first cube we processed with SQL 2000.

    We couldn't believe it could finish that much quicker than our procs - similar to what you are doing.

    Not to say you won't have some challenges - but you should find it well worth the effort.

    Oh, I wouldn't mind moving over to a cube. It's just that, the cube is effectively a new data source for reporting. This means more development work making a cube, making new reports and so forth. The previous design was simple. How are you reporting on this data? We can extract the data, transform the data based on business logic, aggregate the data and then load the data into a physical table for reporting (ETAL baby).

    Simple, easy, done. Just select the data, it's prebaked.

    If we owned the data source, I could effectively transform that data before it even hits the cube because ideally a data warehouse would exist by then. But, those steps of creating a warehouse has not happened yet. I must keep the data raw and transform it elsewhere in case of fire and then feed into a cube. However, most of the logic applies to dimension only, so duplicating the dimensions (small data footprint) may be a good option to feed into a Cube too if not just define views in the Cube as well.

    We had a mature DW.

    Reporting -SSRS, ProClarity, Excel, SharePoint/PerformancePoint. So users had everything from starting points to explore, to canned reports, to Ad Hoc capability. So lots of options for the users.

    One main cube, with perspectives for over a dozen Roles.

    So it could be daunting to those who had all Roles.

    Balance the work done today by both you and the users vs. adding SSAS.

    Many times the big hole is no one wants to add IT resources, but the work users are doing on the side more than makes up for it.

    Especially if a couple of the power users become part of the solution.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply