SSAS ROLAP to work to query low level data for large data sets with MDX?

  • svenflock

    Old Hand

    Points: 353

    Sorry, this is strange, this is the third post, my posts are not visible. Let's give it another try. Has somebody else those issues after editing the original post?

     

    Hi all,

    first of all, I'm new to the community of sqlservercentral.com and I have looked through several forum posts before I decided to post my own topic :-).

    I am a new employee at a research facility and responsible for all data related topics. I don't do research but I am the engineer for the new to come data platform to provide all requirements of the researcher. The most important requirements are

    • self service (pick the fields I am interested it without having knowledge of the underlying data structure)
    • in almost all cases data is always required on the lowest level (meaning that the researchers don't work with aggregates but want to make statistics on the previous picked data.
    • Queried data sets have about 50 - 200 million records at max

    I have seen a possible solution in production at my former employee. They had a star schema modelled running on a Oracle EE 11.2 RDBMS with MicroStrategy on top in ROLAP mode. Dimensions, attributes and measures have been defined in MicroStrategy and the user picked the fields in the GUI to build the report. An native SQL statement has been created by MicroStrategy and sent to the database (no idea if the GUI translated to MDX in the first place and then further to SQL).

    My current employee has a very strict policy about software as we are bound to Microsoft solutions (version 2016 with the latest SPs und KB installed). Never worked with SSIS, the database and SSAS before, I did a lot of reading but I didn't find any answers to my use case. What I have learnt so far is that I must use ROLAP for storage type of the measure group and the dimensions because we have 20 Dimensions with some of them having 500 million records (as it is on a person level).

    In the use case with Oracle and MicroStrategy I have seen that this can work (the generated report is automatically generated as csv file).

    As I have no real world practice with Microsoft, I wanted to know if ROLAP with several dimensions and 2 dimension sizes up to 2-3 digit million records will work. Speaking in 2 dimensions, my resulting table generally will have 40-50 columns (coming from several dimensions) with a very large number of records (up to 50-200 records). Is there some limitation in the MDX language for that number of objects?

    The SSAS provides the definition of dimensions and measures and the user only deals with metadata. I don't want to give up on that :-). I do not need any aggregation features which is the original intent of OLAP with pivoting but data on the primary key level of the fact table.

    What do you think?

    Another thing: reading the other topics: to extract a ssas query to csv I have to go through the SSIS, haven't I? Or did that change and become nicer in the newer versions?

    Is there something compareable to reporting like MicroStrategy in the SSAS world?

     

    Thanks for reading my story 🙂

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • Martin Schoombee

    SSCoach

    Points: 19010

    Ok...let's try again. Interesting requirements.

    Here's my two cents based on what you've described:

    In theory an Analysis Services cube (multidimensional) in ROLAP mode (or a Tabular cube in Direct Query mode) will work if you throw enough resources at it, but I question the use of SSAS if aggregation is never an end goal. If the end goal is a granular level csv file, you might as well just use a reporting tool that executes a quey against the back-end database (dimensional model) and return a csv as output.

    That being said, I can understand why a cube was used as it provided the semantic layer for an end user to pick the fields they wanted to see and without understanding the underlying data model.

    Before deciding on a solution, I would want to ask what the end game is for end users. What do they do with the csv files? Are they going to use it as input into some other statistical tool (i.e. Python, R Studio, etc.) and just need a means to filter the data set? Or are they using some other tools to eventually aggregate the data, and what are those?

  • svenflock

    Old Hand

    Points: 353

    Hi Martin,

    thanks for your reply.

    Reporting files can be provided as CSV but also as a classical table in MS SQL server which is the source for csv. R, SAS and SPSS will later be used be the scientist to access the data and there is no direct link between the statistical tools and SSAS infrastructure. No more filtering should be required as this is done before on SSAS site as a filter condition. Scientists will definitely define further statistics (aggregations), e.g. quantile calculation, regressions. As every scientists does different data exploration, we do not know what they want to do. It's not like in finance, where we have well-proven KPIs or sale figures which are aggregated as a sum.

    Based on my requirements and just sticking to Microsoft tools: would you recommend to go with the tabular mode in direct query mode to not have all data fit in memory) or in ROLAP? What is the most efficient way to extract that data from SSAS? SSIS? I like the "Browse" functionality from SSAS as it is like MicroStrategy: you just drag the fields and punch the button to get the report. Is there a way to save the results of that browse functionality? I can code if there is an API just in case.

    If you would not be limited to Microsoft: what reporting tool to provide a semantic layer would you recommend? Would that be Microstrategy as I already know it from my last employment? Or is there some other tool for that (open source is an option too)?

    Thanks again!

  • Martin Schoombee

    SSCoach

    Points: 19010

    It's a tricky situation. SSAS provides the semantic layer you want, but if your goal is to provide granular-level data in a csv format then you would either have to go through a reporting tool or Excel. Excel is the primary tool for ad-hoc cube exploration at this point (using Pivot tables), but that will require some work to get into a classic table format which you can export to csv. Your data volumes would also mean that you cannot use a Pivot table directly, so end users would have to use Power Pivot in Excel to overcome that limitation. Don't know about you, but that's a lot of work just to get a data dump in csv format.

    Report Builder in SSRS gives you the ability to create your own reports from a data set, save and export it to csv if you'd like...but given recent developments I think it's on its way out the door (my opinion) and would probably be replaced by paginated report functionality in Power BI. If you're planning on hosting something on premises then Power BI Report Server could be an option.

    If you're going to go with SSAS, then I would opt for a Tabular cube and Direct Query mode would provide similar functionality to what you've had before. In my opinion though, SSAS would be overkill if your goal is just to provide a csv extract and unless you have multiple reporting tools that would need that semantic layer.

    If the goal is to provide detailed data to almost any reporting tool, I'd consider providing direct access to the underlying database (through views if obfuscation is required) as most reporting tools would have the ability to connect to a relational database. Users would most likely also have the ability to filter the data in whatever reporting tool they have, but this still doesn't give you a good solution for generating csv files quickly and easily.

    We've also not talked about cloud options and I don't know these in great detail, but I'm wondering if the recent developments with Data Lake and Data Catalog technologies could be an option here. I'm not sure and I don't necessarily want to jump onto that bandwagon immediately.

    In terms of other tools, SAP's Business Objects and IBM's Cognos comes to mind as having similar functionality as what you've had with MicroStrategy. Qlik could also be an option but I don't know much about it.

     

    To sum it all up, for me it would depend on the primary end user objectives. SSAS gives you that semantic model and an option for ad-hoc analysis in Excel, but would not provide an easy mechanism to extract large volumes of data into a csv. Reporting tools will provide that functionality, but the amount of work required needs to be evaluated in terms of cost vs. benefit and whether it wouldn't just be easier to give access to the underlying data directly.

    Apologies for the lengthy post. Hope others would join in and provide their thoughts/comments as well.

  • svenflock

    Old Hand

    Points: 353

    Hi,

    thanks for your detailed feedback and experience, no apologies needed :-).

    I used the weekend to build up a development environment at home to try your suggestions.

    Here are my results:

    • Power Pivot: the Cube Browser feature is a common GUI. I was able to build the report and get results in the GUI of Power Pivot. However, I was not able to save the results from that GUI. I had to return to Excel and after that, I was limited to the 1 million row issue.

      Could you give me more insights how to get the data right from the Power Pivot to an external file? I googled and didn't find any solution for that.

    • Didn't know Report Builder, too. It seems like a light edition of the SSRS. Again, the Cube Browser can be used to build the report. Funny thing is that after that, I have pivoting options again. So, it's a little bit like Inception :-). Maybe if I go for that solution, I create such a large master report from a star schema directly. I can define field names, too. As you told me, it seems like an dying thing. The last version of that tool is from 2016. SSRS however is still shipped with the current SQL Server.
    • Power BI Desktop seems like Tableau and QlikView but from Microsoft. What I didn't understand from your reply: does the Power BI Report Server support the CSV export already?
    • I have gone with the ROLAP mode in multi-dimensional as first test (using adventureworks cube). I mixed up several dimensions and up to 20 columns. I designed the report using the Cube Browser (I do really like that tool :-)), copied the MDX and used next SSIS to write a flow from CUBE to SQL. As it was my first time with SSIS and those sources, it was quite frustrating to setup the flow. ADO.NET didn't work at all and OLEDB didn't allow any preview of data. After 4hs, I got a basic flow from CUBE to CSV. Target table in SQL Server took to much time as I had to convert all fields to valid database data types. This is too much work for each report to be repeated. I was quite suprised about the performance on a ordinary notebook using only a single core. I got 5 million record dumped in 6 minutes on the lowest level of the fact table (internet sales).

      This option is only ok for me if I can automatically code the SSIS flow package with C#/VB.NET or some other language. Do you know if that is possible to do that?

    • Cloud is not an option due to high sensible data and of course, German data protection law. Not a good idea 🙂
    • I have to try out the other reporting tools. No idea if I can get test licenses. I know Qlik(View), it behaves similar to Report Builder. Only limited rows are displayed on the screen. Once you do the export, all data is queried. As it is in memory, all filter must be applied during the load script. Could be an option, too for small data requests.
    • The semantic layer is definitely a good thing as I was able to use it out of the box in different tools during my experiments.

     

    What's next on my agenda is to use the real data for the tabular direct query mode. If I can automate the SSIS package and the performance is ok, I will go for that option.

    Looking forward to your input.

  • Martin Schoombee

    SSCoach

    Points: 19010

    Yes, using Excel (or Power Pivot for that matter) just to get to a CSV will not be a good solution...not at the data volumes you're looking at.

    Report Builder could be an interim solution, but will likely be replaced by the paginated report feature in Power BI (still in preview and I've not seen an export to CSV). In that case, going to Power BI Report Server will be your only choice, and nobody knows at this point what the limitations will be.

    SSAS will be a great tool for data exploration, but again not a good solution if your end goal is a CSV file. The more I read your comments and think about it, I come to the following conclusions:

    If your end users are versed in writing TSQL, I'd be inclined to give them access to the database directly (or through a series of views that somewhat obfuscate the necessary). In SSMS you are able to extract your results directly into CSV (or flat file) format. If you need more control, I'd recommend a simple web front-end (using your choice of language) that let's a user select a series of attributes and filters, then generating the TSQL to execute and use SSIS to extract the data in an automated fashion to a CSV file. There will be some work required to get the internal plumbing to work, but there just isn't an elegant solution to get what you want easily by using end-user tools only.

    Hope this helps. Please let me know if I didn't answer all your questions.

     

  • svenflock

    Old Hand

    Points: 353

    Hi Martin,

    sorry for keeping you waiting so long. And thanks for your contributing to the discussion. I've had some time on my side to get Power BI at my dev station. Though, I hadn't had the time to play with it. I know I have difficult requirements :-).

    I like your suggestion with the frontend. If we can't get something like MicroStrategy, it is definitely a try. The users do know SQL, but if there are several dimensions and facts in place, time should not be spend to write the same code every time.

    However, I have one open question on my side: is there a way to get the auto-generated SQL statement when sending a DAX and MDX query from the ROLAP and direct query mode at SSAS, respectively? I didn't find anything in SSMS. Is there something like a tracer or even better an API / code call for that to extract the query?

    Thanks again!

  • Martin Schoombee

    SSCoach

    Points: 19010

    No worries 🙂

    To answer your question: Yes there is. For SSAS Tabular and Power BI, you could use DAX Studio which is freeware. You should also be able to run a Profiler Trace (which you can launch from SSMS) and connect it to the SSAS instance to trace queries.

    I will caution you however that the queries generated by front-end tools are not always the best, and I'll advise to build your own if you plan to dynamically generate queries.

  • svenflock

    Old Hand

    Points: 353

    Hi,

    thanks for the hint with the bad auto-generated queries. I was asking because I had another idea in mind that the user is using DAX Studio or Cube Browser to build the report, test it and dann extracts the underlying SQL query which than is pasted into a separate extraction tool :-).

    DAX Studio provides only the MDX, DAX query, doesn't it? For the real SQL I have to use the Profile Tracer, don't I?

     

    Thanks again!

  • Martin Schoombee

    SSCoach

    Points: 19010

    Yes and yes. In the case of direct query, I'm not sure if DAX Studio will show the actual SQL executed against the source database, but a Profiler Trace definitely will.

  • svenflock

    Old Hand

    Points: 353

    Thanks, I will test this :-).

     

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply