Real-time query access with PDW

, 2014-03-13

The Parallel Data Warehouse (PDW) officially supports Analysis Services as a data source, both the Multidimensional model (ROLAP and MOLAP modes) and the Tabular model (In-Memory and DirectQuery modes).  The big benefit of using ROLAP or DirectQuery is you get real-time query access to the relational data source in PDW (as opposed to the data only up to the last time the cube was processed) and don’t have to process the cube (just make sure to use clustered columnstore indexes on the PDW tables to improve performance).  You create MDX queries when using ROLAP, which get translated to SQL when hitting PDW, and you create DAX queries when using DirectQuery, which also get translated to SQL when hitting PDW.

Keep in mind that the PDW is so fast when using clustered columnstore indexes, that if you have a properly defined star schema you might not even need to use a cube because the results will be returned to the user quickly.  But there are other reasons besides performance as to why you might still want to use a cube (see Why use a SSAS cube?).

An SSAS cube that uses PDW as a data source is just like any other data source that SSAS uses.  Performance is usually fast because of the clustered columnstore indexes, with the only caveat is sometimes the SQL that is generated by DirectQuery to pull data from PDW is not that great (the SQL generated by ROLAP is usually pretty good).

The other thing to note about DirectQuery, which applies to any data source, is you can’t use PerformancePoint or Excel PivotTables with DirectQuery.  This is because MDX queries are not supported for a tabular model in DirectQuery mode, only DAX, so you need to use a DAX client like Power View (PerformancePoint and Excel PivotTables generate MDX queries behind the scenes).  The other limitation with DirectQuery is it does not cache results like ROLAP and there are some unsupported data types (geometry, xml, and nvarchar(max)).  Finally, there are some DAX functions that are not supported in DirectQuery mode and some that might return different results (see Formula Compatibility in DirectQuery Mode) and there are two DAX functions that are not supported (EXACT and REPLACE).  So it seems that ROLAP is the better choice over DirectQuery for many situations.  But if you do go with a tabular model you may want to look into using a hybrid mode (see Tabular query modes: DirectQuery vs In-Memory and Partitions and DirectQuery Mode).  Definitely go with a DirectQuery tabular model over a in-memory model if your database is 1TB or more.

One limit of ROLAP to note is it does not support parent-child hierarchies (which generate recursive CTE queries which PDW does not yet support, so you will have to convert your parent child hierarchies into level based hierarchies (flattened) in the cube via this tool).  One improvement is Distinct Count performance for ROLAP queries is faster if you enable an optimization.  Some other ROLAP limitations against PDW:

  • Auto-cube refresh is not supported
  • Materialized views, also called Indexed views, are not supported
  • Proactive caching is supported only if you use the polling mechanisms provided by Analysis Services
  • Writeback is not supported

Some things I have learned when using ROLAP against a PDW:

  • Sometimes it is better to have your fact tables as ROLAP, but keep the dimensions as MOLAP
  • Think about using MOLAP for your historical partitions and ROLAP for just your current partition
  • Make sure the measures are in BIGINT in the fact tables or MDX aggregates might not work (MDX aggregates use INT by default unless the source is BIGINT)
  • PDW supports hundreds of concurrent users, but if you have a thousands of concurrent users hitting the cube it may be better to move the data from the PDW to a SMP data mart and create the cube there

More info:

Comparing DirectQuery and ROLAP for real-time access

Tabular model: Not ready for prime time?


Parallel Data Warehouse (PDW) and ROLAP

Analysis Services ROLAP for SQL Server Data Warehouses

Columnstore vs. SSAS





Related content

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.


1,567 reads

Networking - Part 4

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...


1,530 reads

Speaking at Community Events - More Thoughts

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.


360 reads