Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

James Serra's Blog

James is currently a Senior Business Intelligence Architect/Developer and has over 20 years of IT experience. James started his career as a software developer, then became a DBA 12 years ago, and for the last five years he has been working extensively with Business Intelligence using the SQL Server BI stack (SSAS, SSRS, and SSIS). James has been at times a permanent employee, consultant, contractor, and owner of his own business. All these experiences along with continuous learning has helped James to develop many successful data warehouse and BI projects. James has earned the MCITP Business Developer 2008, MCITP Database Administrator 2008, and MCITP Database Developer 2008, and has a Bachelor of Science degree in Computer Engineering. His blog is at .

Real-time query access with PDW

The Parallel Data Warehouse (PDW) officially supports Analysis Services as a data source, both the Multidimensional model (ROLAP and MOLAP modes) and the Tabular model (In-Memory and DirectQuery modes).  The big benefit of using ROLAP or DirectQuery is you get real-time query access to the relational data source in PDW (as opposed to the data only up to the last time the cube was processed) and don’t have to process the cube (just make sure to use clustered columnstore indexes on the PDW tables to improve performance).  You create MDX queries when using ROLAP, which get translated to SQL when hitting PDW, and you create DAX queries when using DirectQuery, which also get translated to SQL when hitting PDW.

Keep in mind that the PDW is so fast when using clustered columnstore indexes, that if you have a properly defined star schema you might not even need to use a cube because the results will be returned to the user quickly.  But there are other reasons besides performance as to why you might still want to use a cube (see Why use a SSAS cube?).

An SSAS cube that uses PDW as a data source is just like any other data source that SSAS uses.  Performance is usually fast because of the clustered columnstore indexes, with the only caveat is sometimes the SQL that is generated by DirectQuery to pull data from PDW is not that great (the SQL generated by ROLAP is usually pretty good).

The other thing to note about DirectQuery, which applies to any data source, is you can’t use PerformancePoint or Excel PivotTables with DirectQuery.  This is because MDX queries are not supported for a tabular model in DirectQuery mode, only DAX, so you need to use a DAX client like Power View (PerformancePoint and Excel PivotTables generate MDX queries behind the scenes).  The other limitation with DirectQuery is it does not cache results like ROLAP and there are some unsupported data types (geometry, xml, and nvarchar(max)).  Finally, there are some DAX functions that are not supported in DirectQuery mode and some that might return different results (see Formula Compatibility in DirectQuery Mode) and there are two DAX functions that are not supported (EXACT and REPLACE).  So it seems that ROLAP is the better choice over DirectQuery for many situations.  But if you do go with a tabular model you may want to look into using a hybrid mode (see Tabular query modes: DirectQuery vs In-Memory and Partitions and DirectQuery Mode).  Definitely go with a DirectQuery tabular model over a in-memory model if your database is 1TB or more.

One limit of ROLAP to note is it does not support parent-child hierarchies.  One improvement is Distinct Count performance for ROLAP queries is faster if you enable an optimization.  Some other ROLAP limitations against PDW:

  • Auto-cube refresh is not supported
  • Materialized views, also called Indexed views, are not supported
  • Proactive caching is supported only if you use the polling mechanisms provided by Analysis Services
  • Writeback is not supported

Some things I have learned when using ROLAP against a PDW:

  • Sometimes it is better to have your fact tables as ROLAP, but keep the dimensions as MOLAP
  • Think about using MOLAP for your historical partitions and ROLAP for just your current partition
  • Make sure the measures are in BIGINT in the fact tables or MDX aggregates might not work (MDX aggregates use INT by default unless the source is BIGINT)
  • PDW supports hundreds of concurrent users, but if you have a thousands of concurrent users hitting the cube it may be better to move the data from the PDW to a SMP data mart and create the cube there

More info:

Comparing DirectQuery and ROLAP for real-time access

Tabular model: Not ready for prime time?

PDW and SSAS

Parallel Data Warehouse (PDW) and ROLAP

Analysis Services ROLAP for SQL Server Data Warehouses

Columnstore vs. SSAS

Comments

Leave a comment on the original post [www.jamesserra.com, opens in a new window]

Loading comments...