RE: Any solution for better performance about a cube with 3 distinct counts on the same source DB table on SQL Server 2008 R2?

SSCommitted

Points: 1502

May 30, 2013 at 2:48 pm

#1619913

Any way to design the underlying tables so you don't need 3 distinct count measures?

1. MG_B and MG_C can be partitioned on column B and C even if the table is partitioned on ColumnA. In this case, you will face some performance issues since the query for MG_B and MG_C will require pulling data across multiple partitions. The performance difference of processing between MG_A and MG_B will be the close to performance difference of running a select distinct query across ColumnA and ColumnB. SSAS works by first querying the data, then aggregating it. Aggregation time for MG_A and MG_B would be somewhat consistent, but the query time will change.

2. Make sure the partition query is optimized so you are not selecting fields that are not used. MG_A only select ColumnA and dimension keys. MG_B only select ColumnB and dimension keys. etc...

3. If you want to partition a distinct count, it must be on the distinct column. The reason is that the distinct counts will be summed across partitions. If you partition MG_B on ColumnA and a value for ColumnB exists in more than one partition, you will be double counting that value.

Ex: ColumnA: 1, ColumnB: 2

ColumnA: 2, ColumnB: 2

ColumnA: 2, ColumnB: 3

Lets assume you have two partitions based on ColumnA. Partition1 is where ColumnA = 1. Partition2 is where ColumnA = 2.

Distinct count of ColumnB in Partition1 is one. Distinct count of ColumnB in Partition2 is two. In the MG_B, when you analyze the data, you will end up with the distinct count equaling three, one from Parittion1 and two from Partition2.

4. You wouldn't be able just build the latest partition. Once you have the measure groups partitioned, you only need to reprocess partitions that are affected by updates in your data. In my SSAS solutions, I always create a SQL table that maintains all the partition definitions for all measure groups. During the data processing of my DW, I'll flag which partitions are affected and then have a process run through the table to processes specific partitions.

Efficiently processing large SSAS solutions in my opinion is a bit complicated and MS has not provided a great way to manage it yet. I've only used MOLAP in my SSAS solutions and create custom processing methods to make it efficient.

I start by creating tables to maintain SSAS dimensions and SSAS measure group and partitions. As my DW is being processed, it'll flag which dimensions need to be updated and which partitions need reprocessing. The latest version also creates new records in the partition table when a new partition needs to be created. I have an SSAS process task that runs after the DW and it checks these tables to build out the XMLA to only update the dimensions that have changes. The SSAS process also creates/alters partitions and processes the ones that need reprocessing. All of this is built using SQL and SSIS. If I get enough requests, I'll do a full write up on this.