Cube Designs - how many, how do you deal with size and timeframe?

Question

Cube Designs - how many, how do you deal with size and timeframe?

katrina.oakley

SSC Rookie

Points: 27
More actions
January 26, 2015 at 9:57 am

#296936

I am looking to design a cube(s) for a grocery retailer. At a product level there are approximately 11million rows coming into the database per day.
I have scoped out the reports that I require - At this point most things will be done weekly, by period, and quarter - all with year on year metrics
My question is how do people generally handle this much info? Do you keep 2 years of product level sales etc info aggregated to weekly level (sales, units etc) in the fact table being pulled into the cube - i.e. one massive cube!!! with all the related dimension tables OR would you do quarterly cubes and then to do year on year measures pull data into the reports from 2 cubes?
I have done lots of research and can't find an answer on performance for the amount of "time" required to be stored - only around the amount of dimensions and aggregations.
Thanks in advance

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

PB_BI SSCoach Points: 17461 More actions · Answer 1

I think what you need to take a look at is measure group partitions and storage modes. There are many articles out there that cover this subject, this is a good place to start: https://technet.microsoft.com/en-us/library/cc966527.aspx

It sounds like you have done quite a bit of analysis already so generally speaking you match the design of the above with what sort of queries will be thrown at the cube.

I'm on LinkedIn

katrina.oakley SSC Rookie Points: 27 More actions · Answer 2

Thank you for your response and the link - I will read it.

I have been thinking about partitions, and probably based on time makes sense (maybe each quarter). I was thinking of them in terms of efficiency for loading rather than reporting - do you know if you can have multiple partitions (i.e. based on different dimensions)?

I am still a little confused around the size.

At this point I see approximately 20 measures which will be split by 20 or so dimensions each of which will have hierarchies at which each measure will need to be calculated - e.g. measures at a department level, same measures at a category, sub-category and product - then each of these could be split out to region and down to a store level. Remembering there are about 11 mill rows per day at product level (product by customer) and about 60K products and over 1 mill customers

The cubes we run currently are quarterly - but the are far from efficient and are not built using SSAS - this is our chance to wipe the slate clean and design them properly.

Any other resources on actually designing cubes (rather than building) anyone has would be most appreciated - I am currently working my way through Expert Cube Development by Chris Webb et al

Thanks again

PB_BI SSCoach Points: 17461 More actions · Answer 3

You can have multiple partitions for one measure group. You have to be careful with data overlapping though - query-bound partitions are not aware of each other. So for example if you had one partition which held data from 01/01/2015 - 07/01/2015 and another for 06/01/2015 - 14/01/2015 then the measure would double up the count for 2 days.

In terms of size, I don't think you have a problem. Your dimensions are your dimensions and none of them sound massive. When you are talking about all of the different levels you are thinking as if everything would be processed and stored as an aggregation. Whereas you can do this, don't. It's better to pick and choose your aggregations to keep the performance up and the storage down. Here's what I do:

I build the cube (with all of the design considerations needed) and process it without aggregations

I turn on the OlapQueryLog table (more info here: http://www.mssqltips.com/sqlservertip/2876/improve-sql-server-analysis-services-performance-with-the-usage-based-optimization-wizard/)

I take 10-20 key reports of the business and recreate them using the cube data in excel.

I then use the usage based optimization wizard to generate aggregations based on the queries I've just been running.

I then use BIDS Helper Aggregation Manager (https://bidshelper.codeplex.com/wikipage?title=Aggregation Manager) to manage any duplicates/generally tidy up. I also use this to add any "special" aggregations I feel should be there but for whatever reason are not.

That will give you a good base set of aggregations to start with. Run the usage based optimization every once in a while to make sure that you're picking up the relevant usage patterns.

To be honest you really can't go far wrong with the book that you are reading 🙂

I'm on LinkedIn

katrina.oakley SSC Rookie Points: 27 More actions · Answer 4

Thank you so much for all your help - that really helped!

I may test out a cube based on a week of data, go through the process you described, make sure I am comfortable with it before building the full cube.

Thank you!

PB_BI SSCoach Points: 17461 More actions · Answer 5

PB_BI

SSCoach

Points: 17461

January 27, 2015 at 7:35 am

#1773386

Best of luck 🙂

I'm on LinkedIn