Perfomance problems with olap cube

  • Hi, sorry for my bad english.

    I have a lot of performance problems with my olap cube.(full Molap SSAS 2005).

    It is built on 20 dimensions and a fact table of 150 million records,with three partions for each year (2006/7/8).

    My users query the cube with excel 2007 client and they often say it is so long to retrieve data.

    The real problem is that the same query,same user,different moment, gives a different time of response.

    From some second to several minutes!!!

    I have cube roles activated and I suppose there is some problem with them.

    Users that have no roles activated don't have the same problems of the others.

    I am becoming crazy with this problem.My users want to kill me.

    Any suggestion?

    Thank a lot

  • The number of dimensions is less important than the number of attributes and how you have your attribute hierarchies configured, but 20 dimension is getting into the range of "a lot".

    Query time differences are probably related to caching. The first time you run a query against a cube, some portions of the result (Excel often uses subquery syntax that caches portions of data) may or may not get cached. This cache can live for a short period of time or a long period of time so it can have the interesting effect of making query times vary greatly at seemingly random times. Use profiler to see what is going on.

  • First of all, you have a good grasp on written English so no need to apologise.

    Michael is completely right with the cache comment. Are the queries running longer the first time they are run and quicker every time after that? Most likely. If you are running 32bit you will have a limited cache your results can live and as new queries come in the earlier queries may get pushed out meaning that the next time a user runs the original query again it won’t be in cache.

    There are some performance counters that you can use to track this starting with “MSAS 2005:Cache” coupled with profiler results this will tell you what queries are being answered form cache and which aren’t.

    If you have a 64bit run some tests as you may want to get more memory (64bit allows a much greater cache).

    You could also look into pre-heating the cache. Basically this means that before your users come in in the morning you (or rather SQL Server) have run several queries that users regularly run. By doing this the queries that are run first thing in the morning are already in cache. There are loads of resources on how to do this, in fact I think there was an article on SQLServerCentral just the other week.

    Security does have a part to play in the retrieval times as you might have suspected, it is an extra layer that the query has to go through before the result set is returned, however, I’m not sure of the overhead, you may want to do some tests and review how many security roles there are on the cube in question and how many times users appear multiple times, for example, if a cube has 5 security roles and a user appears in all 5 this could be a fair sized overhead.

    Hope this helps.

  • I'll have to concur with the previous comment that 20 dimensions are "alot". I'd go as far as saying it's huge.

    Examine your dimensions, and determine if any of them are dependent. If they are, combining the dimensions will make terrific sense from an aggregation point of view, usability, flexibility, etc.

    Getting your attribute relationships is important as well. When you define relationships between attribute (say Brand --> Manufacturer), you're telling Analysis Services it can derive manufacturer totals from brand totals. But make sure your data doesn't violate this relationship, or you'e get incorrect results. For example, Dove --> Unilever and Dove --> Mars is invalid, if the ID for the brand is the same. Check!

    As well, consider your typical usage senario. Although Microsoft designed partitions to speed up processing (I spoke to the developer last week about this), I've found that partition design is better done against how "most" queries are done. (Remember, role definitions add query restrictions.) As well, you should include slice information, if you want to ensure the optimizer queries the partition properly.

    Finally, speed will be dependent on the hardware you're running on. Examine your memory usage, CPU and disk throughput to identify the weakest link in your implementation.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply