Slow performance of Analysis Services queries due to high disk usage during queries

Question

Slow performance of Analysis Services queries due to high disk usage during queries

Stefan K

SSCommitted

Points: 1786
More actions
August 22, 2007 at 8:43 am

#118220

Hi,
we have some problems with slow queries in Analysis Services.
We recognized extremly high disk usage on the server when browsing OLAP cubes.
This happens even when many users touched the server with the same reports, firing the same queries.
The data should be in cache and no (or not much) query processing on the disk should be necessary.
My first thought was, that there is not enough RAM available for keeping the data in the cache.
That would be mysterious, because the server is an Itanium 2 dual core with 2 processers and has 16 Gigabytes RAM.
Thereby 7 Gigabytes are reserved for SQL Server Service.
The (after my opinion) relevant memory settings of the Analysis Server are:
<TotalMemoryLimit>60</TotalMemoryLimit>
<LowMemoryLimit>30</LowMemoryLimit>
So the maximum memory threshhold should be at 9,6 Gigabytes.
This has never been reached, the normal memory which the msmdsrv.exe process consumes is about 4-5 Gigabytes.
Thus memory should not be the problem.
The problem occurs independent of the client we use to query the data. (Excel,Reporting Services)
I read the SSAS 2005 Performance guide but did not find much useful information.
Any ideas how to tune the caching of SQL Server? Or any idea why disk usage is so high during querying?
Best regards,
Stefan
Kindest Regards,
SK

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

ptalley Valued Member Points: 68 More actions · Answer 1

I would love to hear an answer on this too. We have queries that one time they are run will take a minute and other times will take 45 minutes (I applaud the patient user who actually waited for it to finish).

stevefromOZ SSC-Forever Points: 43646 More actions · Answer 2

Stefan,

have you used any tracing to see how often the cache or aggregates are being hit for your queries? This might also indicate whoich partition(s) are being hit most frequently (ie do you need more, smaller partitions).

Steve.

Stefan K SSCommitted Points: 1786 More actions · Answer 3

Hi Steve,

I checked that behaviour with the profiler several times. The cache is hit beneath every query for any measure group. If not the cache is hit then the data is pulled from the aggregations. But the IO workload is still very high on the drive on which the data lies.

I cannot use partitions because playing with the f.cking Standard Edition at this customer.

Stefan

Kindest Regards,

SK

Dave Balsillie SSCertifiable Points: 5589 More actions · Answer 4

Some items you should consider:

Depending on the hardware you're running on, you really need to start considering partitions once you pass 2 M rows. It has a dramatic affect on performance, scalability etc. (Before anyone complains, yes, I regularily process partitions > 60 M rows, on 64 bit hardware. Depends on the box.)

Usage analysis should guide how you'll partition the data. Again, getting it right has a dramatic effect.

Since you're constrained by version, consider multiple cubes as sort of a poor man's version of partitioning. (Better to pay for the full version, but you'll pay one way or the other). I believe you'll still be able to pull data between the cubes to get critical totals, etc.

Check your aggregation design(s). As well, change any reference attributes (i.e. codes and sequences) to disable the attribute hierarchy. If your reporting front end won't allow access to them at that point in time, you can still expose them by casting them into measures.

When you've done all this, make sure you've turned on the query logging, and then run through the usage based optimization. Hopefully, you'll get some changes to the aggregation design that'll help. (Save the previous design so you can compare before / after. If it's not significantly altered, you may want to check your MDX.)

As a last resort, you can set the aggregation design properties for the cube, but take care. The wizard seems to do a decient job (if you've done the above) and it's quite possible to make it dramatically worse.

Good luck.

Jeff Moden SSC Guru Points: 1004591 More actions · Answer 5

Or any idea why disk usage is so high during querying?

A large amount of disk reads are normally NOT a hardware problem... they're normally caused by code problems. Dave probably hit the nail on the head about checking aggegation design(s). I'll also note that although partitioning helps, I've found that fixing the slow or resource intensive code works much better... maybe if you post one of the more troublesome scripts and a little DDL?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Stefan K SSCommitted Points: 1786 More actions · Answer 6

Hi Grasshopper and Jeff,

> Since you're constrained by version, consider multiple cubes as sort of a poor man's version of partitioning. (Better to pay for the full version, but you'll pay one way or the other). I believe you'll still be able to pull data between the cubes to get critical totals, etc.

I thought about doing this using calculated members with IIF clauses which pull the data from the cubes, to let the existing reports still work the same. But it is getting pretty complicated and it is much manual work to do.

Check your aggregation design(s). As well, change any reference attributes (i.e. codes and sequences) to disable the attribute hierarchy. If your reporting front end won't allow access to them at that point in time, you can still expose them by casting them into measures.

> Why "disable the hierarchy"? Natural hierarchies are good for performance and in any advisory I was told to define them (the right ones).

When you've done all this, make sure you've turned on the query logging, and then run through the usage based optimization. Hopefully, you'll get some changes to the aggregation design that'll help. (Save the previous design so you can compare before / after. If it's not significantly altered, you may want to check your MDX.)

> The usage based wizard does not work. I does not design any aggregations for most of the cubes due to some pretty big dimensions (400.000 members). Thus I used the data from query logging in a the aggregation manager tool (codeplex) to get the relevant aggregations.

As a last resort, you can set the aggregation design properties for the cube, but take care. The wizard seems to do a decient job (if you've done the above) and it's quite possible to make it dramatically worse.

> What do you exactly mean by "aggregation design properties for the cube"? Which ones and where? In the dimensions?

> I'll also note that although partitioning helps, I've found that fixing the slow or resource intensive code works much better... maybe if you post one of the more troublesome scripts and a little DDL?

Which DDL do you mean?

Best regards,

Stefan

Kindest Regards,

SK

Dave Balsillie SSCertifiable Points: 5589 More actions · Answer 7

Here goes:

1) Manual partitions. Agreed. The best option is still to go up a version and use partitions. Since you seem to have 400,000 items, that's where you need to be. (What's the data row count?)

2) Attribute Hierarchies. For those attributes with a high degree of cardinatity to your item level (i.e. product code for a product, address for a location), creating an attribute hierarchy (totals) for these attributes las a limited aggregation value. But you still want to make them available in your application!

Go to the attrubute properties, and disable the attribute hierarchy. The attribute is still there (an accessible for many tools) but totals for these attributes aren't created.

3) Aggregation design properties. In VS, when you open the cube, on the bottom left of the window, you'll see the dimensions. Switch to the attribute tab. Now with the properties window pinned open, click on an attribute. You'll see things like "AggregationUsage". This property allows you to "force" an attribute to be included in virtually all aggregations. But take care. use it very sparingly.

I believe Jeff means for you to post the XMLA, which you can generate via Management Studio.

Also, the most likely area to introduce unintentional performance problems is in the calculation script, particularily if you've been attempting complicated work.

To check that, coment out the entire script (except for the CALCULATE statement), do some tests and see if performance changes. Of course, any calcuated measures will be disabled. There's been times when I've had to do this through trial and error to isolate a particularily troublesome issue.

stevefromOZ SSC-Forever Points: 43646 More actions · Answer 8

re: the points in the prior and previous messages:

1) even in Standard Edition you can actually have 3 partitions (it's Enterprise where you're not limited). If your data supports the need, then see if splitting to 2 or even 3 partitions will help. Note it's Edition not Version that you're wanting to change, the next Version up is Katmai (SQL2008 [we hope]) so there's no Version moving until that's released

2) Like Dave has said, make sure you Attribute Hierarchies are well thought out. Don't confuse these with User Hierarchies (ie the navigational hierarchies you can create for users to easily navigate the data). Also, of utmost importance is get your attribute hierarchy relationships right. If you're not seeing any aggregations being built then it very well could be because your attribute relationship are not set correctly. It is no longer neough to create a User hierarchy that links Country to State to City - make sure that the relationships between the Attributes reflect this relationship. Look at the following for more on this:

http://www.sqlskills.com/blogs/liz/default.aspx?date=2006-07-09 - although a little old now, Liz Vitt's detail on the effect the settings (e.g. AggregationUsage) is still a good primer.

http://msdn2.microsoft.com/en-us/library/ms166553.aspx good ol msdn

http://cwebbbi.spaces.live.com/Blog/cns!7B84B0F2C239489A!619.entry Chris Webb on what can happen if you leave redundent ones in there

http://sqlblog.com/blogs/mosha/archive/2007/06/07/katmai-june-ctp-attribute-relationships-tab.aspx and Mosha on what's coming in Katmai but also good visualization of attribute relationships.

3) re: setting the aggregationUsage property, at least look at Liz Vitts post linked to above.

Steve.

stevefromOZ SSC-Forever Points: 43646 More actions · Answer 9

Forgot to mention, Liz also wrote some/most of the the perf guide for AS2K5 also and i think it too covers off the usage of attributes for aggregation candidates.

Steve.

Dave Balsillie SSCertifiable Points: 5589 More actions · Answer 10

Great points Steve.

Additional: on the relationships mentioned, make sure that your data adheres to these relationships / specifications. For example, if you specify that a city has one and only one state, and your relational information allows this to be violated, make sure you enforce this specification somehow.

Otherwise, MSAS will happly process all of the data, but you'll have data "holes" in the results and it won't warn you of the error.

Stefan K SSCommitted Points: 1786 More actions · Answer 11

Hi Dave and Steve,

the size of the measure groups differs. The biggest (budget data on day level) is 83 Mio rows but only 2 dimensions (400000 and 90000 elements) Then there some measure groups with 20- 30 Mio rows, some with 10 and some small ones.

> even in Standard Edition you can actually have 3 partitions (it's Enterprise where you're not limited).

? never heard of that, inspite of the partition which is build for write back. Could you please explain how to do that? I could only use one for each measure group, then the warning pops up.

Attribute hierarchies - OK, I understand what you mean. I kicked them out of course.

Aggregation design properties - OK, we mean the same. I went through them when after creating the SSAS solution, deactivating the ones which did not seem to be important. How is it just out of interest - when such attributes are logged in query logging and I use the aggregation manager tool to add aggregations. When these attributes are part of an aggregation - are they created when processing the measure group or does the property prevent the aggregation the be created?

I did not use too many and complicated calculated members, yet. But a guy from MS told me to use much more of them. Because of the pretty complicated security structure at this customer (700 roles), the formula engine cache is used sparse. (and NO they cannot be consolidated) We do many calculations in the reports themselves (WITH member....). These calculations have to be made every time the reports were executed, because they cannot be cached. So most of the calculations were made by using the Storage Engine Cache, which causes much Disk IO. Calculated members use the global Formula Engine Cache, so they should be cached independent from the current role.

Attribute relationships - I defined them for nearly all the user hierarchies and for all the attributes in user hierarchies, which are used very often in the reports. This theme has gone very complex since 2000. What happens if I have to use an attribute in two different user hierarchies? This would be redundant and cause a problem again. But if I do not design the relationship for the attribute in the second hierarchy I would also have a problem.

e.g.

hierarchy1: a > b > c > d > x

hierarchy2: e > b > x

I could define the relationsship between a and b but would have a problem when designing the relationship between e and b because of redundancy.

Thanks for your help and the links,

Stefan

Kindest Regards,

SK

Stefan K SSCommitted Points: 1786 More actions · Answer 12

Hi,

I was just checking the aggregation design again. Usage based optimization still does not work. Indeed the target server in the project solution is set to the live server's url it says that no log entries are available for the measure group. But they exist. When I try to do the same with Aggregation manager Add in I could query data and add aggregations from the query log table to the measure groups.

I 'd like to reduce the number of aggregations for some measure groups which need pretty long for calculating them during processing. In aggregation manager you could even check the physical size of aggregations. I have some very big aggs, which are about 30% of the size of the fact table itself. Is such an aggregation still useful or should I try to reduce the number of attributes in the aggregation?

Is SSAS only using aggregations, which fully cover a query? If I query a cube and using the 3 dimensions a, b, c and I have only an aggregations for a and b, would the SSAS Engine use the aggregation or not?

If SSAS uses the aggregation I could reduce the big aggregation and would still have a benefit from them, also a good tradeoff between processing time and query performance.

Best regards,

Stefan

Kindest Regards,

SK

Dave Balsillie SSCertifiable Points: 5589 More actions · Answer 13

The usage based optimization will only use log entries created from the last time you save the cube. That make sense, since you may have changed the dimensional structure(s), which would invalidate the information. Ideally it would be helpful it if was a little more intellegent (i.e. use log entries since the dimensional information was last changed), but I don't think it has that logic.

I believe AS must use an aggregation that contains data with a granularity <= to the query request. So in your question, no, since it can't produce results for c, based on that aggregation. But don't quote me on that.

Your problem may lie in the calculations you're attempting to do. It may be possible to redesign how they are processed. Consider if you could do part of the calculation on the cube, in particular during the cube process. Often, it's better to create intermeadiate results first (that perform fast), then combine them as a last step.

700 roles sounds like a lot. Fundamentally, that means there's 700 logical sub cubes. Are all of the roles being used or have some been created "just in case". Creating roles as you need them probabaly will result in better optimizations. (Of course, then there's more ongoing interactions to manage this information.) Also remember that users can participate in multiple roles. Creating fewer roles, that can be combined will probabaly help.

Your budget data sounds highly sparse. I've yet to see an application that creates a "daily budget", but perhaps. If this is simply how you're modelling / joining the data, consider reverting it back to a truer form. Most "budgets" are modelled at the month level (and even then, it's a stab in the dark).

The MS recommendation is the right one. Building standardized calculated members in the cube, helps to get the solution right. Of course, there's lots of examples why you might not want to do it, but if you only have a few, there's a good chance you're missing a few opportunities.

Stefan K SSCommitted Points: 1786 More actions · Answer 14

Hi Dave,

the roles are all used. Beneath every user has its own set of rights. We calculated the number of roles we would have to create if just creating roles for all the sales and product groups and their combination. We even had more than this 700 roles and it would have made things worse because of transparency reasons.

The business at this customer does not want to consolidate users rights, we talked a lot about this theme...

The budget data is available on month level. But as we have the requirement to show data consolidated for days we decided to precalculate the daily budget, checking for workdays and all this stuff. We thought it would be faster to just "show" this data instead of calculating it every time when they are queried.

The problem with the standardized calc measures is that they will be cached in Formula Engine Cache for each user role seperately. So they will be usefull, but not too usefull.

Best regards,

Stefan

Kindest Regards,

SK

Slow performance of Analysis Services queries due to high disk usage during queries

Cookies on SQLServerCentral