Storage Modes in SSAS 2005

,

Storage Modes in analysis Services 2005


Overview

In SSAS 2005, a cube, a partition within a cube

or a dimension can store data and aggregations in numerous ways, each with its

own pros and cons. While Microsoft has retained the storage modes available in

AS 2000 (MOLAP, HOLAP and ROLAP), it has added several new storage modes, which

make an SSAS implementation a more robust and flexible solution in a production

environment. A new concept: proactive caching has been introduced, and as the

name implies, using this new feature enables the administrator to better control

the frequency and the way cube updates are being performed.

What exactly do we store?

SSAS 2005, being an OLAP

engine, stores cumulative data of the measures defined in the cube. For example,

if we have a cube with a customer dimension and a time dimension, we might have

a measure like: sales amount which will indicate the dollar figure of the

sales per customer in a certain date and time. This data is stored as part of

the cube. On top of the measures data, each cube can have its own

aggregations. Aggregations are pre-calculated summaries of data for various

combinations of dimension slicing. SSAS 2005 needs to store both the measures

data and the aggregation data on disk. This enables the OLAP engine to retrieve

answers to users queries faster. In the remainder of this article, Ill try to

explain what types of storage modes SSAS 2005 offers to store this data.



1. MOLAP (Multi dimensional OLAP)
This is the most common

storage mode. It implies that the cubes data and aggregations will be stored in

a multidimensional format. This is the most efficient way to store data and will

provide the user with optimal query performance. The downside with using this

storage mode is that it creates a completely offline data set. Once the cube

retrieved the data from the underlying relational database and processed it,

there is no further connection to the relational database. Any subsequent

updates in the database will not be reflected in the cube, unless it is

re-processed. Now, for most data warehouse solutions, a daily or even a weekly

re-process of the cubes is more then enough. For environments like these, it is

highly recommended to use the MOLAP storage mode.

2. Scheduled MOLAP
This is somewhat similar to the MOLAP

storage mode, in that it stores its data in a multidimensional format. However,

the content of the cube/partition cache will be automatically updated every 24

hours. In a way, using this storage mode already utilize the proactive caching

feature. It is important to note that theres no need to create a SQL Agent job

or an IS package to process the cube. The cubes cache will be refreshed

automatically every 24 hours.

3. Automatic MOLAP
Again, data is stored in a

multidimensional format. There is a basic change, though, when compared with the

previous 2 storage modes. Using an automatic MOLAP storage mode means that

Analysis Services needs to listen to notifications from the underlying

relational database. Once an update has occurred in the database, an event is

raised and sent to Analysis services. This type of messaging system can be

achieved by using Notification Services, for example. When the Analysis Services

Server has received the notification, it will automatically refresh the cubes

cache, to reflect the new changes in the database. Users will still be able to

access the cube while the cache is being refreshed, but they will be using the

old cache. Once the cache is refreshed, users will be redirected to the new

data. This type of storage provides the users with an almost up-to-date cube

(the target latency is 2 hours from update to process). This obviously depends

on the cube size and the magnitude of changes in the underlying database.



4. Medium Latency MOLAP
Things are starting to get

interesting as we approach the latency variance (which we touched briefly in

the previous section). Initially data is stored in a multidimensional format.

The Analysis Services server is listening to notifications from the underlying

relational database. When an update is performed in the database, the cube

switches to a Real Time ROLAP storage mode. We will touch on this storage mode

shortly, but I will briefly note that this storage mode means: no

multidimensional storage at all and the cubes data is being retrieved directly

from the underlying relational database. The timing of the switch to ROLAP is

determined by the latency variable. By default, cache will be updated after 4

hours from the time the relational database was updated. This switch should not

last for long though, as in parallel, the cubes MOLAP cache gets processed to

reflect the latest changes. Once the cache is processed, the cube switches back

to the MOLAP storage mode. You would want to use this type of storage when you

realize that the underlying database goes through updates occasionally, and

users do want to have an up-to-date cube. However, they still require

performance to be reasonable. Having the cube reflecting data which is accurate

up to the last 4 hours is reasonable for the users and provides the correct

trade-off between performance and real-time data. If the users agree to these

terms, then this is the preferred storage mode to use.

5. Low Latency MOLAP
This mode is similar in its behavior

to the Medium Latency MOLAP mode, the only change is (yes, you may have guessed

it by now) the latency. Instead of allowing a latency of 4 hours, this mode will

allow only up to 30 minutes of latency. This implies a more frequent switch to

the ROLAP mode and even poorer query performance, as the cube is being processed

more often. The users do get to see a more updated version of the data though.



6. Real Time HOLAP
HOLAP stands for hybrid OLAP. So far we

have discussed having the data and aggregations stored in a multidimensional

format. In the hybrid storage mode, data is maintained in its relational format

(i.e. data resides in the relational database), while aggregations are stored in

a multidimensional format. What this means is that the data is always

real-time data. The Analysis Services server still listens to notifications,

and when changes in the relational database occur, the aggregations (which are

still stored in a multidimensional format) are refreshed. During this process,

the cube is switched to the infamous ROLAP mode, until aggregation processing is

complete. It is easy to see that users are enjoying real-time data with the

added benefit of MOLAP aggregations to improve query performance. Still,

performance is not as good as it used to be when compared to data stored in a

multidimensional format, as well as the aggregations.

7. Real Time ROLAP
This is the last storage mode available

and we are actually quite familiar with it already. Data and aggregations are

stored in a relational format. This means zero latency for users, who will

always be able to access the cube and retrieve real-time data. Query performance

is the poorest here though as no MOLAP objects are available at all.

Just a bit more before finishing

You can apply the

various storage modes on cubes, partitions and dimensions. Some of the features

will not be available on some of the objects. When using proactive caching,

there are several settings which you can manually tune, like the latency, the

listening intervals and the notification types. You can also fine-tune the

processing options and instruct the SSAS engine whether to fully process a

partition or to incrementally process it.

Conclusion

The MOLAP, HOLAP and ROLAP storage modes were

available in the Analysis Services 2000 version. The important addition to the

2005 version is the proactive caching feature, which enables the user to have a

real-time version of the data while still enjoying the query performance

benefits of the MOLAP storage mode.

Rate

5 (5)

Share

Share

Rate

5 (5)