Blog Post

SQL Server “Denali”: Project Apollo

,

Project Apollo is a new feature in CTP3 of SQL Server Denali that enables a new columnstore index that offers 10-100x performance improvements for a star join or similar query.  Apollo brings together the in-memory columnstore technology (VertiPaq) that is used in PowerPivot and a new query execution paradigm called batch processing to provide impressive speed improvements for common data warehouse queries.  Microsoft has claimed in test scenarios customers have experienced approximately 100x improvements in star join and similar queries.

VertiPaq makes columnstore indexes more efficient because it uses a different way of storing columns than traditional indexes, and it effectively compresses the data in the index.  In a regular index, all indexed data from each row is kept together on a single page, and data in each column is spread across all pages in an index.  In a columnstore index, the data from each column is kept together so each data page contains data only from a single column.  In addition, the index data for each column is compressed, and since many columns often contain highly repetitive values, the compression ratio can be very high.  This architecture reduces the number of pages in the index and, if you are selecting only a few columns, it also reduces the number of pages that need scanning (and therefore it is more likely that SQL Server will be able to keep them in memory).

Building a columnstore index is easy.  You use the same index creation syntax and just specify the keyword COLUMNSTORE.  But note that once you add a columnstore on a table, the table itself becomes read-only, so  inserts, updates or deletes are not allowed.  If you need to insert new rows or update existing ones, you can disable the index, do data modifications and rebuild the columnstore index.  Because of this limitation, this feature for now is more suitable for data warehouse tables that contain static data, where it’s acceptable for the data to be refreshed during only scheduled intervals.  You can, however, use partitioning to avoid having to rebuild the index. For example, you can create a daily, weekly or monthly partition, load the data into a new table, build all indexes, and then switch in the table into the partitioned table.  You can also create a view that uses UNION ALL to combine a table with a columnstore index and an updatable table without a columnstore index into one logical table. This view can then be referenced by queries. This allows dynamic insertion of new data into a single logical fact table while still retaining much of the performance benefit of columnstore capability.

VertiPaq is also integrated into Analysis Services and enables a new tabular mode that provides in-memory based analytics on ‘billions’ of rows of data at lightning fast speeds.  There will be reduced development costs and ETL times since columnstore indexes limit or eliminate the need to rely on pre-built aggregates, including user-defined summary tables, and indexed (materialized) views.  Furthermore, columnstore indexes can greatly improve ROLAP performance, making ROLAP more attractive.

Note that OLTP-style queries, including point lookups, and fetches of every column of a wide row, will usually not perform as well with a columnstore index as with a B-tree index.  Columnstore indexes don’t always improve data warehouse query performance.  When they don’t the query optimizer will choose to use a heap or B-tree to access the data.

Microsoft states a factor of 4 to a factor of 15 compression with different fact tables containing real user data.  The columnstore index is a secondary index; the row store is still present, though during query processing it is often not needed, and ends up being paged out.  A clustered columnstore index, which will be the master copy of the data, is planned for the future.  This will give significant space savings in addition to the performance gains already provided.

Index build times for a columnstore index have been observed to be 2 to 3 times longer than the time to build a clustered B-tree index on the same data, on a pre-release build.  So you will need to accommodate this time difference in their ETL processes.  However, since you typically will no longer need summary aggregates, which can take a lot of time to build, so in fact, ETL time may decrease.

The bottom line is because of the excellent performance of columnstore indexes, they will allow your users to get much more business value from their data by encouraging them to interactively explore it, and will reduce your burden and shorten ETL time by decreasing reliance on having to create summary tables or OLAP aggregates.

More info:

Project Apollo casts light on column-store indexes in SQL Server Denali

Columnstore Indexes for fast DW

SQL Server Data Mining and Apollo Columnstore Indexes

SQL Server Columnstore Index FAQ

Columnstore Indexes

Query Optimization with Denali Columnstore Indexes

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating