Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

James Serra's Blog

James is currently a Senior Business Intelligence Architect/Developer and has over 20 years of IT experience. James started his career as a software developer, then became a DBA 12 years ago, and for the last five years he has been working extensively with Business Intelligence using the SQL Server BI stack (SSAS, SSRS, and SSIS). James has been at times a permanent employee, consultant, contractor, and owner of his own business. All these experiences along with continuous learning has helped James to develop many successful data warehouse and BI projects. James has earned the MCITP Business Developer 2008, MCITP Database Administrator 2008, and MCITP Database Developer 2008, and has a Bachelor of Science degree in Computer Engineering. His blog is at .

SQL Server “Denali”: Project Apollo

Project Apollo is a new feature in CTP3 of SQL Server Denali that enables a new columnstore index that offers 10-100x performance improvements for a star join or similar query.  Apollo brings together the in-memory columnstore technology (VertiPaq) that is used in PowerPivot and a new query execution paradigm called batch processing to provide impressive speed improvements for common data warehouse queries.  Microsoft has claimed in test scenarios customers have experienced approximately 100x improvements in star join and similar queries.

VertiPaq makes columnstore indexes more efficient because it uses a different way of storing columns than traditional indexes, and it effectively compresses the data in the index.  In a regular index, all indexed data from each row is kept together on a single page, and data in each column is spread across all pages in an index.  In a columnstore index, the data from each column is kept together so each data page contains data only from a single column.  In addition, the index data for each column is compressed, and since many columns often contain highly repetitive values, the compression ratio can be very high.  This architecture reduces the number of pages in the index and, if you are selecting only a few columns, it also reduces the number of pages that need scanning (and therefore it is more likely that SQL Server will be able to keep them in memory).

Building a columnstore index is easy.  You use the same index creation syntax and just specify the keyword COLUMNSTORE.  But note that once you add a columnstore on a table, the table itself becomes read-only, so  inserts, updates or deletes are not allowed.  If you need to insert new rows or update existing ones, you can disable the index, do data modifications and rebuild the columnstore index.  Because of this limitation, this feature for now is more suitable for data warehouse tables that contain static data, where it’s acceptable for the data to be refreshed during only scheduled intervals.  You can, however, use partitioning to avoid having to rebuild the index. For example, you can create a daily, weekly or monthly partition, load the data into a new table, build all indexes, and then switch in the table into the partitioned table.  You can also create a view that uses UNION ALL to combine a table with a columnstore index and an updatable table without a columnstore index into one logical table. This view can then be referenced by queries. This allows dynamic insertion of new data into a single logical fact table while still retaining much of the performance benefit of columnstore capability.

VertiPaq is also integrated into Analysis Services and enables a new tabular mode that provides in-memory based analytics on ‘billions’ of rows of data at lightning fast speeds.  There will be reduced development costs and ETL times since columnstore indexes limit or eliminate the need to rely on pre-built aggregates, including user-defined summary tables, and indexed (materialized) views.  Furthermore, columnstore indexes can greatly improve ROLAP performance, making ROLAP more attractive.

Note that OLTP-style queries, including point lookups, and fetches of every column of a wide row, will usually not perform as well with a columnstore index as with a B-tree index.  Columnstore indexes don’t always improve data warehouse query performance.  When they don’t the query optimizer will choose to use a heap or B-tree to access the data.

Microsoft states a factor of 4 to a factor of 15 compression with different fact tables containing real user data.  The columnstore index is a secondary index; the row store is still present, though during query processing it is often not needed, and ends up being paged out.  A clustered columnstore index, which will be the master copy of the data, is planned for the future.  This will give significant space savings in addition to the performance gains already provided.

Index build times for a columnstore index have been observed to be 2 to 3 times longer than the time to build a clustered B-tree index on the same data, on a pre-release build.  So you will need to accommodate this time difference in their ETL processes.  However, since you typically will no longer need summary aggregates, which can take a lot of time to build, so in fact, ETL time may decrease.

The bottom line is because of the excellent performance of columnstore indexes, they will allow your users to get much more business value from their data by encouraging them to interactively explore it, and will reduce your burden and shorten ETL time by decreasing reliance on having to create summary tables or OLAP aggregates.

More info:

Project Apollo casts light on column-store indexes in SQL Server Denali

Columnstore Indexes for fast DW

SQL Server Data Mining and Apollo Columnstore Indexes

SQL Server Columnstore Index FAQ

Columnstore Indexes

Query Optimization with Denali Columnstore Indexes

Comments

No comments.

Leave a Comment

Please register or log in to leave a comment.