SQL Server “Denali”: Project Apollo

Project Apollo is a new feature in CTP3 of SQL Server Denali that enables a new columnstore index that offers 10-100x performance improvements for a star join or similar query. Apollo brings together the in-memory columnstore technology (VertiPaq) that is used in PowerPivot and a new query execution paradigm called batch processing to provide impressive speed improvements for common data warehouse queries. Microsoft has claimed in test scenarios customers have experienced approximately 100x improvements in star join and similar queries.

VertiPaq makes columnstore indexes more efficient because it uses a different way of storing columns than traditional indexes, and it effectively compresses the data in the index. In a regular index, all indexed data from each row is kept together on a single page, and data in each column is spread across all pages in an index. In a columnstore index, the data from each column is kept together so each data page contains data only from a single column. In addition, the index data for each column is compressed, and since many columns often contain highly repetitive values, the compression ratio can be very high. This architecture reduces the number of pages in the index and, if you are selecting only a few columns, it also reduces the number of pages that need scanning (and therefore it is more likely that SQL Server will be able to keep them in memory).

Building a columnstore index is easy. You use the same index creation syntax and just specify the keyword COLUMNSTORE. But note that once you add a columnstore on a table, the table itself becomes read-only, so inserts, updates or deletes are not allowed. If you need to insert new rows or update existing ones, you can disable the index, do data modifications and rebuild the columnstore index. Because of this limitation, this feature for now is more suitable for data warehouse tables that contain static data, where it’s acceptable for the data to be refreshed during only scheduled intervals. You can, however, use partitioning to avoid having to rebuild the index. For example, you can create a daily, weekly or monthly partition, load the data into a new table, build all indexes, and then switch in the table into the partitioned table. You can also create a view that uses UNION ALL to combine a table with a columnstore index and an updatable table without a columnstore index into one logical table. This view can then be referenced by queries. This allows dynamic insertion of new data into a single logical fact table while still retaining much of the performance benefit of columnstore capability.

VertiPaq is also integrated into Analysis Services and enables a new tabular mode that provides in-memory based analytics on ‘billions’ of rows of data at lightning fast speeds. There will be reduced development costs and ETL times since columnstore indexes limit or eliminate the need to rely on pre-built aggregates, including user-defined summary tables, and indexed (materialized) views. Furthermore, columnstore indexes can greatly improve ROLAP performance, making ROLAP more attractive.

Note that OLTP-style queries, including point lookups, and fetches of every column of a wide row, will usually not perform as well with a columnstore index as with a B-tree index. Columnstore indexes don’t always improve data warehouse query performance. When they don’t the query optimizer will choose to use a heap or B-tree to access the data.

Microsoft states a factor of 4 to a factor of 15 compression with different fact tables containing real user data. The columnstore index is a secondary index; the row store is still present, though during query processing it is often not needed, and ends up being paged out. A clustered columnstore index, which will be the master copy of the data, is planned for the future. This will give significant space savings in addition to the performance gains already provided.

Index build times for a columnstore index have been observed to be 2 to 3 times longer than the time to build a clustered B-tree index on the same data, on a pre-release build. So you will need to accommodate this time difference in their ETL processes. However, since you typically will no longer need summary aggregates, which can take a lot of time to build, so in fact, ETL time may decrease.

The bottom line is because of the excellent performance of columnstore indexes, they will allow your users to get much more business value from their data by encouraging them to interactively explore it, and will reduce your burden and shorten ETL time by decreasing reliance on having to create summary tables or OLAP aggregates.

More info:

Project Apollo casts light on column-store indexes in SQL Server Denali

Columnstore Indexes for fast DW

SQL Server Data Mining and Apollo Columnstore Indexes

SQL Server Columnstore Index FAQ

Columnstore Indexes

Query Optimization with Denali Columnstore Indexes

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

SQL Server “Denali”: Project Apollo

Rate

Share

Share

Rate

SQL Server “Denali”: Project Apollo

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts