The Illusion of Updateable Clustered ColumnStore Indexes

Klaus Aschenbrenner, 2014-03-31 (first published: 2014-03-25)

(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan – you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)

In todays blog posting I want to concentrate in more details on the Updateable Clustered ColumnStore Index that is introduced with SQL Server 2014. Before we go down to the details, I want to give you a brief overview about their first appearance in SQL Server 2012, and their limitations.

ColumnStore Indexes in SQL Server 2012

The introduction of ColumnStore Indexes in SQL Server 2012 was one of the hottest new features (besides AlwaysOn). If chosen right, they can make an impressive performance improvement for Data Warehousing workloads. Unfortunately, they had 2 big limitations:

There was only the support for Non-Clustered ColumnStore Indexes
As soon as you had created a ColumnStore Index on a table, the underlying table was read only, and no changes to the data were allowed anymore

Both limitations were a huge bummer for some customers. Imagine you had a table with 300 GB of traditional row store data. With a ColumnStore Index it’s possible to compress that data down to a size of 30 GB. But SQL Server 2012 only allowed a Non-Clustered ColumnStore Index, means you had to store your data twice: one in the traditional row store format, and once in the new ColumnStore format. That’s a huge waste of storage, because your queries will (hopefully) only use your Non-Clustered ColumnStore Index.

And as soon as you had created your Non-Clustered ColumnStore Index, you were not allowed to change the underlying table data anymore – your table was just read only! Of course, there were some workarounds for this problem, like Partition Switching, but you still needed to implement that on your own…

ColumnStore Indexes in SQL Server 2014

Things are changing now with SQL Server 2014, because Microsoft has resolved the above mentioned issues – with some magic and illusion: SQL Server 2014 provides you an Updateable Clustered ColumnStore Index! Let’s have a more detailed look on how this magic and illusion happens internally in SQL Server.

The first most important fact is that an underlying direct update of a ColumnStore Index is not possible! It would be too time consuming to do the complete decompress and compress on the fly during your INSERT, UPDATE, and DELETE transactions. Therefore SQL Server 2014 uses help from some magic: Delta Stores and Delete Bitmaps. Let’s have a more detailed look on both concepts.

Every time when you run an INSERT statement, the new record isn’t directly inserted into the ColumnStore Index – the record is inserted into a Delta Store. The Delta Store itself is nothing else than a traditional B-Tree structure with all its pro’s and con’s. When you afterwards read from the ColumnStore Index, SQL Server returns you the data from the compressed ColumnStore Index AND also from the Delta Store.

When you run an DELETE statement, again nothing happens in the compressed ColumnStore Index. The only thing that happens is that the record is deleted logically through a Delete Bitmap. Every record in the ColumnStore Index has a corresponding bit in that Delete Bitmap. When you again read your ColumnStore Index, SQL Server just discards the rows that are marked as deleted in the Delete Bitmap.

And running an UPDATE statement just means inserting the new version into the Delta Store, and marking the old version as deleted in the Delete Bitmap. Easy, isn’t it? The following picture (source http://research.microsoft.com/apps/pubs/default.aspx?id=193599) shows this concept in more details.

ColumnStore

Because of the Delta Stores and the Delete Bitmap it seems that your ColumnStore Index is updateable, but in reality it is just immutable. There is also a background process called the Tuple Mover, which runs regularly and finally pushes your changes asynchronously into the compressed ColumnStore Index.

In addition you can also finally define a ColumnStore Index in SQL Server 2014 as Clustered. This means that you don’t need to have your data in the traditional row store format anymore. Just create your table and create a Clustered ColumnStore Index on it. You can some with this approach huge space savings in your storage, because everything is now compressed. When you work in more details with the concepts of relational databases, Clustered always means Sorted. But with a Clustered ColumnStore Index this statement is not true: when you create a Clustered ColumnStore Index, there is NO sorting order in your data! Just be aware of this tiny little fact

Summary

Updateable Clustered ColumnStore Indexes are a nice magical illusion of SQL Server 2014. Don’t get me wrong: I really like the possibilities introduced with this new feature, but you have to understand how the feature is implemented internally, if you want to make the best use of it.

Thanks for reading

-Klaus

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

The Illusion of Updateable Clustered ColumnStore Indexes

ColumnStore Indexes in SQL Server 2012

ColumnStore Indexes in SQL Server 2014

Summary

Rate

Share

Share

Rate

The Illusion of Updateable Clustered ColumnStore Indexes

ColumnStore Indexes in SQL Server 2012

ColumnStore Indexes in SQL Server 2014

Summary

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts