Exam Prep 70-463: Intro to Columnstore Indexes

Colleen M. Morrow, 2014-12-09 (first published: 2014-12-01)

This post is part of a series on this blog that will help me, and hopefully you, pass exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012. So far, we’ve covered:

SQL Server 2012 introduced a new way of storing nonclustered indexes, called the columnstore index, which has been shown to improve performance in a data warehouse environment many times over. How, you might ask? Well, the magic of the columnstore index lies in how the data is stored.

Rowstore structure

If you look at a traditional nonclustered index, you’ll see that the data is stored in a row-wise format. For example, let’s assume we have a Customers table with columns for CustomerNum, FirstName, LastName, and BirthDate. In a row-wise storage format, data pages contain rows of data, as illustrated below.

Rowstore storage structure

Let’s say I want to run the following query:

 SELECT LastName FROM dbo.Customers

Even though I only want the LastNames, SQL Server still has to read all pages in their entirely into memory to satisfy my query (assuming I don’t have an index on last name). That makes for a lot of wasted space in memory holding information that I don’t really need (i.e. first names, birth dates, etc.). It also means a lot of extra IO operations if the data isn’t already in memory.

Columnstore structure

With columnstore indexes, data is stored in a columnar format, meaning pages contain the data from a single column, as shown below.

Columnstore storage structure

By storing data for a single column on a page, SQL Server can now read in only the data really needed by a particular query. If my query only needs LastName, SQL Server only needs to retrieve the pages for LastName. This leads to fewer IO operations and less wasted memory space. SQL Server also uses a special compression algorithm on columnstore indexes, leading to even greater IO savings.

Batch Mode Processing

But wait, there’s more! In addition to highly compressed columnstore indexes, SQL Server 2012 also introduced batch mode processing. In the data warehouse environment, we’re typically scanning and processing large sets of data. Working with individual rows would be very inefficient, so SQL Server can now choose to process data in batches, rather than one row at a time.

Not all operators will run in batch mode. In SQL Server 2012, batch mode processing is available for filter, sequence project, scan, hash match (partial) aggregation, hash match (inner join), and hash table build operations. You can use the Actual and Estimated Execution Mode properties of the Actual Execution Plan to determine what mode SQL Server is using in a particular plan. Keep in mind that batch processing is really only used when joining, filtering or aggregating large data sets.

The cost of columnstore indexes

Unfortunately, there’s no such thing as a free lunch. SQL Server still needs to ultimately return rows of data. Therefore, it needs to take the columnstore data and reconstruct the rows that satisfy a query. This reconstruction takes time and CPU resources. This also makes nonclustered columnstore indexes read-only and their underlying tables also read-only, so you won’t be using this in your OLTP databases. (SQL Server 2014 introduced updateable clustered columnstore indexes, but we won’t go into those here.) If your table is partitioned, the columnstore index must be partition aligned.

There can also only be one columnstore index on a table. A common recommendation for the data warehouse is to create a columnstore index on fact tables, putting all of the columns of the fact table in the columnstore index. Larger dimension tables can also benefit from columnstore indexes. Just keep in mind that you don’t want to use this feature on tables that are updated frequently.

Summary

That’s a brief introduction on columnstore indexes and their benefits, as well as their inherent costs. Next time we’ll look at how to implement columnstore indexes in our data warehouse.

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

Exam Prep 70-463: Intro to Columnstore Indexes

Rowstore structure

Columnstore structure

Batch Mode Processing

The cost of columnstore indexes

Summary

Rate

Share

Share

Rate

Exam Prep 70-463: Intro to Columnstore Indexes

Rowstore structure

Columnstore structure

Batch Mode Processing

The cost of columnstore indexes

Summary

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts