SQLServerCentral Article

Column Store vs Row Store Indexes

,

This article is intended to compare the traditional row store indexes with the (not so new) columnstore indexes that have been introduced in SQL Server 2012 and further improved in SQL Server 2014.

A lot of information already exists on the internet with regards to the columnstore internals as detailed in this MSDN article. Therefore in this article, I will just focus on the performance improvements. A very important section of that document, which I would suggest to go through, is the Key Characteristics section. It highlights the can and can't of the columnstore indexes which are very important to understand when taking the decision to actually start using them.

Test Scenarios

I have created 5 tests and tried to keep my testing enviroment free from any other heavy load so as not to contaminate the results. I'll be basing my tests on two identical tables, which I called:

  • FactTransaction_ColumnStore - This table will contain only one index, a Clustered columnstore Index. Due to the Clustered columnstore restrictions, no other index can be created on this table.
  • FactTransaction_RowStore - This table will contain a Clustered Index, a non-clustered columnstore index and a row store index.

I have attached a script file with the SQL needed to create the tables and Indexes. 

I have then populated the tales with 30 million rows each (using sample data from my existing DWH). Through all the tests I specified the MAXDOP Query Hint all statements, so I'll be able to test all queries using the specified number of cores.

Test 1 - Populating both tables

To better match the testing environments, I populated both tables with only their Clustered index enabled. One table is made up of a clustered columnstore index while the other table is made up of only a clustered row index.

IO and Time Statistics

Table 'FactTransaction_ColumnStore'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactTransaction'. Scan count 1, logical reads 73462, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
  (30000000 row(s) affected)
SQL Server Execution Times:
  CPU time = 98204 ms,  elapsed time = 109927 ms.
Table ' FactTransaction_RowStore '. Scan count 0, logical reads 98566047, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactTransaction'. Scan count 1, logical reads 73462, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 (30000000 row(s) affected)
SQL Server Execution Times:
  CPU time = 111375 ms,  elapsed time = 129609 ms.

Observations from Test 1

Table Name Time to populate Logical Reads
FacTransaction_ColumnStore 1.49 mins ? Not sure why the Logical reads is 0!
FacTransaction_RowStore 2.09 mins 98566047

Test 2 - Comparing SEEK

Note that I have specified the FORCESEEK table hint on the RowStore table to force an index seek operation since this is the requirement for this test.

-- Comparing Seek.... 
SET Statistics IO,TIME ON
Select CustomerFK
From [dbo].FactTransaction_RowStore WITH(FORCESEEK)
Where transactionSK = 4000000
OPTION (MAXDOP 1)
Select CustomerFK
From [dbo].FactTransaction_ColumnStore  
Where transactionSK = 4000000
OPTION (MAXDOP 1)
SET Statistics IO,TIME OFF

IO and Time Statistics

Table 'FactTransaction_RowStore'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
Table 'FactTransaction_ColumnStore'. Scan count 1, logical reads 714, physical reads 0, read-ahead reads 2510, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 83 ms.

Execution Plans

Observations from Test 2

As can be noticed from the above screenshots, an index seek is much faster on the RowStore table than on the columnstore index table. This is mainly due to the fact that SQL Server does not currently support an Index seek on a clustered columnstore index. Looking at the execution plan, an Index scan was actually performed on the columnstore index, resulting in much more logical reads, thus the decrease in performance when compared to a RowStore Index seek.

Table Name Index Type Logical Reads  Elapsed Time
FacTransaction_ColumnStore Column 714 83 ms
FacTransaction_RowStore Row 3 0 ms

Test 3 - Comparing SCAN

Note that I have specified the FORCESCAN table hint on the RowStore table to force an index scan operation. On the ColumnStore table this hint was not actually needed because the optimzer will do an Index scan anyway.

-- Comparing Scan.... 
SET Statistics IO,TIME ON
Select CustomerFK
From [dbo].FactTransaction_RowStore WITH(FORCESCAN)
Where transactionSK = 4000000
OPTION (MAXDOP 1)
Select CustomerFK
From [dbo].FactTransaction_ColumnStore WITH(FORCESCAN)
Where transactionSK = 4000000
OPTION (MAXDOP 1)
SET Statistics IO,TIME OFF

IO and Time Statistics

Table 'FactTransaction_RowStore'. Scan count 1, logical reads 12704, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
  CPU time = 32 ms,  elapsed time = 22 ms.
Table 'FactTransaction_ColumnStore'. Scan count 1, logical reads 714, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
  CPU time = 0 ms,  elapsed time = 2 ms.

Execution Plan

Observations from Test 3

As can be noticed from the above screenshots, an index scan is faster when performed on a columnstore index than on a row store index. Both the logical reads and the elapsed time indicate that a columnstore index is the preferred method to use when scanning large tables, thus being the better to use on datawarehouse tables.

(In a normal day-to-day scenario, you would not want to scan a large table to retrieve records when you are expecting the result set to be just a couple of rows. An Index seek would obviously be the preferred way to go; however this scenario was created just to demostrate and compare Index scans between Column and Row store indexes.)

Table Name Index Type Logical Reads  Elapsed Time
FacTransaction_ColumnStore Column 714 2 ms
FacTransaction_RowStore Row 12704 22 ms

Test 4 - Compare Aggregation Queries

Test on RowStore Table using the Clustered Index based on TransactionSK.

SET Statistics IO,TIME ON
Select CustomerFK,BrandFK, Count(*)
From [dbo].[FactTransaction_RowStore] WITH(INDEX=RowStore_FactTransaction)
Group by CustomerFK,BrandFK
OPTION (MAXDOP 4)

Test on RowStore table using a RowStore Index on CustomerFK and BrandFK. (Covering Index).

Select CustomerFK,BrandFK, Count(*)
From [dbo].[FactTransaction_RowStore] WITH(INDEX=RowStore_CustomerFK_BrandFK)
Group by CustomerFK,BrandFK
OPTION (MAXDOP 4)

Test on RowStore table using a ColumnStore Index on CustomerFK and BrandFK.(Covering Index).

Select CustomerFK,BrandFK, Count(*) From [dbo].[FactTransaction_RowStore] WITH(INDEX=ColumnStore_CustomerFK_BrandFK) Group by CustomerFK,BrandFK OPTION (MAXDOP 4)

Test on the columnstore table using the Clustered Index.

Select CustomerFK,BrandFK, Count(*)
From [dbo].[FactTransaction_ColumnStore]
Group by CustomerFK,BrandFK
OPTION (MAXDOP 4)
SET Statistics IO,TIME OFF

IO and Time Statistics

Test on RowStore Table using the Clustered Index based on TransactionSK.

Table 'FactTransaction_RowStore'. Scan count 5, logical reads 45977, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
  Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
  CPU time = 9516 ms,  elapsed time = 2645 ms.

Test on RowStore table using a RowStore non clustered Index on CustomerFK and BrandFK. (Covering Index).

Table 'FactTransaction_RowStore'. Scan count 5, logical reads 71204, physical reads 0, read-ahead reads 2160, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
  CPU time = 5343 ms,  elapsed time = 1833 ms.

Test on RowStore table using a ColumnStore non clustered Index on CustomerFK and BrandFK. (Covering Index).

Table 'FactTransaction_RowStore'. Scan count 4, logical reads 785, physical reads 7, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
  CPU time = 141 ms,  elapsed time = 63 ms.

Test on the columnstore table using the Clustered Index.

Table 'FactTransaction_ColumnStore'. Scan count 4, logical reads 723, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
  CPU time = 203 ms,  elapsed time = 118 ms.

Execution Plans

Observations from Test 4

Here is where the columnstore index really starts to shine! The performance on both columnstore indexes is much better when compared to the traditional row indexes both in terms of logical reads and elapsed time.

Table Name Index used Index Type Logical Reads  Elapsed Time
FacTransaction_ColumnStore ClusteredColumnStore Column 717 118
FacTransaction_RowStore RowStore_FactTransaction Row 45957 2645
FacTransaction_RowStore RowStore_CustomerFK_BrandFK Row 71220 1833
FacTransaction_RowStore ColumnStore_CustomerFK_BrandFK Column 782 63

Test 5 - Compare Updates (Subset of data)

In this test I'll be updating slightly less than 1 million rows i.e. 1/30th of the data in the table.

SET Statistics IO,TIME ON
Update [dbo].[FactTransaction_ColumnStore]
Set    TransactionAmount = 100
Where  CustomerFK = 112
OPTION (MAXDOP 1)
Update [dbo].[FactTransaction_RowStore]
Set    TransactionAmount = 100
Where  CustomerFK = 112
OPTION (MAXDOP 1)
SET Statistics IO,TIME OFF

IO and Time Statistics

Table 'FactTransaction_ColumnStore'. Scan count 2, logical reads 2020, physical reads 0, read-ahead reads 2598, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(913712 row(s) affected)
SQL Server Execution Times:
  CPU time = 27688 ms,  elapsed time = 37638 ms.
Table 'FactTransaction_RowStore'. Scan count 1, logical reads 2800296, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(913712 row(s) affected)
SQL Server Execution Times:
  CPU time = 6812 ms,  elapsed time = 6819 ms.

Execution Plans

Observations from Test 5

In this case, it was much more faster to update the RowStore clustered index table then the ColumnStore clustered index table. 

Table Name Index Type Logical Reads Elapsed Time
FactTransaction_ColumnStore columnstore 2020 37638 ms
FactTransaction_RowStore Row Store 2800296 6819 ms

Note that the logical reads for the Row Store is higher than the columnstore albeit the columnstore index took longer to update. This is due to the fact that the columnstore index can achieve a higher compression ratio, thus a smaller memory footprint.

Conclusions

The columnstore indexes (both clustered & non clustered) do offer a number of advantages over the row store indexes. However, in my opinion the datawarehouse on which they will be implemented needs to be 'prepared' to support them. One such instance is that non clustered indexes cannot be updated and they have to be disabled for the underlying table to get updated. If the table is huge and there are no partitions, this may become a problem because the whole table index would need to be rebuilt every time, which may be prohibitive if the table is huge! Therefore, in this case a good partition strategy must already be in place to support such indexes.

A very good candidate for these types of indexes is obviously large fact tables on which aggregation will be performed. I also think that columnstore indexes are also a good candidate to be used on Fast Track Data Warehouse Servers since this setup usually involves a lot of Scans. Also, if an SSAS cube is in place, from the above test, it is evident that reading large amounts of data from a columnstore index is faster than reading from its counter part. Hence this can help in improving the cube processing time.

Resources

Rate

4.8 (15)

You rated this post out of 5. Change rating

Share

Share

Rate

4.8 (15)

You rated this post out of 5. Change rating