SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Stairway To SQL Server Columnstore Indexes Level 2: Columnstore Storage


Stairway To SQL Server Columnstore Indexes Level 2: Columnstore Storage

Author
Message
Hugo Kornelis
Hugo Kornelis
SSCoach
SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)

Group: General Forum Members
Points: 19191 Visits: 12426
Comments posted to this topic are about the item Stairway To SQL Server Columnstore Indexes Level 2: Columnstore Storage


Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
Iwas Bornready
Iwas Bornready
SSC-Insane
SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)SSC-Insane (22K reputation)

Group: General Forum Members
Points: 22816 Visits: 885
For someone like me who knows nothing about this, the article is incredible. Thank you so much.
mshparber
mshparber
Valued Member
Valued Member (51 reputation)Valued Member (51 reputation)Valued Member (51 reputation)Valued Member (51 reputation)Valued Member (51 reputation)Valued Member (51 reputation)Valued Member (51 reputation)Valued Member (51 reputation)

Group: General Forum Members
Points: 51 Visits: 14
Thanks for the detailed explanation!

I am still missing something...
It my user whant to see a crosstab report of NetPrice Amount by Month by Product - how the Columnstore index helps?
It still will have to match Date with Product with NetPrice at the ROW level - so it can aggregate that in March 2015 the revenues from Chairs were 20,000$

I probably missed something...
Thanks,

Michael
Hugo Kornelis
Hugo Kornelis
SSCoach
SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)

Group: General Forum Members
Points: 19191 Visits: 12426
mshparber (5/5/2015)
Thanks for the detailed explanation!

I am still missing something...
It my user whant to see a crosstab report of NetPrice Amount by Month by Product - how the Columnstore index helps?
It still will have to match Date with Product with NetPrice at the ROW level - so it can aggregate that in March 2015 the revenues from Chairs were 20,000$

I probably missed something...
Thanks,

Michael


Hi Michael,

If you have a columnstore index on a large table, then the optimizer will automatically create an execution plan that uses the columnstore index. If you are not getting the performance improvements you were hoping for, there may be several causes.

1. The columnstore index is not used. That is possible if the query is such that, according to the estimated costing done by the optimizer, it will actually take more time than using any of the other indexes. That can be true (there are actually queries that perform better with a traditional index), or it can be a mis-estimate.
2. The columnstore index is used, but the plan uses row mode execution instead of batch mode execution. That means that the query probably gains some performance from the columstore index, but far less than when batch mode would be used. This is especially common on SQL Server 2012, where batch mode was very limited and a lot of query constructions would cause SQL Server to fall back into row mode. I will cover this in detail in one of the later levels of the stairway.

If you can post a repro script (CREATE TABLE statements, a few INSERT statements with sample data, CREATE INDEX statements, and the actual query that does not perform as expected), I can take a quick look and see if I can give more details. Please test your repro script on an empty test database with case sensitive collation and US-English language settings to ensure that I can simply copy, paste and execute it on my end. Also please understand that I can take a quick look, but cannot afford to take many hours out of my consulting work to assist individual readers of this article - so if you are going to post a 10,000-line monster query, I can only respond by sending you my rates.


Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
t.franz
t.franz
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1045 Visits: 266
any idea, if (and if yes then how) small datatypes as TINYINT (= 1 byte) could be compressed in columnstored indexes?

Its clear, that I will save space when I replace "this is a long string" with a simple tinyint (at least, when it occurse more than once), but when the column data type is already as short as possible, it could only work when you save "value 1 (which is the dictonary entry for the original tinyint CountryID 20) will used in rows 1-10, 12-50 and 90-200", but this would it make much harder to recombine rows to answer an SELECT *
Hugo Kornelis
Hugo Kornelis
SSCoach
SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)

Group: General Forum Members
Points: 19191 Visits: 12426
t.franz - Monday, April 24, 2017 8:21 AM
any idea, if (and if yes then how) small datatypes as TINYINT (= 1 byte) could be compressed in columnstored indexes?

Its clear, that I will save space when I replace "this is a long string" with a simple tinyint (at least, when it occurse more than once), but when the column data type is already as short as possible, it could only work when you save "value 1 (which is the dictonary entry for the original tinyint CountryID 20) will used in rows 1-10, 12-50 and 90-200", but this would it make much harder to recombine rows to answer an SELECT *

Yes, tinyint data is also compressed. The average space gain is less than for other data types, but it can still be relevant.

The primary compression method for tinyint will be run-length encoding (RLE). So if the data in rows 1 to 20 reads 1 / 1 / 1 / 1 / 2 / 2 / 2 / 2 / 3 / 3 / 1 / 1 / 1 / 1 / 4 / 1 / 3 / 3 / 3 / 3, then this will be stored as 1 (4) / 2 (4) / 3 (2) / 1 (4) / 4 (1) / 1 (1) / 3 (4). The algorithm that builds the columnstore index includes a step (after dividing the table into rowgroups) where the rows within a rowgroup are sorted to optimize the benefit of RLE. The "optimal" sort order for this column would after RLE result in 1 (9) / 2 (4) / 3 (6) / 4 (1). However, the sort algorithm that is used for columnstore indexes has to look at ALL columns, not just a single one.

A secondary compression method for tinyint tries to reduce the size of each value to less than 1 full byte. Let's for example say that, within a specific rowgroup, all values in your tinyint columns are NULL or between 23 and 48. The algorithm will replace NULLs with a value not in the normal range (probably 22), and then subtract 22 from each value changing the range to 0 (for NULL) and 1 - 26 (for 23-48); this can then be encoded in 5 bits instead of 8 for the value + 1 for the NULL marker. The "magic value" to represent NULL, and the offset of 22, are stored only once, in the rowgroup metadata. (You can find it in the DMV sys.column_store_segments. This DMV is introduced in level 4, though I have not shown all columns; the null value (null_value) and the offset (base_id) are two of the columns I chose not to include).



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
t.franz
t.franz
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1045 Visits: 266
Thank you very much for the quick answer. I always wondered, how exactly SQL Server does this stuff and your article (plus the answer) makes it a lot more clear for me.
INNOCENT GUMBO
INNOCENT GUMBO
SSC-Enthusiastic
SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)SSC-Enthusiastic (146 reputation)

Group: General Forum Members
Points: 146 Visits: 469
Hi Hugo;

I am sorry to get into this late. SQL Server 2016 became smarter with regards to Segment reading. Have a look at this below. I know you have already observed this but SQL Server 2014 and 2012 do not specify Segments read and those skipped but only dwell on Logical reads for you to determine if there are skipped Segments:

(1 row(s) affected)
Table 'FactOnlineSales'. Scan count 8, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 8468, lob physical reads 1, lob read-ahead reads 0.
Table 'FactOnlineSales'. Segment reads 9, segment skipped 6.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
Table 'FactOnlineSales'. Scan count 8, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 4282, lob physical reads 0, lob read-ahead reads 0.
Table 'FactOnlineSales'. Segment reads 5, segment skipped 10.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Hugo Kornelis
Hugo Kornelis
SSCoach
SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)

Group: General Forum Members
Points: 19191 Visits: 12426
INNOCENT GUMBO - Saturday, May 6, 2017 11:18 AM
Hi Hugo;

I am sorry to get into this late. SQL Server 2016 became smarter with regards to Segment reading. Have a look at this below. I know you have already observed this but SQL Server 2014 and 2012 do not specify Segments read and those skipped but only dwell on Logical reads for you to determine if there are skipped Segments:

(1 row(s) affected)
Table 'FactOnlineSales'. Scan count 8, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 8468, lob physical reads 1, lob read-ahead reads 0.
Table 'FactOnlineSales'. Segment reads 9, segment skipped 6.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
Table 'FactOnlineSales'. Scan count 8, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 4282, lob physical reads 0, lob read-ahead reads 0.
Table 'FactOnlineSales'. Segment reads 5, segment skipped 10.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Yes, that is a 100% correct observation. Version over version things improved with the columnstore. This series (with the exception of the last level, which is now in final edit stage) focuses on 2012 and 2014 only, because that was the scope when we started (we added the last level later).
Being able to see rowgroups read and skipped from the STATISTICS IO was a huge step forward. (Although I am not happy that they used the term "segments" to show the number of rowgroups)



Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
t.franz
t.franz
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1045 Visits: 266
Another question.
If I understood it correct, it has to read all segments / pages for the city column, when I execute a
SELECT COUNT(*) from customers WHERE city = 'New York' 
.
With a nonclustered index on the city column it would just start after New Windsor, read the few tousand entries of New York and stop at Newark.
But since - following to my understanding - the entries in the columnstore index are not ordered and segment elemination would not work (I guess in each rowgroup of > 1 mio people will be a few persons who lives in a town < New York and in a town > New York), it has to read all 500 mio city entries (instead of the only max. 8 mio real inhabitants in a nonclustered index).

Is the following assumation true? When I'm using Partitioning - each partition has its own global dictionary, since I can switch the partition in and out as I wish (and when I create a CCI on the staging table the SQL Server has no idea into which table I'll be switching it).
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search