Columnstore efficiencies

  • Comments posted to this topic are about the item Columnstore efficiencies

  • Hi Steve,

    As the author of the article referenced in the explanation, I am very sure that the answer I ticked is correct, and that the answer you marked as correct is wrong.

    The correct answer is column elimination (even though I do not use this term in the article).

    Segment elimination is the process of excluding horizontal groups of rows based on minimum and maximum values in that group that are stored in the metadata (e.g. all one million rows in the third rowgroup have a price of 500 dollar or less; if I query with a filter WHERE Price > 600, segment elimination will cause SQL Server to not process that entire rowgroup).

    Rowgroup elimination is not an officiaal term, allthough it actually better describes what happens when segment elimination kicks in.

    Partition elimination is similar to segment elimination, but not restricted to columnstore indexes, limited to partitioned tables only, and it can only be used when a filter is on the column used for the partitioning. (Segment elimination is applied to all columns in the index).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hi Hugo,

    I think you got it right when you wrote the article. The term "column elimination" doesn't appear to exist, and "segment elimination" is used to cover elimination of whole columns as well as elimination of rowgroups. So Steve has it right in the QotD.

    An article (written by Sergio Govoni) which states that segment elimination covers both the elimination of entire columns because they are unreferenced (references in filter and grouping clauses count as well as references in the select clause) and the elimination of row-groups because metadata (min and max colum value) and the filters (ON or WHERE) specified in the query clause indicate they are unneeded is available at Updated Column Store Indexes in SQL Server 2014. Similarly, pages 25 and 26 of Pascal Rochard's and Donato Verardi's TechTalk - Bienvenue dans le monde de Microsoft SQL Server 2014 (in English despite the French title) makes it pretty clear that segment elimination is a two stage process - first eliminate columns and then eliminate rowgroupps from the remaining colum-set.

    As long as Microsoft and the majority of the tech community uses "segment elimination" the way it is currently used we can't really say the the correct name is something else (after all, we dont insist on calling "SQL Server" "T-SQL Server", do we?). Of course campaigning for a change in terminology is fair enough, but claiming that the change has already happened and teh QotD is therefor incorrect isn't.

    Somewhere I've seen the terms "column segment elimination" and "segment elimination by row" which suggests that some people are treating it all as segment elimination but specifying what collection of segments (row group or column) are eliminated by adding qualifying words when they want to refer to just one of the two phases. But most peole don't do that.

    edit: "column segment elimination" is in SQL Shep's blog (he works for Microsoft as an "SQL Customer Advisory Team Lab Manager", whatever that may be).

    Tom

  • TomThomson (5/19/2015)


    The term "column elimination" doesn't appear to exist

    It actually does (and I should have included it in my article).

    Some references:

    http://byobi.com/blog/2012/10/column%C2%AD-stores-for-wide-and-sparse-data/[/url]. (A blog I found when searching the internet for the term column elimination)

    download.microsoft.com/download/C/8/0/C8084B3F-07A7-4620-B583-09C7EDF00E8C/Techday_2014_Ernestas_Sysojevas.pptx - slide 39. (I don't know the speaker, but it's from an official Microsoft conference, and published on a Microsoft site)

    http://www.vconferenceonline.com/vconference/materials/slides/In-Memory%20Columnstore%20Overview.pdf - slide 18. (Same slide as the previous one, so someone borrowed a slide from someone. But this time the name on the "about me" slide is Sunil Agarwal, who works for Microsoft, on the SQL Server team. It won't get any more official then this.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • I note from Hugo's article that if a string (char, varchar, nchar, nvarchar) is part of the columnstore index, then filtering on that column does not result in segment elimination.

    I assume that all the numeric data types that are part of the columnstore index result in segment elimination when filtered on.

    Will the date and time data types result in segment elimination?

  • briankwartler (5/19/2015)


    I note from Hugo's article that if a string (char, varchar, nchar, nvarchar) is part of the columnstore index, then filtering on that column does not result in segment elimination.

    I assume that all the numeric data types that are part of the columnstore index result in segment elimination when filtered on.

    Will the date and time data types result in segment elimination?

    Yes.

    Sorry if that is unclear in my article.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Thanks, Hugo. I read the section of your article on segment elimination again and it actually does include a Saledate column which apparently is part of the columnstore index for your sample table, since you say that the metadata of the Saledate data for a particular segment shows a minimum value (2012-03-08) and a maximum value (2012-03-24).

    And you use the example of querying the table for sales made in April 2012, which causes this segment to be eliminated.

    I should have read more carefully.

  • Hugo Kornelis (5/19/2015)


    TomThomson (5/19/2015)


    The term "column elimination" doesn't appear to exist

    It actually does (and I should have included it in my article).

    Some references:

    http://byobi.com/blog/2012/10/column%C2%AD-stores-for-wide-and-sparse-data/[/url]. (A blog I found when searching the internet for the term column elimination)

    download.microsoft.com/download/C/8/0/C8084B3F-07A7-4620-B583-09C7EDF00E8C/Techday_2014_Ernestas_Sysojevas.pptx - slide 39. (I don't know the speaker, but it's from an official Microsoft conference, and published on a Microsoft site)

    http://www.vconferenceonline.com/vconference/materials/slides/In-Memory%20Columnstore%20Overview.pdf - slide 18. (Same slide as the previous one, so someone borrowed a slide from someone. But this time the name on the "about me" slide is Sunil Agarwal, who works for Microsoft, on the SQL Server team. It won't get any more official then this.

    I think we have a terminology issue, so I'll change the question and award back points. I don't find "Column elimination" on MS sources. I have found a few blogs that have segment elimination, and I've seen the various column portions of a table referred to in DW sources as segments for years. With all due respect to Sunil, who is brilliant, I suspect he has misappropriated a term for a talk, perhaps unintentionally. Or marketing likes "column elimination" better.

    In any case, it doesn't really matter, and there are references I've seen outside of MSDN that will say column segment elimination, or column elimination, which make sense. I'm not sure one is more write than another.

  • I learned something new today from the research. Thanks, Steve.

  • I had to research this as we are still on 2008 R2. But it looks like there is no solid simple answer here.

  • The question appeared tougher to me, but I got the full score. I had a bit idea of the sys.column_store_segments catalog view.

    Thanks.

  • Confusing. I use segment elimination to indicate that a segment of rows has been eliminated, not necessarily an entire column.

    But I guess that if you eliminate all segments of a column, you have eliminated the column 😀

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (5/21/2015)


    Confusing. I use segment elimination to indicate that a segment of rows has been eliminated, not necessarily an entire column.

    But I guess that if you eliminate all segments of a column, you have eliminated the column 😀

    I still maintain that the process of eliminating entire columns is called column elimination, and that segment elimination is only the term for eliminating rowgroups.

    But I am unfortunately not able to find better sources than the ones I already gave, and they were insufficient to convince Steve.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • You may be correct, Hugo. However I think this is a case of there not really being a term available generally and we have different people at MS making up and using what they think fits.

    I am happy to be proven wrong, and hopefully, will be with better documentation and consistency of the usage in the future.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply