Stairway to Columnstore Indexes Level 9: Batch Mode Execution

  • Hugo Kornelis

    SSC Guru

    Points: 64645

    Comments posted to this topic are about the item Stairway to Columnstore Indexes Level 9: Batch Mode Execution


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Dmitry Artemov

    SSC Journeyman

    Points: 93

    "which grabs a row from the index" Probably it is just a metaphor but in reality all you can grab from a CCI is a segment which can be up to 1 million row.

  • Hugo Kornelis

    SSC Guru

    Points: 64645

    Dmitry Artemov (9/2/2016)


    "which grabs a row from the index" Probably it is just a metaphor but in reality all you can grab from a CCI is a segment which can be up to 1 million row.

    Hi Dmitry,

    Thanks for your attention to detail!

    You are right, the explanation is simplified. What actually happens in the columnstore index scan operator is that it grabs a segment for each column it has to read, then takes the corresponding values from each segments and combines them into a row in the internal structure (which is similar to the structure of a row in a row-store index) that operators use to exchange data. When then called next, it still has the segments in memory so it does not hit the disk but instead grabs the next values from each segment to assemble into a row; and this repeats until the entire segment is processed. After that, on the next call, a new set of segments is read in.

    When running in batch mode, then the basic idea is the same: the operator when first called will read a segment for each column that is requested. But now it will not take a sinigle value from each segment, but a set of values that are assembled in the internal structure that is somewhat similar to the columnstore index itself. This entire batch is then passed to the next operator. On the next call, a whole new batch of values is sent over; and when all values from the segments have been processed this way it will read the next segment for each column, and then continiue the process from there.

    One detail that I do not know is whether the operator always reads the segments completely when it starts processing them. Each segment is stored as LOB data, which means that it occupies multiple pages. It is possible for this process to just read the first page (or first few pages) and then fetch the next when needed. It is also possible to read the segment completely. Both have their benefits and drawbacks. Howeverr, since this is far beyond the level I want to cover, I have not done any research to find out which of the possible implementations has actually been used.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Dmitry Artemov

    SSC Journeyman

    Points: 93

    Hi Hugo

    Thank you for the reply and detailed explanations.

    Please keep publishing stairways if time permits. 🙂

    Dmitry

  • choiclark

    Valued Member

    Points: 73

    Hi Hugo,

    Thanks for the article.

    I went ahead and ran the code in Listing 9-3 on SQL Server 2012 instance, and checked the execution plan with and without the optimizer, but did not see anything like Figure 9-6.

    Figure 9-6 seems completely off because it shows Physical operation : Hash Match, yet Logical operation: Inner Join. I don't see such operator in the execution plan of any of the code in the List 9-3. Any "Hash Match" I have seen so far has the logical operation of "Aggregation", not "Inner Join", so I am confused here.

    Thanks,

  • Hugo Kornelis

    SSC Guru

    Points: 64645

    choiclark (12/9/2016)


    Hi Hugo,

    Thanks for the article.

    I went ahead and ran the code in Listing 9-3 on SQL Server 2012 instance, and checked the execution plan with and without the optimizer, but did not see anything like Figure 9-6.

    Figure 9-6 seems completely off because it shows Physical operation : Hash Match, yet Logical operation: Inner Join. I don't see such operator in the execution plan of any of the code in the List 9-3. Any "Hash Match" I have seen so far has the logical operation of "Aggregation", not "Inner Join", so I am confused here.

    Thanks,

    Hi Choiclark,

    Thanks, again, for your kind words! I am glad that you like this stairway series.

    There ios often more that I want to include in a level than I have room for, so I sometimes have to cut my descriptions a bit short. The part where I show the batch mode bailout due to a hash spill is one part where clarity has suffered because of this. I probably should have cut the listing into three parts, and I should have included a screenshot of the full execution plan I got, with an arrow to indicate where you can see the bailout happening.

    So for your understanding, the code in listing 9-3 consists of three parts.

    The first part runs up to and including the line that reads "GO 10", and this is setup only. Run it once, wait until it finishes, then your system is ready for the actual repro.

    The second part starts at the comment line "-- Force the bailout (SQL Server 2012 only)", and ends at "OPTION (OPTIMIZE FOR (@MinManufacturer = 'Y'));" (or at the "GO" just after that). This is the part that will show the bailout. You should run this *after* the setup part is done; and with the "Include Actual Execution Plan" option enabled. You need to run the code with an actual plan; an estimated plan will never show the bailout. You can repeat this part multiple times if you wish, until you do the cleanup.

    The third and last part is just the "DROP TABLE dbo.TmpProduct;" line, which removes the temporary table created in the setup part.

    On my system, running the second part (after the setup has completed) takes a few minutes, and produces the actual execution plan I attached to this post. There are three operators where you can see the bailout: the Columnstore Index Scan and the two Hash Match operators all show actual execution mode row vs estimated batch (marked with arrows in the attachment). The picture in figure 9-6 is from one of the two Hash Match operators (the one with the circle in the attached picture).

    If you still do not see this on your system, then can you please in the actual execution plan right-click somewhere in the white space, select "Save execution plan as", save it somewhere with the default .sqlplan extension, then reply to this post and attach the saved execution plan to that post? That will allow me to open up the plan on my system, look into all the properties and see if I can find out why you got a different execution plan from what I have.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • This was removed by the editor as SPAM

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply