Clustered Columnstore Index - Multiple Trimmed Row Groups with reason of "RESIDUAL_ROW_GROUP" within a Single Partition

  • Hi,

    I'm currently troubleshooting a query performance issue against a partitioned fact tables that has a clustered columnstore index. I'm on SQL Server 2016, V13.0.4422.0.  During my investigation, I noticed that the fact table has a lot more segments/row groups than I would've expected.  Within each partition, I am seeing multiple trimmed row-groups after a fresh columnstore build. After looking at the dm_db_column_store_row_group_physical_stats dmv, I see that the trim reason is always "RESIDUAL_ROW_GROUP". My understanding is that this trim reason means that "not enough rows to reach the maximum number of rows, this state is typically reached for the last Row Groups in the Columnstore Tables". So, my question is: hHow are there multiple trimmed row groups within the same partition, all with that reason? I would only expect a max of 1 per partition. As an example, this is what 1 of my partitions looks like:

    - partition_number 22, row_group_id 0, total_rows 1048576
    - partition_number 22, row_group_id 1, total_rows 1048576
    - partition_number 22, row_group_id 2, total_rows 1024, trim_reason_desc RESIDUAL_ROW_GROUP
    - partition_number 22, row_group_id 3, total_rows 148143, trim_reason_desc RESIDUAL_ROW_GROUP
    - partition_number 22, row_group_id 4, total_rows 1048576
    - partition_number 22, row_group_id 5, total_rows 1024, trim_reason_desc RESIDUAL_ROW_GROUP

    Thanks for any help/insight you can provide.

  • The query/sp you are having performance issues with, is it trying to select specific rows? Is there over 15-20% fragmentation on the CS index when these performance issues occur?

    Niko has some good scripts for fragmentation. The one thing I notice is when a CS is fragmented even 10%, we get significantly worse performance than a b-tree index. We have nightly maintenance jobs which run and rebuild or reorg the column store index depending on the fragmentation level.
    http://www.nikoport.com/2014/07/29/clustered-columnstore-indexes-part-36-maintenance-solutions-for-columnstore/

    If the query in question can be pregrouped via an index view that will help significantly with performance. Keep in mind though insert/updates/deletes will take a hit. Also by converting to index view, you will create a b-tree index. In cases where you only need specific rows, this will be very fast and can seek to the index.

    http://www.nikoport.com/2016/10/10/columnstore-indexes-part-87-indexed-views/

    I am thinking one of Niko's articles will help you.

  • Hi Brad, 
    No, the query isn't trying to select specific rows; it's aggregating data with some GROUP BY logic - it's a fact table in a data warehouse with 35M rows, and the columnstore works quite nicely for our purposes.   A little more background on this specific scenario: we currently have the fact table partitioned on column A, and we're thinking of changing the partition key to a different column to accommodate a future business requirement.  I've been testing the 2 partition schemes in an isolated development environment to see how query performance is impacted.  For these tests, there is no fragmentation or anything like that - the CS index is built fresh, so no delta store and no deletes.

    One of the test results was puzzling to me - I thought I would see a performance increase with our new partition key (call this P2) due to partition elimination, but it actually performs worse than the original partition setup (call this P1).  I do see partition elimination, so that is behaving as expected.  Also, the Query plans are very similar.  However, I noticed that the LOB reads are much higher with P2 than they were with P1.  It turns out that P2 has many more segments being read than P1 did.  I looked into why that was, and noticed that P2 has a lot of segments that are trimmed in each partition.  And I'm not sure why.  I think this is the cause for the increased query time, but not sure.

    Hope that helps to clarify.  Yes, I reference Niko's blog quite a bit as I work through CS issues - his work is invaluable!

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply