SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

Clustered Columnstore Indexes – part 36 (“Maintenance Solutions for Columnstore”)

Continuation from the previous 35 parts, starting from http://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/

At the 12th Edition of SQLBits around 1 week ago I have had a short but interesting conversation about Columnstore Indexes, its sizes and its importance for rebuild processes with Ola Hallengren.

I have asked Ola to include some specific functionality into his Maintenance Solution regarding Columnstore Indexes, since there is absolutely no reason to do the same type of maintenance for them as it is being done for the RowStore.

Consider the 2 principal functions for table maintenance in SQL Server, Rebuild & Reorganize:

Reorganize

– for RowStore we are basically shifting positions of different data pages (8Kb) in order to lower a little bit defragmentation (lack of the sorted order) caused by the inserts & updates.

– for Columnstore we are invoking Tuple Mover which in this case has 2 known functionalities:
1) the default invocation causes it to look for closed Delta-Stores and compress them turning into Segments;
2) invocation with a hint (COMPRESS_ALL_ROW_GROUPS = ON) makes Tuple Mover additionally look for any open Delta-Stores, close them immediately and then compress them into Segments.

Rebuild

– for RowStore we use it to get the nice ordering for our Indexes, to reclaim some unused space, etc. When rebuilding, we are getting all of our data sorted for our B-trees and compressed, if we are using one of the available compression algorithms (ROW,PAGE)

– for Columnstore we use this function in order to reshuffle all the data between existing Row Groups (trying to get the best compression possible), and to then the compression algorithm does its best guesses for getting the best compression. The ordering of data is not guaranteed nor intended.
We are also reclaiming some space, should we have any rows deleted or updated (Marked in our Deleted Bitmap).

When to Rebuild and when to Reorganize Columnstore

The big question is how to decide when it is the right time to execute the maintenance and rebuild the Columnstore.
What kind of metrics should be used in order to advance with this decision ?
In my opinion the percentage of the Deleted Rows regarding to the total Number of Rows in the Columnstore might be a relevant metrics to consider.

There are a number of details here, like for example if we have 2 Million Rows spread over 2 Segments and at the same time 1 Million Rows marked in a Deleted Bitmap.

A very important issue in this case is if the Deleted Rows belong to just one Row Group or to the multiple ones, because our Segment Elimination should simply ignore this specific Row Group without touching a bit on the disk – Query Optimiser should be able to determine this and simply ignore it. Right now it is not the case, as I blogged in part 34, Deleted Segment Elimination.

We can still decide to rebuild our table even if we have 50% of each of our Segments deleted, but a simple single metric showing us just 1 number might not tell the whole story behind it.

The Catch

Let us try to list the possible metrics in the game:
– Number of Deleted Segments (Segments where all respective data has been marked as deleted)
– Segment Fragmentation Percentage in a Partition/Table

But we need to define the fragmentation for Columnstore Indexes –
in my opinion, since there is no natural ordering for the data inside compressed Segments, any rows marked as Deleted are introducing Columnstore fragmentation.

Notice, that I am definitely considering the Segments, because in the most cases they are definitely representing the vast majority of the Columnstore table structure. While Delta-Stores are using B-Tree, they are also unsorted because internally they are represented as HEAPs, plus any data inside the Delta-Stores can be easily deleted from them without any potential issues plugging the Segments.

In SQL Server 2014 the default behaviour for Tuple Mover is to run automatically every 5 minutes and unless you disable it with a Trace Flag 634 as described in Trace Flags & Query Optimiser Rules blogpost, you are absolutely fine already doing Reorganize on a regular schedule.

And so we can define the following fragmentation types:
1) Segment Fragmentation – as a percentage of the deleted rows relatively to the total number of rows of one particular compressed Segment
2) Size on Disk Fragmentation – as a percentage of the Columnstore index size, since every segment might be radically different to each other in the terms of the size.

Important issue here is the decision about the inclusion/exclusion of the completely deleted Segments into/out of the Fragmentation statistics, I consider it to be a complementary metrics in the case of the current SQL Server 2014 (RTM-CU2), which is a integral part of the total equation.

Another important note goes to the fact that even though we are dealing with different Segments for each of the columns, since every row is is spread between all Segments – their destiny of being active or being deleted is perfectly shared between all of the columns.

The Decision Flow

Thousands of blog posts, whitepapers & books have been written on the topic of when you should Reorganise and when to Rebuild a RowStore table, with typical values suggesting to effectuate a maintenance operation to my knowledge rounding 10% for the Reorganize and reaching 30% of the fragmentation means that it is a time to consider to rebuild the whole Table/Partition.

In real life, everyone have their own numbers which alter depending on the database(s) and server(s) in question.

My suggestion by default is to set your table/partition for a Rebuild operation when:
30% of the total fragmentation for the table/partition in question has been reached.
10% of the completely deleted Segments found in our Columnstore Table or Partition.

This recommendation serves of course until Microsoft implements a maintenance solution for removing and merging Columnstore Segments, because should it happen – the game would change & improve greatly.

Please vote up on the Columnstore Segments Maintenance – Remove & Merge Connect Item proposing this solution.

So let us try to create some queries which will allow us to identify the current situation of my database:

I have used a copy of ContosoRetailDW database, where I have created Clustered Columnstore Indexes for 4 of the tables: FactOnlineSales, FactSales, FactInventory and FactSalesQuota.

alter table dbo.[FactOnlineSales] DROP CONSTRAINT PK_FactOnlineSales_SalesKey
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT PK_FactStrategyPlan_StrategyPlanKey
alter table dbo.[FactSales] DROP CONSTRAINT PK_FactSales_SalesKey
alter table dbo.[FactInventory] DROP CONSTRAINT PK_FactInventory_InventoryKey
alter table dbo.[FactSalesQuota] DROP CONSTRAINT PK_FactSalesQuota_SalesQuotaKey

alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimCurrency
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimCustomer
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimDate
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimProduct
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimPromotion
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimStore
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT FK_FactStrategyPlan_DimAccount
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT FK_FactStrategyPlan_DimCurrency
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT FK_FactStrategyPlan_DimDate
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT FK_FactStrategyPlan_DimEntity
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT FK_FactStrategyPlan_DimProductCategory
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT FK_FactStrategyPlan_DimScenario
alter table dbo.[FactSales] DROP CONSTRAINT FK_FactSales_DimChannel
alter table dbo.[FactSales] DROP CONSTRAINT FK_FactSales_DimCurrency
alter table dbo.[FactSales] DROP CONSTRAINT FK_FactSales_DimDate
alter table dbo.[FactSales] DROP CONSTRAINT FK_FactSales_DimProduct
alter table dbo.[FactSales] DROP CONSTRAINT FK_FactSales_DimPromotion
alter table dbo.[FactSales] DROP CONSTRAINT FK_FactSales_DimStore
alter table dbo.[FactInventory] DROP CONSTRAINT FK_FactInventory_DimCurrency
alter table dbo.[FactInventory] DROP CONSTRAINT FK_FactInventory_DimDate
alter table dbo.[FactInventory] DROP CONSTRAINT FK_FactInventory_DimProduct
alter table dbo.[FactInventory] DROP CONSTRAINT FK_FactInventory_DimStore
alter table dbo.[FactSalesQuota] DROP CONSTRAINT FK_FactSalesQuota_DimChannel
alter table dbo.[FactSalesQuota] DROP CONSTRAINT FK_FactSalesQuota_DimCurrency
alter table dbo.[FactSalesQuota] DROP CONSTRAINT FK_FactSalesQuota_DimDate
alter table dbo.[FactSalesQuota] DROP CONSTRAINT FK_FactSalesQuota_DimProduct
alter table dbo.[FactSalesQuota] DROP CONSTRAINT FK_FactSalesQuota_DimScenario
alter table dbo.[FactSalesQuota] DROP CONSTRAINT FK_FactSalesQuota_DimStore;
GO

Create Clustered Columnstore Index PK_FactOnlineSales
    on dbo.FactOnlineSales;

Create Clustered Columnstore index PK_FactSales
	on dbo.FactSales;

Create Clustered Columnstore index PK_FactInventory
	on dbo.FactInventory;

Create Clustered Columnstore index PK_FactSalesQuota
	on dbo.FactSalesQuota;

I have than deleted some random numbers of rows across different tables, and now trying to identify if I should rebuild them or not (Updated on 27th of January 2015, after reminder from David Barbarin):

SELECT object_name(p.object_id) as TableName,
		p.partition_number as Partition,
		cast( Avg( (rg.deleted_rows * 1. / rg.total_rows) * 100 ) as Decimal(5,2)) as 'Total Fragmentation (Percentage)',
		sum (case rg.deleted_rows when rg.total_rows then 1 else 0 end ) as 'Deleted Segments Count',
		cast( (sum (case rg.deleted_rows when rg.total_rows then 1 else 0 end ) * 1. / count(*)) * 100 as Decimal(5,2)) as 'DeletedSegments (Percentage)'
	FROM sys.partitions AS p 
		INNER JOIN sys.column_store_row_groups rg
			ON p.object_id = rg.object_id 
	where rg.state = 3 -- Compressed (Ignoring: 0 - Hidden, 1 - Open, 2 - Closed, 4 - Tombstone) 
	group by p.object_id, p.partition_number
	order by object_name(p.object_id);

Columnstore_Fragmentation As you can see from my 4 Tables I have different percentages for the fragmentation spread over just 1 partition – I can clearly see that 2 of my tables have a fragmentation above 10% but at the same time only for 1 Table do I have Deleted Segments, while there is another table where we have a lower percentage of fragmentation but already 1 Deleted Segment.
Notice, that the fragmentation percentage of the deleted segments can never be higher than the total fragmentation of the Table/Partition.

For me the FactInventory table should be rebuild even though it did not reached the total fragmentation Threshold of 30%, but because it it has reached over 20% of the total Deleted Segments.

Fragmentation based on the size of the data

I wish there would be a way to calculate the total size of the data that is deleted, but since Deleted Bitmap is quite a kind of a unicorn – as far as it is know, there is no way to determine it’s exact size, and than of course we can only estimate the size of the deleted data, which is compressed in a number of ways making it impossible to get an exact number.

Also the calculation of the total size would have to be spread against all of the columns and in DWH this might be hundreds of them.

I might revisit this blog post in the future with a solution, but at the moment I do not see a real necessity of calculating the value in bits & bytes.

to be continued with Clustered Columnstore Indexes – part 37 (“Deleted Bitmap & Delta-Store Sizes”)

Niko's blog

Niko Neugebauer is Microsoft Data Platform professional. A SQL Server MVP, he is very passionate about Data Platform (with a focus on Columnstore) and Community. He is a founder and a leader of the Portuguese SQL Server User Group, SQLPort and a co-founder of 2 more Data Platform User Groups in his home country. His passion for community led him to organise the first SQLSaturday outside of North America – #78 Portuga. Niko presented at a number of different conferences, including PASS Summit, SQLRallys, SQLBits, and SQLSaturdays around the world.

Comments

Leave a comment on the original post [www.nikoport.com, opens in a new window]

Loading comments...