SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

Clustered Columnstore Indexes – part 39 (“Memory in Action”)

Continuation from the previous 38 parts, starting from http://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/

In this post I shall try to discover more interesting information about Columnstore Object Pool behaviour and it’s relations to the total size of the available memory as well as the relationship to the Buffer Pool.

For the start, I will need a freshly restored copy of ContosoRetailDW with every major blocker (PKs, FKs) dropped.

USE [master]

alter database ContosoRetailDW
	set SINGLE_USER WITH ROLLBACK IMMEDIATE;

RESTORE DATABASE [ContosoRetailDW] 
	FROM  DISK = N'C:\Install\ContosoRetailDW.bak' WITH  FILE = 1,  
		MOVE N'ContosoRetailDW2.0' TO N'C:\Data\ContosoRetailDW.mdf', 
		MOVE N'ContosoRetailDW2.0_log' TO N'C:\Data\ContosoRetailDW.ldf',  
		NOUNLOAD,  STATS = 5;

alter database ContosoRetailDW
	set MULTI_USER;
GO
use ContosoRetailDW;

alter table dbo.[FactOnlineSales] DROP CONSTRAINT PK_FactOnlineSales_SalesKey
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT PK_FactStrategyPlan_StrategyPlanKey
alter table dbo.[FactSales] DROP CONSTRAINT PK_FactSales_SalesKey
alter table dbo.[FactInventory] DROP CONSTRAINT PK_FactInventory_InventoryKey
alter table dbo.[FactSalesQuota] DROP CONSTRAINT PK_FactSalesQuota_SalesQuotaKey

alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimCurrency
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimCustomer
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimDate
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimProduct
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimPromotion
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimStore
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT FK_FactStrategyPlan_DimAccount
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT FK_FactStrategyPlan_DimCurrency
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT FK_FactStrategyPlan_DimDate
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT FK_FactStrategyPlan_DimEntity
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT FK_FactStrategyPlan_DimProductCategory
alter table dbo.[FactStrategyPlan] DROP CONSTRAINT FK_FactStrategyPlan_DimScenario
alter table dbo.[FactSales] DROP CONSTRAINT FK_FactSales_DimChannel
alter table dbo.[FactSales] DROP CONSTRAINT FK_FactSales_DimCurrency
alter table dbo.[FactSales] DROP CONSTRAINT FK_FactSales_DimDate
alter table dbo.[FactSales] DROP CONSTRAINT FK_FactSales_DimProduct
alter table dbo.[FactSales] DROP CONSTRAINT FK_FactSales_DimPromotion
alter table dbo.[FactSales] DROP CONSTRAINT FK_FactSales_DimStore
alter table dbo.[FactInventory] DROP CONSTRAINT FK_FactInventory_DimCurrency
alter table dbo.[FactInventory] DROP CONSTRAINT FK_FactInventory_DimDate
alter table dbo.[FactInventory] DROP CONSTRAINT FK_FactInventory_DimProduct
alter table dbo.[FactInventory] DROP CONSTRAINT FK_FactInventory_DimStore
alter table dbo.[FactSalesQuota] DROP CONSTRAINT FK_FactSalesQuota_DimChannel
alter table dbo.[FactSalesQuota] DROP CONSTRAINT FK_FactSalesQuota_DimCurrency
alter table dbo.[FactSalesQuota] DROP CONSTRAINT FK_FactSalesQuota_DimDate
alter table dbo.[FactSalesQuota] DROP CONSTRAINT FK_FactSalesQuota_DimProduct
alter table dbo.[FactSalesQuota] DROP CONSTRAINT FK_FactSalesQuota_DimScenario
alter table dbo.[FactSalesQuota] DROP CONSTRAINT FK_FactSalesQuota_DimStore;
GO

Now let us create Clustered Columnstore Indexes on our 5 Fact Tables:

Create Clustered Columnstore index PK_FactOnlineSales
	on dbo.FactOnlineSales;

Create Clustered Columnstore index PK_FactStrategyPlan
	on dbo.FactStrategyPlan;

Create Clustered Columnstore index PK_FactInventory
	on dbo.FactInventory;

Create Clustered Columnstore index PK_FactSalesQuota
	on dbo.FactSalesQuota;

Create Clustered Columnstore index PK_FactSales
	on dbo.FactSales;

Columnstore Object Pool & Max Server Memory:

Since SQL Server 2012 we have a better control over the total amount of memory available to SQL Server, because ‘max server memory (KB)’ option is including all memory structures and not just the Buffer Pool as for SQL Server 2008 R2 and previous versions.

For the start, let’s try to see the maximum size for the Columnstore Object Pool depending on the amount of memory available to SQL Server.
We can set the max server memory to 512, 768 and 1024 MB and see how it will function: we also need to drop all the occupied Buffers as well as the Proc Cache.

exec sp_configure 'max server memory (MB)', '512';
reconfigure;

dbcc DropCleanBuffers;
dbcc FreeProcCache;

I will read export the data from my 5 Fact tables into the same file C:\Install\ExportData.csv, but in the process I will make SQL Server read the whole content of my tables with Clustered Columnstore Index:

EXEC xp_cmdshell 'bcp "SELECT * FROM ContosoRetailDW.dbo.FactOnlineSales" queryout "C:\Install\ExportData.csv" -T -c -t,';

EXEC xp_cmdshell 'bcp "SELECT * FROM ContosoRetailDW.dbo.FactSalesQuota" queryout "C:\Install\ExportData.csv" -T -c -t,';

EXEC xp_cmdshell 'bcp "SELECT * FROM ContosoRetailDW.dbo.FactInventory" queryout "C:\Install\ExportData.csv" -T -c -t,';

EXEC xp_cmdshell 'bcp "SELECT * FROM ContosoRetailDW.dbo.FactStrategyPlan" queryout "C:\Install\ExportData.csv" -T -c -t,';

EXEC xp_cmdshell 'bcp "SELECT * FROM ContosoRetailDW.dbo.FactSales" queryout "C:\Install\ExportData.csv" -T -c -t,';

To monitor the results I will query sys.dm_os_memory_cache_counters DMV:

select name, type, pages_kb, pages_in_use_kb, entries_count, entries_in_use_count
	from sys.dm_os_memory_cache_counters 
	where type = 'CACHESTORE_COLUMNSTOREOBJECTPOOL';

Even though the total amount of space, occupied by the Columnstore Segments, Dictionaries & whatever else is there that we do not know yet :), is around 400 MB, we can’t reach that number and for the 512 MB of RAM configured the maximum number of space occupied that I have managed to achieve was 147528 KB.
This corresponds around 30% of the total amount of memory, and it seems to be quite logical that SQL Server is reserving enough space for the Buffer Pool, CLR, Plan Cache, etc – we can’t spend everything on Columnstore Indexes even though we might like it ??

What if I increase the max server memory to 768 MB ?
This time my result was 279760 KB – seems that we have more than doubled the available space for the Columnstore Object Pool just by growing the total amount of memory in 50%.

When I increase the max server memory to 1GB running the scripts above gets me 433472 KB, which corresponds to to over 40% of the total and actually increases the maximum available space even further than the first increase from 512MB to 768MB did.

I was wondering if this was the real limit, and so I decided to copy FactOnlineSales data onto itself (12.6 Million Rows * 2), in order to reach the limit of the Columnstore Object Pool for 1GB of configured max server memory:

set identity_insert dbo.FactOnlineSales on ;

insert into dbo.FactOnlineSales
	(OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate)
	select OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate
		from dbo.FactOnlineSales;

set identity_insert dbo.FactOnlineSales off;

I shall finish the load process with reorganize statement that shall force to close any open Delta-Stores:

alter index PK_FactOnlineSales on dbo.FactOnlineSales
	Reorganize with (COMPRESS_ALL_ROW_GROUPS = ON);

I ran the exporting script again, hoping to get even bigger numbers, above 50%, but to my surprise independently from the number of attempts I was never able to get that far:

EXEC xp_cmdshell 'bcp "SELECT * FROM ContosoRetailDW.dbo.FactOnlineSales" queryout "C:\Install\ExportData.csv" -a 16000 -T -c -t,';

EXEC xp_cmdshell 'bcp "SELECT * FROM ContosoRetailDW.dbo.FactSalesQuota" queryout "C:\Install\ExportData.csv" -a 16000 -T -c -t,';

EXEC xp_cmdshell 'bcp "SELECT * FROM ContosoRetailDW.dbo.FactInventory" queryout "C:\Install\ExportData.csv" -a 16000 -T -c -t,';

EXEC xp_cmdshell 'bcp "SELECT * FROM ContosoRetailDW.dbo.FactStrategyPlan" queryout "C:\Install\ExportData.csv" -a 16000 -T -c -t,';

EXEC xp_cmdshell 'bcp "SELECT * FROM ContosoRetailDW.dbo.FactSales" queryout "C:\Install\ExportData.csv" -a 16000 -T -c -t,';

Even more interesting was the fact that when I monitor sys.dm_os_memory_broker_clerks DMV I can clearly see the Buffer Pool growing up and then decreasing while reading information from Columnstore Indexes.

select *
	from sys.dm_os_memory_broker_clerks;

In the end I have managed to reach 477184 KB, which is almost about at my target, but it was not on the first attempt and they key here was absolute silence with not even a single other query touching Buffer Pool. The final result after the end of the script execution was ~470 MB for Columnstore Object Pool and 94 MB for the Buffer Pool.

Columnstore Object Pool Max Size 1024I have joined the results from 3 configuration in order to get a better visualisation and analysis.

These tests are fine for a blog post, but in the real life none runs a single query against a database (especially based on a single table), and the tables most probably will represent a mixture from Columnstore and Rowstore storage types, but they show how far Columnstore Object Pool can potentially develop – way over some 20-30% but of course it should never reach over 80% of the total memory.

Memory Pressure (Columnstore Object vs Buffer Pool)

I restored the original ContosoRetailDW database before proceeding, setting the total memory to 8GB before the operation and resetting it back to 512MB after

exec sp_configure 'max server memory (MB)', '512';
reconfigure;

Previous test was very synthetic, because it was touching Columnstore Object Pool almost exclusively, without any fight for the resources between them.
Now it is the time to show a little bit more complex scenario –
I will start by issuing a complex query that while supposed to be using Columnstore Object Pool heavily, in practice works much more with a Buffer Pool, because of all the Table Spool operations and consecutive Row Execution Modes on the Hash Match (Aggregate) operators.

dbcc freeproccache
dbcc DropCleanBuffers;

select count( distinct StoreKey ) as 'Distinct Stores'
	, count( distinct CustomerKey ) as 'Distinct Customers'
	, count( distinct CurrencyKey ) as 'Distinct Currencies'
	from dbo.FactOnlineSales sales
		inner join dbo.DimPromotion prom
			on sales.PromotionKey = prom.PromotionKey
	where prom.DiscountPercent = 0
	group by sales.LoadDate;

It takes more than 20 seconds on my test VM and it’s complexity goes to some 11739271 logical reads from the Worktable. (Notice that I have restored the ContosoRetailDW)

Let us check out the undocumented DMV sys.dm_os_memory_broker_clerks, it will give us an update on the current status of Buffer Pool and Columnstore Object Pool:

select *
	from sys.dm_os_memory_broker_clerks;

dm_os_memory_broker_clerks_query_1Well, well, well – we have so much interesting information here:
– first of all we see just 6.7 MB of space occupied by Columnstore Object Pool and a very fat part with around 150MB being occupied by a Buffer Pool.
– secondly, internal_freed_kb columns shows us that 1.574.368 KB were processed and freed from the Columnstore Object Pool.

Ok, and what about running a simple query that should increase Columnstore Object Pool greatly to help it reach the same size as Buffer Pool potentially (FactOnlineSales occupies more than 150MB):

set statistics io on

declare
	@OnlineSalesKey int,
	@DateKey datetime,
	@StoreKey int,
	@ProductKey int,
	@PromotionKey int,
	@CurrencyKey int,
	@CustomerKey int,
	@SalesOrderNumber nvarchar(20),
	@SalesOrderLineNumber int,
	@SalesQuantity int,
	@SalesAmount money,
	@ReturnQuantity int,
	@ReturnAmount money ,
	@DiscountQuantity int ,
	@DiscountAmount money ,
	@TotalCost money,
	@UnitCost money ,
	@UnitPrice money ,
	@ETLLoadID int ,
	@LoadDate datetime ,
	@UpdateDate datetime;



select @OnlineSalesKey = OnlineSalesKey
	, @DateKey = DateKey, @StoreKey = StoreKey, @ProductKey = ProductKey, @PromotionKey = PromotionKey, 
	@CurrencyKey = CurrencyKey, @CustomerKey = CustomerKey, @SalesOrderNumber = SalesOrderNumber, 
	@SalesOrderLineNumber = SalesOrderLineNumber, @SalesQuantity = SalesQuantity, @SalesAmount = SalesAmount, 
	@ReturnQuantity = ReturnQuantity, @ReturnAmount = ReturnAmount, @DiscountQuantity = DiscountQuantity, 
	@DiscountAmount = DiscountAmount, @TotalCost = TotalCost, @UnitCost = UnitCost, @UnitPrice = UnitPrice, 
	@ETLLoadID = ETLLoadID, @LoadDate = LoadDate, @UpdateDate = UpdateDate
	from dbo.FactOnlineSales;

dm_os_memory_broker_clerks_query_2 Surprise! We have increased our Columnstore Object Pool only by 10 MB apparently, while Buffer Pool has lost just 3 MB. This is not what I expected. There are more questions here than any possible answers:
– We have definitely read more than 10 MB, why SQL Server did not fill out Columnstore Object Pool with them ? This looks like a priority issues between Buffer Pool and Columnstore Object Pool, where Columnstore Object Pool is clearly the loosing side.

Ok, maybe my previous query was something that SQL Server did not like, let’s export the whole table into a CSV:

EXEC xp_cmdshell 'bcp "SELECT * FROM ContosoRetailDW.dbo.FactOnlineSales" queryout "C:\Install\ExportData.csv" -T -c -t,';

dm_os_memory_broker_clerks_query_3This time we have jumped to 44MB with Columnstore Object Pool while loosing 2MB from Buffer Pool – the only idea that I get from these observations is that Buffer Pool has a higher priority than Columnstore Object Pool and lives in its own space, which is quite apart from the Columnstore Object Pool.

With this all said, let’s try to expand Columnstore Object Pool, by executing the data reading from Columnstore table in a cycle:

declare
	@OnlineSalesKey int,
	@DateKey datetime,
	@StoreKey int,
	@ProductKey int,
	@PromotionKey int,
	@CurrencyKey int,
	@CustomerKey int,
	@SalesOrderNumber nvarchar(20),
	@SalesOrderLineNumber int,
	@SalesQuantity int,
	@SalesAmount money,
	@ReturnQuantity int,
	@ReturnAmount money ,
	@DiscountQuantity int ,
	@DiscountAmount money ,
	@TotalCost money,
	@UnitCost money ,
	@UnitPrice money ,
	@ETLLoadID int ,
	@LoadDate datetime ,
	@UpdateDate datetime;

declare @i as int = 0;

while @i < 10
begin

select @OnlineSalesKey = OnlineSalesKey
	, @DateKey = DateKey, @StoreKey = StoreKey, @ProductKey = ProductKey, @PromotionKey = PromotionKey, 
	@CurrencyKey = CurrencyKey, @CustomerKey = CustomerKey, @SalesOrderNumber = SalesOrderNumber, 
	@SalesOrderLineNumber = SalesOrderLineNumber, @SalesQuantity = SalesQuantity, @SalesAmount = SalesAmount, 
	@ReturnQuantity = ReturnQuantity, @ReturnAmount = ReturnAmount, @DiscountQuantity = DiscountQuantity, 
	@DiscountAmount = DiscountAmount, @TotalCost = TotalCost, @UnitCost = UnitCost, @UnitPrice = UnitPrice, 
	@ETLLoadID = ETLLoadID, @LoadDate = LoadDate, @UpdateDate = UpdateDate
	from dbo.FactOnlineSales;

	set @i += 1;
end

dm_os_memory_broker_clerks_query_4This does not seem to help a lot – Buffer Pool does not give it’s memory back easily, even though there are no new queries requesting it (besides system DMVs request and all the stuff that SSMS issues). Columnstore Object Pool has increased to 52 MB and Buffer Pool went down to 100MB.

What if I issue a single query that should increase Buffer Pool size:

select *
	from dbo.DimProduct

dm_os_memory_broker_clerks_query_5We have increased our Buffer Pool by ~3MB and nothing else happened.

Let’s force some more information into Buffer Pool ??

select a1.*
	into dbo.TestProduct
	from dbo.DimProduct a1
	cross join dbo.DimProduct a2

dm_os_memory_broker_clerks_query_6In my opinion there is no doubt that Buffer Pool consumes Columnstore Object Pool space with much ease, what is not happening in other direction.

4096 MB:
Running the same workloads with 4096 MB RAM produces different results, we definitely have more space to expand and here until some certain point Buffer Pool gives space to Columnstore Object Pool without a lot of hustle.
Around 400 MB things start to get more complex and Buffer Pool becomes one greedy creature and starts to slow down memory

I wonder, is there some internal threshold set on ~10% that Columnstore Object Pool can easily occupy before it starts some serious fight with Buffer Pool for every consecutive MB ?
In the case of 512 MB RAM configuration my troubles were starting at 50-60 MB, which corresponds this threshold.

Thoughts

It makes a lot of sense to implement higher priority for Buffer Pool, but I have some concern on the minimum space for Columnstore Object Pool and how in practice it will function – nobody wants to read everything from the disk, especially if the technology is being labelled as a “In-Memory” one.

While we are able to reach almost 50% of the space with Columnstore Object Pool it seems that the there are some serious limitations once we have some data in the Buffer Pool (ha! we always have data there) and in practice I believe that a given SQL Server instance with Columnstore Indexes should have between 10% and 20% of the total memory occupied.

Having a lot of memory in these conditions might be really essential for a good performance and as Geoff Patterson suggested in the previous article comment having better control over it might be a very good idea.

to be continued with Clustered Columnstore Indexes – part 40 (“Compression Algorithms”)

Niko's blog

Niko Neugebauer is Microsoft Data Platform professional. A SQL Server MVP, he is very passionate about Data Platform (with a focus on Columnstore) and Community. He is a founder and a leader of the Portuguese SQL Server User Group, SQLPort and a co-founder of 2 more Data Platform User Groups in his home country. His passion for community led him to organise the first SQLSaturday outside of North America – #78 Portuga. Niko presented at a number of different conferences, including PASS Summit, SQLRallys, SQLBits, and SQLSaturdays around the world.

Comments

Leave a comment on the original post [www.nikoport.com, opens in a new window]

Loading comments...