Blog Post

Columnstore Indexes – part 102 (“CCI with Secondary Rowstore Indexes on SQL 2014”)

,

Continuation from the previous 101 parts, the whole series can be found at http://www.nikoport.com/columnstore/.

In SQL Server 2014 we have received a possibility to update the Columnstore Index directly with an addition of the Clustered Columnstore Index. This addition had some important limitations, between which were the impossibility of adding secondary Rowstore Indexes (making the Clustered Columnstore Index the only index on the table) and the lack of the unique constraints (

Meet the good old and absolutely underused friend – the Indexed Views, which in fact (and for some cost) will provide you with the above mentioned features.

If you absolutely need those features, follow this article to find this absolutely trivial way.

Using the good old free database ContosoRetilDW, download the backup and restore it from the C:\Install\:

USE [master]
 
alter database ContosoRetailDW
    set SINGLE_USER WITH ROLLBACK IMMEDIATE;
 
RESTORE DATABASE [ContosoRetailDW] 
    FROM  DISK = N'C:\Install\ContosoRetailDW.bak' WITH  FILE = 1,  
        MOVE N'ContosoRetailDW2.0' TO N'C:\Data\ContosoRetailDW.mdf', 
        MOVE N'ContosoRetailDW2.0_log' TO N'C:\Data\ContosoRetailDW.ldf',  
        NOUNLOAD,  STATS = 5;
 
alter database ContosoRetailDW
    set MULTI_USER;
GO
GO
ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 120
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0', SIZE = 2000000KB , FILEGROWTH = 128000KB )
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0_log', SIZE = 400000KB , FILEGROWTH = 256000KB )
GO

Let’s convert the FactOnlineSales table to the Clustered Columnstore one, by dropping the primary key and all the foreign keys:

alter table dbo.[FactOnlineSales] DROP CONSTRAINT PK_FactOnlineSales_SalesKey;
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimCurrency;
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimCustomer;
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimDate;
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimProduct;
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimPromotion;
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimStore;
create clustered columnstore Index PK_FactOnlineSales
on dbo.FactOnlineSales;

Here is query that I would like to optimise today, which is a rather basic SalesAmount aggregation on the daily basis for some certain orders, where we try to sort the data based on the Total Sales Amount per Day:

SET STATISTICS TIME, IO ON
select st.CalendarDayOfWeekLabel, SUM(SalesAmount) as TotalSales
from dbo.FactOnlineSales v 
inner join dbo.DimDate st
on v.DateKey = st.DateKey
where v.SalesOrderNumber >= N'200912282CS780'
GROUP BY st.CalendarDayOfWeekLabel
ORDER BY TotalSales desc;

This query takes almost 330 ms on the average to run on my virtual machine, at the same burning 1281 ms of the CPU time, doing over incredible 36.000 LOB logical reads!

Table 'FactOnlineSales'. Scan count 4, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 36367, lob physical reads 0, lob read-ahead reads 0.
Table 'DimDate'. Scan count 5, logical reads 334, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 1281 ms,  elapsed time = 330 ms.

Here is the execution plan for this query:

If you run the query and look at the actual execution plan, you will think about adding a nonclustered index:

USE [ContosoRetailDW]
GO
CREATE NONCLUSTERED INDEX IX_MyMissingIndex
ON [dbo].[FactOnlineSales] ([DateKey])
INCLUDE ([SalesAmount])
WITH (DATA_COMPRESSION = PAGE);

this operation will naturally fail, as expected:

Msg 35303, Level 16, State 1, Line 3
CREATE INDEX statement failed because a nonclustered index cannot be created on a table that has a clustered columnstore index. Consider replacing the clustered columnstore index with a nonclustered columnstore index.

We can solve and work around this problem if we build an indexed view, and we shall use it: ??

create view dbo.vContosoTest WITH SCHEMABINDING  
AS
select OnlineSalesKey, DateKey, SalesAmount
from dbo.FactOnlineSales;
GO

We shall need a unique clustered index to turn this view an indexed one:

create unique clustered index pk_vContosoTest
on dbo.vContosoTest (OnlineSalesKey)
WITH (DATA_COMPRESSION = PAGE);

And let us add an additional nonclustered rowstore index, to make this view performing great:

create nonclustered index cci_vContosoTest
on dbo.vContosoTest (SalesOrderNumber,DateKey)
Include(SalesAmount)
WITH (DATA_COMPRESSION = PAGE);

Let us re-execute the query, this time against our indexed view to see if the query is performing well or not:

set statistics time, io on
select st.CalendarDayOfWeekLabel, SUM(SalesAmount) as TotalSales
from dbo.vFactOnlineSales v 
inner join dbo.DimDate st
on v.DateKey = st.DateKey
where v.SalesOrderNumber >= N'200912282CS780'
GROUP BY st.CalendarDayOfWeekLabel
ORDER BY TotalSales desc;

This time the results are returned with a different speed – it took just 23ms of the total execution time, while spending just 31ms of the CPUT time! From the read accesses we are talking here about less than 170 logical reads. Compare these results with the original ones with 330 ms of the total execution (10 times improvement), and over 36000 LOB logical reads (over 200 times improvement).

(4 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DimDate'. Scan count 1, logical reads 81, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'vFactOnlineSales'. Scan count 1, logical reads 85, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 23 ms.

Here is the actual execution plan that was produced for this query:

Here we have a beautiful and a simple execution plan, which delivers what we need – great performance with a relative low overall cost. And in this way we enjoy the possibility to get the best out of the 2 worlds – Columnstore & Rowstore.

Should we need to run a similar query but agains the whole dataset, it will be redirected to our Columnstore Index which will deliver great performance:

select st.CalendarDayOfWeekLabel, SUM(SalesAmount) as TotalSales
from dbo.vFactOnlineSales v 
inner join dbo.DimDate st
on v.DateKey = st.DateKey
GROUP BY st.CalendarDayOfWeekLabel
ORDER BY TotalSales desc;

Uniqueness

Another frequent requirements is the uniqueness for the Clustered Columnstore Indexes on SQL Server 2014, and this one can be achieved by simply adding unique indexes to our indexed view (and notice that there could be more than 1 default required unique index on it).

The operation that we have already executed to make an indexed view out of the simple view is the one which shall guarantee the uniqueness for the column OnlineSalesKey:

create unique clustered index pk_vContosoTest
on dbo.vContosoTest (OnlineSalesKey)
WITH (DATA_COMPRESSION = PAGE);

Should we try to insert a duplicate into our table FactOnlineSales, here by copying one of the rows into our test table FactOnlineSales:

SET IDENTITY_INSERT dbo.FactOnlineSales ON
insert into dbo.FactOnlineSales
(OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate)
select top 1
OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate
from dbo.FactOnlineSales;
SET IDENTITY_INSERT dbo.FactOnlineSales OFF

The error message is very clear that it the uniqueness of the data is enforced and will be preserved by the unique index in our view:

Msg 2601, Level 14, State 1, Line 3
Cannot insert duplicate key row in object 'dbo.vFactOnlineSales' with unique index 'pk_vFactOnlineSales'. 
The duplicate key value is (20362532).
The statement has been terminated.

Now, if only we could build primary keys/foreign keys to the views … Ok, Ok – I know, this is a different topic …

Final Thoughts

The only thing to add here is that one should evaluate costs very carefully, but if you are dealing with reasonably (not huge) tables in the low million rows, this might be a great solution for your unique constraints and secondary indexes.

to be continued with Columnstore Indexes – part 103 (“Partitioning 2016 vs Partitioning 2014”)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating