Blog Post

Exam Prep 70-463: Columnstore Indexes and Partitioning

,

This post is part of a series on this blog that will help me, and hopefully you, pass exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012.  So far, we’ve covered:

Today we’re continuing the discussion on columnstore indexes and how they can be implemented in a practical manner.  Typically columnstore indexes are reserved for the data warehouse environment, on tables that are read-mostly. Notice I didn’t say read-only. Obviously, at some point, you’re going to want to load more data into your data warehouse. But this is typically done via batch processing once a day or so. And to accommodate that, you can always drop your columnstore indexes, load the data, and rebuild the indexes. That’s assuming that all of that fits into your workflow. But what if it doesn’t?

Columnstore indexes and partitioning

It’s true that you can’t perform DML operations (insert, update, delete) on a table that contains a nonclustered columnstore index. You can, however, perform DDL operations. What that means is that you can use partitioning to effectively insert data into a table without dropping the columnstore index by using a little partition-switching magic. The basic concept is to load my new data into an empty table (I can use a bulk insert process here because the table is empty), then switch that table into an empty partition my partitioned fact table.

I’ll be partitioning the FactInternetSales table in the AdventureWorksDW2012 database. But first I need to create the partition function. Since my goal is to load data into what will become a single partition in my fact table, I need to make sure that my partitioning method allows for that. If I were just thinking about query performance, I might consider partitioning by sales territory. However, hopefully all of my territories will continue to make sales, so new data won’t fit into a single partition. A better option would be something like the order date, or, better yet, the order year. So that’s what we’ll use here. I’ll also create the partitioning scheme.

 USE AdventureWorksDW2012
GO
CREATE PARTITION FUNCTION PF_OrderYear (smallint)
as RANGE LEFT FOR VALUES (2001,2002,2003,2004,2005,2006,2007,2008,2009);
GO
CREATE PARTITION SCHEME PS_OrderYear
AS PARTITION PF_OrderYear
ALL TO ([PRIMARY]);
GO 

Next I’ll create a copy of the FactInternetSales table, called FactInternetSalesPart. FactInternetSalesPart will include a new column, OrderYear, which is what the table will be partitioned on. I will also create the foreign keys to the dimension tables.

 CREATE TABLE dbo.FactInternetSalesPart(
ProductKey int NOT NULL,
OrderDateKey int NOT NULL,
DueDateKey int NOT NULL,
ShipDateKey int NOT NULL,
CustomerKey int NOT NULL,
PromotionKey int NOT NULL,
CurrencyKey int NOT NULL,
SalesTerritoryKey int NOT NULL,
SalesOrderNumber nvarchar(20) NOT NULL,
SalesOrderLineNumber tinyint NOT NULL,
RevisionNumber tinyint NOT NULL,
OrderQuantity smallint NOT NULL,
UnitPrice money NOT NULL,
ExtendedAmount money NOT NULL,
UnitPriceDiscountPct float NOT NULL,
DiscountAmount float NOT NULL,
ProductStandardCost money NOT NULL,
TotalProductCost money NOT NULL,
SalesAmount money NOT NULL,
TaxAmt money NOT NULL,
Freight money NOT NULL,
CarrierTrackingNumber nvarchar(25) NULL,
CustomerPONumber nvarchar(25) NULL,
OrderDate datetime NULL,
DueDate datetime NULL,
ShipDate datetime NULL,
OrderYear smallint NOT NULL
 CONSTRAINT PK_FactInternetSalesPart_SalesOrderNumber_SalesOrderLineNumber PRIMARY KEY CLUSTERED
(
SalesOrderNumber ASC, SalesOrderLineNumber ASC, OrderYear ASC
)) ON PS_OrderYear(OrderYear)
GO
ALTER TABLE [dbo].[FactInternetSalesPart]  WITH CHECK ADD  CONSTRAINT [FK_FactInternetSalesPart_DimCurrency] FOREIGN KEY([CurrencyKey])
REFERENCES [dbo].[DimCurrency] ([CurrencyKey])
GO
ALTER TABLE [dbo].[FactInternetSalesPart]  WITH CHECK ADD  CONSTRAINT [FK_FactInternetSalesPart_DimCustomer] FOREIGN KEY([CustomerKey])
REFERENCES [dbo].[DimCustomer] ([CustomerKey])
GO
ALTER TABLE [dbo].[FactInternetSalesPart]  WITH CHECK ADD  CONSTRAINT [FK_FactInternetSalesPart_DimDate] FOREIGN KEY([OrderDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO
ALTER TABLE [dbo].[FactInternetSalesPart]  WITH CHECK ADD  CONSTRAINT [FK_FactInternetSalesPart_DimDate1] FOREIGN KEY([DueDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO
ALTER TABLE [dbo].[FactInternetSalesPart]  WITH CHECK ADD  CONSTRAINT [FK_FactInternetSalesPart_DimDate2] FOREIGN KEY([ShipDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO
ALTER TABLE [dbo].[FactInternetSalesPart]  WITH CHECK ADD  CONSTRAINT [FK_FactInternetSalesPart_DimProduct] FOREIGN KEY([ProductKey])
REFERENCES [dbo].[DimProduct] ([ProductKey])
GO
ALTER TABLE [dbo].[FactInternetSalesPart]  WITH CHECK ADD  CONSTRAINT [FK_FactInternetSalesPart_DimPromotion] FOREIGN KEY([PromotionKey])
REFERENCES [dbo].[DimPromotion] ([PromotionKey])
GO
ALTER TABLE [dbo].[FactInternetSalesPart]  WITH CHECK ADD  CONSTRAINT [FK_FactInternetSalesPart_DimSalesTerritory] FOREIGN KEY([SalesTerritoryKey])
REFERENCES [dbo].[DimSalesTerritory] ([SalesTerritoryKey])

Finally, I’ll load data up to the end of 2007.

 INSERT INTO dbo.FactInternetSalesPart (ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey,
CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice,
ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt,
Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate, OrderYear)
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey,
CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice,
ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt,
Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate, datepart(yy,OrderDate)
  FROM dbo.FactInternetSales
  where datepart(yy,OrderDate) <= 2007
GO 

And create my columnstore index. Note that it’s partition aligned with the table.

CREATE COLUMNSTORE INDEX cidx_FactInternetSales
ON dbo.FactInternetSalesPart (ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey,
CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice,
ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt,
Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate, OrderYear)
ON PS_OrderYear(OrderYear);
GO 

My data warehouse is now where we would normally find it. Historical data is loaded and my columnstore index is in place. Now, I’d like to load new data for 2008. The first thing I want to do is create a new, empty table with the same definition as my fact table. I’ll include the same columns, the same foreign key constraints, and the same compression method, if we were using compression. The only difference between this table and the original fact table is the check constraint I’m putting on the OrderYear column. This will ensure that all of the data I load into my new table will fit into a single partition of my fact table.

CREATE TABLE dbo.FactInternetSalesNew(
ProductKey int NOT NULL,
OrderDateKey int NOT NULL,
DueDateKey int NOT NULL,
ShipDateKey int NOT NULL,
CustomerKey int NOT NULL,
PromotionKey int NOT NULL,
CurrencyKey int NOT NULL,
SalesTerritoryKey int NOT NULL,
SalesOrderNumber nvarchar(20) NOT NULL,
SalesOrderLineNumber tinyint NOT NULL,
RevisionNumber tinyint NOT NULL,
OrderQuantity smallint NOT NULL,
UnitPrice money NOT NULL,
ExtendedAmount money NOT NULL,
UnitPriceDiscountPct float NOT NULL,
DiscountAmount float NOT NULL,
ProductStandardCost money NOT NULL,
TotalProductCost money NOT NULL,
SalesAmount money NOT NULL,
TaxAmt money NOT NULL,
Freight money NOT NULL,
CarrierTrackingNumber nvarchar(25) NULL,
CustomerPONumber nvarchar(25) NULL,
OrderDate datetime NULL,
DueDate datetime NULL,
ShipDate datetime NULL,
OrderYear smallint NOT NULL CHECK (OrderYear = 2008)
 CONSTRAINT PK_FactInternetSalesNew_SalesOrderNumber_SalesOrderLineNumber [PRIMARY] KEY CLUSTERED
(
SalesOrderNumber ASC, SalesOrderLineNumber ASC, OrderYear ASC
))
GO
ALTER TABLE dbo.FactInternetSalesNew  WITH CHECK ADD  CONSTRAINT FK_FactInternetSalesNew_DimCurrency FOREIGN KEY(CurrencyKey)
REFERENCES dbo.DimCurrency (CurrencyKey)
GO
ALTER TABLE dbo.FactInternetSalesNew  WITH CHECK ADD  CONSTRAINT FK_FactInternetSalesNew_DimCustomer FOREIGN KEY(CustomerKey)
REFERENCES dbo.DimCustomer (CustomerKey)
GO
ALTER TABLE dbo.FactInternetSalesNew  WITH CHECK ADD  CONSTRAINT FK_FactInternetSalesNew_DimDate FOREIGN KEY(OrderDateKey)
REFERENCES dbo.DimDate (DateKey)
GO
ALTER TABLE dbo.FactInternetSalesNew  WITH CHECK ADD  CONSTRAINT FK_FactInternetSalesNew_DimDate1 FOREIGN KEY(DueDateKey)
REFERENCES dbo.DimDate (DateKey)
GO
ALTER TABLE dbo.FactInternetSalesNew  WITH CHECK ADD  CONSTRAINT FK_FactInternetSalesNew_DimDate2 FOREIGN KEY(ShipDateKey)
REFERENCES dbo.DimDate (DateKey)
GO
ALTER TABLE dbo.FactInternetSalesNew  WITH CHECK ADD  CONSTRAINT FK_FactInternetSalesNew_DimProduct FOREIGN KEY(ProductKey)
REFERENCES dbo.DimProduct (ProductKey)
GO
ALTER TABLE dbo.FactInternetSalesNew  WITH CHECK ADD  CONSTRAINT FK_FactInternetSalesNew_DimPromotion FOREIGN KEY(PromotionKey)
REFERENCES dbo.DimPromotion (PromotionKey)
GO
ALTER TABLE dbo.FactInternetSalesNew  WITH CHECK ADD  CONSTRAINT FK_FactInternetSalesNew_DimSalesTerritory FOREIGN KEY(SalesTerritoryKey)
REFERENCES dbo.DimSalesTerritory (SalesTerritoryKey)
 

Now I can use my data warehouse load process to load 2008’s data into FactInternetSalesNew. And once the data is loaded, I create a columnstore index, just like my primary table.

INSERT INTO dbo.FactInternetSalesNew (ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey,
CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice,
ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt,
Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate, OrderYear)
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey,
CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice,
ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt,
Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate, datepart(yy,OrderDate)
  FROM dbo.FactInternetSales
  where datepart(yy,OrderDate) = 2008
GO
CREATE COLUMNSTORE INDEX cidx_FactInternetSales
ON dbo.FactInternetSalesNew (ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey,
CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice,
ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt,
Freight, CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate, OrderYear)
;
GO 

I’m now ready to switch the data from FactInternetSalesNew into an empty partition of FactInternetSalesPart. Querying the row counts, I see no data after the 7th partition in FactInternetSalesPart, and some data in my FactInternetSalesNew table.

select $PARTITION.PF_OrderYear(OrderYear) as PartitionNumber, count(*) as "RowCount"
FROM dbo.FactInternetSalesPart
GROUP BY $partition.PF_OrderYear(OrderYear)
SELECT COUNT(*) as "RowCount" FROM dbo.FactInternetSalesNew
Partitioning - before switch

Partitioning – before switch

When I switch the partition and check again, I’ll see the data is now in partition 8 of my fact table and my new table is empty.

ALTER TABLE dbo.FactInternetSalesNew
SWITCH TO dbo.FactInternetSalesPart PARTITION 8;
GO
select $PARTITION.PF_OrderYear(OrderYear) as PartitionNumber, count(*) as "RowCount"
FROM dbo.FactInternetSalesPart
GROUP BY $partition.PF_OrderYear(OrderYear)
SELECT COUNT(*) as "RowCount" FROM dbo.FactInternetSalesNew
Partitioning - after switch

Partitioning – after switch

At this point, all I need to do is drop the columnstore index on my FactInternetSalesNew table and modify the check constraint and I’m ready for my next load.

DROP INDEX cidx_FactInternetSales
ON dbo.FactInternetSalesNew
ALTER TABLE dbo.FactInternetSalesNew DROP CONSTRAINT CK__FactInter__Order__6DCC4D03
GO
ALTER TABLE dbo.FactInternetSalesNew ADD CONSTRAINT CK_OrderYear CHECK (OrderYear = 2009) 

Other benefits of partitioning

Using partitioning in the data warehouse has other benefits, as well. Partitioning can improve query performance by spreading data over multiple drives, dispersing IO and parallelizing reads. IO can be further reduced through partition elimination, performed by the query optimizer. Index maintenance can be performed at the partition level, and each partition can use its own compression method, depending on that partition’s individual usage pattern. And inserting and deleting data are more efficient using bulk operations, again made possible by partition switching.

Summary

So far, we’ve laid a good foundation on the basics of implementing a data warehouse. Next time, we’ll start looking at SQL Server Integration Services (SSIS) and developing packages to load our data.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating