Partitioning is a feature that has been around since SQL Server 2005. It allows for users to split a table across multiple filegroups. Multiple filegroups means the opportunity to store a table across multiple volumes. If you have a SAN that has disk tiers, you can use partitioning to store the data you want, where you want. Want to store the current month of your data on SSDs and move the rest to your slower, 10k or 7k disk? Partitioning can make this happen. Partitioned tables can be queried just like any other table in the database, but it’s the magic happening behind the scenes that, when set up and queried correctly, can make them powerful.
What You’ll Need
A partition column, partition scheme, and partition function are the required components necessary to implement partitioning. You’ll also want new filegroups and files to store the partitioned data.
This is the column that will drive what filegroup will store the data being inserted. Most fields can be selected to be a partition column, but the most common choices are either a date or integer field. These two data types typically have the most potential for logical separation of data. This column is often referred to as the partition key.
The partition function maps the data of a table to the partitions created based on the value of the partition column. The boundaries and range of the partitions are also created by the partition function.
The partition scheme maps the partitions created with the partition function to multiple filegroups.
Ready, Set, Partition
Using the AdventureWorks2008R2 database, I’ll walk through an example of setting up partitioning on the ‘Production.TransactionHistory’ table. The ‘TransactionDate’ field will be used as the partition column. For this example, I’m going to focus on the first three months of 2013.
First, we need a partition function. My snippet below creates a function setting boundaries and ranges for January,February, and March of 2013.
CREATE PARTITION FUNCTION TransactionHistoryRange (datetime) AS RANGE LEFT FOR VALUES ('2013-01-31 23:59:59.997', '2013-02-28 23:59:59.997', '2013-03-31 23:59:59.997'); GO
Something you may notice with the statement above is the peculiar choice for the datetime values. Per Kimberly Tripp’s whitepaper on partitioning, the datetime data type is precise with 3.33 milliseconds. This means that the maximum potential value for January 31st, 2013 is 2013-01-31 23:59:59.997. Any value greater for this day is not available, and would be rounded up to the following day.
Before we create the partition scheme, I’ll create three new filegroups, each with a new data file.
ALTER DATABASE AdventureWorks2008R2 ADD FILEGROUP January2013; ALTER DATABASE AdventureWorks2008R2 ADD FILEGROUP February2013; ALTER DATABASE AdventureWorks2008R2 ADD FILEGROUP March2013; GO ALTER DATABASE AdventureWorks2008R2 ADD FILE ( NAME = January2013_01, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.KREULSQL1\MSSQL\DATA\January2013_01.ndf', SIZE = 5MB, FILEGROWTH = 5MB ) TO FILEGROUP January2013; ALTER DATABASE AdventureWorks2008R2 ADD FILE ( NAME = February2013_01, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.KREULSQL1\MSSQL\DATA\February2013_01.ndf', SIZE = 5MB, FILEGROWTH = 5MB ) TO FILEGROUP February2013; ALTER DATABASE AdventureWorks2008R2 ADD FILE ( NAME = March2013_01, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.KREULSQL1\MSSQL\DATA\March2013_01.ndf', SIZE = 5MB, FILEGROWTH = 5MB ) TO FILEGROUP March2013; GO
Now we’re ready for the partition scheme
CREATE PARTITION SCHEME TransactionHistoryRangeScheme AS PARTITION TransactionHistoryRange TO (January2013, February2013, March2013, [PRIMARY]); GO
Note that we actually specify four filegroups, even though we only created three new filegroups. The fourth filegroup, the ‘PRIMARY’ filegroup, is specified to hold data newer than 3/31/2013. If you’re going through the work of partitioning, chances are you most likely don’t actually want data to be inserted into the ‘PRIMARY’ filegroup. To prevent this, we can set up a check constraint to check to make sure data inserted is no newer than the greatest value of the last partition. The constraint is created as a fail-safe measure and you would want to make sure this constraint is updated with a new max value each time the partition range is expanded.
ALTER TABLE Production.TransactionHistory WITH CHECK ADD CONSTRAINT CK_TransactionHistory_Max_DateCheck CHECK (TransactionDate <= '2013-03-31 23:59:59.997'); GO
Now that the plumbing is set up, I need to move the ‘TransactionHistory’ table to the new partition scheme. The table currently resides in the ‘PRIMARY’ filegroup, and I now want it to move it to the ‘TransactionHistoryRangeScheme’. I can accomplish this by dropping and re-creating the primary key, which is also the clustered index on the table. There are a couple key differences to account for when creating a primary key on a partitioned table vs a non-partitioned table. First, partitioning requires that the partition key be a part of any unique index on the table. Also, I need to specify the name of the partition scheme, ‘TransactionHistoryRangeScheme’, and the name of the partitioned column, ‘TransactionDate’, in the ON clause of the script.
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'Production.TransactionHistory') AND name = N'PK_TransactionHistory_TransactionID') ALTER TABLE Production.TransactionHistory DROP CONSTRAINT PK_TransactionHistory_TransactionID; GO ALTER TABLE Production.TransactionHistory ADD CONSTRAINT PK_TransactionHistory_TransactionID PRIMARY KEY CLUSTERED ( TransactionID, TransactionDate )ON TransactionHistoryRangeScheme (TransactionDate); GO
To verify everything is working properly, I’ll insert a few rows into the ‘Production.TransactionHistory’ table and then check the row count of each partition using the $PARTITION function.
INSERT INTO AdventureWorks2008R2.Production.TransactionHistory (ProductID ,ReferenceOrderID ,ReferenceOrderLineID ,TransactionDate ,TransactionType ,Quantity ,ActualCost ,ModifiedDate) VALUES (877,73963,3,'2013-01-28 00:00:00.000','S',1,8.95,'2013-01-28 00:00:00.000'), (877,73963,3,'2013-02-20 00:00:00.000','S',1,8.95,'2013-02-20 00:00:00.000'), (877,73963,3,'2013-3-01 00:00:00.000','S',1,8.95,'2013-03-01 00:00:00.000'); GO SELECT * FROM Production.TransactionHistory WHERE $PARTITION.TransactionHistoryRange(TransactionDate) = 1 AND YEAR(TransactionDate) = 2013; SELECT * FROM Production.TransactionHistory WHERE $PARTITION.TransactionHistoryRange(TransactionDate) = 2; SELECT * FROM Production.TransactionHistory WHERE $PARTITION.TransactionHistoryRange(TransactionDate) = 3; GO
Were the rows inserted correctly?
Success! The rows were inserted into their respective partitions. Since this table was made of rows where ‘TransactionDate’ was older than 2013, the first partition is going to contain all of the table (minus the two rows I inserted for February and March), so I included the extra filter in the where clause for January to return the row I’m looking for. Note that each partition receives a partition id. In our case, partition ID ’1′ is for all rows up to and including January 31, 2013, partition ID ’2′ includes February 1-February 28, 2013 and partition ID ’3′ includes March 1-March 31st, 2013.
Is Partitioning Right For You?
Every situation, every environment is different. Partitioning is not without maintenance, as partition ranges will most likely need to be expanded, check constraints updated, files moved and consolidated, etc. Partitioning can however make seemingly unmanageable growth, manageable. Coupled with SAN tiering technology, we have successfully used partitioning to scale tables as large as 14 billion rows. Before you go through the effort of implementing partition, have a firm understanding of how your data is accessed and how fast it’s growing.
There is a ton out there on partitioning, and Brent Ozar has put together a great list of reference material.
The front page image for this post is courtesy of: www.flickr.com/photos/infomatique/306917951/