SQLServerCentral Article

Automate Sliding Window Partition Management: Part I

,

Introduction

We implemented partitioned tables in one of our data marts primarily to facilitate purging data. Our business users defined a requirement to retain 4 years of data and that the data could be dropped in monthly increments. Prior experience dealing with SQL Server and large deletes led me to experiment with partitioning. I wasn't overly concerned with improving daily performance of bulk loads or building the Analysis Server cubes. I was concerned with the monthly requirement to purge hundreds of thousands, perhaps even millions of records. In addition, as we added new fact tables to the data mart, I did not want to add more purge routines. Using partitioning with a common partition scheme based on a date column, I am able to add new fact tables without changing the purge process at all.

I should note that all of this could have been done in T-SQL. There's no specific reason that I had to implement this using PowerShell. I looked at some other tools to facilitate managing partitions, but none of them quite fit the bill. Most were geared around maintaining a single table at a time or were GUI interfaces with no scriptable component.

I have to confess too that I seriously wanted to work with PowerShell as a scripting tool.

Finally, I have to give credit to Stuart Ozer. A huge chunk of this was adapted from his PartitionManager C# class library. That may have made the decision to work with PowerShell easier, since I could easily adapt some of those routines directly into the PowerShell code.

I first published this as set of two scripts on SQLServerCentral. The code pretty much stayed in that form until recently when I had some time to review and revise it. The biggest things that I wanted to fix from the original scripts that I posted:

  1. Allow the scripts to automatically calculate boundary dates for both splitting (adding a new partition) and merging (removing a trailing) partition.
  2. Add some error handling to more gracefully handle situations that I had not anticipated in production.
  3. From a published article standpoint, I also wanted to add a set up script to help admins who were not familiar with setting up partitioning.

I should note that I am NOT an expert in partitioning. Nor am I an expert coder. My understanding of partitioning continues to evolve and I continue to discover new ways to approach things and occasionally issues that cause me to head back to the books for more research.

So, without further ado, here goes...

Part I: Design and Setup

Planning and Design

Creating a sliding window partition strategy requires some planning and some resources. First off, you need SQL Server Enterprise Edition or Developer Edition. None of the other editions of SQL Server 2005 or 2008 support partitioning. SQL Server 2000 and earlier do not support partitioning at all.

Next, you need to decide which tables are going to be partitioned and which column you are going to use as the basis for your partitioning scheme. It's important that once you have identified the tables and the common field for partitioning that you pause and review your design and your data model for potential issues. If table A contains records that are related to table B, you want to be sure that the data in both tables "ages out" at the same time. You don't want to get into a situation where you are creating orphan records.

In our case, the choice was easy and was based on the activity close date which was more or less common to several fact tables. We have eight fact tables which all provide differing levels of detail regarding orders. The tables are related to one another by a surrogate key for the order. When an order "ages out", then all of the data related to that order can be aged out as well.

I cannot stress enough how important this design and review is. If you get this part wrong, you will spend many hours fixing it down the road. Design it. Sketch it out. Review it with the data owners. Review it with your developers. Then put it away for a night or two and come back to it when you have a clear, fresh perspective.

Partition Function

Your next step will be to create a partition function. The partition function will define the boundaries which divide the table into its constituent partitions. If you've done the design part well, this step will come pretty easily. The syntax for creating a partition function is:

CREATE PARTITION FUNCTION partition_function_name ( input_parameter_type )
AS RANGE [ LEFT | RIGHT ] 
FOR VALUES ( [ boundary_value [ ,...n ] ] )

From a snippet of the attached PartitionDBSetup.sql script, it looks like:

CREATE PARTITION FUNCTION [pf_FACT_DATA_DATE](datetime) AS RANGE RIGHT FOR 
VALUES (N'2008-10-01', N'2008-11-01', N'2008-12-01')

One note here about partition functions and boundary values. You will always have an upper boundary and a lower boundary. Values that are greater than the upper boundary will go into the "upper" partition. Values lower than the lower boundary will go into the "lowest" partition. If I've understood partitioning correctly, you can't really purge data from the lowest partition, since this partition always contains data less than the lowest boundary value. Essentially, if you have numbered your partitions 1, 2, 3 ... etc, you will always be purging data from partition number 2.

This is a key understanding on my part and is manifested in the MergePartition.ps1 script in way in which the boundary date is automatically calculated (when not specified as a parameter). We implemented our lowest partition as a deliberately "empty" partition (on a file group called "FACT_EMPTY" and the data file associated with it is deliberately small with auto grow turned off. If any data makes it into this partition, it's because of a data error or an issue with the ETL.

Partition Scheme

Next, we created a partition scheme. Again, this is pretty straightforward if you've done your homework. You simply need to have worked out how many boundaries you have in your partition function and add one. In our case, the business wants to keep four years of data by month, so we have 50 partitions and 49 boundaries. This gives us:

1 empty partition on the "lower" side; this one should never have data

48 "active" partitions

1 empty partition on the "upper" side; this is ready to store data on the first of the new month

The syntax for creating the partition scheme is:

CREATE PARTITION SCHEME partition_scheme_name
AS PARTITION partition_function_name
[ ALL ] TO ( { file_group_name | [ PRIMARY ] } [ ,...n ] )

Again, from a snippet of the attached PartitionDBSetup.sql script, it looks like:

CREATE PARTITION SCHEME [ps_FACT_DATA_DATE] AS PARTITION [pf_FACT_DATA_DATE] TO 
([FACT_EMPTY], [FACT_2008_M10], [FACT_2008_M11], [FACT_2008_M12])

Note here that there is one more partition than there are boundary values. Note also that the lower boundary is named [FACT_EMPTY]. It is not intended for data to be stored here. It's here for administrative purposes and (at least initially during development) helped us to identify data quality issues (null valued data fell out into the FACT_EMPTY partition).

Files and File Groups

One thing to consider here is whether you are going to split your data across multiple file groups (for better performance and IO optimization), or use a single file group. The scripts are really geared around using multiple file groups, but that's not an essential requirement. Remember, the overarching objective here was to facilitate purging of data; it's not all about performance.

Next we created the individual fact tables with the partition column. One key point: in order to perform the "magic" of the switch process, any indexes that you create on partitioned tables must be "storage aligned". This means that any index that you create must be created with the partitioned column. This makes primary keys and unique keys a challenge, especially when using date values for partitioning. Candidly, I have not overcome this particular challenge in our environment, although we do not appear to have an issue with duplicate data.

Here is a sample table excerpted from the accompanying script:

CREATE TABLE Orders (
 OrderCloseDate datetime not null, 
 OrderNum int not null, 
 [Status] char(2) null,
 CustomerID int not null)
ON ps_FACT_DATA_DATE (OrderCloseDate)
GO
CREATE INDEX IX_Orders_CustomerID
ON Orders (OrderCloseDate, CustomerID)
ON ps_FACT_DATA_DATE (OrderCloseDate)
GO

The included setup script creates an empty database, with the necessary file groups and data files. The script also creates two partitioned tables and some storage aligned indexes. The script contains insert statements to minimally populate each table with data. Finally, the script creates a view which helps you as an administrator to visualize the partitions and the amount of data held in each partition.

After running the entire script, you should be able to execute the following from a query window:

SELECT * FROM partition_info ORDER BY TableName, partitionNumber

Note the information provided and the number of rows for each table in each partition. Note also the first and last rows for each table, which are easily identified by the null values for LowerBoundary and UpperBoundary.

Conclusion

This introduction to partitioning and the included samples will hopefully clarify some of the concepts and give you a "sandbox" in which to try out some of the concepts related to partitioning with SQL Server. In the next article, I will discussion splitting a partition, which is used to add a new partition at the "top" end of the partition scheme.

Resources

Rate

3.79 (28)

You rated this post out of 5. Change rating

Share

Share

Rate

3.79 (28)

You rated this post out of 5. Change rating