This series of articles is designed to demonstrate in a practical manner how to implement partitioning in SQL Server. In particular the focus is on designing and maintaining a "sliding window" for the purpose of purging old data from the database with minimal performance overhead.
Fundamentally, there are two administrative tasks needed to maintain a sliding window for data partitioning:
- You have to create new partitions to store new data; and
- You have to drop data from the trailing partition.
Each process poses its own unique challenges. In this article, I will review the functionality of MergePartition.ps1, a script that encapsulates and automates the following tasks:
- Finds the tables associated with the partition scheme (set as a parameter),
- Creates copies of the tables (staging tables),
- Splits the data from the targeted partition boundary into the staging tables,
- (optionally) drops the staging tables,
- (optionally) merges the partition boundary; and
- (optionally) drops the file group with the associated data files
As noted previously, a huge chunk of this was adapted from Stuart Ozer's PartitionManager C# class library.
Running the Script
With the database set up (with the script from the first article) and the MergePartition.ps1 script loaded, now we run a couple of statements. You are ready to start PowerShell. Note that unless you have the ability to sign code, you will need to set PowerShell's execution policy to Unrestricted in order to be able to run this script. To do so, enter Set-ExecutionPolicy Unrestricted.
Before running the script, open a query window and run:
SELECT * FROM partition_info ORDER BY TableName, partitionNumber
Review the results to see what partitions exist and how much data is in each partition.
First, let's run the script with the minimum possible number of parameters:
PS C:\> ./MergePartition.ps1 <server_name> PartitionMaintTest ps_FACT_DATA_DATE
This will result in removing partitionNumber 2 (note that the other partitions will be renumbered sequentially; you will only know that the correct partition has been removed by verifying that the lower boundary date has changed.
Note that the data in the FACT_EMPTY partition is not affected. You can run the above statement as many times as you like (well, up to the limit of the number of partitions that were created): the result will be that partition number 2 will be removed. The data in FACT_EMPTY will remain untouched.
Next, you can specify a specific boundary date to remove.
PS C:\> ./MergePartition.ps1 <server_name> PartitionMaintTest ps_FACT_DATA_DATE "08/01/2009"
Note that running the script in this manner may result in a "gap" appearing in your data. You will remove a month of data from the middle of your table (not that I speak from experience!).
The major challenge in maintaining the sliding window partition scheme has been working with the developers to ensure that new fact tables (there haven't been two) and changes to existing fact tables (there have been a few) have been consistent with the established partition scheme. In particular new indexes have to be carefully designed and must respect the storage alignment requirement.
Beyond that, up until these most recent changes, I had to execute the scripts manually once per month. This was partly because as a new process, I did not fully trust automating it until I had seen it run successfully in production a few times. It was also because I needed to update the scripts to automatically calculate boundary values and add some additional error traps and checks.
So, at the end of the day, what's the value proposition to the business for all this?
In a nutshell, for me, it boils down to consistent, reliable performance and ease of administration. These are trivial examples with small amounts of data. However, when running against our data warehouse, the MergePartition.ps1 script removes millions of records, representing a months worth of activity around the country in just about the same amount of time (a very few seconds).
By contrast, our old data mart contains data going back more than 7 years. Performance has steadily declined over the years as more and more data accumulates. It is a vendor database and we are not permitted to make changes to the schema (and we are just about to retire it anyway). The application does not contain a purge function (but we can get one custom written for us for *small* fee!!!). In short, we're stuck.
Now, we have an agreed upon strategy for purging data from the data mart, and the process takes only seconds to complete. The amount of data in the database remains relatively constant and our users benefit from better performance for their reports.
You can read all the articles in this series: