SQLServerCentral Article

Partitioning - Part 4

,

So far I've covered the basics of partitioning strategies in

Part 1,

archiving in

Part 2,

and partitioned views in Part 3. This time we'll finish up by looking at the

partitioning features now available in the Enterprise edition of SQL 2005.

I'm going to start with deploying a simple partition. It's a three step process:

  • Create a partitioning function
  • Create a partitioning scheme
  • Create the table using the partitioning scheme

For a simple example we're going to partition the Person.Contact table in

Adventureworks by year. We'll start by creating the partitioning function:

CREATE PARTITION FUNCTION fnPartitionByYear (datetime)
AS RANGE RIGHT 
FOR VALUES (
'1/1/1996',
'1/1/1997',
'1/1/1998',
'1/1/1999',
'1/1/2000',
'1/1/2001',
'1/1/2002',
'1/1/2003',
'1/1/2004',
'1/1/2005',
'1/1/2006',
'1/1/2007')

In this case we're creating our 'bucketing' mechanism. We want to create

partitions for each year. One slightly confusing part is whether to use LEFT or

RIGHT. Essentially you're telling the function whether your values are the last

ones in each bucket (LEFT), or the first one in each bucket (RIGHT). I like

RIGHT for dates because I want 1/1/1996 12/31/1996 23:59 in the same partition

and it's easier than using LEFT with 12/31/1996 23:59:59, 12/31/1997 23:59, etc.

SQL will apply this function to each row to figure out what partition it belongs

to when you load (or modify) data.

Now we'll create a partitioning scheme using the function we just created. A

scheme is a super container that can hold one or more filegroups. I've used the

simpler ALL syntax here to show that I want to create all the partitions on

Primary. Note that because primary is a reserved word it requires the brackets.

Putting all on one filegroup isn't as good as using multiple filegroups across

different drives, but on my laptop I have only one drive and I wanted to

illustrate the ALL usage. It is possible to specify a filegroup for each

partition and even to specify what filegroup will be used when the next

partition is created.

CREATE PARTITION SCHEME ContactScheme
AS PARTITION fnPartitionByYear
ALL to ([Primary])

We should see this message:

Finally, we'll create a new table called ContactPartitioned that is almost

the same as the original table. One of the rules is that partitioning columns

have to be a part of any unique index (as I understand it, this helps the

optimizer know that it only needs to check a single partition) if you want that

index to be on the same partition. So our choices based on the design of the

original Contact table are to append the modifieddate to the the previously

defined clustered index so that everything else stays the same, or we change the

primary key to be non clustered and create it directly on a filegroup. I know

it's a little confusing! It's absolutely worth spending a few minutes using my

code samples to work through all of this so you can see first hand. The "magic"

is the last line where instead of creating the table on FILEGROUP, we create it

on SCHEME. In this case, our just created ContactScheme and we pass in the

partitioning column of modified date.

CREATE TABLE [Person].[ContactPartitioned](
[ContactID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
[NameStyle] [dbo].[NameStyle] NOT NULL   DEFAULT ((0)),
[Title] [nvarchar](8) NULL,
[FirstName] [dbo].[Name] NOT NULL,
[MiddleName] [dbo].[Name] NULL,
[LastName] [dbo].[Name] NOT NULL,
[Suffix] [nvarchar](10) NULL,
[EmailAddress] [nvarchar](50) NULL,
[EmailPromotion] [int] NOT NULL  DEFAULT ((0)),
[Phone] [dbo].[Phone] NULL,
[PasswordHash] [varchar](40) NOT NULL,
[PasswordSalt] [varchar](10) NOT NULL,
[AdditionalContactInfo] [xml](CONTENT [Person].[AdditionalContactInfoSchemaCollection]) NULL,
[rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL   DEFAULT (newid()),
[ModifiedDate] [datetime] NOT NULL  DEFAULT (getdate())
 CONSTRAINT [PK_Contact_ContactIDNew] PRIMARY KEY CLUSTERED ([ContactID], ModifiedDate ASC))
 ON ContactScheme(modifieddate)

To see if everything is really working, we'll copy the data from

Person.contact into the new table:

INSERT INTO Person.ContactPartitioned
   (NameStyle
   ,Title
   ,FirstName
   ,MiddleName
   ,LastName
   ,Suffix
   ,EmailAddress
   ,EmailPromotion
   ,Phone
   ,PasswordHash
   ,PasswordSalt
   ,AdditionalContactInfo
   ,rowguid
   ,ModifiedDate)
select
    NameStyle
   ,Title
   ,FirstName
   ,MiddleName
   ,LastName
   ,Suffix
   ,EmailAddress
   ,EmailPromotion
   ,Phone
   ,PasswordHash
   ,PasswordSalt
   ,AdditionalContactInfo
   ,rowguid
   ,ModifiedDate
from person.contact

Let's run two simple queries so that we can compare apples to apples:

select  * from person.contactpartitioned where modifieddate = '1/1/2004'
select  * from person.contact where modifieddate = '1/1/2004'

The both return the same number of rows, but if just look at the plans we see that there is a big difference in the relative query costs:

We can see that in both cases we used a clustered index scan (because we

elected to return all columns and because we have no index on modifieddate in

either table), but our query against our new partitioned table seems to be much

more efficient. We can confirm that with a quick Profiler session:

Another way to confirm that partitioning is actually being used is to hover

over the plan operator and see if it shows a partition id line (in red below):

To see how the data was allocated, we can run this query:

select object_name(object_id), partition_number, rows from sys.partitions
where
object_name(object_id)='contactpartitioned'

If we query the data by year directly, we get:

select year(modifieddate), count(*) as PartitionCount
from person.contactpartitioned
group by year(modifieddate)
order by year(modifieddate)

We only get 10

rows. Why? The person.contact table contains no data for 2006 or 2007, but we

created partitions for those year in our partitioning function. That accounts

for 2 of the missing 3. The other one is the first partition with 0 rows,

created to handle any data that might be prior to 1/1/1996.

Here are a few miscellaneous notes about partitioning:

  • Remember its only Enterprise edition (and Developer of course)
  • Indexes can be partitioned to match the table partition, or not. Just as

    partitioning a table can boost speed in certain cases, same is true for

    indexes

  • It's horizontal only
  • There's a lot of options and rules to partitioning and no UI to help you

    out. Before you go big, build a small table with some test data that matches

    your scenario and work through the steps

I blog once a week or so at

http://blogs.sqlservercentral.com/blogs/andy_warren/default.aspx about

SQL Server, SQL user groups, and related topics. I hope you'll visit and comment

occasionally!

Rate

4 (15)

You rated this post out of 5. Change rating

Share

Share

Rate

4 (15)

You rated this post out of 5. Change rating