Partitioning - Part 3

,

Previously I've posted

Partitioning Part 1 and

Partitioning Part 2 that describe the overall options when it comes to

partitioning and also looks at archiving as a closely related operation. This

time we'll take a look at partitioned views, a technique that can be applied in

SQL 2000 or SQL 2005.

The essence of partitioned views is that we create many tables from one table

and then union all the tables back together. What we're trying to do is reduce the overall number of records

that we have to seek or scan against to find our matching rows. It's important

to understand that aside from the possible benefits gained from putting the

tables on separate filegroups (increasing the available IO), partitioning will

only help queries where the query can be restricted to one of the tables in the

view.

To get it to work you have to add a check constraint to each table so that

the optimizer can recognize that the rows can only exist in one of the tables. In the examples that follow I'm

going to do it wrong to start with so you can see the results, then we'll fix

our mistakes and see the final correct results.

We'll start by creating two tables

based on the Person.Contact table, each holding about one half of the records

because we're splitting the data based on lastname. We'll also add a clustered index to each

(same clustered index that is on Person.Contact):

select * into ContactAM from Person.contact where lastname <'N'
select * into ContactNZ from Person.contact where lastname >='N'
create unique clustered index ndxContactID1 on ContactAM(ContactID)
create unique clustered index ndxContactID2 on ContactNZ(ContactID)

Now we'll create a view to make the two tables look like one when we write our

queries:

create view Contact as
select 
	ContactID,
	NameStyle,
	Title,
	FirstName,
	MiddleName,
	LastName,
	Suffix
	EmailAddress,
	EmailPromotion,
	Phone,
	PasswordHash,
	PasswordSalt,
	--AdditionalContactInfo] [xml] --deliberately excluded for now
	rowguid,
	modifieddate
from dbo.ContactAM
union
select 
	ContactID,
	NameStyle,
	Title,
	FirstName,
	MiddleName,
	LastName,
	Suffix
	EmailAddress,
	EmailPromotion,
	Phone,
	PasswordHash,
	PasswordSalt,
	--AdditionalContactInfo] [xml] --deliberately excluded for now
	rowguid,
	modifieddate
from dbo.ContactNZ

We had to comment out the XML column because no distinct operation is possible on an XML column. The distinct is required because of our UNION.

Now let's run a simple query:

select * from dbo.Contact where lastname = 'green'

We know we're getting the clustered index scan because we have no index on last

name. We're scanning both tables because we didn't apply a check constraint that

would allow the optimizer to limit the query to one table. Let's add the

constraints now:

alter table dbo.ContactAM with check add constraint CKContactAM check ([lastname] like '[a-m]%')
alter table dbo.ContactNZ with check add constraint CKContactNZ check ([lastname] like '[n-z]%')

If we repeat the query we get the same query plan that still has two scans. What did we miss? It turns out that check constraints using LIKE don't work for partitioned views, even though they work perfectly well on the table itself. We'll drop the original constraints and add them back in slightly different form (actually the same way we executed our select * into operation):

alter table dbo.ContactAM drop constraint CKContactAM
alter table dbo.ContactNZ drop constraint CKContactNZ
alter table dbo.ContactAM with check add constraint CKContactAM check ([lastname] < 'N')
alter table dbo.ContactNZ with check add constraint CKContactNZ check ([lastname] >= 'N')

Now if we run our query again, we get a much better plan:

We are only scanning the ContactAM table and because we are only referencing one

table, the union operator falls out of the query plan. This reduces the query

cost by roughly 50% because we are only scanning one table.

If we replace UNION with

UNION ALL we can add back the xml column we commented out earlier:

alter view Contact as
select 
	ContactID,
	NameStyle,
	Title,
	FirstName,
	MiddleName,
	LastName,
	Suffix
	EmailAddress,
	EmailPromotion,
	Phone,
	PasswordHash,
	PasswordSalt,
	AdditionalContactInfo,
	rowguid,
	modifieddate
from dbo.ContactAM
union all
select 
	ContactID,
	NameStyle,
	Title,
	FirstName,
	MiddleName,
	LastName,
	Suffix
	EmailAddress,
	EmailPromotion,
	Phone,
	PasswordHash,
	PasswordSalt,
	AdditionalContactInfo
	rowguid,
	modifieddate
from dbo.ContactNZ
go
sp_refreshview contact

Our test query will now return the XML column as well, without any change to the

query plan. More importantly, by using UNION ALL we've taken the first step

towards making the view updateable. There are a lot of rules around when and how

you can update a partitioned view without doing extra work (See

http://msdn2.microsoft.com/en-us/library/ms187956.aspx and look at

"Conditions for Modifying Partitioned Views" to see the list). You can work

around most of these by using an instead of trigger on the view.

A few close out notes:

  • Partitioned views work on all versions of SQL 2005 and they are not hard

    to implement, but use them only after careful thought! Partitioning works

    best when you have a very effective partitioning column that will be

    included in the majority of your queries

  • The example I gave here is meant to just illustrate partitioning,

    clearly we could have improved performance by limiting the columns in the

    select statement (or the underlying view) and by adding an index on lastname.

  • A less used variation of this is the distributed partitioned view, where

    the tables are spread across different (federated) servers. This definitely

    adds complexity over and above standard partitioning and I suspect that it's

    often more effective to just replicate slices of the data.

Next time we'll finish up by looking at partitioned tables, a feature only

available in the Enterprise Edition of SQL 2005.

Rate

4.38 (13)

Share

Share

Rate

4.38 (13)