SQL Server Central is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
Search:  
 
 

Partitioning - Part 3

By Andy Warren, 2008/01/08

Total article views: 4451 | Views in the last 30 days: 154

Previously I've posted Partitioning Part 1 and Partitioning Part 2 that describe the overall options when it comes to partitioning and also looks at archiving as a closely related operation. This time we'll take a look at partitioned views, a technique that can be applied in SQL 2000 or SQL 2005.

The essence of partitioned views is that we create many tables from one table and then union all the tables back together. What we're trying to do is reduce the overall number of records that we have to seek or scan against to find our matching rows. It's important to understand that aside from the possible benefits gained from putting the tables on separate filegroups (increasing the available IO), partitioning will only help queries where the query can be restricted to one of the tables in the view.

To get it to work you have to add a check constraint to each table so that the optimizer can recognize that the rows can only exist in one of the tables. In the examples that follow I'm going to do it wrong to start with so you can see the results, then we'll fix our mistakes and see the final correct results.

We'll start by creating two tables based on the Person.Contact table, each holding about one half of the records because we're splitting the data based on lastname. We'll also add a clustered index to each (same clustered index that is on Person.Contact):

select * into ContactAM from Person.contact where lastname <'N'
select * into ContactNZ from Person.contact where lastname >='N'
create unique clustered index ndxContactID1 on ContactAM(ContactID)
create unique clustered index ndxContactID2 on ContactNZ(ContactID)

Now we'll create a view to make the two tables look like one when we write our queries:

create view Contact as

select 
	ContactID,
	NameStyle,
	Title,
	FirstName,
	MiddleName,
	LastName,
	Suffix
	EmailAddress,
	EmailPromotion,
	Phone,
	PasswordHash,
	PasswordSalt,
	--AdditionalContactInfo] [xml] --deliberately excluded for now
	rowguid,
	modifieddate
from dbo.ContactAM

union

select 
	ContactID,
	NameStyle,
	Title,
	FirstName,
	MiddleName,
	LastName,
	Suffix
	EmailAddress,
	EmailPromotion,
	Phone,
	PasswordHash,
	PasswordSalt,
	--AdditionalContactInfo] [xml] --deliberately excluded for now
	rowguid,
	modifieddate
from dbo.ContactNZ

We had to comment out the XML column because no distinct operation is possible on an XML column. The distinct is required because of our UNION. Now let's run a simple query:

select * from dbo.Contact where lastname = 'green'

We know we're getting the clustered index scan because we have no index on last name. We're scanning both tables because we didn't apply a check constraint that would allow the optimizer to limit the query to one table. Let's add the constraints now:

alter table dbo.ContactAM with check add constraint CKContactAM check ([lastname] like '[a-m]%')
alter table dbo.ContactNZ with check add constraint CKContactNZ check ([lastname] like '[n-z]%')

If we repeat the query we get the same query plan that still has two scans. What did we miss? It turns out that check constraints using LIKE don't work for partitioned views, even though they work perfectly well on the table itself. We'll drop the original constraints and add them back in slightly different form (actually the same way we executed our select * into operation):

alter table dbo.ContactAM drop constraint CKContactAM
alter table dbo.ContactNZ drop constraint CKContactNZ

alter table dbo.ContactAM with check add constraint CKContactAM check ([lastname] < 'N')
alter table dbo.ContactNZ with check add constraint CKContactNZ check ([lastname] >= 'N')

Now if we run our query again, we get a much better plan:

We are only scanning the ContactAM table and because we are only referencing one table, the union operator falls out of the query plan. This reduces the query cost by roughly 50% because we are only scanning one table.

If we replace UNION with UNION ALL we can add back the xml column we commented out earlier:

alter view Contact as

select 
	ContactID,
	NameStyle,
	Title,
	FirstName,
	MiddleName,
	LastName,
	Suffix
	EmailAddress,
	EmailPromotion,
	Phone,
	PasswordHash,
	PasswordSalt,
	AdditionalContactInfo,
	rowguid,
	modifieddate
from dbo.ContactAM

union all

select 
	ContactID,
	NameStyle,
	Title,
	FirstName,
	MiddleName,
	LastName,
	Suffix
	EmailAddress,
	EmailPromotion,
	Phone,
	PasswordHash,
	PasswordSalt,
	AdditionalContactInfo
	rowguid,
	modifieddate
from dbo.ContactNZ
go
sp_refreshview contact

Our test query will now return the XML column as well, without any change to the query plan. More importantly, by using UNION ALL we've taken the first step towards making the view updateable. There are a lot of rules around when and how you can update a partitioned view without doing extra work (See http://msdn2.microsoft.com/en-us/library/ms187956.aspx and look at "Conditions for Modifying Partitioned Views" to see the list). You can work around most of these by using an instead of trigger on the view.

A few close out notes:

  • Partitioned views work on all versions of SQL 2005 and they are not hard to implement, but use them only after careful thought! Partitioning works best when you have a very effective partitioning column that will be included in the majority of your queries
  • The example I gave here is meant to just illustrate partitioning, clearly we could have improved performance by limiting the columns in the select statement (or the underlying view) and by adding an index on lastname.
  • A less used variation of this is the distributed partitioned view, where the tables are spread across different (federated) servers. This definitely adds complexity over and above standard partitioning and I suspect that it's often more effective to just replicate slices of the data.

Next time we'll finish up by looking at partitioned tables, a feature only available in the Enterprise Edition of SQL 2005.

By Andy Warren, 2008/01/08

Total article views: 4451 | Views in the last 30 days: 154
Your response
 
 
Related tags
 
Related content

What does SMO mean in SQL Server 2005?

(not yet rated) | 1,944 reads

What is database mirroring?

By Steve Jones | Category: SQL Server 2005
(not yet rated) | 2,250 reads

What is the role of a witness in...

By Steve Jones | Category: SQL Server 2005
(not yet rated) | 1,376 reads
Like this? Try these...

Table Partitioning

By Andy Warren | Category: Development
| 12,750 reads
Already registered?  

Free registration required

To read the rest of this article, and access thousands of other articles, we ask you to register on the site and subscribe to our newsletters.

Register

E-mail address:
Password:
Password (confirm):

  

Subscriptions

We ask you to register on the site and subscribe to our newsletters. Subscribing to our newsletters gets you:

  • ALL of our content (thousands of articles, scripts, and forum postings)
  • A daily newsletter (example)
  • A weekly news round up (example)
  • The opportunity to ask and answer questions in our forums
  • A daily Question of the Day to test and help you increase your knowledge of SQL Server.

We ask that you give the newsletter a try for a week. Over 200,000 SQL Server Professionals a day find it entertaining and useful. If not, you are welcome to unsubscribe at anytime.

Steve Jones
Editor, SQLServerCentral.com