Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Table Partitioning

By Andy Warren,

Partitioning is typically used to break data up for either performance or space reasons, comes in a few different flavors, and is often used with or as an alternate to implementing a data archiving plan. Partitioning isn't terribly hard to do, but there are enough details that it's worth working through some scenarios to make sure you know what goes where. Over the next couple articles I'll try to demo all the major options, and I'm going to start here with a very high level review of our options.

Vertical Partitions

This is often done when you have a table that is much wider than it needs to be (often with nice columns like Skill1, Skill2, etc, etc) and hurts performance when table scans are done. By splitting lesser used columns into a parallel table we can increase the number of rows per page in the main table which will increase performance on table scans, and will also give us a boost when building or rebuilding indexes on that table. Remember that it doesn't do a lot of good to move columns that are mainly null, or to necessarily move text columns which only take up 16 bytes in the table. We might move columns that we rarely use, or we might split highly used columns across the two tables. From a performance perspective it's generally a good thing to keep together columns that are frequently updated together so that we only log one update instead of two. From a space perspective we may need to just move the heavier columns to the other table just to get some room to work.

Within vertical partitioning we can pursue two strategies. One is to add triggers to the main (original) table to force the secondary table to stay in sync. If we add a row to the main table we add a row to the secondary table even if all we have is the primary key, and deleting a row from the main table forces a parallel delete in the secondary table. The alternative is to do a sparse implementation where we only add a row to the secondary table if we have data for one or more columns. This is typically a little harder to work with because it requires an upsert type operation, but it's doable and does decrease space usage and that would boost performance as there would be fewer rows to scan (if a scan is needed), and is usually done inside a stored procedure, but could be handled by triggers.

Vertical partitioning is a do it yourself option, SQL doesn't provide any fancy tools to help you do it. Once done you can index each table appropriately and you'll get good or bad query plans based on indexing, statistics, etc, as usual. There's no rule that says you are limited to splitting a main table into two pieces, but I think beyond that it starts to get hard to manage and you incur the increasing overhead of maintaining copies of the primary key in each table. Vertical partitioning is much less common than horizontal partitioning.

Horizontal Partitions

The idea here is to split the rows into multiple tables, either physically or virtually, and gain the ability to place them on separate filegroups which can offer us room to grow and often increased speed as well due to more disks being involved. Pretty much all versions support 'partitioned views' where we manually create separate tables based on our partitioning criteria and then union them all back together with a view to make it look like it's really still one table. This is physically partitioning the data. We can also create a distributed partitioned view which lets us spread out data across multiple servers and join them all back together using a view on each server that uses linked servers to access the remote tables. In SQL 2005 we have true partitions, where we can define a table as residing on a partition and all the sub tables are managed by SQL, there is no view. To us, it looks and feels like a table.

If we use the old 'partitioned view' approach we can add opt to bake in some intelligence to our data access and just query the current orders table, and fall back on the view of all orders if needed. We can also index each table differently. For example, we might choose to add more indexes to previous years tables since we know that data will not change. The tradeoff is that if we add an index we're responsible for adding it to all the other tables, and it's easy to forget that step. By contrast with the newfangled partitions in SQL 2005 we don't have the option of defining different sets of indexes by partition, which can be good or bad depending on your needs. SQL 2005 partitions also require the table to be empty so that it can be created on a partition. It's not going to be a good idea to move all the data around during the day anyway, so this isn't as bad as it might seem.

Implementing any of these requires moving significant chunks of data around and is not likely to be done during production hours. Each can impart some performance gains if the column(s) that were used as the partitioning criteria are part of your query. For example, let's say that we decide to partition our orders table based on order date and build a separate partition for each year. Queries that include the order date will automatically be confined to that one partition, where as if we query on something like customer name without adding order date we will have to search all three tables using seeks or scans as SQL deems best.

In SQL 2000 it's all do it ourselves, in SQL 2005 we can use the partitioning mechanism built it, or choose to do it the old way manually. Horizontal partitions are definitely nice if you need to manage space constraints, just push a big chunk of data to a different set(s) of drives. Performance gains aren't always so clear. Each can impart some performance gains if the column(s) that were used as the partitioning criteria are part of your query. For example, let's say that we decide to partition our orders table based on order date and build a separate partition for each year. Queries that include the order date will automatically be confined to that one partition, where as if we query on something like customer name without adding order date we will have to search all three tables using seeks or scans as SQL deems best. There can be a gain overall if the partitions are put on separate RAID sets, giving SQL more available IO to satisfy the query.

Archiving

Archiving isn't a partitioning strategy, but it's part of the debate because we may need to do it instead of or in addition to partitioning. My definition of archiving is that the data is no longer visible to any production application or report. We typically move it to another table/database in case we need to research something, or we decide to go spelunking for some big historical trend. The advantage to archiving is that can be easy at the table level, just create a job along the lines of 'move anything more than 5 years old in Table A to Archive..TableA'. It get's trickier when you have to take into account foreign key relationships but it's not an impossible job and it one that usually only makes sense to do on the largest tables.

As far as space savings we definitely achieve that, assuming we move the records to a different drive or machine, or just delete them (seriously, this is usually only a good idea for the most trivial of data points). For performance we reduce the total number of rows which will help if we can, and to a degree fewer rows even makes seeks faster.

Which Should I Choose?

I'm a firm believer that every table should have an archiving plan identified when it's created, even if the plan is to never archive it, and set up jobs to maintain the archiving process as soon as the table is live. If I'm short of space I would usually prefer to move entire objects rather than partition, but there might be times when it makes sense. Partitioning for performance should be just about the last thing you try. Picking the right column to partition on isn't always easy, the gains very depending on the queries being run, and you add a layer of complexity that while manageable, is still one more thing to deal with. Before implementing any of the options you'll definitely want to take them for a test run or two, and I'll be doing that in the next couple articles.

Total article views: 19467 | Views in the last 30 days: 52
 
Related Articles
FORUM

Query performance for Ordering

Query performance for Ordering

FORUM

Table partitioning and MAX queries performance problem

table partitioning performance problems

FORUM

Dual Column Partitioning

Need a Dual column partitioning.

ARTICLE

Improving Performance of Cross-Partition Queries

Table partitioning was added in SQL Server 2005 and increases performance in most cases but decrease...

FORUM

Partitioning...

querying partitioned table still slow..

Tags
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones