SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Online table partitioning


Online table partitioning

Author
Message
nadersam
nadersam
Mr or Mrs. 500
Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)

Group: General Forum Members
Points: 530 Visits: 456
Dear All,

To improve performance for some tables having huge amount of historical data we have considered using the file partitioning based on date for those tables.
The question is could that be done online which users are working or could it cause a severe performance hit.
The table size vary between 5 millions and 30 millions records
Thanks in advance
Nader
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)

Group: General Forum Members
Points: 87176 Visits: 41113
nadersam (12/15/2013)
Dear All,

To improve performance for some tables having huge amount of historical data we have considered using the file partitioning based on date for those tables.
The question is could that be done online which users are working or could it cause a severe performance hit.
The table size vary between 5 millions and 30 millions records
Thanks in advance
Nader


Yes. It can be done in a nearly 100% (except for about 65ms) online fashion. Do you have enough disk space for 2 copies of the table? (Note, there is a way to move all of the data into a single partition of a new table but that provides no advantages of actually partitioning the data by File/Filegroup). Of course, the original copy will go away once we bring the 2nd copy online but there will be a substantial period of time where there will be two copies of the data. Once all of the data is in the new partitioned table, rename the two tables so that the new table will "come online" (takes about 65 ms to do such a thing), and then check the old table to see if there are any rows that were added after you completed the copy of the table and copy them to the new table. Of course, the old table should not be deleted until you've double checked that all the rows it contains are, in fact, contained in the new table and that the new table has been operating successfully for a day or two with no hiccups.

Shifting gears, partitioning isn't the cake-walk that most people think it is. For example, are there any FKs that point at the PK of these historical data tables? There really shouldn't be but, if someone made that mistake, that means that you have UNIQUE indexes in the table and they will need to have the partitioning column (date in this case) added to them to keep them "aligned" (and I'll tell you, it's a huge mistake to not have them aligned) and that means they will no longer be truly unique keys anymore. That, of course, includes the PK, whatever it is (hopefully, it's an IDENTITY column that has been assigned no special meaning).

Then, there's the subject of whether or not you'd like to take advantage of reducing backups and backup time. Are you going to do that (I strongly recommend that you do)? If so, you'll need a File/FileGroup per month (for example) and that brings up the problem having too much free space per file. Reindexing of such F/FGs is certainly possible but, as you well know, rebuilding the index of a clustered index causes about 120% free space because the old index (and, therefor, the data) is kept in place until the new index is complete. The trick of doing that twice and then truncating the space appears to be unpredictable, as best. I've had to do a trick with moving the data out of the original F/FG, shrink the file to 0, and then move it back in using a Clustered Index Rebuild, which is the only way to avoid the freespace growth of rebuilding the CI on the target partition that I can see (if anyone knows a better way, please jump in here). This, too, can be done in an online fashion but the data for the F/FG won't be available during the CI Rebuild. Removal of such free space can be important to keep disk usage in check. For example, if you have a little over 4 years of data, you might have, say, 50 F/FG combinations. If each one has only an extra 100MB in it (and that amount is frequently much larger because of interim index rebuilds), that's 5GB that's been wasted. You don't want to simply set the growth on these things to 1MB, either. Disk level fragmentation could be horrible if you do. Since my largest month (phone system that keeps the recorded calls) is 7GB, I set each "current month" partition to 10GB to prevent fragmentation and to allow for unexpected growth. That would (currently) waste at least 3GB per month so such truncation of the file to its actual size has become rather important to me. :-)

Then, there's the advantage that once the data has been partitioned, the older F/FGs can be set to READ_ONLY to protect the files from further modification (auditors LOVE this feature) and to make Piece-Meal restores a fair bit easier. The problem is that the DB must be set to single user to set a FileGroup to read only and can then be set back to multi user. It doesn't take long for that to occur but it will throw everyone out of the data base with great predjudice once per month. There's a work around for this but it requires a little dancing once a month and having the historical data live in a separate database. Because of that little nuance, I actually have moved such historical data to a separate "Archive" database. Synonyms work a treat, here.

Also, since such historical data is infrequently accessed (or, at least, it should be and then only for reports), have you considered moving this data off the SAN and onto less expensive drives? That's what I'm in the process of doing at work now. We're moving the data from the very high performance SAN to a slower, much less expensive, iSCSI system.

Speaking of performance and resources, depending on the queries you play against the historical table and contrary to what many people advertise, you could actually get a performance decrease and an increase in I/O resource usage because each F/FG has its own B-Tree. Code that cannot be made sensitive to the partitioning date column will be a bit (sometimes a fair bit slower) because of this fact. You might be tempted to NOT use a separate file for each month but that could also make monthly processing quite a bit different than end of year processing or it will cause you to lose the extreme advantages of not having to backup a full year or quarter of data every night (or whenever your non-log backups run).

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
nadersam
nadersam
Mr or Mrs. 500
Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)Mr or Mrs. 500 (530 reputation)

Group: General Forum Members
Points: 530 Visits: 456
Hi Jeff,

Thanks for the detailed explanation , i will have to examine that thoroughly.
But based on that solution, doing the partitioning online directly on original table will cause a bad performance impact.

Thanks
Nader
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)SSC Guru (87K reputation)

Group: General Forum Members
Points: 87176 Visits: 41113
nadersam (12/16/2013)
Hi Jeff,

Thanks for the detailed explanation , i will have to examine that thoroughly.
But based on that solution, doing the partitioning online directly on original table will cause a bad performance impact.

Thanks
Nader


Correct.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search