Partitioning large tables ONLINE

  • We have a very big table we are trying to get partitioned on all of our servers.

    All of the testing seems to indicate that this will take about 12 hours, which is longer than our maximum outage window (6 hours) that we can schedule.

    So now we are investigating doing the partitioning 'ONLINE' during a low customer activity period.

    Anyone done this before? Any tips or comments on what works best? Thanks

  • I've done this several times before. You need capacity to store two copies of the table.

    -create a partitioned table of the same structure called table_ptn.

    -Add a trigger to maintain the new table

    -Backfill the data into table_ptn

    Then

    exec sp_rename 'table', 'table_old'

    GO

    CREATE VIEW

    AS SELECT * FROM table_ptn

    GO

    or

    exec sp_rename 'table', 'table_old'

    GO

    exec sp_rename 'table_ptn', 'table'

    GO

    The actual switch will take milliseconds as its a meta data change

  • We discussed that as an option. Create a shadow partition table and gradually crawl through the original table to partition the data over time. Then we will just rename tables when we are done.

    But the original table can be updated also if users change their data. So we would have to track updates also during the process, perhaps with a trigger to a 3rd table. Then sync it all at the end. Does that make sense? Thanks

  • Why?

    Use the trigger on the source table to insert/update/delete in the "shadow" table.

    Any additional tables and logic is adding work and complicating a simple setup

  • I should have asked already. What are your reasons for partitioning this table?

  • Reason = query performance on the table with partition elimination. Also index maintenance going forward.

    I understand what you are saying with the update trigger directly to the new partition table.

    The problem I see with that is if an update to the original table happens, but that row is not yet on the partitioned version of the table, there is nothing for the trigger to update. That was the thinking with having a 3rd table track those changes, then have a brief outage to sync the updates and swap the table names.

    Thanks

  • Any reasons why you shouldn't let SQL Server do that for you?

    If you have Enterprise Edition, you can rebuild the clustered index ONLINE on the partition scheme.

    Watch your tempdb/user db space usage (depends if you have sort on tempdb on) and log growth.

    -- Gianluca Sartori

  • We are testing the ONLINE partitioning process this week. But if we do it that way, it is definitely a performance hit. So we are trying to figure out how big that performance hit will be. If the performance hit is such that for 12-18 hours many things will timeout, then that is not going to work for us.

    If that is the case, then we might have to do the partitioning in a 'shadow' table and just gradually build it on the side. That would lead to a 3rd table to track changes (updates) that happen in the source table while the new partitioned 'shadow' table is being constructed. Then at the end have a brief outage to sync the two tables, then rename tables to swap them.

    Any other ideas? Thanks for any suggestions.

    Just some background, yes it is Enterprise Edition 2008.

    256 GB RAM, plenty of horsepower with the latest Netapp SAN and Cisco UCS servers.

  • JamesMorrison (9/17/2012)


    I understand what you are saying with the update trigger directly to the new partition table.

    The problem I see with that is if an update to the original table happens, but that row is not yet on the partitioned version of the table, there is nothing for the trigger to update. That was the thinking with having a 3rd table track those changes, then have a brief outage to sync the updates and swap the table names.

    Assuming you are still migrating the data into the shadow table it doesnt matter if the row is there or not on an update. You still update the live row.

  • JamesMorrison (9/17/2012)


    We are testing the ONLINE partitioning process this week. But if we do it that way, it is definitely a performance hit. So we are trying to figure out how big that performance hit will be. If the performance hit is such that for 12-18 hours many things will timeout, then that is not going to work for us.

    With the hardware you mention the performance hit will be minimal provided youre partitioned table has the correct primary key. I've done this on 400GB+ tables with many billions of rows and database TPS over 10,000.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply