Table patitioning / partition switching

  • Hi,

    I'm fairly certain that I know the answer to this question, but would like some validation nonetheless. A large table, with roughly 7 billion rows and 360GB of storage, receives daily updates via a MERGE statement. Some rows are updated, some are inserted, but none are currently deleted. In my mind, such a table is NOT a candidate for partition switching. Do you concur?

    Thanks,

    Bennett

  • If you can't guarantee that the data to be merged is focused at the most recent data, you have to consider the following example:

    Data2010

    Data2011

    Data2012

    Data2013

    What happens in your system if you now try to merge more 2010 data (for whatever reason)?

    New rows would appear of course.

    Keep in mind: these new rows would appear regardless of if the table is partitioned.

    Partitioning might put them into the 2011 dataset (depending on how you set up your partition functions)

    But with sizable data like you mention (360GB) - you will need a sufficient maintenance window to process logged deletes. If you don't have that window, partitioning will be your only way to "keep up" with rolling purging.

    In the example above, it's really not that bad to "add" more 2010 data. It may hang around for a year, but it will eventually purge.

    The merge issue you give would not stop me from partitioning this table if otherwise partitioning was a good fit.

    Partitioning does add complexity to managing the system. But it's hard to beat for rolling purges of data if the table qualifies to be partitioned.

    Are there other concerns?

  • Hi Jim,

    Thank you for your reply. I have a fairly good understanding the process of switching in and switching out. I agree that partitioning would be useful for switching out the old data. What I cannot see is any way to switch in new data since we're bringing it in via a MERGE statement rather than an INSERT statement - or am I overlooking something?

    Thanks,

    Bennett

  • Hey Bennett,

    For switching-in partitions - you're dead on.

    As soon as you're doing "Updates" (via the MERGE) partition switching won't help.

    I you have a way to split the "new data" into two sets there may be value in...

    Split 1: just yesterday's data enters the table via partition switching-in.

    Split 2: data older than yesterday enters the table via MERGE (hopefully just a handful of rows)

    But just doing the MERGE is a lot less complex. The added complexity of the above logic and adding partitioning would be weighted against the speed value provided. It may not be worth it.

    -Jim

  • SPAM reported.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply