Fast Data Loading with Partition Switching Strategy

  • I’m looking for clearity on partition switching. The idea is to use many BULK INSERT statements into table dbo.X_n in parallel and when BULK INSERT for table dbo.X_n is completed, switch dbo.X_n into dbo.bigdaddy. I think this is the fastest way to upload a couple hundred GB of data.

    In learning about partition switching (in part) from The Data Loading Performance Guide under Partition SWITCH, I hear the instructions to say copy the main table exactly to become a target. But in that same step (#1), I read that we need to change the default file group of the target (dbo.X_n) from the default file group. Then it says I need to match indexes and lists the filegroup as something we need to match with the main table.

    As an overview of the partition switching strategy, I think the whole point of BULK INSERT with partitioning is to have seperate files (in same group) to enable concurrent uploading where each table has its own file. Once the upload is completed to a table (dbo.X_n) then we do the partition switch into the main table (dbo.bigdaddy). The data we just uploaded doesn’t actually move, just the metadata for it.

    When I read the instructions linked above, I hear

    “Don’t have the same filegroup on your target as the main table. You must have the same filegroup on your target as the main table.”

    I’m confused. Where am I disconnected?

  • You've got it right that partition switching doesn't physically move the partitioned data. It simply changes pointers that define which table the partitioned data is owned by. But for this to happen, you have to use the same partition scheme and partition function for the table(s) X_n and for table BigDaddy. The partition scheme points to the filegroups that will be used.

    What you are proposing only makes sense if each X_n table being loaded has a logical separation from all the other X_n tables that is identical to that used for partitioning. For example an individual year, or an individual state or province, or an individual client. The partition scheme and function will have to use the same column that logically separates each X_N table.

    Also, partition switching is all or nothing. If BigDaddy already has a partition with data for 2012, for example, you would have to swap the existing 2012 partition out before swapping X_2012 in. If you want to preserve the existing data, you're back to traditional merging.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply