Moving Records in Batches - 7 Tables - Several Million Records

  • Hello All,

    I know there are multiple ways to move records between tables but I'm trying to come up with a good way to go about moving millions of records from multiple tables into another set of tables. The reasoning for this move is pretty complicated but it's a necessity. Here are the basics,

    - 7 Tables, let's call them tables temp1 thru temp 7

    - Each Table has 1 million + records that need to be migrated to 7 tables within a different database, let's call those tables arc_temp1 thru arc_temp7

    - I would prefer to run this in batches overnight, say 1k or 10k per batch at a time

    - I would like the batches to go in order and then loop, i.e. move 10k from temp1, then move 10k from temp2, then move 10k from temp 3, etc. and then loop back to temp1 and start again

    - Although i need to move millions of records, this can be done over several nights and I would like to only run this for maybe 4 hours at a time

    I've done rowcount with inserts before but I don't know if this is the best way to go about it, i've also used cursors but with millions of rows I think this would be a not-so efficient way of going about it, i've never used SSIS to do a move job like this before but I could probably figure that out as well if you guys think it would be the best way.

    Also, you don't have to actually write any code, just a pointer in the right direction and I'll be able to figure it out.

    Thanks for the help

  • Are you using a particularly slow system?

    Millions of rows shouldn't be a major problem unless those rows are massively wide...for instance, on my PC I can shift 1000000 rows in 600 milliseconds.

    Anyway, the question at hand is how...well it all depends on what unique indexes you have on source and target, but if I was batching in the manner you suggested, I might look at using a while loop that tested for either duration of the run or an explicit end time, as well as whether there was any more data to process.

    Inside that loop, I would move X number of rows by DELETING from the source table with an OUTPUT INTO my target table, with either an inline TOP or a global SET ROWCOUNT to restrict the rows.

    I would do that for each table once per loop, with an explicit committed transaction for each table.

    Once you complete a loop without moving any more rows or the timeout has expired, the loop exits and the batch completes.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (8/1/2013)


    Are you using a particularly slow system?

    Millions of rows shouldn't be a major problem unless those rows are massively wide...for instance, on my PC I can shift 1000000 rows in 600 milliseconds.

    Anyway, the question at hand is how...well it all depends on what unique indexes you have on source and target, but if I was batching in the manner you suggested, I might look at using a while loop that tested for either duration of the run or an explicit end time, as well as whether there was any more data to process.

    Inside that loop, I would move X number of rows by DELETING from the source table with an OUTPUT INTO my target table, with either an inline TOP or a global SET ROWCOUNT to restrict the rows.

    I would do that for each table once per loop, with an explicit committed transaction for each table.

    Once you complete a loop without moving any more rows or the timeout has expired, the loop exits and the batch completes.

    I've done batch jobs like this in the past, our system isn't too slow but a few of the tables have over 100 million rows.

  • JoshDBGuy (8/1/2013)


    mister.magoo (8/1/2013)


    Are you using a particularly slow system?

    Millions of rows shouldn't be a major problem unless those rows are massively wide...for instance, on my PC I can shift 1000000 rows in 600 milliseconds.

    Anyway, the question at hand is how...well it all depends on what unique indexes you have on source and target, but if I was batching in the manner you suggested, I might look at using a while loop that tested for either duration of the run or an explicit end time, as well as whether there was any more data to process.

    Inside that loop, I would move X number of rows by DELETING from the source table with an OUTPUT INTO my target table, with either an inline TOP or a global SET ROWCOUNT to restrict the rows.

    I would do that for each table once per loop, with an explicit committed transaction for each table.

    Once you complete a loop without moving any more rows or the timeout has expired, the loop exits and the batch completes.

    I've done batch jobs like this in the past, our system isn't too slow but a few of the tables have over 100 million rows.

    After you copy the rows to the arch database, do you need to delete the rows from the original database?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, I'm using temporary tables to grab and record primary keys and I reference those tables for the deletes and inserts. Seems to be working ok. I'm able to do about 1 million rows every couple minutes.

  • Viewing 5 posts - 1 through 4 (of 4 total)

    You must be logged in to reply to this topic. Login to reply