Add million of records

  • Hi all, I have to transfer around 50 million records from one table to another. What should be the best approach. I know it can be done batchwise in while loop but is there any other way to do it faster ?

    The database is in simple recovery model.

  • Is it essential that the table be available for users during this bulk load operation, and is the table partitioned?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • sqlenthu 89358 (9/14/2016)


    Hi all, I have to transfer around 50 million records from one table to another. What should be the best approach. I know it can be done batchwise in while loop but is there any other way to do it faster ?

    The database is in simple recovery model.

    1) If the tlog is big enough to handle this without growth then the fastest way would be all-at-once I think, assuming (potential) blocking isn't a problem.

    2) To be most efficient you need to get minimally-logged INSERT. Read up on the requirements for that. If the target table is already populated you may be unable to achieve that. Also, if there are not too many records in the target table it could be best to drop all indexes, insert data, recreate indexes.

    3) Batching is VERY useful where you need to minimize/control blocking, server hit, tlog growth, etc.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Unless there are no other variables here with the destination table (i.e.: indexes, blocking, partitioning, etc), this shouldn't take that long either all at once or in batch under simple recovery model. For example, in most cases, copying over a million records from one table to another on average with most of the systems I work with takes seconds, not minutes. Times it does not include wide tables with a lot of bytes and destination tables with large indexes, heavy usage and so forth.

  • Eric M Russell (9/14/2016)


    Is it essential that the table be available for users during this bulk load operation, and is the table partitioned?

    Table is not partitioned and also system will be kept down while performing the process. There are non clusteed indexes in the target table and target table currently contains around 120 million records.

  • sqlenthu 89358 (9/14/2016)


    Eric M Russell (9/14/2016)


    Is it essential that the table be available for users during this bulk load operation, and is the table partitioned?

    Table is not partitioned and also system will be kept down while performing the process. There are non clusteed indexes in the target table and target table currently contains around 120 million records.

    In that case just take whatever steps are required to optimize bulk import process. If the source is a table or query, then BCP out to a native format file, and then BCP into the target table.

    Off the top of my head do the following:

    Simple or Bulk recovery model.

    Disable triggers, non-clustered indexes, and foreign key constraints.

    Use table locking (TABLOCK).

    If there is a clustered index, then sort the source file or query to match.

    But also read up on the following:

    Prerequisites for Minimal Logging in Bulk Import

    https://technet.microsoft.com/en-us/library/ms190422(v=sql.105).aspx

    Optimizing Bulk Import Performance

    https://technet.microsoft.com/en-us/library/ms190421(v=sql.105).aspx

    Controlling the Sort Order When Bulk Importing Data

    https://technet.microsoft.com/en-us/library/ms177468(v=sql.105).aspx

    Also, any indexes left enabled during the bulk insert operation will probably be significantly fragmented. You can ALTER INDEX ALL ON ... REBUILD ... afterward, but the process of disabling and then re-enabling an index will rebuild it the same, so you might as well keep the indexes disabled from the start for performance reasons and then re-enable afterward.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • sqlenthu 89358 (9/14/2016)


    Hi all, I have to transfer around 50 million records from one table to another. What should be the best approach. I know it can be done batchwise in while loop but is there any other way to do it faster ?

    The database is in simple recovery model.

    You're in the SIMPLE Recovery Model. That's a good thing. The next question is, will you need the Clustered Index on that table? If so, doing it during the data insert can be minimally logged but you need to...

    1. Create the table.

    2. Add the clustered index

    3. Add a WITH (TABLOCK) to the table in the INSERT clause and...

    4. Add an ORDER BY that is the same as the clustered index.

    5. Run the code AFTER you have ensured all the above has been done correctly. Additional, see the "undocumented knowledge" tip below before you run.

    Basically, this allows the clustered index to be built for free insofar as MDF and LDF space is concerned. I did this just last night during a demonstration of a 7 Million row table where the rows where 1,040 bytes wide. Total run time was a little over 2 minutes.

    A bit of undocumented knowledge... if the query that does the INSERT/SELECT has ANY variables or uses TOP, you MUST use OPTION(RECOMPILE) in order for it to minimally log. And yeah... the demo was actually on a 2008 box.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/14/2016)


    sqlenthu 89358 (9/14/2016)


    Hi all, I have to transfer around 50 million records from one table to another. What should be the best approach. I know it can be done batchwise in while loop but is there any other way to do it faster ?

    The database is in simple recovery model.

    You're in the SIMPLE Recovery Model. That's a good thing. The next question is, will you need the Clustered Index on that table? If so, doing it during the data insert can be minimally logged but you need to...

    1. Create the table.

    2. Add the clustered index

    3. Add a WITH (TABLOCK) to the table in the INSERT clause and...

    4. Add an ORDER BY that is the same as the clustered index.

    5. Run the code AFTER you have ensured all the above has been done correctly. Additional, see the "undocumented knowledge" tip below before you run.

    Basically, this allows the clustered index to be built for free insofar as MDF and LDF space is concerned. I did this just last night during a demonstration of a 7 Million row table where the rows where 1,040 bytes wide. Total run time was a little over 2 minutes.

    A bit of undocumented knowledge... if the query that does the INSERT/SELECT has ANY variables or uses TOP, you MUST use OPTION(RECOMPILE) in order for it to minimally log. And yeah... the demo was actually on a 2008 box.

    Don't have clustered Indes in either source or target table. Thank re are non clustered indexes in target table which I will disable before loading. Is it good idea to load data in batches or in one go ?

  • sqlenthu 89358 (9/15/2016)


    Jeff Moden (9/14/2016)


    sqlenthu 89358 (9/14/2016)


    Hi all, I have to transfer around 50 million records from one table to another. What should be the best approach. I know it can be done batchwise in while loop but is there any other way to do it faster ?

    The database is in simple recovery model.

    You're in the SIMPLE Recovery Model. That's a good thing. The next question is, will you need the Clustered Index on that table? If so, doing it during the data insert can be minimally logged but you need to...

    1. Create the table.

    2. Add the clustered index

    3. Add a WITH (TABLOCK) to the table in the INSERT clause and...

    4. Add an ORDER BY that is the same as the clustered index.

    5. Run the code AFTER you have ensured all the above has been done correctly. Additional, see the "undocumented knowledge" tip below before you run.

    Basically, this allows the clustered index to be built for free insofar as MDF and LDF space is concerned. I did this just last night during a demonstration of a 7 Million row table where the rows where 1,040 bytes wide. Total run time was a little over 2 minutes.

    A bit of undocumented knowledge... if the query that does the INSERT/SELECT has ANY variables or uses TOP, you MUST use OPTION(RECOMPILE) in order for it to minimally log. And yeah... the demo was actually on a 2008 box.

    Don't have clustered Indes in either source or target table. Thank re are non clustered indexes in target table which I will disable before loading. Is it good idea to load data in batches or in one go ?

    Heh... hard for me to imagine why anyone would have a 50 million row HEAP but that actually works out for you, in this case.

    If you do INSERT/SELECT into a HEAP using WITH (TABLOCK) on the INSERT clause (and OPTION(RECOMPILE) if there are any variables in the statement), you can do multiple inserts into the HEAP and they'll all be minimally logged. The advantages of doing it in batches are...

    1) You have a natural method of monitoring progress instead of a single "Hail Mary" copy of 50 Million rows with no indication of progress.

    2) If something goes haywire along the way, you will still have what you've done up 'til the batch that failed.

    So, same steps that I outlined before except you won't need to create the clustered index and you won't need to do an ORDER BY.

    With all that in mind, it may be a whole lot simpler to use BCP to export the data using the "NATIVE" format and then import the data using either BCP or BULK INSERT (with the TABLOCK option) because both of those have a Batch Size option already built in. You, basically, would need to write one line of code to do the export and one line of code to do the import. BCP and BULK INSERT will auto-magically do the rest for you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply