RE: Add million of records – SQLServerCentral

SSC Guru

Points: 1003873

September 15, 2016 at 9:22 am

#1901151

sqlenthu 89358 (9/15/2016)
Jeff Moden (9/14/2016)
sqlenthu 89358 (9/14/2016)
Hi all, I have to transfer around 50 million records from one table to another. What should be the best approach. I know it can be done batchwise in while loop but is there any other way to do it faster ?
The database is in simple recovery model.
You're in the SIMPLE Recovery Model. That's a good thing. The next question is, will you need the Clustered Index on that table? If so, doing it during the data insert can be minimally logged but you need to...
1. Create the table.
2. Add the clustered index
3. Add a WITH (TABLOCK) to the table in the INSERT clause and...
4. Add an ORDER BY that is the same as the clustered index.
5. Run the code AFTER you have ensured all the above has been done correctly. Additional, see the "undocumented knowledge" tip below before you run.
Basically, this allows the clustered index to be built for free insofar as MDF and LDF space is concerned. I did this just last night during a demonstration of a 7 Million row table where the rows where 1,040 bytes wide. Total run time was a little over 2 minutes.
A bit of undocumented knowledge... if the query that does the INSERT/SELECT has ANY variables or uses TOP, you MUST use OPTION(RECOMPILE) in order for it to minimally log. And yeah... the demo was actually on a 2008 box.
Don't have clustered Indes in either source or target table. Thank re are non clustered indexes in target table which I will disable before loading. Is it good idea to load data in batches or in one go ?

Heh... hard for me to imagine why anyone would have a 50 million row HEAP but that actually works out for you, in this case.

If you do INSERT/SELECT into a HEAP using WITH (TABLOCK) on the INSERT clause (and OPTION(RECOMPILE) if there are any variables in the statement), you can do multiple inserts into the HEAP and they'll all be minimally logged. The advantages of doing it in batches are...

1) You have a natural method of monitoring progress instead of a single "Hail Mary" copy of 50 Million rows with no indication of progress.

2) If something goes haywire along the way, you will still have what you've done up 'til the batch that failed.

So, same steps that I outlined before except you won't need to create the clustered index and you won't need to do an ORDER BY.

With all that in mind, it may be a whole lot simpler to use BCP to export the data using the "NATIVE" format and then import the data using either BCP or BULK INSERT (with the TABLOCK option) because both of those have a Batch Size option already built in. You, basically, would need to write one line of code to do the export and one line of code to do the import. BCP and BULK INSERT will auto-magically do the rest for you.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)