which way is the better for 1 million records inserting into one table

Question

which way is the better for 1 million records inserting into one table

cooljagadeesh

SSC Eights!

Points: 959
More actions
March 27, 2014 at 9:10 pm

#291550

which way is the better for 1 million records inserting into one table
1) insert into destination
select * from source
2) BULK INSERT destination
FROM '\\computer\source.txt';
3) BCP

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

MyDoggieJessie SSC-Forever Points: 44484 More actions · Answer 1

If the source data isn't too "wide", BCP would work quite well, but I'd recommend using SSIS so you can take more advantage of simultaneous/parallel threads

Take a quick look through these great articles:

Optimizing Bulk Import Performance

http://msdn.microsoft.com/en-us/library/ms190421(v=sql.105).aspx

The Data Loading Performance Guide

http://technet.microsoft.com/en-us/library/dd425070(SQL.100).aspx

We Loaded 1TB in 30 Minutes with SSIS, and So Can You

http://msdn.microsoft.com/en-us/library/dd537533(v=sql.100).aspx

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 2

That article on loading 1TB in 30 minutes seems to be impressive testament for SSIS but they had 4 instances of SSIS running 56 streams to a 64 processor database using 56 different tables (later, switched into a paritioned table) on a full hardware package (including a killer disk system with 165 spindles and 8GB fiber channels) that would have made the makers of Cray green with envy. From what I can see, SSIS didn't have anything to do with the performance. If they had executed 56 parallel streams using BULK INSERT, they likely would have gotten even better performance and they wouldn't have needed the extra 4 servers that SSIS lived on. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

MyDoggieJessie SSC-Forever Points: 44484 More actions · Answer 3

Fair point 🙂

For ease of use, I'd go with BCP. Which would you recommend?

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Eirikur Eiriksson SSC Guru Points: 182879 More actions · Answer 4

cooljagadeesh (3/27/2014)
which way is the better for 1 million records inserting into one table
1) insert into destination
select * from source
2) BULK INSERT destination
FROM '\\computer\source.txt';
3) BCP

Quick questions; is this a one off thing? Does it need to be managed, logged, reported? Are all files in the same format? Is speed more important than server load? :w00t: Or is there a file in the first place? Is it maybe a table to table load? 😎

The options you list imply both table and file source.

Jeff Moden SSC Guru Points: 1003851 More actions · Answer 5

MyDoggieJessie (3/28/2014)
Fair point 🙂
For ease of use, I'd go with BCP. Which would you recommend?

If there are no "double-hop" or server trust problems for BULK INSERT, I'd go with BULK INSERT over BCP. That way, it can all be done from T-SQL without someone getting bent out of shape about using xp_CmdShell or having to do things from a command prompt or etc, etc. In the absence of xp_CmdShell, it's also a whole lot easier to make dynamic insofar as source and destination than using BCP. With that in mind, you can actually have a proc that does like they did in the 1TB/30Minute example insofar as parallel loads go. You can have a stored proc that creates multiple jobs and starts them all. Each job would poke a table with an "I'm done" marker that the main proc could check for completion. The jobs could be self-dropping or you could modify them from the main proc. Of course, such parallel loads don't do much for performance if your destination table/partitions are all on the same disk(s) regardless of how many files/filegroups you have. At best, I've only seen a nearly insignificant gain in load performance that way (same set of spindles) and I've also seen MUCH worse performance because the poor ol' read/write heads spend so much time thrashing on the same spindles.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)