which way is the better for 1 million records inserting into one table

  • which way is the better for 1 million records inserting into one table

    1) insert into destination

    select * from source

    2) BULK INSERT destination

    FROM '\\computer\source.txt';

    3) BCP

  • If the source data isn't too "wide", BCP would work quite well, but I'd recommend using SSIS so you can take more advantage of simultaneous/parallel threads

    Take a quick look through these great articles:

    Optimizing Bulk Import Performance

    http://msdn.microsoft.com/en-us/library/ms190421(v=sql.105).aspx

    The Data Loading Performance Guide

    http://technet.microsoft.com/en-us/library/dd425070(SQL.100).aspx

    We Loaded 1TB in 30 Minutes with SSIS, and So Can You

    http://msdn.microsoft.com/en-us/library/dd537533(v=sql.100).aspx

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • That article on loading 1TB in 30 minutes seems to be impressive testament for SSIS but they had 4 instances of SSIS running 56 streams to a 64 processor database using 56 different tables (later, switched into a paritioned table) on a full hardware package (including a killer disk system with 165 spindles and 8GB fiber channels) that would have made the makers of Cray green with envy. From what I can see, SSIS didn't have anything to do with the performance. If they had executed 56 parallel streams using BULK INSERT, they likely would have gotten even better performance and they wouldn't have needed the extra 4 servers that SSIS lived on. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Fair point 🙂

    For ease of use, I'd go with BCP. Which would you recommend?

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • cooljagadeesh (3/27/2014)


    which way is the better for 1 million records inserting into one table

    1) insert into destination

    select * from source

    2) BULK INSERT destination

    FROM '\\computer\source.txt';

    3) BCP

    Quick questions; is this a one off thing? Does it need to be managed, logged, reported? Are all files in the same format? Is speed more important than server load? :w00t: Or is there a file in the first place? Is it maybe a table to table load? 😎

    The options you list imply both table and file source.

  • MyDoggieJessie (3/28/2014)


    Fair point 🙂

    For ease of use, I'd go with BCP. Which would you recommend?

    If there are no "double-hop" or server trust problems for BULK INSERT, I'd go with BULK INSERT over BCP. That way, it can all be done from T-SQL without someone getting bent out of shape about using xp_CmdShell or having to do things from a command prompt or etc, etc. In the absence of xp_CmdShell, it's also a whole lot easier to make dynamic insofar as source and destination than using BCP. With that in mind, you can actually have a proc that does like they did in the 1TB/30Minute example insofar as parallel loads go. You can have a stored proc that creates multiple jobs and starts them all. Each job would poke a table with an "I'm done" marker that the main proc could check for completion. The jobs could be self-dropping or you could modify them from the main proc. Of course, such parallel loads don't do much for performance if your destination table/partitions are all on the same disk(s) regardless of how many files/filegroups you have. At best, I've only seen a nearly insignificant gain in load performance that way (same set of spindles) and I've also seen MUCH worse performance because the poor ol' read/write heads spend so much time thrashing on the same spindles.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply