How much faster is bulk insert?

  • I've read some articles stating that bulk insert is quicker than using transforms. Has anyone got some benchmark figures or seen any claims on how much faster it's supposed to be?

  • SQL Server Magazine did a comparisopn showdown a few years ago... the article can be found here:

    http://www.windowsitpro.com/SQLServer/Articles/ArticleID/19760/pg/2/2.html

    -Mike Gercevich

  • The bulk insert will undoubtedly be much faster than using transforms, but an important question is what kind of transforms?  If you are modifying data in transforms, then these operations will have to be done in a separate UPDATE after the bulk insert.  If you need transforms to clean or correct data (illegal dates, formatting issues, etc.), the bulk copy will probably fail.  If the DTS package is running on a client PC, the transformation workload is on the client instead of the server, although you would need some heavy-duty transformation logic for this to outweigh the speed advantage of a minimally-logged bulk insert.

    There are exceptions, but I would say use the bulk insert if it will get the job done.  Even if an UPDATE or INSERT/SELECT is needed to modify the data, this will probably run faster in T-SQL than in ActiveX transforms.  I can't say how much faster, there are too many variables.

  • Thank you for replies. The transform is a simple one, all work on the data is done using updates after the update.  We are currently importing a table that takes 2h 40m and need evidence to support our theory that being provided with a csv will speed this up. The  CSV would be created by those responsible for the Oracle database that we are importing from and they are disputing that an extract, move and load would be quicker than Oracle to SQL DTS.  I was hoping that there were official figures to back up my theory to add to the sample testing that I can do myself.

  • Bulk insert is the fastest way to load data into SQL Server, especially if it is minimally logged.  The data source can only be a text file.

    The fastest way to get the data out of Oracle is to dump it to a local disk.  The fastest way to transfer the data is to copy the file (even faster if you use FTP) rather than a direct connection that will have row-by-row overhead.  And the fastest way to load the data in SQL Server is bulk insert.  There are more steps involved, but anyone who claims this will not be faster than a 3-hour DTS transfer either doesn't know what they are talking about or is intentionally lying.

    They may not have space on the Oracle server to dump the table to, or they may not want to create a share you can read from for security reasons.  (An answer is to dump the file to a network share, or a share on the SQL Server).  If they had a valid reason for not wanting to use a CSV file I could understand, but if they simply say they don't think it will be faster then they are wrong.  It's easy enough to try it once to see what the difference is.

    I'm not an Oracle expert, but I've heard from Oracle users that it can be difficult to export data.  It sounds to me they just want this to be your job (maintaining the DTS package) or have it percieved as a SQL Server performance issue, rather than expending any effort to do the job correctly.

  • They do the export for another table we use in another system OK. I think it's just that they only have the Oracle experience and not MS SQL or DTS.

  • I personally always use the transformation task since it's the only one that can go both ways. As long as you specify a decent batch size the speed difference between the transformation task and the bulk insert task is minimal from my experience.

  • I don't want this to sound like a flame, but if you haven't seen a speed difference between a DTS data pump task and a bulk insert then either you haven't tried to copy a large enough table or you're not meeting the criteria for minimally-logged bulk copy.  There are limitations on database recovery mode and destination table indexes and triggers that can prevent a bulk insert from running as fast as possible.

    I have to transfer data back and forth to a Teradata system, and the DTS data pump task is so slow it is unusable for all but the smallest datasets.  This is a combination of bad Teradata drivers (IMHO), and a slow VPN connection where the latency affects the row-by-row transfer handshaking.  Teradata has their own data transfer utilities that collect rows into large blocks for network transfer, use multiple connections in parallel, and use bulk copy API commands to get the data into SQL Server.  These utilities are a pain to set up and much more limited than DTS, but are many times faster.  This may be a worst-case scenerio (at least the worst one I've had to deal with), but the speed difference is huge.  It's quite possible the Oracle-SQL connection works better than the Teradata-SQL connection, but I haven't used it myself.

    I also have to work with remote SQL Servers, also over a VPN but one with less latency.  I use DTS packages extensively to move data back and forth, but at some point in the 10MB-100MB range it becomes worthwhile to switch to a bcp out - zip - ftp - bcp in process.  If the file is big enough to notice the difference between FTP and Windows Explorer file copy, you should notice a speed improvement with bulk insert.

    There is also the issue of the performance impact on the Oracle server.  The 3-hour data transfer is using resources on their system the whole time.  A data export would finish very quickly, the subsequent copy and import operations would be completly independent of the Oracle database.

  • Scott,

    You may not want to sound like a flame but you do sound a little miffed.  One problem is that this issue is on a customer site, we have no way of creating an equivalent size file on our development server and we do not have the same network, therefore I haven't proved the difference on this myself exactly, however I have seen that it is much better. We can't set up a test until they agree what is to be tested and currently they are reluctant to test this method. My main issue is that between us and a third party Oracle house any official quotes from recognised bodies would hold much more sway than anything we can say that we've proved and may help persuade them to do a test, hence proving to them that this is the way to go.

  • I didn't want my reply to Todd to sound testy.  I believe he was expressing an honest opinion when he said he had only seen minimal speed differences.  I have had other experiences and wanted to share them without sounding like I was trying to shout him down.  It's amazing how much you can learn here by having open discussions with others and listening to why they disagree with you.

    On the other hand the people you're dealing with would get under my skin pretty quickly, at least from the way you've described the situation.  They're a customer, so you have to be professional and keep it to yourself, but I think I be way past "miffed" by now.  There are perfectly legitimate reasons not to use bulk insert, but those aren't the reasons they're giving you.

    • It could be a security issue.  They may be able to allow you a database connection, but don't want to open a file share on their system or have their system access a file share on yours.  Wherever the CSV file is dumped, the import process probably has to delete the file when finished, so both systems need write access to the file share.  (They export another file for you, so this doesn't seem to be a likely reason.)
    • It could be a disk issue.  Most DBAs would be aghast at the suggestion to put a file share on their data drives, and possibly their system doesn't have an extra drive set aside for ETL operations.  (This could be possible if this export would be much larger than the other one.)
    • The duration of the transfer is less important than the fact that the simple DTS transfer is easy to set up and run, vs the more complicated dump - copy - import process.

    There could be other good reasons, but I don't know enough about Oracle to speculate.  I find it hard to believe that dumping data to a csv file is some fiendishly difficult operation in Oracle that would bring the server to its knees.  Since they don't seem to have legitimate excuses, let's speculate on some alternatives.

    • A DTS transfer would be your responsibility, so they don't want to consider an alternative where they might have to do something.  This could be a blessing in disguise, it's under your control and they can't screw it up.
    • It could be turf wars.  They're a third party, they don't want to cooperate with your group or (if you're more paranoid) may be actively hostile to your project.  If your project succeeds, will it mean less money for them?
    • They could be extreme Oracle bigots who don't want to foul themselves by cooperating with minions of the Beast of Redmond.
    • They don't know how to do what you're asking and don't want to expose their ignorance.
    • They hate you personally because you're younger/smarter/more attractive/better paid/drive a nicer car/have a better personality.  Well, if they're getting Oracle DBA salaries then "better paid" probably doesn't belong on the list, but I'm willing to believe anything else on the list you want me to.
    • They like sitting in meetings discussing problems and being confrontational more than they like tinkering with databases and solving problems.

    Their approach seems to be: "We don't believe you.  We can't say why, but we can block your access to the resources to prove us wrong.  So there, Nyaah!"  This doesn't seem professional to me.

    If someone disagrees with my technical opinion but can't or won't bother to explain why, I have to assume ignorance, stupidity, and/or malice on their part.  Even if I am older, overweight, overpaid, and have a bad personality.

    PS: Maybe I'm too old to feel the need to insert smileys to indicate humor or sarcasm.  Feel free to mentally add them while reading.

  • Bueno, luego de leerlo todo, trataré de seguir las mejores técnicas en el plan de réplica de datos desde una base de datos DB2 a otra SQL Server 2012, luego les contaré al final del proyecto cuales fueron los caminos elegidos y por que.

    Best Regards, MauricioP.

  • Google Translate (5/21/2013)


    Well, after reading it all, try to follow the best techniques on data replication scheme from a DB2 database to another SQL Server 2012, then I will tell you at the end of the project which were the paths chosen and why.

    Best Regards, MauricioP.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • mike gercevich (11/15/2005)


    SQL Server Magazine did a comparisopn showdown a few years ago... the article can be found here:

    <A href="http://www.windowsitpro.com/SQLServer/Articles/ArticleID/19760/pg/2/2.html">http://www.windowsitpro.com/SQLServer/Articles/ArticleID/19760/pg/2/2.html</A&gt;

    -Mike Gercevich

    The article has moved here: http://sqlmag.com/t-sql/showdown-bcp-vs-dts

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply