I've read some articles stating that bulk insert is quicker than using transforms. Has anyone got some benchmark figures or seen any claims on how much faster it's supposed to be?
SQL Server Magazine did a comparisopn showdown a few years ago... the article can be found here:
The bulk insert will undoubtedly be much faster than using transforms, but an important question is what kind of transforms? If you are modifying data in transforms, then these operations will have to be done in a separate UPDATE after the bulk insert. If you need transforms to clean or correct data (illegal dates, formatting issues, etc.), the bulk copy will probably fail. If the DTS package is running on a client PC, the transformation workload is on the client instead of the server, although you would need some heavy-duty transformation logic for this to outweigh the speed advantage of a minimally-logged bulk insert.
There are exceptions, but I would say use the bulk insert if it will get the job done. Even if an UPDATE or INSERT/SELECT is needed to modify the data, this will probably run faster in T-SQL than in ActiveX transforms. I can't say how much faster, there are too many variables.
Bulk insert is the fastest way to load data into SQL Server, especially if it is minimally logged. The data source can only be a text file.
The fastest way to get the data out of Oracle is to dump it to a local disk. The fastest way to transfer the data is to copy the file (even faster if you use FTP) rather than a direct connection that will have row-by-row overhead. And the fastest way to load the data in SQL Server is bulk insert. There are more steps involved, but anyone who claims this will not be faster than a 3-hour DTS transfer either doesn't know what they are talking about or is intentionally lying.
They may not have space on the Oracle server to dump the table to, or they may not want to create a share you can read from for security reasons. (An answer is to dump the file to a network share, or a share on the SQL Server). If they had a valid reason for not wanting to use a CSV file I could understand, but if they simply say they don't think it will be faster then they are wrong. It's easy enough to try it once to see what the difference is.
I'm not an Oracle expert, but I've heard from Oracle users that it can be difficult to export data. It sounds to me they just want this to be your job (maintaining the DTS package) or have it percieved as a SQL Server performance issue, rather than expending any effort to do the job correctly.
I don't want this to sound like a flame, but if you haven't seen a speed difference between a DTS data pump task and a bulk insert then either you haven't tried to copy a large enough table or you're not meeting the criteria for minimally-logged bulk copy. There are limitations on database recovery mode and destination table indexes and triggers that can prevent a bulk insert from running as fast as possible.
I have to transfer data back and forth to a Teradata system, and the DTS data pump task is so slow it is unusable for all but the smallest datasets. This is a combination of bad Teradata drivers (IMHO), and a slow VPN connection where the latency affects the row-by-row transfer handshaking. Teradata has their own data transfer utilities that collect rows into large blocks for network transfer, use multiple connections in parallel, and use bulk copy API commands to get the data into SQL Server. These utilities are a pain to set up and much more limited than DTS, but are many times faster. This may be a worst-case scenerio (at least the worst one I've had to deal with), but the speed difference is huge. It's quite possible the Oracle-SQL connection works better than the Teradata-SQL connection, but I haven't used it myself.
I also have to work with remote SQL Servers, also over a VPN but one with less latency. I use DTS packages extensively to move data back and forth, but at some point in the 10MB-100MB range it becomes worthwhile to switch to a bcp out - zip - ftp - bcp in process. If the file is big enough to notice the difference between FTP and Windows Explorer file copy, you should notice a speed improvement with bulk insert.
There is also the issue of the performance impact on the Oracle server. The 3-hour data transfer is using resources on their system the whole time. A data export would finish very quickly, the subsequent copy and import operations would be completly independent of the Oracle database.
You may not want to sound like a flame but you do sound a little miffed. One problem is that this issue is on a customer site, we have no way of creating an equivalent size file on our development server and we do not have the same network, therefore I haven't proved the difference on this myself exactly, however I have seen that it is much better. We can't set up a test until they agree what is to be tested and currently they are reluctant to test this method. My main issue is that between us and a third party Oracle house any official quotes from recognised bodies would hold much more sway than anything we can say that we've proved and may help persuade them to do a test, hence proving to them that this is the way to go.
I didn't want my reply to Todd to sound testy. I believe he was expressing an honest opinion when he said he had only seen minimal speed differences. I have had other experiences and wanted to share them without sounding like I was trying to shout him down. It's amazing how much you can learn here by having open discussions with others and listening to why they disagree with you.
On the other hand the people you're dealing with would get under my skin pretty quickly, at least from the way you've described the situation. They're a customer, so you have to be professional and keep it to yourself, but I think I be way past "miffed" by now. There are perfectly legitimate reasons not to use bulk insert, but those aren't the reasons they're giving you.
There could be other good reasons, but I don't know enough about Oracle to speculate. I find it hard to believe that dumping data to a csv file is some fiendishly difficult operation in Oracle that would bring the server to its knees. Since they don't seem to have legitimate excuses, let's speculate on some alternatives.
Their approach seems to be: "We don't believe you. We can't say why, but we can block your access to the resources to prove us wrong. So there, Nyaah!" This doesn't seem professional to me.
If someone disagrees with my technical opinion but can't or won't bother to explain why, I have to assume ignorance, stupidity, and/or malice on their part. Even if I am older, overweight, overpaid, and have a bad personality.
PS: Maybe I'm too old to feel the need to insert smileys to indicate humor or sarcasm. Feel free to mentally add them while reading.