The Multi Phase Data Pump

Question

The Multi Phase Data Pump

Dinesh Asanka

SSChampion

Points: 11058
More actions
March 21, 2005 at 10:30 am

#102451

Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/dasanka/themultiphasedatapump.asp

My Blog: http://dineshasanka.spaces.live.com/

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

philcart SSC-Forever Points: 47797 More actions · Answer 1

Nice article, but would have liked a mention that using the Multi-phase Datapump to transform data will switch the datapump to row-by-row processing. This is something that people should be aware of and can lead to data loads taking much longer than if a normal datapump copy column task is used.

Also, you don't have to use all the phases. In a lot of my packages I use just the PumpComplete phase. The VBScript in this phase uses the DTSTransformPhaseInfo object to capture the number of rows exported to the file. This object also has an ErrorRows property that holds the number of errors encountered.

--------------------
Colt 45 - the original point and click interface

Dinesh Asanka SSChampion Points: 11058 More actions · Answer 2

yes u are correct phillcart.

I just enable all the phases just to give an idea as u said it is not a compulsory thing to do.

My Blog: http://dineshasanka.spaces.live.com/

Peter Kryszak SSCarpal Tunnel Points: 4398 More actions · Answer 3

Our daily DTS data pumps normally run without errors, but there are a lot of messages written to the log file. If errors do occur, the messages don't help much and we have to investigate the data file in any case. It seems that we can probably reduce those messages to indicate only success or failure by using a tailored multiphase data pump.

Please comment on the differences between the default and a multiphase data pump in terms of what the default actions are. What might be involved in creating a less verbose data pump?

philcart SSC-Forever Points: 47797 More actions · Answer 4

As I mentioned above the main difference is if you use the Multi Phse datapump in data transformations. This switches the datapump into row-by-row processing instead of a bulk load.

If you are able to accept the performance hit you can develop a pretty sophisticated data loadng routine that traps data conversion and key violation errors. The rows in error can be redirected to an exception table for later examination.

If you're loading a lot of data then a better approach would be to bulk load the file directly into a staging table that has all the fields defined as varchar. Then you can run validation routines on the data without affecting performance.

--------------------
Colt 45 - the original point and click interface