• admin (3/5/2008)


    Definitely an ingenious solution; however, it can quickly become a poor approach when having to deal with larger import file -think telephone call records for instance.

    I've found that the best approach is -and forgive me for still using DTS! :)- the judicious application of modest VB scripting to quickly manipulate suspicous data, or 'cleansing' done directly within the package and temp table.

    Actually, I've used the method in this article for CDR (Call Detail Records) cleansing... Consider that bulk insert will import 5.1 million rows in 60 seconds and BCP will do an export in about the same time... that means you can do a 5.1 million row "clean up" of this nature at the rate of 5.1 million rows in about 3 minutes (1 export, 2 imports).

    Of course, CDRs are normally in a fixed field format and it's a lot cheaper to just input the file into a single column and use substring to split the records whilst ignoring the "short rows". Works nasty fast. Another advantage is for CDR files like what some of the "Bell" companies send... they send a mixed bag... they use a certain "record indicator" in the same postion across multiple record types that have different layouts (CDRs vs Tax Records vs records counts for both). A simple substring "detector" allows me to filter out all but the rows that I want to split.

    Heh... and forgive me for not using DTS! I've actually never learned how to use it because I can normally beat the guys at work that do use it for performance.

    And, no... I'm not making fun of anyone who uses DTS... if you take it that way, then I apologize. I'm just saying I've always been able to beat DTS with T-SQL especially on CDR files.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)