Handling strings methodology

  • It appears to me that I have basically 3 different methods in handling strings that are of wrong lengths from text file into the destination table:

    1) Edit the source using advanced editor and set the column mapping accordingly

    2) Use derived column task (e.g. SUBSTRING)

    3) Use data conversion task

    I found out that data conversion is of no help if the incoming data is too wide (e.g. it's varchar(50) but destination table can only have varchar(20). Even though data conversion explicitly set the converted data to varchar(20), this still fails in truncation error which seems counterintuitive to me. Why would it be a truncation error when it's being converted explicitly to a shorter string?

    It seems to me that editing the source task via advanced editor would take minimum work and wouldn't require intervening tasks such as derived columns but I also see lot of questions here about using those tasks so I wonder if I'm actually doing something more risky than if I were to use a derived task or data conversion task? Why should I use either derived column or data conversion task?

    If anyone know of a good article that covers the questions I'm trying to get answer, I'll be happy to have the links so I can further my research and understand the ramification better.

    Thank you.

  • Editing the source could give you the same truncation error as the conversion does (unless you tell it to ignore truncation).

    My preference would be derived column. Then you can specify exactly what you want and it is explicitly clear that you are expecting data of possibly greater length, but are intentionally only taking the first X characters.

  • Nevyn, if it is true that source also would give truncation errors, then it makes default even more stranger - the default is to use string with length of 50 for text file -- those are delimited text file and as such there is no really length restriction. The "correct" default should be DT_NTEXT or something equivalent, though I assume they don't do that to avoid making the buffer ridiculously big.

    Nonetheless, thanks for sharing the opinion. I'll look into using derived column instead of data conversion.

  • Banana-823045 (11/12/2013)


    Nevyn, if it is true that source also would give truncation errors, then it makes default even more stranger -

    The default would be strange either way. If it did not generate an error, the default could truncate when the user did not expect it, which in some cases is a worse outcome since it is hidden. At least an error leads to investigation of your assumptions.

    And I did do a simple test before posting that, where reducing the column size generated an error.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply