invalid character in input file

  • Hi all, I'm being told I have to scrub invalid data in an input file.

    I have a 30 character description field with extended Ascii poop causing a failure in my SSIS package, trying to store this in a char column.

    Is there a simple way using sql to replace ascii values < 32 and > 127 with a space ' '?

    thanks in advance

  • Without actual test data all I can suggested is that you look up the REPLACE function.

    It is used in this manner:

    REPLACE(<old_text>, <start_num>, <num_chars>, <new_text>)

    If you would provide some sample data, someone can / will assist you in learning how to properly use the REPLACE function.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • If the data source is a file and its failing in SSIS then SQL is not going to help you unless you can get the data from the file into the database in the first place. What is the exact error message you're receiving from your SSIS package? Also, if you could post a screenshot of your Flat File Connection property pages that would help us too.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks for your replies, here is one of the error messge from the SSIS Package:

    Error: 2013-03-13 05:01:23.75

    Code: 0xC020902A

    Source: Data Flow Task Flat File Source 1 [27497]

    Description: The "output column "SELL_DSCR" (27517)" failed because truncation occurred, and the truncation row disposition on "output column "SELL_DSCR" (27517)" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.

    End Error

    I've attached the properties page.

    Can I simply change the code page to US-ASCII or add Unicode? The invalid characters appear to be extended ASCII values coming in from cut/paste functions in a distant system...

    I'll read up on those attirubtes..

    Thanks,

  • "bad" data:

    000000000005500549|CeraVe Sunscreen Stick SPF 50Â|00187220801|B|N||0000900153||Y|0.000|0.000|0.000

    Are you suggesting I code a loop using replace?

    thanks

  • Â is a valid ANSI character as far as SSIS and SQL Server are concerned, Latin1 codepoint 194 to be exact. Truncation could mean that the string length exceeds what your connection manager is setup to bring in. Check the Advanced Page (was hoping you would have sent a screenshot of that too) of the Connection Manager and see what the Output Column Width is setup as for SELL_DSCR? Is it 30 or more?

    As a side note I just imported a file containing the line...

    000000000005500549|CeraVe Sunscreen Stick SPF 50Â|00187220801|B|N||0000900153||Y|0.000|0.000|0.000

    ...into a table and data was imported cleanly, i.e. the  character exists my table.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Since the data your are loading is going to be scrubbed, don't try to match the source and destination sizes. In your destination table feel free to make it double the expected size (ie. char(60) or char(255) in this case) though I would make it an Nchar just out of habit since it's only a staging area. I presume you're going to be moving the data from the staging/scrubbing area to another destination afterwards and THAT's where I'd worry about field sizes and optimization.

    You're getting the error do to truncation, ergo, increase the size of your destination field.

  • thanks for the replies, I've attached the properties of the sell descr it is 30 character long...

    glad to hear you've loaded the file and your table has that ascii character. did you load it to a 30 byte column?

  • not my first choice to increase the table column size, as there are potential downstream effects....

  • j.bluestein (3/13/2013)


    thanks for the replies, I've attached the properties of the sell descr it is 30 character long...

    glad to hear you've loaded the file and your table has that ascii character. did you load it to a 30 byte column?

    I did. I would check other rows in the file for values that exceed 30 characters. Or try loading a file with just the one row you think is the problem, to get a proof of concept and rule that row out.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Erin Ramsay (3/13/2013)


    Since the data your are loading is going to be scrubbed, don't try to match the source and destination sizes. In your destination table feel free to make it double the expected size (ie. char(60) or char(255) in this case) though I would make it an Nchar just out of habit since it's only a staging area. I presume you're going to be moving the data from the staging/scrubbing area to another destination afterwards and THAT's where I'd worry about field sizes and optimization.

    You're getting the error do to truncation, ergo, increase the size of your destination field.

    In general I agree with making the initial load table wide enough to handle more data than might be expected when loading delimited files however I usually land on the side of using variable-length data types like VARCHAR or NVARCHAR.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I think I've got a truncate problem not an ascii character problem. I've attached the problem row, the descritpion appears to be greater than the defined 30 characters in the properties of the file connection.

    Try loading that.

    The extended ascii character is coming from a UNIX server and putting an extra space in the description when it arrives on Windows server.

    The UNIX source file has no extended ascii A after the 50...

    thanks,

  • You can expand the Flat File Connection Manager column property to accept more than 30 characters and then run the data through a Derived Column Transformation to apply a TRIM() to that column before trying to load it into your database. That will likely solve this specific issue on your side, but as to a root cause on how that space is getting in the file, that seems like a mystery worth tracking down since it sounds like that is not supposed to happen.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I like the idea of accepting > 30 characters and using TRIM. I'll read up and test that feature.

    I believe the UNIX - Windows file transfer my be adding that extra character. I'll hunt that down.

    I'll post my results later.

    Thanks for your help!

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply