Import Excel file with line feed characters

  • I have an Excel source file that has some rows with a line feed carriage return in the Address1 columns of the file. This causes SSIS to think it is the end of the row, and import the data incorrectly. Is there a way to remove the CRLF characters from the file using SSIS? Right now, I'm toying with converting the Excel file to XML to see if SSIS can remove them from an XML file (I was going to replace them with a space), but even if that is possible, I'll still have to figure out how to automatically convert the Excel file to XML. Unfortunately, they can only save my source file as .pdf or .xls. Any help/ideas is GREATLY appreciated.

    Thanks,

    Celita

  • You should be able to tell the Excel Data Source to treat the Address1 field as a text stream rather than as a string. Right-click on it and pick "Advanced Editor...", go to the last tab and pick the Address1 field, there you can override the data type (probably DT_STR) that SSIS automatically selected. You probably want DT_TEXT, that should allow it to read the whole field regardless of line feeds and other control characters. Then feed it into a Derived Column and you can use the string manipulation functions to REPLACE char(13) with char(32) in that field and any others.

  • Wow! That was fast. I should have posted yesterday! I see where you mean and will try that. Thank you VERY VERY much!

  • 😀 I just hope it works, I did that from memory.

  • First I want to say that I'm not really familiar with the Advanced Editor, which you can tell since I completely forgot it even existed. Anyway, I'm now getting an error. I changed the Excel Source Output External Columns for the first Address1 field from Unicode String to Unicode Text Stream. This also changed the field type for the same column in the Excel Source Output Columns and Excel Source Error Output Output Columns to be Unicode Text Stream. Now my data source has a red X on it with the error below. What did I do wrong? The data isn't going anywhere yet.

    Error6Validation error. Convert Data file: rqi excel source [678]: The output column "PatAddr1" (36688) on the error output has properties that do not match the properties of its corresponding data source column.RQI_Export_2.dtsx00

  • There are three places to change the data type. You did output and error, you also need to do input. Same place, different part of the tree.

  • That is what it SOUNDS like, but I reset and did it again. *I* changed only the Excel Source output External Columns, which is what is coming into Excel, and changed that one column PatAddr1 to Unicode text stream. I then checked the Excel Source output Output Columns, and the data type for PatAddr1 had been automatically changed to Unicode text stream when I changed it in the other location. I then checked the Excel Source Error Output, and it says Unicode string, and won't let me change it to Unicode text stream.

    I've attached screen prints of what the columns look like in each section. The error message I get on the Error Output is also there.

  • I've never tried this but supposedly you can add "IMEX=1; MAXROWSTOSCAN=0" to your Excel connection string to force SSIS to reconsider what the data type should be. I don't know that it will work in your case.

    See also Microsoft Knowledge Base 189897 for more hints.

  • I did try the IMEX=1, but that didn't help at all. I"ll try that knowledge base article. Thanks!

  • Celita (6/17/2010)


    I have an Excel source file that has some rows with a line feed carriage return in the Address1 columns of the file. This causes SSIS to think it is the end of the row, and import the data incorrectly. Is there a way to remove the CRLF characters from the file using SSIS? Right now, I'm toying with converting the Excel file to XML to see if SSIS can remove them from an XML file (I was going to replace them with a space), but even if that is possible, I'll still have to figure out how to automatically convert the Excel file to XML. Unfortunately, they can only save my source file as .pdf or .xls. Any help/ideas is GREATLY appreciated.

    Thanks,

    Celita

    If you can use third-party solutions, check the commercial CozyRoc Excel Source Plus component. The component doesn't try to be smart about the column type and you will be able to read the cell value without any additional tricks.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • I have a working solution. I imported the Excel data into a table with ntext fields for the address1 fields. I also had a second new column for each address1 field. I then used a dreaded cursor to check for char(10), and if present, put everything before it in the new address1 fields and everything after it in address2 fields. Then, because I found that some people hit enter twice, I checked address2 fields for char(10) and if found, deleted them. Since this was in a physical table, I was able to use a distinct query of the fields without line feeds as my source for the rest of the package. It isn't very elegant, but it works.

  • Hi,

    Here is a Macro for excel to carriage return or linefeed characters.

    Sub KillCR()

    Dim ws As Worksheet

    For Each ws In Worksheets

    With ws.Cells

    .Replace vbCrLf, " "

    .Replace vbCr, " "

    .Replace vbLf, " "

    End With

    Next

    MsgBox "Done"

    End Sub

    The link to the code is as below.

    http://allfaq.org/forums/t/111654.aspx

    Please try to see of this helps.

    Thanks,

    lakshmi

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply