SSIS CSV file import with embedded quotes and multi line

  • I need help with importing a csv file into Sql Server 2008 using SSIS. I have successfully imported 10 of the 11 csv files but this last one is giving me problems. This is to be a repeatable import done every 24 hours so the solution has to be automated.

    The CSV file has " double quotes as the text identifier but it also has embedded quotes as well. I saw the posts on the problem that SSIS has with embedded quotes and used a Script task to replace the text identifiers with |~|. That all works but I still can't get the csv file to import into my table as the error message states "The column delimiter for column "Comment Detail" was not found".

    That happens to be were the CSV file they sent to us has extra lines in it.

    Here is how the data is after I replaced the quotes and this is the structure including the extra lines where I'm having a problem:

    |~|Project_ID|~|,|~|Date Comment Created|~|,|~|Comments entered by|~|,|~|Comment Detail|~|

    |~|9|~|,|~|2010-01-14 14:05:00|~|,|~|NAME|~|,|~|NAME acquisition - . SE end of island. (3) sites total today - NA, NA and, one site is a NA. Lease expires in 18 months. Can vacate any time. Must vacate site. Equipment ordered with 5 month lead time. Due to arrive in May, must be installed in new site.

    Other businesses potentially looking for space - NA.

    Looking to lease or own, preferrably own, NA to NA sf

    Need power requirements from NA.

    No height requirement

    NAME (NAME & NAME) to set up site tours with NA for Tuesday, 1/19

    Puerto Rico uses 3 forms of measurement - sf, sm and ciridas|~|

    Here is my REGEX code to replace the Quotes:

    System.IO.StreamReader sr = new System.IO.StreamReader(@"C:\#####\CLProjectComments.csv");

    string allText = sr.ReadToEnd();

    sr.Close();

    System.IO.StreamWriter sw = new System.IO.StreamWriter(@"C:\#####\CLProjectCommentsClean.csv");

    sw.Write(

    Regex.Replace(

    allText,

    @"""((?:""""||.)*?)""(?!"")",

    @"|~|$1|~|"

    ).Replace("\"\"", "\""), RegexOptions.Multiline

    );

    sw.Close();

    I tried to use a script on the Script Transformation Editor but that didn't work: I tried this prior to replacing the quotes:

    string r = Row.SingleLine;

    string colVal = string.Empty;

    bool inQuote = false;

    int col = 1;

    for (int i = 0; i < r.Length; i++)

    {

    if (r.Substring(i, 1) == "\"")

    {

    inQuote = !(inQuote);

    }

    else

    {

    if (inQuote)

    {

    colVal += r.Substring(i, 1);

    }

    else

    {

    if (r.Substring(i, 1) == ",")

    {

    //MessageBox.Show(colVal + Environment.NewLine + col.ToString());

    col = setOutput(Row, col, colVal);

    colVal = string.Empty;

    }

    else

    {

    colVal += r.Substring(i, 1);

    }

    }

    }

    }

    col = setOutput(Row, col, colVal);

    colVal = string.Empty;

    }

    private int setOutput(Input0Buffer Row, int col, string value)

    {

    switch (col)

    {

    case 1:

    Row.number = value;

    break;

    case 2:

    Row.projectid = value;

    break;

    case 3:

    Row.date = value;

    break;

    case 4:

    Row.enteredby = value;

    break;

    case 5:

    Row.comment = value;

    break;

    }

    return col + 1;

    }

    }

    Anyone have ideas on how to delete the extra lines or get SSIS to read multi line?

    Thank you for your ideas and suggestions, I'm sure this is a simple thing and should take you guys a few minutes to figure out and I've spent WAY too much time trying to work it out without asking for help.

  • Unfortunately, you are using a part of SSIS that, in my opinion, has been implemented poorly (although you could argue that the problem is the OLE DB provider used for CSV files.

    Whilst you have worked around the quote delimiter, you still have a problem with the comma separator between fields because the last field appears to contain commas. When SSIS reads a CSV file, it looks for the comma (field separator) and WHENEVER it finds it, it blindly expects that to indicate that the next field start right there. Now, you might hope to that the field delimiter (in your case |~|) should tell SSIS "Hang on - the any commas you find between the pair of |~| should simply be treated as data in that field.". Well, no, it simply does not do that. Every comma (or whatever character is used as the field separator) gets treated as a field separator.

    If at all possible, try to use a value that is not present in the data (or add logic to your script so that the field separators are changed to something that will work for you.

    And the most disappointing part of CSV files is that MS-Excel will quite happily open the file and will actually do what you are expecting. So why, oh why, can't the Excel developers talk to the SSIS (or OLE DB) folk - they all work for the same company and one of them knows how to open CSV files.

  • It might be worth looking at the SSIS Delimited File Source on Codeplex (http://ssisdfs.codeplex.com/) - I haven't used it but it might work for you

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply