SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


SSIS CSV file import with embedded quotes and multi line


SSIS CSV file import with embedded quotes and multi line

Author
Message
mooneymarc
mooneymarc
Forum Newbie
Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)

Group: General Forum Members
Points: 9 Visits: 31
I need help with importing a csv file into Sql Server 2008 using SSIS. I have successfully imported 10 of the 11 csv files but this last one is giving me problems. This is to be a repeatable import done every 24 hours so the solution has to be automated.

The CSV file has " double quotes as the text identifier but it also has embedded quotes as well. I saw the posts on the problem that SSIS has with embedded quotes and used a Script task to replace the text identifiers with |~|. That all works but I still can't get the csv file to import into my table as the error message states "The column delimiter for column "Comment Detail" was not found".

That happens to be were the CSV file they sent to us has extra lines in it.
Here is how the data is after I replaced the quotes and this is the structure including the extra lines where I'm having a problem:
|~|Project_ID|~|,|~|Date Comment Created|~|,|~|Comments entered by|~|,|~|Comment Detail|~|
|~|9|~|,|~|2010-01-14 14:05:00|~|,|~|NAME|~|,|~|NAME acquisition - . SE end of island. (3) sites total today - NA, NA and, one site is a NA. Lease expires in 18 months. Can vacate any time. Must vacate site. Equipment ordered with 5 month lead time. Due to arrive in May, must be installed in new site.

Other businesses potentially looking for space - NA.

Looking to lease or own, preferrably own, NA to NA sf

Need power requirements from NA.
No height requirement

NAME (NAME & NAME) to set up site tours with NA for Tuesday, 1/19

Puerto Rico uses 3 forms of measurement - sf, sm and ciridas|~|


Here is my REGEX code to replace the Quotes:
System.IO.StreamReader sr = new System.IO.StreamReader(@"C:\#####\CLProjectComments.csv");
string allText = sr.ReadToEnd();
sr.Close();
System.IO.StreamWriter sw = new System.IO.StreamWriter(@"C:\#####\CLProjectCommentsClean.csv");
sw.Write(
Regex.Replace(
allText,
@"""((?:""""||.)*?)""(?!"")",
@"|~|$1|~|"
).Replace("\"\"", "\""), RegexOptions.Multiline

);
sw.Close();


I tried to use a script on the Script Transformation Editor but that didn't work: I tried this prior to replacing the quotes:
string r = Row.SingleLine;
string colVal = string.Empty;
bool inQuote = false;
int col = 1;

for (int i = 0; i < r.Length; i++)
{
if (r.Substring(i, 1) == "\"")
{
inQuote = !(inQuote);
}
else
{
if (inQuote)
{
colVal += r.Substring(i, 1);
}
else
{
if (r.Substring(i, 1) == ",")
{
//MessageBox.Show(colVal + Environment.NewLine + col.ToString());
col = setOutput(Row, col, colVal);
colVal = string.Empty;
}
else
{
colVal += r.Substring(i, 1);
}
}
}
}
col = setOutput(Row, col, colVal);
colVal = string.Empty;
}
private int setOutput(Input0Buffer Row, int col, string value)
{
switch (col)
{
case 1:
Row.number = value;
break;
case 2:
Row.projectid = value;
break;
case 3:
Row.date = value;
break;
case 4:
Row.enteredby = value;
break;
case 5:
Row.comment = value;
break;

}
return col + 1;
}

}


Anyone have ideas on how to delete the extra lines or get SSIS to read multi line?

Thank you for your ideas and suggestions, I'm sure this is a simple thing and should take you guys a few minutes to figure out and I've spent WAY too much time trying to work it out without asking for help.
happycat59
happycat59
SSCarpal Tunnel
SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)

Group: General Forum Members
Points: 4755 Visits: 3221
Unfortunately, you are using a part of SSIS that, in my opinion, has been implemented poorly (although you could argue that the problem is the OLE DB provider used for CSV files.

Whilst you have worked around the quote delimiter, you still have a problem with the comma separator between fields because the last field appears to contain commas. When SSIS reads a CSV file, it looks for the comma (field separator) and WHENEVER it finds it, it blindly expects that to indicate that the next field start right there. Now, you might hope to that the field delimiter (in your case |~|) should tell SSIS "Hang on - the any commas you find between the pair of |~| should simply be treated as data in that field.". Well, no, it simply does not do that. Every comma (or whatever character is used as the field separator) gets treated as a field separator.

If at all possible, try to use a value that is not present in the data (or add logic to your script so that the field separators are changed to something that will work for you.

And the most disappointing part of CSV files is that MS-Excel will quite happily open the file and will actually do what you are expecting. So why, oh why, can't the Excel developers talk to the SSIS (or OLE DB) folk - they all work for the same company and one of them knows how to open CSV files.



happycat59
happycat59
SSCarpal Tunnel
SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)SSCarpal Tunnel (4.8K reputation)

Group: General Forum Members
Points: 4755 Visits: 3221
It might be worth looking at the SSIS Delimited File Source on Codeplex (http://ssisdfs.codeplex.com/) - I haven't used it but it might work for you



Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search