SSIS and dealing with embedded double quotes and coma

  • I posted my problem in other thread also but no go, I think this is the right forum to post my problem. This has been discuss couple of years ago in this forum but I dont think it lead to any solution.

    Many people may have faced this problem and I searched many solutions but somehow I could not hit the right one. Sorry for repeatative question.

    Here is my data in csv file and I am trying to import into sql server table but it does not import the way it should.

    field1,field2,field3,field4,field5,field6

    n1,n2,n3,stringWithout Quotes,"String in Quotes with ""words"" and coma , in the string",some more

    n1,n2,n3,"stringWith Quotes","String in Quotes with coma , in thestring",some more

    I need output

    Record 1:

    Field1: n1

    Field2: n2

    Field3: n3

    Field4: stringWithout Quotes

    Field5: String in Quotes with "words" and coma , in the string

    Field6: some more

    Record 2:

    Field1: n1

    Field2: n2

    Field3: n3

    Field4: stringWith Quotes

    Field5: String in Quotes with coma , in thestring

    Field6: some more

    I tried to cleanup with Regex in script task ,mentioned in some other blog but it did not work and I am not expert to change RegEx to get desired output. Can anyone please help to find solution? Regex should work but any other solution is also welcome.

    Thanks.

    P.S. I tried to import same csv file in Access and it worked perfactly, so I know the csv format is correct.

  • Yuck. Ugly. It's a little tough to diagnose the specific problems you're having without the code for your scripts/regex.

    You've got no consistency either within records or across records with the use of commas and quotes as field separators vs embedded characters. MS Office processes delimiters differently than SQL Server and SSIS so what works in Access/Excel isn't a good indicator of how SSIS handles it.

    If you have any control or influence over the creation of the source data, see if you can't have the file created with a tab or pipe (|) character.

    If not, really your best bet is to do what you've been trying: do some preprocessing on each line of the file before trying to import. RegEx isn't *that* hard, but it does take practice. There are a number of websites around that can help you test your RegEx.

    I've also had some success importing files like this into a table where each line of the input file is a record in a varchar(max) field, then using T-SQL REPLACE() and SUBSTRING() in a stored procedure.

    ____________
    Just my $0.02 from over here in the cheap seats of the peanut gallery - please adjust for inflation and/or your local currency.

  • This is what I use in RegEx

    Regex.Replace(allText, @"""((?:""""|.)*?)""(?!"")", @"|~|$1|~|").Replace("\"\"", "\"")

  • I think the next thing I might try is a script task that reads in a line at a time and then applies logic to parse the line. I'm not sure what that logic would be because you have a very unusal input. But I'm sure it would involve finding the locations of the commas and quotes and then making decisions about whether they function as delimiters or data.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply