SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


SSIS and dealing with embedded double quotes and coma


SSIS and dealing with embedded double quotes and coma

Author
Message
mak101
mak101
SSC Veteran
SSC Veteran (205 reputation)SSC Veteran (205 reputation)SSC Veteran (205 reputation)SSC Veteran (205 reputation)SSC Veteran (205 reputation)SSC Veteran (205 reputation)SSC Veteran (205 reputation)SSC Veteran (205 reputation)

Group: General Forum Members
Points: 205 Visits: 218
I posted my problem in other thread also but no go, I think this is the right forum to post my problem. This has been discuss couple of years ago in this forum but I dont think it lead to any solution.

Many people may have faced this problem and I searched many solutions but somehow I could not hit the right one. Sorry for repeatative question.

Here is my data in csv file and I am trying to import into sql server table but it does not import the way it should.

field1,field2,field3,field4,field5,field6
n1,n2,n3,stringWithout Quotes,"String in Quotes with ""words"" and coma , in the string",some more
n1,n2,n3,"stringWith Quotes","String in Quotes with coma , in thestring",some more

I need output
Record 1:
Field1: n1
Field2: n2
Field3: n3
Field4: stringWithout Quotes
Field5: String in Quotes with "words" and coma , in the string
Field6: some more

Record 2:
Field1: n1
Field2: n2
Field3: n3
Field4: stringWith Quotes
Field5: String in Quotes with coma , in thestring
Field6: some more

I tried to cleanup with Regex in script task ,mentioned in some other blog but it did not work and I am not expert to change RegEx to get desired output. Can anyone please help to find solution? Regex should work but any other solution is also welcome.

Thanks.

P.S. I tried to import same csv file in Access and it worked perfactly, so I know the csv format is correct.
lshanahan
lshanahan
SSCommitted
SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)SSCommitted (1.5K reputation)

Group: General Forum Members
Points: 1507 Visits: 438
Yuck. Ugly. It's a little tough to diagnose the specific problems you're having without the code for your scripts/regex.

You've got no consistency either within records or across records with the use of commas and quotes as field separators vs embedded characters. MS Office processes delimiters differently than SQL Server and SSIS so what works in Access/Excel isn't a good indicator of how SSIS handles it.

If you have any control or influence over the creation of the source data, see if you can't have the file created with a tab or pipe (|) character.

If not, really your best bet is to do what you've been trying: do some preprocessing on each line of the file before trying to import. RegEx isn't *that* hard, but it does take practice. There are a number of websites around that can help you test your RegEx.

I've also had some success importing files like this into a table where each line of the input file is a record in a varchar(max) field, then using T-SQL REPLACE() and SUBSTRING() in a stored procedure.

____________
Just my $0.02 from over here in the cheap seats of the peanut gallery - please adjust for inflation and/or your local currency.
mak101
mak101
SSC Veteran
SSC Veteran (205 reputation)SSC Veteran (205 reputation)SSC Veteran (205 reputation)SSC Veteran (205 reputation)SSC Veteran (205 reputation)SSC Veteran (205 reputation)SSC Veteran (205 reputation)SSC Veteran (205 reputation)

Group: General Forum Members
Points: 205 Visits: 218
This is what I use in RegEx
Regex.Replace(allText, @"""((?:""""|.)*?)""(?!"")", @"|~|$1|~|").Replace("\"\"", "\"")
Ed Zann
Ed Zann
SSCommitted
SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)SSCommitted (2K reputation)

Group: General Forum Members
Points: 1978 Visits: 1398
I think the next thing I might try is a script task that reads in a line at a time and then applies logic to parse the line. I'm not sure what that logic would be because you have a very unusal input. But I'm sure it would involve finding the locations of the commas and quotes and then making decisions about whether they function as delimiters or data.



Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search