Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

SSIS and dealing with embedded double quotes and coma Expand / Collapse
Author
Message
Posted Sunday, March 31, 2013 10:24 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, November 12, 2014 2:42 PM
Points: 52, Visits: 202
I posted my problem in other thread also but no go, I think this is the right forum to post my problem. This has been discuss couple of years ago in this forum but I dont think it lead to any solution.

Many people may have faced this problem and I searched many solutions but somehow I could not hit the right one. Sorry for repeatative question.

Here is my data in csv file and I am trying to import into sql server table but it does not import the way it should.

field1,field2,field3,field4,field5,field6
n1,n2,n3,stringWithout Quotes,"String in Quotes with ""words"" and coma , in the string",some more
n1,n2,n3,"stringWith Quotes","String in Quotes with coma , in thestring",some more

I need output
Record 1:
Field1: n1
Field2: n2
Field3: n3
Field4: stringWithout Quotes
Field5: String in Quotes with "words" and coma , in the string
Field6: some more

Record 2:
Field1: n1
Field2: n2
Field3: n3
Field4: stringWith Quotes
Field5: String in Quotes with coma , in thestring
Field6: some more

I tried to cleanup with Regex in script task ,mentioned in some other blog but it did not work and I am not expert to change RegEx to get desired output. Can anyone please help to find solution? Regex should work but any other solution is also welcome.

Thanks.

P.S. I tried to import same csv file in Access and it worked perfactly, so I know the csv format is correct.

Post #1437300
Posted Monday, April 1, 2013 6:35 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Yesterday @ 8:01 AM
Points: 154, Visits: 284
Yuck. Ugly. It's a little tough to diagnose the specific problems you're having without the code for your scripts/regex.

You've got no consistency either within records or across records with the use of commas and quotes as field separators vs embedded characters. MS Office processes delimiters differently than SQL Server and SSIS so what works in Access/Excel isn't a good indicator of how SSIS handles it.

If you have any control or influence over the creation of the source data, see if you can't have the file created with a tab or pipe (|) character.

If not, really your best bet is to do what you've been trying: do some preprocessing on each line of the file before trying to import. RegEx isn't *that* hard, but it does take practice. There are a number of websites around that can help you test your RegEx.

I've also had some success importing files like this into a table where each line of the input file is a record in a varchar(max) field, then using T-SQL REPLACE() and SUBSTRING() in a stored procedure.


____________
Just my $0.02 from over here in the cheap seats of the peanut gallery - please adjust for inflation and/or your local currency.
Post #1437422
Posted Monday, April 1, 2013 8:16 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Wednesday, November 12, 2014 2:42 PM
Points: 52, Visits: 202
This is what I use in RegEx
Regex.Replace(allText, @"""((?:""""|.)*?)""(?!"")", @"|~|$1|~|").Replace("\"\"", "\"")
Post #1437453
Posted Monday, April 1, 2013 12:30 PM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Yesterday @ 7:13 AM
Points: 487, Visits: 1,246
I think the next thing I might try is a script task that reads in a line at a time and then applies logic to parse the line. I'm not sure what that logic would be because you have a very unusal input. But I'm sure it would involve finding the locations of the commas and quotes and then making decisions about whether they function as delimiters or data.


Post #1437566
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse