CSV Import Problem

Question

CSV Import Problem

GN-340682

SSC-Addicted

Points: 455
More actions
November 26, 2010 at 9:01 am

#147872

Hi,
This driving me nuts!
I'm being sent a unicode CSV file that has some text qualifiers and is kind of partially fixed width. The text qualifiers are only used when necessary. For example:
XXXX ,"1,400.00",GBN01 ,*XXXXXXXXX ,20,70.00,"1,400.00",17.50,300
I've XXXXX'd out sensitive data.
The problem is that I'm getting an error when trying to import this line, and it's the field "1,400.00" that is causing the problem. When I set up the Flat File Connection Manager I've specified " as the text qualifier and everything works fine. I preview the file and everything is just as I would expect.
However, when I come to run the SSIS package I get the following error:
"The data value cannot be converted for reasons other than sign mismatch or data overflow"
If I manually change the field from "1,400.00" to 1400.00 with no text qualifier it works fine.
Please, someone, give me some clues as to why this isn't working.
Cheers
George

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

George

I think the problem is that 1,400.00 is actually a number, which is being sent as text because it has a "," as a thousands separator. I would import it into a text field and then attempt to convert it into a number.

John

GN-340682 SSC-Addicted Points: 455 More actions · Answer 2

Hi John,

Thanks for the reply.

I've tried that. It just pushes the problem on. It seems that if I do that when it gets to the next numeric field it's trying to load text in to it. In other words, it's like the comma in the "1,400.00" is being treated as a field separator and the text qualifiers are being ignored.

Could any of this be related to the file being unicode? I tried processing it as non-unicode, but that obviously created problems with the file not being read correctly.

Cheers

George

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 3

George

Another way round it might be to open the file in Excel, format the columns appropriately and then save it and import from that. You could write a VB script that would do that for you in the package.

John

GN-340682 SSC-Addicted Points: 455 More actions · Answer 4

Hi John,

Yeah, I have thought about doing some pre-processing to convert the file to | or TAB delimeted. I just can't believe it doesn't work straight out of the tin.

I've fired off an email to the supplier who's sending the data to ask if they can change the format of the file. I hope they can as this is driving me nuts 🙂

In the meantime though, if you, or anyone, knows why this is happening I would really appreciate some help.

Cheers

George

Chrissy321 SSCoach Points: 15750 More actions · Answer 5

Hi George,

Its not working because you file is quite non-standard. Best bet is to get the file sent as straight csv or straight fixed width and with consistent use of delimiters.

Otherwise you may need to import the file into a staging table with one column and then parse out that data into meaningful columns into you final destination table.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 6

GN-340682 (11/26/2010)
Hi John,
Yeah, I have thought about doing some pre-processing to convert the file to | or TAB delimeted. I just can't believe it doesn't work straight out of the tin.
I've fired off an email to the supplier who's sending the data to ask if they can change the format of the file. I hope they can as this is driving me nuts 🙂
In the meantime though, if you, or anyone, knows why this is happening I would really appreciate some help.
Cheers
George

You're taking the correct steps. The first step is always to ask the vendor to meet certain requirements. The problem is that they're probably exporting from a formatted spreadsheet or table or, worse, from a reporting system that does formatting.

Please do post back if you have no success with the vendor. There's a couple of ways to get around this problem "auto-magically" without having to import and re-export from a spreadsheet on your end.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Phil Parkin SSC Guru Points: 246695 More actions · Answer 7

GN-340682 (11/26/2010)
Hi John,
Thanks for the reply.
I've tried that. It just pushes the problem on. It seems that if I do that when it gets to the next numeric field it's trying to load text in to it. In other words, it's like the comma in the "1,400.00" is being treated as a field separator and the text qualifiers are being ignored.
Could any of this be related to the file being unicode? I tried processing it as non-unicode, but that obviously created problems with the file not being read correctly.
Cheers
George

It's easy to understand why SSIS is getting confused with this format, which is most unusual. Including a thousands separator in a numeric field is a hindrance and it is this which I would address first. Based on the sample data provided, I can see no straightforward way of pre-processing the data to make it import correctly ('Delete all commas between concurrent sets of double quotes' is easy enough to state but more difficult to code).

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

GN-340682 SSC-Addicted Points: 455 More actions · Answer 8

Thanks to everyone who posted. It's nice to know I'm not going mad, and it is an odd file format 🙂

Hopefully they'll be able to change format and I won't have to code around it.

Cheers

George

GN-340682 SSC-Addicted Points: 455 More actions · Answer 9

Just to add a little more detail to this. After much fiddling I managed to get the data to import to the load table. This is how it's being imported (I've turned each column in to a row to make it visually easier to see):

"1

400.00"

XXXXX

20.00

70.00

"1

400.0

As you can see SSIS has completely ignored the fact I've specified a text qualifier of " and effectively pushed all the columns along. Why does this work fine in the Flat File Connection Manager Editor (under the preview tab), but then fails so spectacularly when it comes to running it?

I'm still waiting to hear if the file format can be changed, so in the meantime any ideas on this would be useful. Thanks

Cheers

George

SQLSmasher Right there with Babe Points: 777 More actions · Answer 10

i had same issue where we use to import CSV files. Best option i would recommend is first clean up your CSV by removing columns with internal COMMA's

Good luck

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 11

SQLSmasher (11/30/2010)
i had same issue where we use to import CSV files. Best option i would recommend is first clean up your CSV by removing columns with internal COMMA's
Good luck

Any suggestions on how you would do that? Maybe some code?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SQLSmasher Right there with Babe Points: 777 More actions · Answer 12

Jeff,

The method i can straight away suggest is to use SSIS.

1. Create in-memory single column table with each csv row as table row

2. find COMMA's between "xxxx,yyyy"

3. replace with "xxxxyyyy"

4. continue with your particular SSIS job by reteriving data from this new table

Thanks

Phil Parkin SSC Guru Points: 246695 More actions · Answer 13

Part 2 is the tricky bit though. If you've done this before, I think it would be useful if you could provide a code snippet.

Tricky because

"1,400.00",17.50,300

needs to become

"1400.00",17.50,300

so the rules about which commas to remove are not straightforward ...

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

LutzM SSC Guru Points: 107049 More actions · Answer 14

Here's a *cough* loop version I just tried to solve that little puzzle. I'm more than confident there are much better solutions available...

Edit: a recursive CTE would be possible, too. But at least this one is easily identified as a loop solution... 😉

DECLARE

@t VARCHAR(30),

@repl CHAR(1),

@identifier1Pos INT,

@identifier2Pos INT,

@commaPos INT,

@lengthPos INT

SET @repl='!' -- intermediate replacement for a comma in order not to have to deal with the string changes in length

SET @t='"1400.00,",",17.50",300,",",'

SELECT

@identifier1Pos= charindex('"',@t), -- position of the first identifier

@commaPos= charindex(',',@t,@identifier1Pos+1), -- position of the first comma after the identifier just found

@identifier2Pos= charindex('"',@t,@identifier1Pos+1), -- position of the 2nd identifier

@lengthPos= len(@t) -- overall strin length

WHILE @identifier2Pos>0 and @identifier1Pos>0 and @commaPos>0 -- as long as all there are two identifier and a comma

BEGIN

IF @identifier1Pos< @commaPos and @commaPos < @identifier2Pos -- if comma is in between the two identifier

SELECT @t=stuff(@t,@commaPos,1,@repl) -- replace comma with another character

SELECT -- grab the next set of comma and identifier

@identifier1Pos=charindex('"',@t,@identifier2Pos+1),

@commaPos= charindex(',',@t,@identifier1Pos+1),

@identifier2Pos=charindex('"',@t,@identifier1Pos+1)

END

SELECT REPLACE(@t,@repl,'') -- final output, remove the intermediate character

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]