SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


BULK INSERT & Text Qualifier


BULK INSERT & Text Qualifier

Author
Message
mdjtlj
mdjtlj
Grasshopper
Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)

Group: General Forum Members
Points: 22 Visits: 28
Greetings,

I have a st*&id vendor that provides files that we need to load in the following format:

"datahere","data with a comma, more","xxx", etc...

Bastically it is comma-delimited with a text qualifier of '"'

This is giving some heartache with respect to the use of BULK INSERT, as the fieldterminator and row terminator all work correctly, the only problem is that the first field is hosed. Here's the example T-SQL:

BULK INSERT [InterPay Office].dbo.[address] FROM
"c:\data\interlake\interpay office\tempfiles\address.csv"
WITH
(DATAFILETYPE='CHAR',
FIELDTERMINATOR='","',
FIRSTROW=2,
ROWTERMINATOR='"\n')

A sample of the output would be like this:

ID Name Address etc....
"1 Michael 656 whereever
"2 Joe 12345 whatever

Anybody have any suggestions about how to fix this. The sp is going to be run nightly, where we have about 15 tables to import.

I guess that I could do the following as alternatives:

1. Take the data as is and place another sp that does the replace of '"' with '' via a cursor
2. Use VB to redo their files into a "decent" format.

DTS allows you to specify the delimiter, text qualifier, and row terminator, but all of this functionality is not available in this method.

Anyone have more suggestions?



Jonathan
Jonathan
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1249 Visits: 54
Not sure I understand your question, but did you try SET QUOTED_IDENTIFIER OFF at the beginning of the script?

--Jonathan



--Jonathan
mdjtlj
mdjtlj
Grasshopper
Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)

Group: General Forum Members
Points: 22 Visits: 28
Yes,

SET QUOTED_IDENTIFIER OFF was set at the top of the script. The basic problem is that I've got a file with data fields such as:

"3637","","","","1505 London Road",......
and using the BULK INSERT command (as shown) it is giving

"3637 <NULL> <NULL> <NULL> 1505 London Road

Everything is right, except for the "3637 which keeps on picking up the ".

Hopefully that is clearer.



Jonathan
Jonathan
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1249 Visits: 54
See if you can get it working using bcp interactively and then use the generated format file with BULK INSERT.

--Jonathan



--Jonathan
ThomasH
ThomasH
Mr or Mrs. 500
Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)Mr or Mrs. 500 (521 reputation)

Group: General Forum Members
Points: 521 Visits: 1
You can try to set your rowterminator to '"/n'. This way, SQL will "eat" the doublequote that starts each record after the 1st record. It might be cleaner in using a format file since each column delimiter may be specified individually.



Once you understand the BITs, all the pieces come together
Mark Harr
Mark Harr
SSCommitted
SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)

Group: General Forum Members
Points: 1584 Visits: 191
quote:

You can try to set your rowterminator to '"/n'.



Nice idea, Thomas. Although I think you meant using
'\n"'
, where the doublequote is after the newline tag.

I had a similar situation, but solved it with using a format file to skip the first character, using a single doublequote as field terminator, and skipping the 2 char field ',"' between each field. I certainly like the combination of using the 3 char field terminator and 2 char row terminator, it would have saved some time developing and testing the format file.



Mark
Jeremy Kemp
Jeremy Kemp
SSC-Addicted
SSC-Addicted (452 reputation)SSC-Addicted (452 reputation)SSC-Addicted (452 reputation)SSC-Addicted (452 reputation)SSC-Addicted (452 reputation)SSC-Addicted (452 reputation)SSC-Addicted (452 reputation)SSC-Addicted (452 reputation)

Group: General Forum Members
Points: 452 Visits: 1
On a similar vein, I have used a field terminator or "," (double quote comman double quote).

The downside to this is that the first double quote and the last double quote in the line are not removed so I had to remove them with string manipulation.

select substring(col1,1,len(col1)-1),...., left(collast,len(collast)-1)
etc.


Jeremy



Mike Mortensen
Mike Mortensen
SSC Rookie
SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)SSC Rookie (35 reputation)

Group: General Forum Members
Points: 35 Visits: 24
I've had to figure this out in the past. Here's what you need to do, using your example.

"datahere","data with a comma, more","xxx"

FMT file:
7.0
3
1 SQLCHAR 0 10 "\",\"" 1 DataHereColumn
2 SQLCHAR 0 30 "\",\"" 2 DataWithCommaColumn
3 SQLCHAR 0 3 "\"\r\n" 3 xxxColumn

Using \",\" will treat the "," as the field terminator. You need to include the forward slash before the double quote mark so the format treats it as a real character.

BULK INSERT query:
BULK INSERT database.owner.targettable FROM 'drive:\path\file.txt' WITH (FIRSTROW = 2, FORMATFILE = 'drive:\path\formatfile.fmt')

The first record should look like this when you're done:
Field1: "datahere
Field2: data with a comma, more
Field3: xxx

Next, you need to remove the leading double quote from Field1 for ALL records. If you do not have double quotes in the data, just use:
UPDATE database.owner.table
SET Field1 = REPLACE(Field1, '"', '')

Let me know if this doesn't work for you!


Edited by - mmortensen on 11/18/2003 12:01:58 PM



sainswor99
sainswor99
Grasshopper
Grasshopper (20 reputation)Grasshopper (20 reputation)Grasshopper (20 reputation)Grasshopper (20 reputation)Grasshopper (20 reputation)Grasshopper (20 reputation)Grasshopper (20 reputation)Grasshopper (20 reputation)

Group: General Forum Members
Points: 20 Visits: 1
To be honest, it sounds more like a business relationship problem rather than a programming problem; how good is your relationship with this vendor? All of the solutions below are creative and may solve the immediate problem, but does it cause more problems in the long run? Does the vendor understand that you are relying on a daily batch of this data, and are they committed to maintaining this non-standard format?

On the other hand, have you considered using DTS with an ActiveX script to transform the first column? Not sure if it will provide you with the speed you need, but it might be comparable. How much data per night are you talking; couple thousand rows or a couple million?

HTH,
Stu



mdjtlj
mdjtlj
Grasshopper
Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)

Group: General Forum Members
Points: 22 Visits: 28
Greetings,

Thanks for you all of your support and thoughts on the matter. I've spent quite a few hours putting together format files, testing them and getting really pi#$ed.

Using the bcp to write the format file using the -n switch does not give the required prefix, so I ended up using a combination of the automatically generated as well as the format file that is created when you try to do a load where the data types are not specified (this gives a prefix). Took both of these fmt files and merged them to get what I'd consider a good format file.

Out of the 26 tables that I have to do this on, it worked on 24 tables. The only common thread that I can find is that these two tables have over 170 columns. The BULK INSERT and bcp both blow up, saying that the field length is too long for column 1 (which is a SQLINT 4). Rather than try to figure out how to cut down on the column load (not practical anyway) to find the breaking point, I decided on an alternate loading mechanism.

Ultimately, I've ended up putting together a generic CSV to XML translator and then used SQLXML 3.0 bulk load to force the data in. Decided against using .NET datasets, due to the slow nature of the load.

Would be interested in hearing any suggestions on the XML load speed issue (50,000 rows usually for each table)


Relationship with the vendor? Let's put it this way, they are using MDBS Titanium 6.1f (www.mdbs.com), navigational model, as the datastore and generally are not very responsive to requests such as this. The only saving grace is that they will be moving to mySQL in an upcoming release (please insert bashing here), so I'll be able to reach in and get what I need when needed.

Many thanks for all your help.



Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search