Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

BULK INSERT & Text Qualifier Expand / Collapse
Author
Message
Posted Monday, November 17, 2003 10:06 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, April 2, 2014 4:26 AM
Points: 4, Visits: 25
Greetings,

I have a st*&id vendor that provides files that we need to load in the following format:

"datahere","data with a comma, more","xxx", etc...

Bastically it is comma-delimited with a text qualifier of '"'

This is giving some heartache with respect to the use of BULK INSERT, as the fieldterminator and row terminator all work correctly, the only problem is that the first field is hosed. Here's the example T-SQL:

BULK INSERT [InterPay Office].dbo.[address] FROM
"c:\data\interlake\interpay office\tempfiles\address.csv"
WITH
(DATAFILETYPE='CHAR',
FIELDTERMINATOR='","',
FIRSTROW=2,
ROWTERMINATOR='"')

A sample of the output would be like this:

ID Name Address etc....
"1 Michael 656 whereever
"2 Joe 12345 whatever

Anybody have any suggestions about how to fix this. The sp is going to be run nightly, where we have about 15 tables to import.

I guess that I could do the following as alternatives:

1. Take the data as is and place another sp that does the replace of '"' with '' via a cursor
2. Use VB to redo their files into a "decent" format.

DTS allows you to specify the delimiter, text qualifier, and row terminator, but all of this functionality is not available in this method.

Anyone have more suggestions?




Post #18289
Posted Monday, November 17, 2003 10:31 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Wednesday, September 23, 2009 9:40 AM
Points: 940, Visits: 54
Not sure I understand your question, but did you try SET QUOTED_IDENTIFIER OFF at the beginning of the script?

--Jonathan




--Jonathan
Post #87412
Posted Monday, November 17, 2003 10:45 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, April 2, 2014 4:26 AM
Points: 4, Visits: 25
Yes,

SET QUOTED_IDENTIFIER OFF was set at the top of the script. The basic problem is that I've got a file with data fields such as:

"3637","","","","1505 London Road",......
and using the BULK INSERT command (as shown) it is giving

"3637 <NULL> <NULL> <NULL> 1505 London Road

Everything is right, except for the "3637 which keeps on picking up the ".

Hopefully that is clearer.




Post #87413
Posted Monday, November 17, 2003 11:27 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Wednesday, September 23, 2009 9:40 AM
Points: 940, Visits: 54
See if you can get it working using bcp interactively and then use the generated format file with BULK INSERT.

--Jonathan




--Jonathan
Post #87414
Posted Monday, November 17, 2003 12:57 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, March 29, 2005 2:31 PM
Points: 393, Visits: 1
You can try to set your rowterminator to '"/n'. This way, SQL will "eat" the doublequote that starts each record after the 1st record. It might be cleaner in using a format file since each column delimiter may be specified individually.





Once you understand the BITs, all the pieces come together
Post #87415
Posted Tuesday, November 18, 2003 5:54 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Sunday, October 19, 2014 4:48 AM
Points: 1,327, Visits: 156
quote:

You can try to set your rowterminator to '"/n'.



Nice idea, Thomas. Although I think you meant using
'"'
, where the doublequote is after the newline tag.

I had a similar situation, but solved it with using a format file to skip the first character, using a single doublequote as field terminator, and skipping the 2 char field ',"' between each field. I certainly like the combination of using the 3 char field terminator and 2 char row terminator, it would have saved some time developing and testing the format file.





Mark
Post #87416
Posted Tuesday, November 18, 2003 6:58 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, March 23, 2006 6:53 AM
Points: 348, Visits: 1
On a similar vein, I have used a field terminator or "," (double quote comman double quote).

The downside to this is that the first double quote and the last double quote in the line are not removed so I had to remove them with string manipulation.

select substring(col1,1,len(col1)-1),...., left(collast,len(collast)-1)
etc.


Jeremy




Post #87417
Posted Tuesday, November 18, 2003 11:53 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, March 4, 2014 12:27 PM
Points: 13, Visits: 19
I've had to figure this out in the past. Here's what you need to do, using your example.

"datahere","data with a comma, more","xxx"

FMT file:
7.0
3
1 SQLCHAR 0 10 "\",\"" 1 DataHereColumn
2 SQLCHAR 0 30 "\",\"" 2 DataWithCommaColumn
3 SQLCHAR 0 3 "\"\r" 3 xxxColumn

Using \",\" will treat the "," as the field terminator. You need to include the forward slash before the double quote mark so the format treats it as a real character.

BULK INSERT query:
BULK INSERT database.owner.targettable FROM 'drive:\path\file.txt' WITH (FIRSTROW = 2, FORMATFILE = 'drive:\path\formatfile.fmt')

The first record should look like this when you're done:
Field1: "datahere
Field2: data with a comma, more
Field3: xxx

Next, you need to remove the leading double quote from Field1 for ALL records. If you do not have double quotes in the data, just use:
UPDATE database.owner.table
SET Field1 = REPLACE(Field1, '"', '')

Let me know if this doesn't work for you!


Edited by - mmortensen on 11/18/2003 12:01:58 PM



Post #87418
Posted Tuesday, November 18, 2003 9:38 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, June 13, 2007 6:00 AM
Points: 8, Visits: 1
To be honest, it sounds more like a business relationship problem rather than a programming problem; how good is your relationship with this vendor? All of the solutions below are creative and may solve the immediate problem, but does it cause more problems in the long run? Does the vendor understand that you are relying on a daily batch of this data, and are they committed to maintaining this non-standard format?

On the other hand, have you considered using DTS with an ActiveX script to transform the first column? Not sure if it will provide you with the speed you need, but it might be comparable. How much data per night are you talking; couple thousand rows or a couple million?

HTH,
Stu




Post #87419
Posted Thursday, November 20, 2003 8:32 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, April 2, 2014 4:26 AM
Points: 4, Visits: 25
Greetings,

Thanks for you all of your support and thoughts on the matter. I've spent quite a few hours putting together format files, testing them and getting really pi#$ed.

Using the bcp to write the format file using the -n switch does not give the required prefix, so I ended up using a combination of the automatically generated as well as the format file that is created when you try to do a load where the data types are not specified (this gives a prefix). Took both of these fmt files and merged them to get what I'd consider a good format file.

Out of the 26 tables that I have to do this on, it worked on 24 tables. The only common thread that I can find is that these two tables have over 170 columns. The BULK INSERT and bcp both blow up, saying that the field length is too long for column 1 (which is a SQLINT 4). Rather than try to figure out how to cut down on the column load (not practical anyway) to find the breaking point, I decided on an alternate loading mechanism.

Ultimately, I've ended up putting together a generic CSV to XML translator and then used SQLXML 3.0 bulk load to force the data in. Decided against using .NET datasets, due to the slow nature of the load.

Would be interested in hearing any suggestions on the XML load speed issue (50,000 rows usually for each table)


Relationship with the vendor? Let's put it this way, they are using MDBS Titanium 6.1f (www.mdbs.com), navigational model, as the datastore and generally are not very responsive to requests such as this. The only saving grace is that they will be moving to mySQL in an upcoming release (please insert bashing here), so I'll be able to reach in and get what I need when needed.

Many thanks for all your help.




Post #87420
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse