Bulk insert with format file - handling quotation mark text qualifiers in header row

Question

Bulk insert with format file - handling quotation mark text qualifiers in header row

caspersql

SSCrazy

Points: 2276
More actions
October 9, 2013 at 5:57 am

#277890

I'm trying to use BULK insert a flat file.
The file has a header row. In addition to the delimiter character which is , the header row contains text qualifiers: "
The text file looks like this when opened in notepad:
"Column1Name","Column2Name","Column3Name"
"Row2Column1Data","Row2Column2Data","Row2Column3Data"
I would like to use the bulk insert command with the format file option to import the data into a table in my database.
The table in my database looks like this:
CREATE TABLE [dbo].[BulkInsertedData](
[Column1Name] [nvarchar](4000) NULL,
[Column2Name] [nvarchar](4000) NULL,
[Column3Name] [nvarchar](4000) NULL
) ON [PRIMARY]
If I try to import from a flat file that is identical in all respects except that does not contain delimiters, I can import without any problems. Without delimiters, the format file looks like this:
<?xml version="1.0"?><BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="Column1Name" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="4000"/> <FIELD ID="Column2Name" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="4000"/> <FIELD ID="Column3Name" xsi:type="CharTerm" TERMINATOR="\r" MAX_LENGTH="4000"/> </RECORD> <ROW> <COLUMN SOURCE="Column1Name" NAME="Column1Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column2Name" NAME="Column2Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column3Name" NAME="Column3Name" xsi:type="SQLVARYCHAR"/> </ROW></BCPFORMAT>
The bulk insert command I use to get the data in looks like this:
BULK INSERT BulkInsertedData
FROM 'E:\1.txt'
WITH
(
FORMATFILE = 'E:\FormatFile.txt',
FIRSTROW = 2, -- first row has column headings
KEEPIDENTITY
)
My problem is - I can't figure out how to change the format file so that the bulk insert works when the header data is surrounded by text qualifiers as per the text file described above. I'm guessing the problem is escaping certain characters, but after hacking at it for a few hours, I haven't been able to figure out where to put the escape characters. Has anyone else successfully done this and if so, how did your XML format file differ from mine?

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

caspersql SSCrazy Points: 2276 More actions · Answer 1

Okay after hammering away for a while I figured this out. Since my header row contains text qualifiers, I had assumed that I needed to include those in my XML format file 'Field ID' and 'Column Source' attributes. However that wasn't required. All I needed to do was include the text qualifiers in the 'terminator' attribute.

This format file works for my sample text file:

<?xml version="1.0"?><BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="Column1Name" xsi:type="CharTerm" TERMINATOR='\",\"' MAX_LENGTH="4000"/> <FIELD ID="Column2Name" xsi:type="CharTerm" TERMINATOR='\",\"' MAX_LENGTH="4000"/> <FIELD ID="Column3Name" xsi:type="CharTerm" TERMINATOR='\r' MAX_LENGTH="4000"/> </RECORD> <ROW> <COLUMN SOURCE="Column1Name" NAME="Column1Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column2Name" NAME="Column2Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column3Name" NAME="Column3Name" xsi:type="SQLVARYCHAR"/> </ROW></BCPFORMAT>

This still leaves the text qualifier as the first character in column1 and the last character of the last column in my imported data. However I can just use T-SQL string manipulation after the import to remove those per this thread: http://www.sqlservercentral.com/Forums/FindPost87417.aspx

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 2

caspersql (10/9/2013)
Okay after hammering away for a while I figured this out. Since my header row contains text qualifiers, I had assumed that I needed to include those in my XML format file 'Field ID' and 'Column Source' attributes. However that wasn't required. All I needed to do was include the text qualifiers in the 'terminator' attribute.
This format file works for my sample text file:
<?xml version="1.0"?><BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="Column1Name" xsi:type="CharTerm" TERMINATOR='\",\"' MAX_LENGTH="4000"/> <FIELD ID="Column2Name" xsi:type="CharTerm" TERMINATOR='\",\"' MAX_LENGTH="4000"/> <FIELD ID="Column3Name" xsi:type="CharTerm" TERMINATOR='\r' MAX_LENGTH="4000"/> </RECORD> <ROW> <COLUMN SOURCE="Column1Name" NAME="Column1Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column2Name" NAME="Column2Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column3Name" NAME="Column3Name" xsi:type="SQLVARYCHAR"/> </ROW></BCPFORMAT>
This still leaves the text qualifier as the first character in column1 and the last character of the last column in my imported data. However I can just use T-SQL string manipulation after the import to remove those per this thread: http://www.sqlservercentral.com/Forums/FindPost87417.aspx

I hate the XML format files. Requires two different line types for each column, is tag bloated, generally looks like crap, and etc. 🙂

Try using a standard format file. Much easier to grasp IMHO. I haven't tested this particular one but should auto-magically get rid of the first and last quote...

10.0

4

1 SQLCHAR 0 1 "" 0 FirstQuote ""

2 SQLCHAR 0 8000 "\",\"" 1 Column1Name ""

3 SQLCHAR 0 8000 "\",\"" 2 Column2Name ""

4 SQLCHAR 0 8000 "\"\r\ n" 3 Column3Name ""

NOTE: REMOVE THE SPACE FROM BETWEEN THE \ and the n in row 4. This forum "eats" that particular combination of characters.

It also makes the column widths wider than the columns in the table so that it auto-magically checks for oversize data. The import will fail if the data is wider than the columns in the table. You could sequester the failed rows by using the error file feature of either BCP or BULK INSERT.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

caspersql SSCrazy Points: 2276 More actions · Answer 3

Thanks Jeff, I like this idea because I'll be generating the format file using BCP and dynamic SQL. I read here http://social.msdn.microsoft.com/Forums/sqlserver/en-US/5e996cc4-466a-4569-912a-8782beba6806/bcp-and-dynamic-sql?forum=transactsql that there's a 4000 character limit for that so the less text in my format file the better!

When I use your suggested version, I still get a single text qualifier left in the last column of the last row of my data. I guess there's no /r/ n at the end of that line because there are no further rows in the file. Do you experience that and if so, do you just use string manipulation afterwards to remove the offending qualifier?

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 4

caspersql (10/10/2013)
Thanks Jeff, I like this idea because I'll be generating the format file using BCP and dynamic SQL. I read here http://social.msdn.microsoft.com/Forums/sqlserver/en-US/5e996cc4-466a-4569-912a-8782beba6806/bcp-and-dynamic-sql?forum=transactsql that there's a 4000 character limit for that so the less text in my format file the better!
When I use your suggested version, I still get a single text qualifier left in the last column of the last row of my data. I guess there's no /r/ n at the end of that line because there are no further rows in the file. Do you experience that and if so, do you just use string manipulation afterwards to remove the offending qualifier?

Ugh!... Yeah... it's probably because there's no ^C_R^L_F at the end of the file. That's the age old problem of the sender of the data not using a consistant format. I'm actually a bit surprised it didn't cause an error.

And, yes... some "post import" validation is essential. That's why I always (and I don't say that word very often in anything having to do with computers) load the data into a staging table instead of into final tables. That's where you can check and correct the final column for the common error that you're running across.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 5

I could certainly be wrong but I'll also add that I doubt that BCP has a 4K limit for the format file especially since they allow XML formats. I've not found any documentation in Books Online that speaks of any such limit. These types of problems usually arise because someone doesn't realize that dynamic SQL results sometimes truncate to 4 or 8k if all of the variables being concatenated aren't all of the MAX datatype. To overcome that without setting all the variables to be so large, you can usually get away with making a CAST to a MAX datatype as the most outer function applied to the concatenation used to build the dynamic SQL.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

caspersql SSCrazy Points: 2276 More actions · Answer 6

Okay thanks for the insight Jeff, much appreciated.

vinaypandey28 8266 Valued Member Points: 67 More actions · Answer 7

Hi Jeff,

I am facing difficulty when column row is not having any text qualifier but data rows have.

what would be the format file structure in this case.

--Data file structure

Column1Name,Column2Name,Column3Name

"Row2Column1Data","Row2Column2Data","Row2Column3Data"

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 8

vinaypandey28 8266 (8/5/2016)
Hi Jeff,
I am facing difficulty when column row is not having any text qualifier but data rows have.
what would be the format file structure in this case.
--Data file structure
Column1Name,Column2Name,Column3Name
"Row2Column1Data","Row2Column2Data","Row2Column3Data"

What difficulty are you having? Are you using the "FirstRow" setting to skip the header? Also, can you post the command and the format file you're having problems with?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)