SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Bulk insert with format file - handling quotation mark text qualifiers in header row


Bulk insert with format file - handling quotation mark text qualifiers in header row

Author
Message
caspersql
caspersql
Mr or Mrs. 500
Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)

Group: General Forum Members
Points: 580 Visits: 252
I'm trying to use BULK insert a flat file.

The file has a header row. In addition to the delimiter character which is
,
the header row contains text qualifiers:
"


The text file looks like this when opened in notepad:
"Column1Name","Column2Name","Column3Name"
"Row2Column1Data","Row2Column2Data","Row2Column3Data"


I would like to use the bulk insert command with the format file option to import the data into a table in my database.

The table in my database looks like this:
CREATE TABLE [dbo].[BulkInsertedData](
[Column1Name] [nvarchar](4000) NULL,
[Column2Name] [nvarchar](4000) NULL,
[Column3Name] [nvarchar](4000) NULL
) ON [PRIMARY]



If I try to import from a flat file that is identical in all respects except that does not contain delimiters, I can import without any problems. Without delimiters, the format file looks like this:
<?xml version="1.0"?><BCPFORMAT    xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">  <RECORD>    <FIELD ID="Column1Name" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="4000"/>     <FIELD ID="Column2Name" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="4000"/>     <FIELD ID="Column3Name" xsi:type="CharTerm" TERMINATOR="\r" MAX_LENGTH="4000"/>   </RECORD>  <ROW>    <COLUMN SOURCE="Column1Name" NAME="Column1Name" xsi:type="SQLVARYCHAR"/>    <COLUMN SOURCE="Column2Name" NAME="Column2Name" xsi:type="SQLVARYCHAR"/>    <COLUMN SOURCE="Column3Name" NAME="Column3Name" xsi:type="SQLVARYCHAR"/>      </ROW></BCPFORMAT>


The bulk insert command I use to get the data in looks like this:
BULK INSERT BulkInsertedData
FROM 'E:\1.txt'
WITH
(
FORMATFILE = 'E:\FormatFile.txt',
FIRSTROW = 2, -- first row has column headings
KEEPIDENTITY
)



My problem is - I can't figure out how to change the format file so that the bulk insert works when the header data is surrounded by text qualifiers as per the text file described above. I'm guessing the problem is escaping certain characters, but after hacking at it for a few hours, I haven't been able to figure out where to put the escape characters. Has anyone else successfully done this and if so, how did your XML format file differ from mine?
caspersql
caspersql
Mr or Mrs. 500
Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)

Group: General Forum Members
Points: 580 Visits: 252
Okay after hammering away for a while I figured this out. Since my header row contains text qualifiers, I had assumed that I needed to include those in my XML format file 'Field ID' and 'Column Source' attributes. However that wasn't required. All I needed to do was include the text qualifiers in the 'terminator' attribute.

This format file works for my sample text file:
<?xml version="1.0"?><BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="Column1Name" xsi:type="CharTerm" TERMINATOR='\",\"' MAX_LENGTH="4000"/> <FIELD ID="Column2Name" xsi:type="CharTerm" TERMINATOR='\",\"' MAX_LENGTH="4000"/> <FIELD ID="Column3Name" xsi:type="CharTerm" TERMINATOR='\r' MAX_LENGTH="4000"/> </RECORD> <ROW> <COLUMN SOURCE="Column1Name" NAME="Column1Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column2Name" NAME="Column2Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column3Name" NAME="Column3Name" xsi:type="SQLVARYCHAR"/> </ROW></BCPFORMAT>

This still leaves the text qualifier as the first character in column1 and the last character of the last column in my imported data. However I can just use T-SQL string manipulation after the import to remove those per this thread: http://www.sqlservercentral.com/Forums/FindPost87417.aspx
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)

Group: General Forum Members
Points: 218219 Visits: 41995
caspersql (10/9/2013)
Okay after hammering away for a while I figured this out. Since my header row contains text qualifiers, I had assumed that I needed to include those in my XML format file 'Field ID' and 'Column Source' attributes. However that wasn't required. All I needed to do was include the text qualifiers in the 'terminator' attribute.
This format file works for my sample text file:
<?xml version="1.0"?><BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="Column1Name" xsi:type="CharTerm" TERMINATOR='\",\"' MAX_LENGTH="4000"/> <FIELD ID="Column2Name" xsi:type="CharTerm" TERMINATOR='\",\"' MAX_LENGTH="4000"/> <FIELD ID="Column3Name" xsi:type="CharTerm" TERMINATOR='\r' MAX_LENGTH="4000"/> </RECORD> <ROW> <COLUMN SOURCE="Column1Name" NAME="Column1Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column2Name" NAME="Column2Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column3Name" NAME="Column3Name" xsi:type="SQLVARYCHAR"/> </ROW></BCPFORMAT>

This still leaves the text qualifier as the first character in column1 and the last character of the last column in my imported data. However I can just use T-SQL string manipulation after the import to remove those per this thread: http://www.sqlservercentral.com/Forums/FindPost87417.aspx


I hate the XML format files. Requires two different line types for each column, is tag bloated, generally looks like crap, and etc. :-)

Try using a standard format file. Much easier to grasp IMHO. I haven't tested this particular one but should auto-magically get rid of the first and last quote...


10.0
4
1 SQLCHAR 0 1 "" 0 FirstQuote ""
2 SQLCHAR 0 8000 "\",\"" 1 Column1Name ""
3 SQLCHAR 0 8000 "\",\"" 2 Column2Name ""
4 SQLCHAR 0 8000 "\"\r\ n" 3 Column3Name ""

NOTE: REMOVE THE SPACE FROM BETWEEN THE \ and the n in row 4. This forum "eats" that particular combination of characters.




It also makes the column widths wider than the columns in the table so that it auto-magically checks for oversize data. The import will fail if the data is wider than the columns in the table. You could sequester the failed rows by using the error file feature of either BCP or BULK INSERT.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
caspersql
caspersql
Mr or Mrs. 500
Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)

Group: General Forum Members
Points: 580 Visits: 252
Thanks Jeff, I like this idea because I'll be generating the format file using BCP and dynamic SQL. I read here http://social.msdn.microsoft.com/Forums/sqlserver/en-US/5e996cc4-466a-4569-912a-8782beba6806/bcp-and-dynamic-sql?forum=transactsql that there's a 4000 character limit for that so the less text in my format file the better!

When I use your suggested version, I still get a single text qualifier left in the last column of the last row of my data. I guess there's no /r/ n at the end of that line because there are no further rows in the file. Do you experience that and if so, do you just use string manipulation afterwards to remove the offending qualifier?
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)

Group: General Forum Members
Points: 218219 Visits: 41995
caspersql (10/10/2013)
Thanks Jeff, I like this idea because I'll be generating the format file using BCP and dynamic SQL. I read here http://social.msdn.microsoft.com/Forums/sqlserver/en-US/5e996cc4-466a-4569-912a-8782beba6806/bcp-and-dynamic-sql?forum=transactsql that there's a 4000 character limit for that so the less text in my format file the better!

When I use your suggested version, I still get a single text qualifier left in the last column of the last row of my data. I guess there's no /r/ n at the end of that line because there are no further rows in the file. Do you experience that and if so, do you just use string manipulation afterwards to remove the offending qualifier?


Ugh!... Yeah... it's probably because there's no CRLF at the end of the file. That's the age old problem of the sender of the data not using a consistant format. I'm actually a bit surprised it didn't cause an error.

And, yes... some "post import" validation is essential. That's why I always (and I don't say that word very often in anything having to do with computers) load the data into a staging table instead of into final tables. That's where you can check and correct the final column for the common error that you're running across.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)

Group: General Forum Members
Points: 218219 Visits: 41995
I could certainly be wrong but I'll also add that I doubt that BCP has a 4K limit for the format file especially since they allow XML formats. I've not found any documentation in Books Online that speaks of any such limit. These types of problems usually arise because someone doesn't realize that dynamic SQL results sometimes truncate to 4 or 8k if all of the variables being concatenated aren't all of the MAX datatype. To overcome that without setting all the variables to be so large, you can usually get away with making a CAST to a MAX datatype as the most outer function applied to the concatenation used to build the dynamic SQL.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
caspersql
caspersql
Mr or Mrs. 500
Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)Mr or Mrs. 500 (580 reputation)

Group: General Forum Members
Points: 580 Visits: 252
Okay thanks for the insight Jeff, much appreciated.
vinaypandey28 8266
vinaypandey28 8266
SSC Rookie
SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)SSC Rookie (31 reputation)

Group: General Forum Members
Points: 31 Visits: 0
Hi Jeff,

I am facing difficulty when column row is not having any text qualifier but data rows have.
what would be the format file structure in this case.

--Data file structure
Column1Name,Column2Name,Column3Name
"Row2Column1Data","Row2Column2Data","Row2Column3Data"
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)

Group: General Forum Members
Points: 218219 Visits: 41995
vinaypandey28 8266 (8/5/2016)
Hi Jeff,

I am facing difficulty when column row is not having any text qualifier but data rows have.
what would be the format file structure in this case.

--Data file structure
Column1Name,Column2Name,Column3Name
"Row2Column1Data","Row2Column2Data","Row2Column3Data"


What difficulty are you having? Are you using the "FirstRow" setting to skip the header? Also, can you post the command and the format file you're having problems with?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search