Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Bulk insert with format file - handling quotation mark text qualifiers in header row Expand / Collapse
Author
Message
Posted Wednesday, October 9, 2013 5:57 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, October 10, 2013 8:39 AM
Points: 47, Visits: 227
I'm trying to use BULK insert a flat file.

The file has a header row. In addition to the delimiter character which is
,
the header row contains text qualifiers:
"


The text file looks like this when opened in notepad:
"Column1Name","Column2Name","Column3Name"
"Row2Column1Data","Row2Column2Data","Row2Column3Data"


I would like to use the bulk insert command with the format file option to import the data into a table in my database.

The table in my database looks like this:
CREATE TABLE [dbo].[BulkInsertedData](
[Column1Name] [nvarchar](4000) NULL,
[Column2Name] [nvarchar](4000) NULL,
[Column3Name] [nvarchar](4000) NULL
) ON [PRIMARY]

If I try to import from a flat file that is identical in all respects except that does not contain delimiters, I can import without any problems. Without delimiters, the format file looks like this:
<?xml version="1.0"?><BCPFORMAT    xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">  <RECORD>    <FIELD ID="Column1Name" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="4000"/>     <FIELD ID="Column2Name" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="4000"/>     <FIELD ID="Column3Name" xsi:type="CharTerm" TERMINATOR="\r" MAX_LENGTH="4000"/>   </RECORD>  <ROW>    <COLUMN SOURCE="Column1Name" NAME="Column1Name" xsi:type="SQLVARYCHAR"/>    <COLUMN SOURCE="Column2Name" NAME="Column2Name" xsi:type="SQLVARYCHAR"/>    <COLUMN SOURCE="Column3Name" NAME="Column3Name" xsi:type="SQLVARYCHAR"/>      </ROW></BCPFORMAT>


The bulk insert command I use to get the data in looks like this:
BULK INSERT BulkInsertedData
FROM 'E:\1.txt'
WITH
(
FORMATFILE = 'E:\FormatFile.txt',
FIRSTROW = 2, -- first row has column headings
KEEPIDENTITY
)

My problem is - I can't figure out how to change the format file so that the bulk insert works when the header data is surrounded by text qualifiers as per the text file described above. I'm guessing the problem is escaping certain characters, but after hacking at it for a few hours, I haven't been able to figure out where to put the escape characters. Has anyone else successfully done this and if so, how did your XML format file differ from mine?
Post #1503039
Posted Wednesday, October 9, 2013 9:43 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, October 10, 2013 8:39 AM
Points: 47, Visits: 227
Okay after hammering away for a while I figured this out. Since my header row contains text qualifiers, I had assumed that I needed to include those in my XML format file 'Field ID' and 'Column Source' attributes. However that wasn't required. All I needed to do was include the text qualifiers in the 'terminator' attribute.

This format file works for my sample text file:
<?xml version="1.0"?><BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="Column1Name" xsi:type="CharTerm" TERMINATOR='\",\"' MAX_LENGTH="4000"/> <FIELD ID="Column2Name" xsi:type="CharTerm" TERMINATOR='\",\"' MAX_LENGTH="4000"/> <FIELD ID="Column3Name" xsi:type="CharTerm" TERMINATOR='\r' MAX_LENGTH="4000"/> </RECORD> <ROW> <COLUMN SOURCE="Column1Name" NAME="Column1Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column2Name" NAME="Column2Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column3Name" NAME="Column3Name" xsi:type="SQLVARYCHAR"/> </ROW></BCPFORMAT>

This still leaves the text qualifier as the first character in column1 and the last character of the last column in my imported data. However I can just use T-SQL string manipulation after the import to remove those per this thread: http://www.sqlservercentral.com/Forums/FindPost87417.aspx
Post #1503176
Posted Wednesday, October 9, 2013 9:21 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:03 PM
Points: 36,983, Visits: 31,508
caspersql (10/9/2013)
Okay after hammering away for a while I figured this out. Since my header row contains text qualifiers, I had assumed that I needed to include those in my XML format file 'Field ID' and 'Column Source' attributes. However that wasn't required. All I needed to do was include the text qualifiers in the 'terminator' attribute.
This format file works for my sample text file:
<?xml version="1.0"?><BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <RECORD> <FIELD ID="Column1Name" xsi:type="CharTerm" TERMINATOR='\",\"' MAX_LENGTH="4000"/> <FIELD ID="Column2Name" xsi:type="CharTerm" TERMINATOR='\",\"' MAX_LENGTH="4000"/> <FIELD ID="Column3Name" xsi:type="CharTerm" TERMINATOR='\r' MAX_LENGTH="4000"/> </RECORD> <ROW> <COLUMN SOURCE="Column1Name" NAME="Column1Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column2Name" NAME="Column2Name" xsi:type="SQLVARYCHAR"/> <COLUMN SOURCE="Column3Name" NAME="Column3Name" xsi:type="SQLVARYCHAR"/> </ROW></BCPFORMAT>

This still leaves the text qualifier as the first character in column1 and the last character of the last column in my imported data. However I can just use T-SQL string manipulation after the import to remove those per this thread: http://www.sqlservercentral.com/Forums/FindPost87417.aspx


I hate the XML format files. Requires two different line types for each column, is tag bloated, generally looks like crap, and etc.

Try using a standard format file. Much easier to grasp IMHO. I haven't tested this particular one but should auto-magically get rid of the first and last quote...

10.0
4
1 SQLCHAR 0 1 "" 0 FirstQuote ""
2 SQLCHAR 0 8000 "\",\"" 1 Column1Name ""
3 SQLCHAR 0 8000 "\",\"" 2 Column2Name ""
4 SQLCHAR 0 8000 "\"\r\ n" 3 Column3Name ""

NOTE: REMOVE THE SPACE FROM BETWEEN THE \ and the n in row 4. This forum "eats" that particular combination of characters.


It also makes the column widths wider than the columns in the table so that it auto-magically checks for oversize data. The import will fail if the data is wider than the columns in the table. You could sequester the failed rows by using the error file feature of either BCP or BULK INSERT.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1503359
Posted Thursday, October 10, 2013 3:28 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, October 10, 2013 8:39 AM
Points: 47, Visits: 227
Thanks Jeff, I like this idea because I'll be generating the format file using BCP and dynamic SQL. I read here http://social.msdn.microsoft.com/Forums/sqlserver/en-US/5e996cc4-466a-4569-912a-8782beba6806/bcp-and-dynamic-sql?forum=transactsql that there's a 4000 character limit for that so the less text in my format file the better!

When I use your suggested version, I still get a single text qualifier left in the last column of the last row of my data. I guess there's no /r/ n at the end of that line because there are no further rows in the file. Do you experience that and if so, do you just use string manipulation afterwards to remove the offending qualifier?
Post #1503433
Posted Thursday, October 10, 2013 8:23 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:03 PM
Points: 36,983, Visits: 31,508
caspersql (10/10/2013)
Thanks Jeff, I like this idea because I'll be generating the format file using BCP and dynamic SQL. I read here http://social.msdn.microsoft.com/Forums/sqlserver/en-US/5e996cc4-466a-4569-912a-8782beba6806/bcp-and-dynamic-sql?forum=transactsql that there's a 4000 character limit for that so the less text in my format file the better!

When I use your suggested version, I still get a single text qualifier left in the last column of the last row of my data. I guess there's no /r/ n at the end of that line because there are no further rows in the file. Do you experience that and if so, do you just use string manipulation afterwards to remove the offending qualifier?


Ugh!... Yeah... it's probably because there's no CRLF at the end of the file. That's the age old problem of the sender of the data not using a consistant format. I'm actually a bit surprised it didn't cause an error.

And, yes... some "post import" validation is essential. That's why I always (and I don't say that word very often in anything having to do with computers) load the data into a staging table instead of into final tables. That's where you can check and correct the final column for the common error that you're running across.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1503616
Posted Thursday, October 10, 2013 8:35 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:03 PM
Points: 36,983, Visits: 31,508
I could certainly be wrong but I'll also add that I doubt that BCP has a 4K limit for the format file especially since they allow XML formats. I've not found any documentation in Books Online that speaks of any such limit. These types of problems usually arise because someone doesn't realize that dynamic SQL results sometimes truncate to 4 or 8k if all of the variables being concatenated aren't all of the MAX datatype. To overcome that without setting all the variables to be so large, you can usually get away with making a CAST to a MAX datatype as the most outer function applied to the concatenation used to build the dynamic SQL.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1503626
Posted Thursday, October 10, 2013 8:44 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, October 10, 2013 8:39 AM
Points: 47, Visits: 227
Okay thanks for the insight Jeff, much appreciated.
Post #1503630
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse