BULK INSERT & Text Qualifier

  • Greetings,

    I have a st*&id vendor that provides files that we need to load in the following format:

    "datahere","data with a comma, more","xxx", etc...

    Bastically it is comma-delimited with a text qualifier of '"'

    This is giving some heartache with respect to the use of BULK INSERT, as the fieldterminator and row terminator all work correctly, the only problem is that the first field is hosed. Here's the example T-SQL:

    BULK INSERT [InterPay Office].dbo.[address] FROM

    "c:\data\interlake\interpay office\tempfiles\address.csv"

    WITH

    (DATAFILETYPE='CHAR',

    FIELDTERMINATOR='","',

    FIRSTROW=2,

    ROWTERMINATOR='"\n')

    A sample of the output would be like this:

    ID Name Address etc....

    "1 Michael 656 whereever

    "2 Joe 12345 whatever

    Anybody have any suggestions about how to fix this. The sp is going to be run nightly, where we have about 15 tables to import.

    I guess that I could do the following as alternatives:

    1. Take the data as is and place another sp that does the replace of '"' with '' via a cursor

    2. Use VB to redo their files into a "decent" format.

    DTS allows you to specify the delimiter, text qualifier, and row terminator, but all of this functionality is not available in this method.

    Anyone have more suggestions?

  • Not sure I understand your question, but did you try SET QUOTED_IDENTIFIER OFF at the beginning of the script?

    --Jonathan



    --Jonathan

  • Yes,

    SET QUOTED_IDENTIFIER OFF was set at the top of the script. The basic problem is that I've got a file with data fields such as:

    "3637","","","","1505 London Road",......

    and using the BULK INSERT command (as shown) it is giving

    "3637 <NULL> <NULL> <NULL> 1505 London Road

    Everything is right, except for the "3637 which keeps on picking up the ".

    Hopefully that is clearer.

  • See if you can get it working using bcp interactively and then use the generated format file with BULK INSERT.

    --Jonathan



    --Jonathan

  • You can try to set your rowterminator to '"/n'. This way, SQL will "eat" the doublequote that starts each record after the 1st record. It might be cleaner in using a format file since each column delimiter may be specified individually.



    Once you understand the BITs, all the pieces come together

  • quote:


    You can try to set your rowterminator to '"/n'.


    Nice idea, Thomas. Although I think you meant using

    '\n"'

    , where the doublequote is after the newline tag.

    I had a similar situation, but solved it with using a format file to skip the first character, using a single doublequote as field terminator, and skipping the 2 char field ',"' between each field. I certainly like the combination of using the 3 char field terminator and 2 char row terminator, it would have saved some time developing and testing the format file.



    Mark

  • On a similar vein, I have used a field terminator or "," (double quote comman double quote).

    The downside to this is that the first double quote and the last double quote in the line are not removed so I had to remove them with string manipulation.

    select substring(col1,1,len(col1)-1),...., left(collast,len(collast)-1)

    etc.

    Jeremy

  • I've had to figure this out in the past. Here's what you need to do, using your example.

    "datahere","data with a comma, more","xxx"

    FMT file:

    7.0

    3

    1SQLCHAR010"\",\""1DataHereColumn

    2SQLCHAR030"\",\""2DataWithCommaColumn

    3SQLCHAR03"\"\r\n"3xxxColumn

    Using \",\" will treat the "," as the field terminator. You need to include the forward slash before the double quote mark so the format treats it as a real character.

    BULK INSERT query:

    BULK INSERT database.owner.targettable FROM 'drive:\path\file.txt' WITH (FIRSTROW = 2, FORMATFILE = 'drive:\path\formatfile.fmt')

    The first record should look like this when you're done:

    Field1: "datahere

    Field2: data with a comma, more

    Field3: xxx

    Next, you need to remove the leading double quote from Field1 for ALL records. If you do not have double quotes in the data, just use:

    UPDATE database.owner.table

    SET Field1 = REPLACE(Field1, '"', '')

    Let me know if this doesn't work for you!

    Edited by - mmortensen on 11/18/2003 12:01:58 PM

  • To be honest, it sounds more like a business relationship problem rather than a programming problem; how good is your relationship with this vendor? All of the solutions below are creative and may solve the immediate problem, but does it cause more problems in the long run? Does the vendor understand that you are relying on a daily batch of this data, and are they committed to maintaining this non-standard format?

    On the other hand, have you considered using DTS with an ActiveX script to transform the first column? Not sure if it will provide you with the speed you need, but it might be comparable. How much data per night are you talking; couple thousand rows or a couple million?

    HTH,

    Stu

  • Greetings,

    Thanks for you all of your support and thoughts on the matter. I've spent quite a few hours putting together format files, testing them and getting really pi#$ed.

    Using the bcp to write the format file using the -n switch does not give the required prefix, so I ended up using a combination of the automatically generated as well as the format file that is created when you try to do a load where the data types are not specified (this gives a prefix). Took both of these fmt files and merged them to get what I'd consider a good format file.

    Out of the 26 tables that I have to do this on, it worked on 24 tables. The only common thread that I can find is that these two tables have over 170 columns. The BULK INSERT and bcp both blow up, saying that the field length is too long for column 1 (which is a SQLINT 4). Rather than try to figure out how to cut down on the column load (not practical anyway) to find the breaking point, I decided on an alternate loading mechanism.

    Ultimately, I've ended up putting together a generic CSV to XML translator and then used SQLXML 3.0 bulk load to force the data in. Decided against using .NET datasets, due to the slow nature of the load.

    Would be interested in hearing any suggestions on the XML load speed issue (50,000 rows usually for each table)

    Relationship with the vendor? Let's put it this way, they are using MDBS Titanium 6.1f (www.mdbs.com), navigational model, as the datastore and generally are not very responsive to requests such as this. The only saving grace is that they will be moving to mySQL in an upcoming release (please insert bashing here), so I'll be able to reach in and get what I need when needed.

    Many thanks for all your help.

  • And what if my data looks like this:

    datahere,"data with a comma, more",xxx

    datahere,data without a comma,xxx

  • All of you are wrong.

    Using "," as delimites is an error waiting to happen.

    mdjtlj, ask you supplier to build a test file containing '19" Monitor'.

    Then try to process that file.

    _____________
    Code for TallyGenerator

  • For this kind of files it's better to use ODBC drivers.

    They read those strings with identifiers properly.

    I created text file TestInsert.txt in the folder C:\TextFiles (local to the SQL Server instance):

    [Code]"Name","Price"

    "Test","1,1"

    "19""Monitor, the nice one","250"[/Code]

    Then I added linked server:

    [Code]EXEC sp_addlinkedserver txtsrv1, 'Jet 4.0',

    'Microsoft.Jet.OLEDB.4.0',

    'C:\TextFiles',

    NULL,

    'Text'

    [/Code]

    To see the list of files available:

    [Code]EXEC sp_tables_ex txtsrv1[/Code]

    And now - just simple SELECT:

    [Code]SELECT *

    FROM txtsrv1...[TestInsert#txt]

    [/Code]

    I remember this code was done based on BOL examples, but I don't remember the topic.

    _____________
    Code for TallyGenerator

  • Sergiy (1/14/2008)


    All of you are wrong.

    Using "," as delimites is an error waiting to happen.

    mdjtlj, ask you supplier to build a test file containing '19" Monitor'.

    Then try to process that file.

    Works fine if BCP format file is properly constructed. Delimiter of "," (including the quotes) is perfect for "real" CSV instead of "comedy separated values".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • remember this code was done based on BOL examples, but I don't remember the topic.

    First, nice idea... hadn't thought about doing it that way. Great way to import from a staging area. Thanks for the tip.

    And the topic you're looking for is "Linked Servers".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply