Stripping out double quotes (") in bulk insert

  • Jonathan Dabbs

    Old Hand

    Points: 368

    We have a client that's still on SQL Server 6.5 (yes, I know, I struggled to remember that far back too!), and they asked me to do some work on importing a text file into a table.  Never used DTS or bulk import in the past, but didn't think this was going to be too much of an issue.  however, I'm struggling to get the data in using bulk insert without the double quotes that are included in the file (the quotes are needed as some columns have several commas within the same field, and should all go into the same column).

    The format of the file is csv, and here's a sample line.

    "Fred", "Bloggs", "123 Any Street, Any town, Any county, AB1 2CD", "Teacher".

    When I use bulk insert, it imports it with the quotes still in the file. i.e.

    select * from table;

    Forename    Surname     Address                                                               Occupation

    "Fred"         "Bloggs"      "123 Any Street, Any town, Any county, AB1 2CD" "Teacher"

    I'm obviously doing something very simple wrong, but can someone point out what this is, so I can import into the table without these quotes?

    Unfortunately, I can't amend the source file, as it's supplied externally.

    Many thanks

    Jonathan

     

  • EddieBaquet

    SSC Veteran

    Points: 267

    Hi, Jonathan.

    Those quotes are probably ensuring the proper division of your fields.  Without them, your data won't go to the correct columns in your target because you have commas within the fields.  For future reference, you should use something a little less common as a delimiter like a vertical line (or pipe) |.

    If it's not too much data, you may want to load it, then run an update on every column using the REPLACE() function to replace those double-quotes with an empty string.  You may have to use the CHAR(x) value of " to make it easier to read and write.  CHAR(34) is the character code for double quote.

    Update YourTable

    Set YourColumn = REPLACE

    (YourColumn, CHAR(34), '')

    Hope this helps.

    Eddie



    "If you don't take the time to do it right, when are you going to find the time to do it over?"

  • David Burrows

    SSC Guru

    Points: 64736

    If you can guarantee the format for all records then use a format file containing :-

    8.0

    5

    1  SQLCHAR  0  1    "\""      0  Unwanted  ""

    2  SQLCHAR  0  255  "\", \""  1  Firstname  ""

    3  SQLCHAR  0  255  "\", \""  2  Lastname  ""

    4  SQLCHAR  0  255  "\", \""  3  Address  ""

    5  SQLCHAR  0  255  "\"\r\n"  4  Occupation  ""

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Jonathan Dabbs

    Old Hand

    Points: 368

    A couple of useful options there, I'll have a play around with both of those, thanks for the input.

  • Lisa Slater Nicholls

    Hall of Fame

    Points: 3311

    If it's available, you might also try consuming this data via Excel, which will "understand" this text file as a comma-separated, quotation-mark-field-delimited file, like this:

    DECLARE @SheetFileName VARCHAR(100), 

                @SheetX VARCHAR(150),

                @Err Int

    SET @SheetFileName = <Your fully-qualified source file name >
    -- I typically get the above from a configuration table
    -- Jet Engine is a bit finicky about the syntax here, but this is correct:
    SET @SheetX =  
    'Data Source="'+@SheetFileName+'";Extended properties=Excel 5.0'
    SET @Statement = N'INSERT INTO YourTable (<yourfieldlist&gt   ' +
                     N'SELECT * FROM ' + 
                     N' OpenDataSource( ''Microsoft.Jet.OLEDB.4.0'', '''  +@SheetX+N''' )...Sheet1$ ' 
    -- you can add any WHERE conditions you need above
    EXEC @Err = sp_executesql @Statement
    Regards,
    >L<
     
  • j.b.shirk@gmail.com

    SSC-Addicted

    Points: 443

    I think one problem one can have with this solution is Excel's limit of 65,535 records.

  • Emmanouil Karaiskakis

    Ten Centuries

    Points: 1013

    Hi

    About the 'Unwanted' column; does it have to exist on the table in which you wish to import the data?

  • Jeff Moden

    SSC Guru

    Points: 996832

    No... that's why the column number is "0". Read up on BCP Format Files... especially the example about skipping unwanted columns.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Emmanouil Karaiskakis

    Ten Centuries

    Points: 1013

    Thanks mate,

    Although I figured it out about 20mins after I posted the question. I couldn't have done it without the above info.

    So thanks everybody who contributed to the above. 🙂

  • Emmanouil Karaiskakis

    Ten Centuries

    Points: 1013

    Hi,

    I have created a program that deals with tab,comma, and quotes delimitations.

    Does anybody know how to deal with fixed length files?

    Bare in mind that although the structure of the table and format file show 59 columns, the Data file contains data for the first 34 columns. The rest are used for processing of the data at later stage...!

    I tried without the extra columns but that didn't work either.

    At the moment the issue is that it places the data in the wrong columns leaving blank columns. The data seems to be ok. It seems like it doesnt like the fact that the 3rd column is data type int. It works as char... but I would rather if i could keep it as int instead of having to convert it later.

    Is there a way around that?

    A format file example would be great 🙂

    here's a copy of my format file:

    9.0

    59

    1 SQLCHAR 0 32 "" 1 ClientID Latin1_General_BIN

    2 SQLCHAR 0 1 "" 2 Passive Latin1_General_BIN

    3 SQLCHAR 0 0 "" 3 Duration Latin1_General_BIN

    ....

    ....

    ....

    57 SQLCHAR 0 0 "" 57 StopFlag Latin1_General_BIN

    58 SQLCHAR 0 0 "" 58 RunsRemaining Latin1_General_BIN

    59 SQLCHAR 0 0 "\r" 59 RunJISID Latin1_General_BIN

    NOTE the last delimitator has \ n after the \r the forum is being weird...!

    Thanks

  • Jeff Moden

    SSC Guru

    Points: 996832

    A format file example would be great 🙂

    Yeah, huh? You expect us to guess what the record layout and target table schema is?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Emmanouil Karaiskakis

    Ten Centuries

    Points: 1013

    Sorry mate,

    Here's the table.

    Also how can you get it to NOT include spaces between columns. For example, if some of the rows are pushed to the left that will mean that the starting point of the next field will be further to the right causing the value to have space to the left of the value or push the value to the next column!

    CREATE TABLE CallTrace_ClientData.dbo.JobTable

    (

    ClientID varchar(32) NULL,

    Passive char(1) NULL,

    Duration int NULL,

    AddressLinks char(1) NULL,

    LearnedLinks char(1) NULL,

    ImpliedLinks char(1) NULL,

    ProductType char(1) NULL,

    LivingAsStated char(1) NULL,

    CreditActive char(1) NULL,

    SearchDeceased char(1) NULL,

    SearchLandRegistry char(1) NULL,

    SearchOccupancyStatus char(1) NULL,

    OccupierLookup char(1) NULL,

    DataCleanseAddress char(1) NULL,

    DataCleanseJointNames char(1) NULL,

    SearchTelephone char(1) NULL,

    SearchP2PScore char(1) NULL,

    SearchBAI char(1) NULL,

    DOBELookup char(1) NULL,

    ReconciliationFile char(1) NULL,

    "Name" char(120) NULL,

    Title char(30) NULL,

    Forename char(30) NULL,

    Othername char(30) NULL,

    Surname char(30) NULL,

    DateOfBirth char(10) NULL,

    Address1 char(150) NULL,

    Address2 char(150) NULL,

    Address3 char(150) NULL,

    Address4 char(150) NULL,

    Address5 char(150) NULL,

    Address6 char(150) NULL,

    Address7 char(150) NULL,

    Postcode char(10) NULL,

    StartDateAtAddress char(10) NULL,

    EndDateAtAddress char(10) NULL,

    P1_Address1 char(150) NULL,

    P1_Address2 char(150) NULL,

    P1_Address3 char(150) NULL,

    P1_Address4 char(150) NULL,

    P1_Address5 char(150) NULL,

    P1_Address6 char(150) NULL,

    P1_Address7 char(150) NULL,

    P1_Postcode char(10) NULL,

    P1_StartDateAtAddress char(10) NULL,

    P1_EndDateAtAddress char(10) NULL,

    P2_Address1 char(50) NULL,

    P2_Address2 char(50) NULL,

    P2_Address3 char(50) NULL,

    P2_Address4 char(50) NULL,

    P2_Address5 char(50) NULL,

    P2_Address6 char(50) NULL,

    P2_Address7 char(50) NULL,

    P2_Postcode char(8) NULL,

    P2_StartDateatAddress char(10) NULL,

    P2_EndDateAtAddress char(10) NULL,

    StopFlag int NULL,

    RunsRemaining int NULL,

    RunJISID int NULL

    )

  • Jeff Moden

    SSC Guru

    Points: 996832

    Thanks... that's half the battle... 😉

    Do you have any documentation on the record layout of the file you intend to import? And can you attach a file with, say, the first 50 lines of data from the file?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • David Burrows

    SSC Guru

    Points: 64736

    At the moment the issue is that it places the data in the wrong columns leaving blank columns. The data seems to be ok. It seems like it doesnt like the fact that the 3rd column is data type int. It works as char... but I would rather if i could keep it as int instead of having to convert it later.

    Is there a way around that?

    The format must specify the input length of each column whether you import it or not. INT columns will be implicitly converted from char automatically but you must specify the number of chars in the input.

    So, for example, for this file

    CREATE TABLE dbo.TestFile (C1 varchar(4) null,C2 varchar(5) null,C3 int null,C4 varchar(7) null,C5 varchar(8) null)

    and the data

    AAAABBBBB123 CCCCCCCDDDDDDDD

    and to ignore C5 on input, you need the following

    1 SQLCHAR 0 4 "" 1 C1 ""

    2 SQLCHAR 0 5 "" 2 C2 ""

    3 SQLCHAR 0 6 "" 3 C3 ""

    4 SQLCHAR 0 7 "" 4 C4 ""

    5 SQLCHAR 0 8 "" 0 C5 ""

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Jeff Moden

    SSC Guru

    Points: 996832

    Yowch! The data's a bloody mess. Who in their right mind would list the word "NULL" for a null in a transmitted text file nevermind add trailing spaces to some of the columnar data?

    Heh... one of my favorite suggestions... you need to find the person who provides you with the data, take them to dinner, and feed them 2 pound porkchops... WITH A SLING SHOT! 😛

    Not to worry, we can do this. Are you allowed to use xp_CmdShell?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 59 total)

You must be logged in to reply to this topic. Login to reply