differiniate quotes as data vs quotes as column identifier

  • I need to import (using DTS) some comma delimited text files into sql server 2000.

    I have discovered that the quotation mark is getting dual use. Not only is some of column data surrounded by quotes, but sometimes the quote marks are part of the data itself, and that's causing problems.

    i.e.

    1234, "some data here", "John "jack" Smith", 1234, 1234

    What's the conventional wisdom on how to handle that? It looks like something I could key on using regular expressions.

    Any suggestions?

    Thanks,

    E3

  • A proper format file using BCP will do... dunno if you can do that in DTS without writing some sort of script...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I don't mind writing a script; althought it would be nice if I could find one already written that does something similar. 🙂

  • I'll have to defer to someone else because I always end up using the BCP format file 😀

    Hey folks, can anyone else help here? I don't use DTS for this type of stuff and the OP sure could use some help here...

    Thanks

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, I don't mind using a bcp format file either, but I read up on that, and I'm not sure how I'd go about allowing one set of quotes and ignoring (allowing) the other (internal) set. The bottom line is just getting the data into sql server, separated into its correct columns. Beyond that I don't really care how it's done.

  • Sure, Bob... If you know how to use a format file, you're well ahead of the game...

    Take a look at the example data you gave...

    1234, "some data here", "John "jack" Smith", 1234, 1234

    ... notice that all of the RED stuff is what the column delimiters are (I'm also including the spaces you included as part of the delimiter). The things in GREEN are the "spurious" quotes.

    Now, it's not very well documented, but the way to represent a quote as part of a delimiter in a BCP format file is by using the escape character of "backslash" along with the quote. For example, if the full delimiter between a column were , " including the space in between, then the full delimiter in the BCP format file (including the required outer quotes) would be...

    ", \""

    Notice that's nothing like the double quotes around the word "jack". 😉

    So, with the idea that you use \" to represent a double quote character as part of the BCP format, here's what the format file might look like for the example you gave...

    8.0

    5

    1 SQLCHAR 0 100 ", \"" 1 somenumcol ""

    2 SQLCHAR 0 100 "\", \"" 2 somedata ""

    3 SQLCHAR 0 100 "\", " 3 fullname ""

    4 SQLCHAR 0 100 ", " 4 somenumcol2 ""

    5 SQLCHAR 0 100 "" 5 somenumcol3 ""

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey Jeff,

    This is a good solution ... using format file. I am wondering what if we don't know which column has the double quotes? Is there any default delimiter we can put for each column which can handle comma separated and/or double quotes around the text? Thanks!

  • Actually, there is an "automatic" method. It involves the use of a "text based" linked server. If you lookup "sp_AddLinkedServer" in BOL and then find the example with the following label...

    [font="Arial Black"]F. Using the Microsoft OLE DB Provider for Jet to access a text file[/font]

    ... it'll tell you how to do it although it will be a bit slower than if you know the exact format and use BULK INSERT or BCP. Do note that the # sign in the example is the required replacement for the "." that separates a file name from the extension.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey Jeff,

    Thanks a lot for your response. I tried that and I got this error:

    The OLE DB provider "Microsoft.Jet.OLEDB.4.0" has not been registered.

    I am running this on:

    Microsoft SQL Server 2005 - 9.00.3239.00 (X64) Apr 9 2008 22:41:28 Copyright (c) 1988-2005 Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2)

    I asked the IT team about 64 version of MSDASQL and he said yeah it is installed... not sure why I get this error. Thanks again.

  • Ghanta (7/23/2009)


    Hey Jeff,

    Thanks a lot for your response. I tried that and I got this error:

    The OLE DB provider "Microsoft.Jet.OLEDB.4.0" has not been registered.

    I am running this on:

    Microsoft SQL Server 2005 - 9.00.3239.00 (X64) Apr 9 2008 22:41:28 Copyright (c) 1988-2005 Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2)

    I asked the IT team about 64 version of MSDASQL and he said yeah it is installed... not sure why I get this error. Thanks again.

    I have a working example at home and I can't "get there" from work. I'll check on this tonight.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Great Thanks a lot Jeff.

  • Ah crud... ya just gotta love stuff like this. The error message you posted indicates that you're running a 64 bit machine... and JET isn't supported on 64 bit machines. Please see the following thread for the bad news...

    http://social.msdn.microsoft.com/Forums/en-US/sqldataaccess/thread/60dc2b9a-eddb-4a13-8c87-e96655be9fca/

    This isn't the only problem I've seen with 64 bit boxes. Guess I won't be recommending 64 bit anytime soon. 😉

    Sorry for the bad news.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey Jeff,

    Thanks for taking time in assisting me... you have always been a huge help. I guess my next option should be to use SSIS where I will use variables for file and staging table.

    Thanks again Jeff.

  • Thanks for the feedback, Ghanta...

    There's gotta be a way to do this (auto recognition and intermitent use of text qualifiers) without having to fire up SSIS or writing a CLR or somesuch... Guess I'm going to have to make one.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yeah Jeff there has to be a way...I don't know how you have time to help us all over here... you are awesome.

Viewing 15 posts - 1 through 15 (of 45 total)

You must be logged in to reply to this topic. Login to reply