Loading text file to SQL Server 2005 using bulk insert

  • Hello,

    I am trying to load text files (comma delimited) into a table in SQL Server 2005 management studio using bulk insert. I am able to load the file into the table but the bulk insert process isn't identifying correct row delimiter. My text file has different number of columns for each row:

    row1: col1,col2,col3

    row2: col1,col2

    row3: col1,col2,col3,col4

    row4: col1,col2,col,3,col4,col5

    here is the exact script i am using:

    BULK

    INSERT ve_load2

    FROM 'C:\ve\files\Copy of VACE.txt'

    WITH

    (

    FIELDTERMINATOR = ',',

    ROWTERMINATOR = ';'

    )

    GO

    Here is what I am getting in the table:

    row1: col1,col2,col3,row2: col1,col2

    row3: col1,col2,col3,col4,row4: col1,col2,col,3,col4,col5

    Can someone help me in loading this text file?

    Thanks

    Ro

  • Heh... that's known as a "Headerless Ragged Right" file and they're a bugger to import correctly. BCP and BULK insert require that all the rows have the same number of delimiters so you can't do an import of such a file with either.

    However... with a little help from a couple of DOS commands and a little prestidigitation with a text based linked server, it can be done with very little pain.

    First, I assume you have a file, like in your original post, in C:\Temp called Test01.txt and it looks like this...

    [font="Courier New"]col1,col2,col3

    col1,col2

    col1,col2,col3,col4

    col1,col2,col,3,col4,col5[/font]

    Next, let's setup a linked server and give it the necessary login privs...

    --===== Create a linked server to the drive and path you desire.

    EXEC dbo.sp_AddLinkedServer TxtSvr,

    'Jet 4.0',

    'Microsoft.Jet.OLEDB.4.0',

    'C:\Temp',

    NULL,

    'Text'

    --===== Set up login mappings.

    EXEC dbo.sp_AddLinkedSrvLogin TxtSvr, FALSE, NULL, Admin, NULL

    Here comes the fun part... if we just read the file directly...

    --===== Query one of the files by using a four-part name.

    SELECT *

    FROM TxtSvr...[Test01#txt]

    ... we get an awful mess that looks like this...

    [font="Courier New"]

    F4 F5 col1 col2 col3

    col1 col2

    col4 col1 col2 col3

    col4 col5 col1 col2 col3

    (3 row(s) affected)[/font]

    notice that the first row was used as a header and that the data has been rearranged AND that two of the headers have been made up! We can fix that with a couple of simple DOS commands...

    --===== Create a header that identifies the columns we want... (any col names will do)

    EXEC Master.dbo.xp_CmdShell 'ECHO HdrCol1,HdrCol2,HdrCol3,HdrCol4,HdrCol5 > C:\Temp\Header.txt'

    --===== Create a new working file that has the header and the original file as one

    EXEC Master.dbo.xp_CmdShell 'COPY C:\Temp\Header.txt+C:\Temp\Test01.txt C:\Temp\MyWork.txt'

    Then, accessing the data in the file is a breeze... do with it what you want because it can now be referenced as a table with the correct column names...

    --===== Read the csv text file as if it were a table

    SELECT *

    FROM TxtSvr...[MyWork#txt]

    [font="Courier New"]HdrCol1 HdrCol2 HdrCol3 HdrCol4 HdrCol5

    col1 col2 col3

    col1 col2

    col1 col2 col3 col4

    col1 col2 col3 col4 col5

    (4 row(s) affected)[/font]

    If you need to drop the linked server after than, the following command will do nicely...

    EXEC dbo.sp_DropServer 'TxtSvr', 'DropLogins'

    😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/16/2008)


    Heh... that's known as a "Headerless Ragged Right" file and they're a bugger to import correctly. BCP and BULK insert require that all the rows have the same number of delimiters so you can't do an import of such a file with either.

    However... with a little help from a couple of DOS commands and a little prestidigitation with a text based linked server, it can be done with very little pain.

    First, I assume you have a file, like in your original post, in C:\Temp called Test01.txt and it looks like this...

    [font="Courier New"]col1,col2,col3

    col1,col2

    col1,col2,col3,col4

    col1,col2,col,3,col4,col5[/font]

    Next, let's setup a linked server and give it the necessary login privs...

    --===== Create a linked server to the drive and path you desire.

    EXEC dbo.sp_AddLinkedServer TxtSvr,

    'Jet 4.0',

    'Microsoft.Jet.OLEDB.4.0',

    'C:\Temp',

    NULL,

    'Text'

    --===== Set up login mappings.

    EXEC dbo.sp_AddLinkedSrvLogin TxtSvr, FALSE, NULL, Admin, NULL

    Here comes the fun part... if we just read the file directly...

    --===== Query one of the files by using a four-part name.

    SELECT *

    FROM TxtSvr...[Test01#txt]

    ... we get an awful mess that looks like this...

    [font="Courier New"]

    F4 F5 col1 col2 col3

    col1 col2

    col4 col1 col2 col3

    col4 col5 col1 col2 col3

    (3 row(s) affected)[/font]

    notice that the first row was used as a header and that the data has been rearranged AND that two of the headers have been made up! We can fix that with a couple of simple DOS commands...

    --===== Create a header that identifies the columns we want... (any col names will do)

    EXEC Master.dbo.xp_CmdShell 'ECHO HdrCol1,HdrCol2,HdrCol3,HdrCol4,HdrCol5 > C:\Temp\Header.txt'

    --===== Create a new working file that has the header and the original file as one

    EXEC Master.dbo.xp_CmdShell 'COPY C:\Temp\Header.txt+C:\Temp\Test01.txt C:\Temp\MyWork.txt'

    Then, accessing the data in the file is a breeze... do with it what you want because it can now be referenced as a table with the correct column names...

    --===== Read the csv text file as if it were a table

    SELECT *

    FROM TxtSvr...[MyWork#txt]

    [font="Courier New"]HdrCol1 HdrCol2 HdrCol3 HdrCol4 HdrCol5

    col1 col2 col3

    col1 col2

    col1 col2 col3 col4

    col1 col2 col3 col4 col5

    (4 row(s) affected)[/font]

    If you need to drop the linked server after than, the following command will do nicely...

    EXEC dbo.sp_DropServer 'TxtSvr', 'DropLogins'

    😛

    I have tried to use this solution, but could not get it to work. It looks like it could solve my problems I've been having this morning, importing data with different columns in it, but I get the following error:

    OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "txtsrv" returned message "Cannot start your application. The workgroup information file is missing or opened exclusively by another user.".

    Msg 7399, Level 16, State 1, Procedure sp_tables_ex, Line 41

    The OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "txtsrv" reported an error. Authentication failed.

    Msg 7303, Level 16, State 1, Procedure sp_tables_ex, Line 41

    Cannot initialize the data source object of OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "txtsrv".

    Here is the code I used, it's pretty much the same code you posted:

    EXEC sp_addlinkedserver txtsrv, 'Jet 4.0',

    'Microsoft.Jet.OLEDB.4.0',

    'c:\SEND\SPF',

    NULL,

    'Text'

    --Set up login mappings.

    EXEC sp_addlinkedsrvlogin txtsrv, false, NULL,'sa','*******'

    --List the tables in the linked server.

    EXEC sp_tables_ex txtsrv

    --Query one of the tables: file1#txt

    --using a four-part name.

    SELECT *

    FROM txtsrv...[merge123#xmt]

    EXEC dbo.sp_DropServer 'txtsrv', 'DropLogins'

    ...

    Is there some problems with my authentication, does it need a windows account to access the file, or "sa" to actually have the rights to load the file?

    Any help would be appreciated.

    Cheers,

    J-F

  • Couple of things may be happening here... first off, where is 'c:\SEND\SPF' ?? If it's not on the server, then you have to change the path to a UNC. If you do change it to a UNC, then the SQL Server "Service" must be logged in as a user that can actually see that path.

    Even if that path IS on the server, it still might not be able to see that path. Here's a way to check...

    EXEC Master.dbo.xp_DirTree 'c:\SEND\SPF',1,1

    If you get an error, then your server can't see the path and the login the service is logging in as has to be changed. I believe you start that process by right clicking on the server icon in the Registered Server browser and selection Properties...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff, thanks for your answer.

    I still get the problem, I used the procedure to see if the server had access and it gave me this result:

    subdirectory depthfile

    1VND2028.GEN 11

    merge.xmt 11

    merge123.xmt 11

    NEWPRD2028.GEN 11

    PRD2028.GEN 11

    schema.ini 11

    TestImport.gen 11

    VND2028.GEN 11

    VNDCATEG2028.GEN11

    I've searched a lot on the web to find answers to that error, but they keep giving me some "Access Errors", fix the access file, etc, etc. This is bull since i have a txt file accessed by the jet engine driver.

    Even when i try to edit the linked server with the Management studio, it gives me the same error when i click OK. What is causing this error, since i can access the path on the server, and i gave it the "SA" credentials, it should have full access...

    If it can help anyone help me, here are the properties for the linked server :

    /****** Object: LinkedServer [txtsrv] Script Date: 08/20/2008 08:18:27 ******/

    EXEC master.dbo.sp_addlinkedserver @server = N'txtsrv', @srvproduct=N'Jet 4.0', @provider=N'Microsoft.Jet.OLEDB.4.0', @datasrc=N'c:\SEND\SPF', @location=N'it020', @provstr=N'Text'

    /* For security reasons the linked server remote logins password is changed with ######## */

    EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'txtsrv',@useself=N'False',@locallogin=NULL,@rmtuser=N'sa',@rmtpassword='########'

    GO

    EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'collation compatible', @optvalue=N'false'

    GO

    EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'data access', @optvalue=N'true'

    GO

    EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'dist', @optvalue=N'false'

    GO

    EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'pub', @optvalue=N'false'

    GO

    EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'rpc', @optvalue=N'false'

    GO

    EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'rpc out', @optvalue=N'false'

    GO

    EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'sub', @optvalue=N'false'

    GO

    EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'connect timeout', @optvalue=N'0'

    GO

    EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'collation name', @optvalue=null

    GO

    EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'lazy schema validation', @optvalue=N'false'

    GO

    EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'query timeout', @optvalue=N'0'

    GO

    EXEC master.dbo.sp_serveroption @server=N'txtsrv', @optname=N'use remote collation', @optvalue=N'true'

    Hope to have some quick feedback, as I am stuck right now... 😉

    Thanks in advance!

    JF

    Cheers,

    J-F

  • Finally got it to work! There was an error with my login, it was not in the right order, needed to put 'Sa' as the third paremeter, not the fourth!!!

    Here is the code I use now:

    EXEC sp_addlinkedserver txtsrv, 'Jet 4.0',

    'Microsoft.Jet.OLEDB.4.0',

    'c:\SEND\SPF',

    NULL,

    'Text'

    GO

    --Set up login mappings.

    EXEC sp_addlinkedsrvlogin txtsrv, FALSE, 'sa', NULL

    GO

    --List the tables in the linked server.

    EXEC sp_tables_ex txtsrv

    GO

    --Query one of the tables: file1#txt

    --using a four-part name.

    SELECT *

    FROM txtsrv...[Merge123#txt]

    EXEC dbo.sp_DropServer 'txtsrv', 'DropLogins'

    But, I found this solution to be slower than using the OpenRowSet command (25 secs for 80k rows, compared to 15 seconds for OpenRowSet). And since I will have about 1 million rows for big updates, might as well work with the faster command! Here is the code, if it can help anyone:

    USE MASTER

    GO

    -- This is needed to override the Ad Hoc Distributed Queries

    -- To be able to load files with different columns numbers in them.

    sp_configure 'Show Advanced Options', 1

    GO

    Reconfigure with override

    GO

    sp_configure 'Ad Hoc Distributed Queries', 1

    GO

    Reconfigure with override

    GO

    USE "Your Database"

    GO

    SELECT *

    FROM OPENROWSET(

    'MSDASQL',

    'Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=C:\SEND\SPF;',

    'SELECT *

    FROM merge123.txt')

    So, as I said, this procedure is faster for me, maybe because of the driver that is used to load the data. I can do any data manipulation from there, and import them in temp tables, or update my data. You might need a Schema file to import the data correctly, here is the script I built for that:

    EXEC Master.dbo.xp_CmdShell

    'ECHO [Merge123.txt] > C:\Send\SPF\Schema.ini'

    EXEC Master.dbo.xp_CmdShell

    'ECHO ColNameHeader=true >> C:\Send\SPF\Schema.ini'

    EXEC Master.dbo.xp_CmdShell

    'ECHO Format=TabDelimited >> C:\Send\SPF\Schema.ini'

    EXEC Master.dbo.xp_CmdShell

    'ECHO MaxScanRows=25 >> C:\Send\SPF\Schema.ini'

    EXEC Master.dbo.xp_CmdShell

    'ECHO CharacterSet=ANSI >> C:\Send\SPF\Schema.ini'

    For more information on how to build a schema file, use this link

    http://puna.net.nz/archives/Technical/schemaini.htm

    Hope it helps!

    Thanks a lot for the help Jeff,

    JF

    Cheers,

    J-F

  • Hi,

    I have an issue with using openrowSet:

    When using on real data, i got several products that with double quotes in the description, and caused me problems.

    Here is an example:

    ProductID Description UPC

    1 Wire 10" 12345678901234

    The OpenRowSet imports the data, and works fine, but the description is cut to "Wire 10", (with no ending double quote), and the UPC value is not present in the field, it is NULL.

    I googled the problem, and got no solution, is there a parameter I can use to keep the double quotes in the description?

    Thanks in advance,

    Cheers,

    J-F

  • jfbergeron (8/20/2008)


    Finally got it to work! There was an error with my login, it was not in the right order, needed to put 'Sa' as the third paremeter, not the fourth!!!

    Hope it helps!

    Thanks a lot for the help Jeff,

    JF

    Outstanding! Thanks for posting your solutions and the bit of test info about linked serverr being slower. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jfbergeron (8/25/2008)


    Hi,

    I have an issue with using openrowSet:

    When using on real data, i got several products that with double quotes in the description, and caused me problems.

    Here is an example:

    ProductID Description UPC

    1 Wire 10" 12345678901234

    The OpenRowSet imports the data, and works fine, but the description is cut to "Wire 10", (with no ending double quote), and the UPC value is not present in the field, it is NULL.

    I googled the problem, and got no solution, is there a parameter I can use to keep the double quotes in the description?

    Thanks in advance,

    Not that I know of... but no real problem either... just use BULK INSERT with a comma delimiter, instead.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This is extremely helpful. One question: is there a way to populate the file name in the schema.ini file using a like statement. For example, I want to say "pick up any and all files (one at a time) where file name starts with words "MD"". Can we do that in schema.ini?

  • Wow, I was facing a similar scenario and tried this proposed solution, and it worked just perfect :). Thanks, this saved me lots of time!

  • I wished for that ability also (wildcards in [fileName*] but it doesn't seem to be supported.

    But you can always start with some token [FileName#SuffixGoesHere#] and do a string replace in the Schema.ini file when you know what the real suffix is before you invoke the transfer that uses the Schema.ini file.

    Alternately, if you have a (small) number of possible suffixes, you can simply copy the entire file specification multiple times, each with the appropriate [fileNamexxx] header in the single Schema.ini file.

    (I have noticed that Jet4.0 odbc access to a text file, is not bothered (as is SSIS) by some garbage comma-less preamble rows prior to the comma delimited data rows -- the "missing" columns simply come through as NULL.)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply