variable column count txt files to sql server table

  • I thought I'd share a unique problem that I'm dealing with and see if we can get a discussion going as to how to best handle it.

    I'm dealing with legacy clients, and we're in a position where we can't ask them change their file format.

    Here's the issue. We need to load into a sql table a comma delimited, text quoted file into a sql table.

    As an example when clientA came on board they only needed to send 2 columns of data, when clientB came on board they sent 4 Columns of data, and ClientC has 6 Columns of data.

    We can put in nulls for for the column data that is not sent.

    i.e.

    clientA's rec.txt

    "Alfred","Ames"

    "Aaron","Abbot"

    clientB's rec.txt

    "Bob","Baines",123,"Blue"

    "Bill","Bathgate",456,"Yellow"

    clientC's rec.txt

    "charlie","Cass",789,"green","123 anywhere","USA"

    "cathy","cow",444,"red",234 nowhere","Germany"

    I want to be able to load records from the three clients above into a single sql table:

    Col1 Col2 Col3 Col4 Col5 Col6

    Alfred Ames NULL NULL NULL NULL

    Aaron Abbot NULL NULL NULL NULL

    Bob Baines 123 Blue NULL NULL

    Bill Bathgate 456 Yellow NULL NULL

    charlie Cass 789 green 123 anywhere USA

    cathy cow 444 red 234 nowhere Germany

    SSIS requires that I map input column to output column and this leads to difficulties.

    1) I can have multiple data flow tasks and only execute a single one based on the number of columns in that particular txt file.

    ---This doesn't seem like a very clean solution to me especially with the number of different times I'd have to do this.

    2) Using a script transform I parse the file and programatically create the NULL rows for output.

    ---I've actually done this for our smaller txt files, but this method kills performance on larger text files.

    3) pre-flood the text files with column delimiters at the end of each line.

    ---This solution is actually similar to #2 above in that you're manufacturing columns. The transform loads quickly, but I haven't found a way to quickly add commas at the end of each line of a text file.

    Have any of you faced this challenge? How did you handle it? What suggestions do you have that I might have missed?

  • I would use SSIS to store the CSV list into a staging table with a single column.

    Then I would call a sproc with a fast split string function (e.g. DelimitedSplit8K) and a CrossTab query to get the data back in one row, filling the missing volumns with zero at the same time. Finally, the sproc would be used to insert the data into the final table.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • there are a number of posts by Jeff Moden to do this via BCP; he refer's to the issue as "ragged right";

    here's one example he posted to address the issue...it's not SSIS, but it's worth looking at:

    --Originally posted by Jeff Moden somewhere in the sands of time:

    BCP file import headless ragged right

    Heh... that's known as a "Headerless Ragged Right" file and they're a bugger to import correctly. BCP and BULK insert require that all the rows have the same number of delimiters so you can't do an import of such a file with either.

    However... with a little help from a couple of DOS commands and a little prestidigitation with a text based linked server, it can be done with very little pain.

    First, I assume you have a file, like in your original post, in C:\Temp called Test01.txt and it looks like this...

    col1,col2,col3

    col1,col2

    col1,col2,col3,col4

    col1,col2,col,3,col4,col5

    Next, let's setup a linked server and give it the necessary login privs...

    --===== Create a linked server to the drive and path you desire.

    EXEC dbo.sp_AddLinkedServer TxtSvr,

    'Jet 4.0',

    'Microsoft.Jet.OLEDB.4.0',

    'C:\Temp',

    NULL,

    'Text'

    --===== Set up login mappings.

    EXEC dbo.sp_AddLinkedSrvLogin TxtSvr, FALSE, NULL, Admin, NULL

    Here comes the fun part... if we just read the file directly...

    --===== Query one of the files by using a four-part name.

    SELECT *

    FROM TxtSvr...[Test01^#txt]

    ... we get an awful mess that looks like this...

    F4 F5 col1 col2 col3

    col1 col2

    col4 col1 col2 col3

    col4 col5 col1 col2 col3

    (3 row(s) affected)

    notice that the first row was used as a header and that the data has been rearranged AND that two of the headers have been made up! We can fix that with a couple of simple DOS commands...

    --===== Create a header that identifies the columns we want... (any col names will do)

    EXEC Master.dbo.xp_CmdShell 'ECHO HdrCol1,HdrCol2,HdrCol3,HdrCol4,HdrCol5 > C:\Temp\Header.txt'

    --===== Create a new working file that has the header and the original file as one

    EXEC Master.dbo.xp_CmdShell 'COPY C:\Temp\Header.txt+C:\Temp\Test01.txt C:\Temp\MyWork.txt'

    Then, accessing the data in the file is a breeze... do with it what you want because it can now be referenced as a table with the correct column names...

    --===== Read the csv text file as if it were a table

    SELECT *

    FROM TxtSvr...[MyWork^#txt]

    HdrCol1 HdrCol2 HdrCol3 HdrCol4 HdrCol5

    col1 col2 col3

    col1 col2

    col1 col2 col3 col4

    col1 col2 col3 col4 col5

    (4 row(s) affected)

    If you need to drop the linked server after than, the following command will do nicely...

    EXEC dbo.sp_DropServer 'TxtSvr', 'DropLogins'

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I probably would use a flat file source to read the data as one column. I would then feed that one column to a script component where I would use VB to parse the data, inserting the default NULL value for the missing columns. The output from the script component would include all the required columns for a "complete" input data row.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Alvin, that is what I did in my choice number two above. I just didn't go into the details of it. I imported the data as a single column then parsed through the file. I found this solution to take on average 4x as long than directly loading the data.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply