SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


variable column count txt files to sql server table


variable column count txt files to sql server table

Author
Message
BobMcC
BobMcC
Mr or Mrs. 500
Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)

Group: General Forum Members
Points: 528 Visits: 832
I thought I'd share a unique problem that I'm dealing with and see if we can get a discussion going as to how to best handle it.
I'm dealing with legacy clients, and we're in a position where we can't ask them change their file format.

Here's the issue. We need to load into a sql table a comma delimited, text quoted file into a sql table.
As an example when clientA came on board they only needed to send 2 columns of data, when clientB came on board they sent 4 Columns of data, and ClientC has 6 Columns of data.

We can put in nulls for for the column data that is not sent.
i.e.
clientA's rec.txt
"Alfred","Ames"
"Aaron","Abbot"

clientB's rec.txt
"Bob","Baines",123,"Blue"
"Bill","Bathgate",456,"Yellow"

clientC's rec.txt
"charlie","Cass",789,"green","123 anywhere","USA"
"cathy","cow",444,"red",234 nowhere","Germany"

I want to be able to load records from the three clients above into a single sql table:

Col1 Col2 Col3 Col4 Col5 Col6
Alfred Ames NULL NULL NULL NULL
Aaron Abbot NULL NULL NULL NULL
Bob Baines 123 Blue NULL NULL
Bill Bathgate 456 Yellow NULL NULL
charlie Cass 789 green 123 anywhere USA
cathy cow 444 red 234 nowhere Germany




SSIS requires that I map input column to output column and this leads to difficulties.
1) I can have multiple data flow tasks and only execute a single one based on the number of columns in that particular txt file.
---This doesn't seem like a very clean solution to me especially with the number of different times I'd have to do this.
2) Using a script transform I parse the file and programatically create the NULL rows for output.
---I've actually done this for our smaller txt files, but this method kills performance on larger text files.
3) pre-flood the text files with column delimiters at the end of each line.
---This solution is actually similar to #2 above in that you're manufacturing columns. The transform loads quickly, but I haven't found a way to quickly add commas at the end of each line of a text file.

Have any of you faced this challenge? How did you handle it? What suggestions do you have that I might have missed?
LutzM
LutzM
SSC-Insane
SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)

Group: General Forum Members
Points: 23759 Visits: 13559
I would use SSIS to store the CSV list into a staging table with a single column.
Then I would call a sproc with a fast split string function (e.g. DelimitedSplit8K) and a CrossTab query to get the data back in one row, filling the missing volumns with zero at the same time. Finally, the sproc would be used to insert the data into the final table.



Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Lowell
Lowell
SSC Guru
SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)SSC Guru (72K reputation)

Group: General Forum Members
Points: 72032 Visits: 40942
there are a number of posts by Jeff Moden to do this via BCP; he refer's to the issue as "ragged right";
here's one example he posted to address the issue...it's not SSIS, but it's worth looking at:

--Originally posted by Jeff Moden somewhere in the sands of time:
BCP file import headless ragged right
Heh... that's known as a "Headerless Ragged Right" file and they're a bugger to import correctly. BCP and BULK insert require that all the rows have the same number of delimiters so you can't do an import of such a file with either.

However... with a little help from a couple of DOS commands and a little prestidigitation with a text based linked server, it can be done with very little pain.

First, I assume you have a file, like in your original post, in C:\Temp called Test01.txt and it looks like this...

col1,col2,col3
col1,col2
col1,col2,col3,col4
col1,col2,col,3,col4,col5

Next, let's setup a linked server and give it the necessary login privs...

--===== Create a linked server to the drive and path you desire.
EXEC dbo.sp_AddLinkedServer TxtSvr,
'Jet 4.0',
'Microsoft.Jet.OLEDB.4.0',
'C:\Temp',
NULL,
'Text'

--===== Set up login mappings.
EXEC dbo.sp_AddLinkedSrvLogin TxtSvr, FALSE, NULL, Admin, NULL
Here comes the fun part... if we just read the file directly...
--===== Query one of the files by using a four-part name.
SELECT *
FROM TxtSvr...[Test01^#txt]
... we get an awful mess that looks like this...


F4 F5 col1 col2 col3
col1 col2
col4 col1 col2 col3
col4 col5 col1 col2 col3

(3 row(s) affected)
notice that the first row was used as a header and that the data has been rearranged AND that two of the headers have been made up! We can fix that with a couple of simple DOS commands...

--===== Create a header that identifies the columns we want... (any col names will do)
EXEC Master.dbo.xp_CmdShell 'ECHO HdrCol1,HdrCol2,HdrCol3,HdrCol4,HdrCol5 > C:\Temp\Header.txt'

--===== Create a new working file that has the header and the original file as one
EXEC Master.dbo.xp_CmdShell 'COPY C:\Temp\Header.txt+C:\Temp\Test01.txt C:\Temp\MyWork.txt'
Then, accessing the data in the file is a breeze... do with it what you want because it can now be referenced as a table with the correct column names...

--===== Read the csv text file as if it were a table
SELECT *
FROM TxtSvr...[MyWork^#txt]
HdrCol1 HdrCol2 HdrCol3 HdrCol4 HdrCol5
col1 col2 col3
col1 col2
col1 col2 col3 col4
col1 col2 col3 col4 col5

(4 row(s) affected)

If you need to drop the linked server after than, the following command will do nicely...

EXEC dbo.sp_DropServer 'TxtSvr', 'DropLogins'



Lowell
--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!
Alvin Ramard
Alvin Ramard
SSCrazy Eights
SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)SSCrazy Eights (9.2K reputation)

Group: General Forum Members
Points: 9183 Visits: 11686
I probably would use a flat file source to read the data as one column. I would then feed that one column to a script component where I would use VB to parse the data, inserting the default NULL value for the missing columns. The output from the script component would include all the required columns for a "complete" input data row.



Alvin Ramard
Memphis PASS Chapter

All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help
BobMcC
BobMcC
Mr or Mrs. 500
Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)Mr or Mrs. 500 (528 reputation)

Group: General Forum Members
Points: 528 Visits: 832
Alvin, that is what I did in my choice number two above. I just didn't go into the details of it. I imported the data as a single column then parsed through the file. I found this solution to take on average 4x as long than directly loading the data.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search