Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

variable column count txt files to sql server table Expand / Collapse
Author
Message
Posted Tuesday, February 22, 2011 12:57 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 10:41 AM
Points: 178, Visits: 622
I thought I'd share a unique problem that I'm dealing with and see if we can get a discussion going as to how to best handle it.
I'm dealing with legacy clients, and we're in a position where we can't ask them change their file format.

Here's the issue. We need to load into a sql table a comma delimited, text quoted file into a sql table.
As an example when clientA came on board they only needed to send 2 columns of data, when clientB came on board they sent 4 Columns of data, and ClientC has 6 Columns of data.

We can put in nulls for for the column data that is not sent.
i.e.
clientA's rec.txt
"Alfred","Ames"
"Aaron","Abbot"

clientB's rec.txt
"Bob","Baines",123,"Blue"
"Bill","Bathgate",456,"Yellow"

clientC's rec.txt
"charlie","Cass",789,"green","123 anywhere","USA"
"cathy","cow",444,"red",234 nowhere","Germany"

I want to be able to load records from the three clients above into a single sql table:
Col1    Col2     Col3 Col4   Col5         Col6
Alfred Ames NULL NULL NULL NULL
Aaron Abbot NULL NULL NULL NULL
Bob Baines 123 Blue NULL NULL
Bill Bathgate 456 Yellow NULL NULL
charlie Cass 789 green 123 anywhere USA
cathy cow 444 red 234 nowhere Germany


SSIS requires that I map input column to output column and this leads to difficulties.
1) I can have multiple data flow tasks and only execute a single one based on the number of columns in that particular txt file.
---This doesn't seem like a very clean solution to me especially with the number of different times I'd have to do this.
2) Using a script transform I parse the file and programatically create the NULL rows for output.
---I've actually done this for our smaller txt files, but this method kills performance on larger text files.
3) pre-flood the text files with column delimiters at the end of each line.
---This solution is actually similar to #2 above in that you're manufacturing columns. The transform loads quickly, but I haven't found a way to quickly add commas at the end of each line of a text file.

Have any of you faced this challenge? How did you handle it? What suggestions do you have that I might have missed?
Post #1067852
Posted Tuesday, February 22, 2011 1:01 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:39 AM
Points: 6,826, Visits: 13,280
I would use SSIS to store the CSV list into a staging table with a single column.
Then I would call a sproc with a fast split string function (e.g. DelimitedSplit8K) and a CrossTab query to get the data back in one row, filling the missing volumns with zero at the same time. Finally, the sproc would be used to insert the data into the final table.




Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question
How to post performance related questions
Links for Tally Table , Cross Tabs and Dynamic Cross Tabs , Delimited Split Function
Post #1067854
Posted Tuesday, February 22, 2011 1:13 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 12:01 PM
Points: 12,899, Visits: 32,113
there are a number of posts by Jeff Moden to do this via BCP; he refer's to the issue as "ragged right";
here's one example he posted to address the issue...it's not SSIS, but it's worth looking at:
--Originally posted by Jeff Moden somewhere in the sands of time:
BCP file import headless ragged right
Heh... that's known as a "Headerless Ragged Right" file and they're a bugger to import correctly. BCP and BULK insert require that all the rows have the same number of delimiters so you can't do an import of such a file with either.

However... with a little help from a couple of DOS commands and a little prestidigitation with a text based linked server, it can be done with very little pain.

First, I assume you have a file, like in your original post, in C:\Temp called Test01.txt and it looks like this...

col1,col2,col3
col1,col2
col1,col2,col3,col4
col1,col2,col,3,col4,col5

Next, let's setup a linked server and give it the necessary login privs...

--===== Create a linked server to the drive and path you desire.
EXEC dbo.sp_AddLinkedServer TxtSvr,
'Jet 4.0',
'Microsoft.Jet.OLEDB.4.0',
'C:\Temp',
NULL,
'Text'

--===== Set up login mappings.
EXEC dbo.sp_AddLinkedSrvLogin TxtSvr, FALSE, NULL, Admin, NULL
Here comes the fun part... if we just read the file directly...
--===== Query one of the files by using a four-part name.
SELECT *
FROM TxtSvr...[Test01^#txt]
... we get an awful mess that looks like this...


F4 F5 col1 col2 col3
col1 col2
col4 col1 col2 col3
col4 col5 col1 col2 col3

(3 row(s) affected)
notice that the first row was used as a header and that the data has been rearranged AND that two of the headers have been made up! We can fix that with a couple of simple DOS commands...

--===== Create a header that identifies the columns we want... (any col names will do)
EXEC Master.dbo.xp_CmdShell 'ECHO HdrCol1,HdrCol2,HdrCol3,HdrCol4,HdrCol5 > C:\Temp\Header.txt'

--===== Create a new working file that has the header and the original file as one
EXEC Master.dbo.xp_CmdShell 'COPY C:\Temp\Header.txt+C:\Temp\Test01.txt C:\Temp\MyWork.txt'
Then, accessing the data in the file is a breeze... do with it what you want because it can now be referenced as a table with the correct column names...

--===== Read the csv text file as if it were a table
SELECT *
FROM TxtSvr...[MyWork^#txt]
HdrCol1 HdrCol2 HdrCol3 HdrCol4 HdrCol5
col1 col2 col3
col1 col2
col1 col2 col3 col4
col1 col2 col3 col4 col5

(4 row(s) affected)

If you need to drop the linked server after than, the following command will do nicely...

EXEC dbo.sp_DropServer 'TxtSvr', 'DropLogins'



Lowell

--There is no spoon, and there's no default ORDER BY in sql server either.
Actually, Common Sense is so rare, it should be considered a Superpower. --my son
Post #1067869
Posted Tuesday, February 22, 2011 1:16 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 2:23 PM
Points: 1,422, Visits: 8,122
I probably would use a flat file source to read the data as one column. I would then feed that one column to a script component where I would use VB to parse the data, inserting the default NULL value for the missing columns. The output from the script component would include all the required columns for a "complete" input data row.





Alvin Ramard
Memphis PASS Chapter

All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.
Post #1067872
Posted Tuesday, February 22, 2011 1:32 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 10:41 AM
Points: 178, Visits: 622
Alvin, that is what I did in my choice number two above. I just didn't go into the details of it. I imported the data as a single column then parsed through the file. I found this solution to take on average 4x as long than directly loading the data.
Post #1067887
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse