Looking for Best Practice on Loading Flat File with 2134 columns

Question

Post reply

Looking for Best Practice on Loading Flat File with 2134 columns

Marvin D

SSCarpal Tunnel

Points: 4058
More actions
February 22, 2008 at 6:09 pm

#120033

Here is the delima, I'm developing a package to bring in a CSV file that contains 2134 columns...stop laughing as we all know the column limit on SQL Server is 1024, not to mention the row size limit of 8060. Of course this data must come in as varchar as I need to perform some quality checks on the data prior to transforming it for load into the data warehouse. So, I develop a package, that chunks this data into seven files and thought all was well in good as then none of the limits are approached by any means.
The problem comes when I get to SSIS to develop the package, I can't use bulk insert because of course the flat file does not match the table structure. So now I'm left doing data flow tasks, another road block removed, correct, no, when I developed the package I get out of memory errors when saving.
So, I break the package up into a base package, that does some routine checks and starts the process and the ending processing I use to ensure the number of rows in the source and destination tables match. All very good practices (hint to all new developers out there). I then develop three additional packages within the solution that are called by the main package to load the data into the tables. Go to save the last package and guess what, out of memory errors. So now I'm either going to have to write several different solutions to chunk the data, or have one of you fine fellow developers come to my rescue.
Suggestion to Microsoft, how come a source file can't have multiple destinations, and why can't a destination in the data flow process link on completion (that being the key) to another source object to establish a flow process.
Ok, enough venting, any suggestions would be greatly appreciated, and no I can't have the file produced with fewer columns.
Thanks
MD
Marvin Dillard
Senior Consultant
Claraview Inc

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004679 More actions · Answer 1

MD (2/22/2008)
I can't use bulk insert because of course the flat file does not match the table structure.

If you use a BCP format file, you can. 😉 See BOL at the following URL to get started:

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/b7b97d68-4336-4091-aee4-1941fab568e3.htm

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Marvin D SSCarpal Tunnel Points: 4058 More actions · Answer 2

Jeff

Thanks, don't know why I didn't think of that. Guess I was too frustrated with the situation.

Marvin Dillard
Senior Consultant
Claraview Inc

Jeff Moden SSC Guru Points: 1004679 More actions · Answer 3

Hoo boy... I understand... 2,134 columns of CSV data is enough to make anyone cranky.

How "clean" is the input file? Any problems with missing delimiters, etc?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Johan Bijnens SSC Guru Points: 135593 More actions · Answer 4

- While you're at it, you might as well split the source file to match your

table's structure if you can:unsure: Then you'll be able to bulkinsert 😉

- Sort the files to be loaded according to the clustering index to avond splits as much as you can.

- Depending on the volume you are loading perform reindex + sp_updatestats afterward, your sql engine may benifit from that.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data and code to get the best help

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Marvin D SSCarpal Tunnel Points: 4058 More actions · Answer 5

Jeff

Luckily the data is very clean. All delimiters are in place, get this, there is only a little over 1100 rows. I actually don't have to do any transforms to get the data in, and very few to get it to it's final destination in the data warehouse. I've already created the BCP files so everything should be ready to flow Monday as soon as I get in the office.

Thanks again for all of your help.

Marvin Dillard
Senior Consultant
Claraview Inc

Jeff Moden SSC Guru Points: 1004679 More actions · Answer 6

Marvin, that's just amazing to me... a data file that has more columns than rows... I don't believe I've ever seen such a thing except maybe for a single row control or header/footer file.

Anyway, glad you're all set. Hope things run perfect on Monday. And, thanks for the feedback.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

VALEK SSCarpal Tunnel Points: 4206 More actions · Answer 7

If you really need to use SSIS you are stuffed as you can see.

You need to do something else here. I am thinking about BCP into a dummy table and then you work with it.

Another solution might not be appropriate here -- write a script in perl or anything like that to split this damn table into let's say five tables and then flow them into the five separate tables and then join to produce the result.

Looks a bit messy but at least you would not get SSIS exceptions. Like this you would control all the 2000 fields, but separately, if you know what I mean...

ps: the dude who designed this table must die

M.Sc.IT, M.B.A, MCTS BI 2008, MCITP BI 2008, MCTS SQL Dev, CSM, CDVDM

Marvin D SSCarpal Tunnel Points: 4058 More actions · Answer 8

Valek

I love your last comment, but it comes from a SAS file, and it comes from another vendor that I can't access. I did break the tables up into 7 tables to ensure I didn't run into row size limits. I took the BCP file format option and all worked well.

As for killing the guy who accepted the design, I'm for that right about now.

THanks

Marvin Dillard
Senior Consultant
Claraview Inc

Jeff Moden SSC Guru Points: 1004679 More actions · Answer 9

One of my favorite forms of torture for such idious-savantes is to simply take them out to dinner... sit them in a chair and feed them big ol' greasy-gravy pork-chops... with a sling-shot! 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)