Looking for Best Practice on Loading Flat File with 2134 columns

  • Here is the delima, I'm developing a package to bring in a CSV file that contains 2134 columns...stop laughing as we all know the column limit on SQL Server is 1024, not to mention the row size limit of 8060. Of course this data must come in as varchar as I need to perform some quality checks on the data prior to transforming it for load into the data warehouse. So, I develop a package, that chunks this data into seven files and thought all was well in good as then none of the limits are approached by any means.

    The problem comes when I get to SSIS to develop the package, I can't use bulk insert because of course the flat file does not match the table structure. So now I'm left doing data flow tasks, another road block removed, correct, no, when I developed the package I get out of memory errors when saving.

    So, I break the package up into a base package, that does some routine checks and starts the process and the ending processing I use to ensure the number of rows in the source and destination tables match. All very good practices (hint to all new developers out there). I then develop three additional packages within the solution that are called by the main package to load the data into the tables. Go to save the last package and guess what, out of memory errors. So now I'm either going to have to write several different solutions to chunk the data, or have one of you fine fellow developers come to my rescue.

    Suggestion to Microsoft, how come a source file can't have multiple destinations, and why can't a destination in the data flow process link on completion (that being the key) to another source object to establish a flow process.

    Ok, enough venting, any suggestions would be greatly appreciated, and no I can't have the file produced with fewer columns.

    Thanks

    MD

    Marvin Dillard
    Senior Consultant
    Claraview Inc

  • MD (2/22/2008)


    I can't use bulk insert because of course the flat file does not match the table structure.

    If you use a BCP format file, you can. 😉 See BOL at the following URL to get started:

    ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/b7b97d68-4336-4091-aee4-1941fab568e3.htm

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff

    Thanks, don't know why I didn't think of that. Guess I was too frustrated with the situation.

    Marvin Dillard
    Senior Consultant
    Claraview Inc

  • Hoo boy... I understand... 2,134 columns of CSV data is enough to make anyone cranky.

    How "clean" is the input file? Any problems with missing delimiters, etc?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • - While you're at it, you might as well split the source file to match your

    table's structure if you can:unsure: Then you'll be able to bulkinsert 😉

    - Sort the files to be loaded according to the clustering index to avond splits as much as you can.

    - Depending on the volume you are loading perform reindex + sp_updatestats afterward, your sql engine may benifit from that.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Jeff

    Luckily the data is very clean. All delimiters are in place, get this, there is only a little over 1100 rows. I actually don't have to do any transforms to get the data in, and very few to get it to it's final destination in the data warehouse. I've already created the BCP files so everything should be ready to flow Monday as soon as I get in the office.

    Thanks again for all of your help.

    Marvin Dillard
    Senior Consultant
    Claraview Inc

  • Marvin, that's just amazing to me... a data file that has more columns than rows... I don't believe I've ever seen such a thing except maybe for a single row control or header/footer file.

    Anyway, glad you're all set. Hope things run perfect on Monday. And, thanks for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If you really need to use SSIS you are stuffed as you can see.

    You need to do something else here. I am thinking about BCP into a dummy table and then you work with it.

    Another solution might not be appropriate here -- write a script in perl or anything like that to split this damn table into let's say five tables and then flow them into the five separate tables and then join to produce the result.

    Looks a bit messy but at least you would not get SSIS exceptions. Like this you would control all the 2000 fields, but separately, if you know what I mean...

    ps: the dude who designed this table must die

    M.Sc.IT, M.B.A, MCTS BI 2008, MCITP BI 2008, MCTS SQL Dev, CSM, CDVDM

  • Valek

    I love your last comment, but it comes from a SAS file, and it comes from another vendor that I can't access. I did break the tables up into 7 tables to ensure I didn't run into row size limits. I took the BCP file format option and all worked well.

    As for killing the guy who accepted the design, I'm for that right about now.

    THanks

    Marvin Dillard
    Senior Consultant
    Claraview Inc

  • One of my favorite forms of torture for such idious-savantes is to simply take them out to dinner... sit them in a chair and feed them big ol' greasy-gravy pork-chops... with a sling-shot! 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply