One big "ragged right" text file!

  • Here are the details:

    - Source is a 5 GB text file. It is in a "ragged right" format with NO character delimiters.

    - Destination is 20 tables.

    What is the most efficient approach to importing this large text file? Should I attempt a bulk insert task? Will that work considering that I'm going from one source file to 20 destination tables? If I use the bulk insert task, how do I construct the format file?

  • You could dump it to a work table. Then do any parsing or reformatting and send it to the final tables.

    Since it is ragged right, I assume this is a fixed-width file. One thing to watch out for: If a row ends before the start of the last field, the next row will be appended to the end of it. I had that situation at work and defined the file as containing one long field. Then derived the individual fields. It was reasonably fast.

  • You say it has no delimiters... but does each row end with something like a carriage return or line feed?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, I believe each line does end with a CR LF.

  • If it's a fixed field (no delimiters, each column a given width), then I just import into a single column flat table and do a simple substring split. It's one of the easiest and fastest methods.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here's the problem: we are doing a proof of concept comparing SSIS and Informatica. In order for there to be a "fair fight," I can't do any interim steps in any type of SQL Server table.

    Is there any kind of interim holding place within SSIS, instead of using a table?

  • Jeff Moden (1/7/2010)


    If it's a fixed field (no delimiters, each column a given width), then I just import into a single column flat table and do a simple substring split. It's one of the easiest and fastest methods.

    i typically do the same as Jeff suggested; only difference is i create a view that parses out the substrings into the columns; also changing datatypes with convert, and joining it to my biz tables so i can insert complete with necessary keys.

    that makes it a little more reusable for me in my imports, since the files are the same week after week, but with different data.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • imani_technology (1/7/2010)


    Here's the problem: we are doing a proof of concept comparing SSIS and Informatica. In order for there to be a "fair fight," I can't do any interim steps in any type of SQL Server table.

    Is there any kind of interim holding place within SSIS, instead of using a table?

    You should include the method I stated as one of the contenders instead of just a feeder. And, no, it won't be a fair fight.... 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Okay, but what if we weren't using a SQL Server database? What if we were using SSIS to import a text file into Vertica, for example? Or Sybase IQ? My boss and clients want a data-migration solution that could potentially be SQL Server database engine independent.

  • imani_technology (1/7/2010)


    Okay, but what if we weren't using a SQL Server database? What if we were using SSIS to import a text file into Vertica, for example? Or Sybase IQ? My boss and clients want a data-migration solution that could potentially be SQL Server database engine independent.

    In truth, there's no such thing. At the very least, you'll need a different connection string. Something somewhere will need to change and if not at the connection string level, then at the API and driver levels so an import actually stands a chance of finishing sometime this year.

    There are some dandy file readers out there that are already developed and tested. Instead of trying to reinvent the wheel in SSIS (it's not a panacea anymore than Business Objects is), why not just spend a couple of bucks on a shrink wrapped solution?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply