Simple Method for Importing Ragged files

  • Comments posted to this topic are about the item Simple Method for Importing Ragged files


    Paul Ibison
    Paul.Ibison@replicationanswers.com

  • would it be possible for you to mail me the package please.

  • neel.singh@interactivesa.co.za

  • *sigh* ANOTHER misleading title. This is NOT how to import a ragged-right file. It's how to remove unwanted header & trailer records from a perfectly formed NON-ragged-right file. I was hoping for the import of the classic ragged-right file - a variable number of fields in records that are wanted.

    BTW Using the usual defaults, record sets can blow up once they reach about 200MB (of input data) with multiple fields of variable text.

  • True - the title is a little misleading. The problem is that there isn't a 'correct' term for such files and this is the nearest commonly understood one. Anyway, what you need is this link: http://www.sql-server-performance.com/article_print.aspx?id=1056&type=art

    Rgds,

    Paul Ibison

  • Re: " I didn't try using the identity method, as I'm always concerned (probably needlessly) that the order of inserts might not necessarily be guaranteed to be identical to that of the text file..."

    Not needlessly at all, I've been bit by that, and after the same import routine had preserved the order some 3-4 runs prior. When you are warned that relational tables have no order, you are completely correct to take the warning seriously!

  • Hi Jim,

    that's reassuring in a sense that my cautious approach was OK:)

    BTW we're talking about reading from a text file rather than a table.

    Even so I'm still a little mystified by this - if I did this in .NET, the sequence of reading would be consistant. My guess is that in SSIS it is read into a sequence of separate buffers and somehow it's the order of the buffers which is not sequential, but am not sure why this logically should be so.

    Rgds,

    Paul Ibison

  • I don't think it is SSIS, it is the SQL engine. Even though you read the file sequentially, where the rows wind up in the destination table is not guaranteed.

    (I got caught by it doing imports into MS Access from .txt files.)

  • OK - I was hooked into thinking that the buffers were changing the order, but as you rightly point out in the absence of a clustered index on the table, the insertion order is not consistant/guaranteed.

    Cheers,

    Paul Ibison

  • I think a combination of omitting the first row and doing a conditional split would do just fine. This the method I use since I receive a number of flat files from legacy systems having header and trailer rows. The conditional split needs to evaluate only one column to see if it's a valid row. Maybe I'm missing something, but I don't understand the reference to manually defining the column names and use of substrings. You don't have to rename the columns and using a substring seems a heck of a lot easier than all of the manipulations as described.

  • If the goal is to skip the "incorrect" rows, the simplest way is to set a DTS package for data import, and set error number to 10 (assuming less then ten short rows as in your example).

  • Laura Meyerovich (3/5/2008)


    If the goal is to skip the "incorrect" rows, the simplest way is to set a DTS package for data import, and set error number to 10 (assuming less then ten short rows as in your example).

    This also assumes that the data rows you want to omit actually create an error. In some cases, the package may just import the row values into the defined data columns without error.

  • I looked at the conditional split, but if you have >50 columns (as I have), you'll need to manually define each column - that means define the name and write 50 substring clauses. If you have extracted the data to a csv staging flat file, with the column headers as the first row, this part is all done automatically.

    Cheers,

    Paul Ibison

  • paul.ibison (3/5/2008)


    I looked at the conditional split, but if you have >50 columns (as I have), you'll need to manually define each column - that means define the name and write 50 substring clauses. If you have extracted the data to a csv staging flat file, with the column headers as the first row, this part is all done automatically.

    Cheers,

    Paul Ibison

    OK. I misread your statement about defining the columns. I generally follow a supplied file layout to define the columns. It's not been my experience where a file would have column names given and have a header and trailer row.

  • paul.ibison (3/5/2008)


    True - the title is a little misleading. The problem is that there isn't a 'correct' term for such files and this is the nearest commonly understood one. Anyway, what you need is this link: http://www.sql-server-performance.com/article_print.aspx?id=1056&type=art

    Rgds,

    Paul Ibison

    Actually, there is... it's called a "mixed rowtype" file.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 51 total)

You must be logged in to reply to this topic. Login to reply