Simple Method for Importing Ragged files

Question

Post reply

Simple Method for Importing Ragged files

Paul Ibison

SSCertifiable

Points: 5303
More actions
March 4, 2008 at 9:55 pm

#120680

Comments posted to this topic are about the item Simple Method for Importing Ragged files

Paul Ibison
Paul.Ibison@replicationanswers.com

Viewing 15 posts - 1 through 15 (of 51 total)

You must be logged in to reply to this topic. Login to reply

Neel Singh Old Hand Points: 307 More actions · Answer 1

would it be possible for you to mail me the package please.

Neel Singh Old Hand Points: 307 More actions · Answer 2

Neel Singh

Old Hand

Points: 307

March 4, 2008 at 11:50 pm

#785323

neel.singh@interactivesa.co.za

brewmanz SSCommitted Points: 1575 More actions · Answer 3

*sigh* ANOTHER misleading title. This is NOT how to import a ragged-right file. It's how to remove unwanted header & trailer records from a perfectly formed NON-ragged-right file. I was hoping for the import of the classic ragged-right file - a variable number of fields in records that are wanted.

BTW Using the usual defaults, record sets can blow up once they reach about 200MB (of input data) with multiple fields of variable text.

paul.ibison Old Hand Points: 324 More actions · Answer 4

True - the title is a little misleading. The problem is that there isn't a 'correct' term for such files and this is the nearest commonly understood one. Anyway, what you need is this link: http://www.sql-server-performance.com/article_print.aspx?id=1056&type=art

Rgds,

Paul Ibison

Jim Russell-390299 SSCrazy Eights Points: 9333 More actions · Answer 5

Re: " I didn't try using the identity method, as I'm always concerned (probably needlessly) that the order of inserts might not necessarily be guaranteed to be identical to that of the text file..."

Not needlessly at all, I've been bit by that, and after the same import routine had preserved the order some 3-4 runs prior. When you are warned that relational tables have no order, you are completely correct to take the warning seriously!

paul.ibison Old Hand Points: 324 More actions · Answer 6

Hi Jim,

that's reassuring in a sense that my cautious approach was OK:)

BTW we're talking about reading from a text file rather than a table.

Even so I'm still a little mystified by this - if I did this in .NET, the sequence of reading would be consistant. My guess is that in SSIS it is read into a sequence of separate buffers and somehow it's the order of the buffers which is not sequential, but am not sure why this logically should be so.

Rgds,

Paul Ibison

Jim Russell-390299 SSCrazy Eights Points: 9333 More actions · Answer 7

I don't think it is SSIS, it is the SQL engine. Even though you read the file sequentially, where the rows wind up in the destination table is not guaranteed.

(I got caught by it doing imports into MS Access from .txt files.)

paul.ibison Old Hand Points: 324 More actions · Answer 8

OK - I was hooked into thinking that the buffers were changing the order, but as you rightly point out in the absence of a clustered index on the table, the insertion order is not consistant/guaranteed.

Cheers,

Paul Ibison

pnewhart SSC Eights! Points: 983 More actions · Answer 9

I think a combination of omitting the first row and doing a conditional split would do just fine. This the method I use since I receive a number of flat files from legacy systems having header and trailer rows. The conditional split needs to evaluate only one column to see if it's a valid row. Maybe I'm missing something, but I don't understand the reference to manually defining the column names and use of substrings. You don't have to rename the columns and using a substring seems a heck of a lot easier than all of the manipulations as described.

Laura Meyerovich Grasshopper Points: 11 More actions · Answer 10

If the goal is to skip the "incorrect" rows, the simplest way is to set a DTS package for data import, and set error number to 10 (assuming less then ten short rows as in your example).

pnewhart SSC Eights! Points: 983 More actions · Answer 11

Laura Meyerovich (3/5/2008)
If the goal is to skip the "incorrect" rows, the simplest way is to set a DTS package for data import, and set error number to 10 (assuming less then ten short rows as in your example).

This also assumes that the data rows you want to omit actually create an error. In some cases, the package may just import the row values into the defined data columns without error.

paul.ibison Old Hand Points: 324 More actions · Answer 12

I looked at the conditional split, but if you have >50 columns (as I have), you'll need to manually define each column - that means define the name and write 50 substring clauses. If you have extracted the data to a csv staging flat file, with the column headers as the first row, this part is all done automatically.

Cheers,

Paul Ibison

pnewhart SSC Eights! Points: 983 More actions · Answer 13

paul.ibison (3/5/2008)
I looked at the conditional split, but if you have >50 columns (as I have), you'll need to manually define each column - that means define the name and write 50 substring clauses. If you have extracted the data to a csv staging flat file, with the column headers as the first row, this part is all done automatically.
Cheers,
Paul Ibison

OK. I misread your statement about defining the columns. I generally follow a supplied file layout to define the columns. It's not been my experience where a file would have column names given and have a header and trailer row.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 14

paul.ibison (3/5/2008)
True - the title is a little misleading. The problem is that there isn't a 'correct' term for such files and this is the nearest commonly understood one. Anyway, what you need is this link: http://www.sql-server-performance.com/article_print.aspx?id=1056&type=art
Rgds,
Paul Ibison

Actually, there is... it's called a "mixed rowtype" file.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)