Simple Method for Importing Ragged files

Question

Simple Method for Importing Ragged files

Viewing 15 posts - 31 through 45 (of 51 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 1

Cool... let's see your solution for this, Brandon.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 2

Heh... Hour or so? Like I said previously, I've used this method to do 5.1 million rows in about 3 minutes. I'm sure there's a faster method, but it worked for me.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

paul.ibison Old Hand Points: 324 More actions · Answer 3

Just to clarify - the "hour or so" being referred to was the time taken to create the package.

Rgds,

Paul Ibison

James Goodwin Hall of Fame Points: 3691 More actions · Answer 4

The "Ragged Right" Files I have to deal with are Mixed Row Type files as well. Because of that, I cannot use column names from the csv file, but also, each row starts with a TypeID.

So, in DTS, I create a transform task for each row type and use a Transform Task with this code on the first column:

Function Main()

If DTSSource("Col001") = "HDRID" then

Main =DTSTransformStat_OK

ELSE

Main = DTSTransformStat_SkipRow

End if

End Function

In the example for the article your initial transform would just check the length and skip it if it was a header or footer. There's no need to import-export-import, even to get the column headings.

--

JimFive

paul.ibison Old Hand Points: 324 More actions · Answer 5

Brandon,

all due respect but I think you're missing the point.

Judging by some of the questions, some others (Jeff apart) have also never really tried to solve this problem in practice.

Take a csv file with 50 columns and use your conditional split starting with a single column and see how long it takes. You'll have to define each of the 50 columns separately, and define the substring function separately for each one. In the example CSV file have no fixed width so your substring function will take account of looking for the commas, possibly nested inside a string. How long before this is robust? How tedious will it be to do all this coding? And the resulting code required will be huge.

This method can do the same in an hour or so and have a very simple resultant package.

Cheers,

Paul Ibison

Mike C SSC-Insane Points: 23224 More actions · Answer 6

paul.ibison (3/6/2008)
Brandon,
all due respect but I think you're missing the point.
Judging by some of the questions, some others (Jeff apart) have also never really tried to solve this problem in practice.
Take a csv file with 50 columns and use your conditional split starting with a single column and see how long it takes. You'll have to define each of the 50 columns separately, and define the substring function separately for each one. In the example CSV file have no fixed width so your substring function will take account of looking for the commas, possibly nested inside a string. How long before this is robust? How tedious will it be to do all this coding? And the resulting code required will be huge.
This method can do the same in an hour or so and have a very simple resultant package.
Cheers,
Paul Ibison

I'll admit I've never had to solve this exact problem before, where simply stripping headers and footers away would do the trick. Problems I've had to solve involving headers and footers have sometimes involved files with complex hierarchical structures from legacy systems, like this:

FILE

ORDERS|2

HDR|100293|987|20080326

ITM|897654|9876.87|3

ITM|098643|76.34|12

FTR|100293|2

HDR|100294|456|20080326

ITM|765432|11.99|6

FTR|100294|1

ENDORDERS|2

CUST|2

HDR|987

ITM|Joe|Jackson|98 Palomino Way|Los Angeles|CA|90823

FTR|987

HDR|456

ITM|Lisa|Lewis|123 Sesame Street|New York|NY|10014

FTR|456

ENDCUST|2

ENDFILE

How does your process work for files like this? Ignoring header and footer information in this file isn't an option since you will lose important information during the process, such as the order #s and order dates, the line item count, and other auditing information included in the file like record counts, etc.

pnewhart SSC Eights! Points: 983 More actions · Answer 7

paul.ibison (3/6/2008)
Brandon,
all due respect but I think you're missing the point.
Judging by some of the questions, some others (Jeff apart) have also never really tried to solve this problem in practice.
Take a csv file with 50 columns and use your conditional split starting with a single column and see how long it takes. You'll have to define each of the 50 columns separately, and define the substring function separately for each one. In the example CSV file have no fixed width so your substring function will take account of looking for the commas, possibly nested inside a string. How long before this is robust? How tedious will it be to do all this coding? And the resulting code required will be huge.
This method can do the same in an hour or so and have a very simple resultant package.
Cheers,
Paul Ibison

Unfortunately in my line of work...a IT solutions vendor...it's part of my job to be a tedious coder. Again, with all due respect to you, I've never seen an example in practice of having header and trailer records along with field names.

Charles Kincaid SSChampion Points: 13593 More actions · Answer 8

pnewhart (3/6/2008)
Unfortunately in my line of work...a IT solutions vendor...it's part of my job to be a tedious coder. Again, with all due respect to you, I've never seen an example in practice of having header and trailer records along with field names.

With all due respect to all of y'all after these many years I've seen some zany stuff. I'm rarely surprised any more. I write file parsers all the time. I just don't have to use SQL to do the job. I'm also not reading millions of rows at a crack either.

We all have to play the cards we are dealt. We don't have to like them and we get to complain about them.

ATBCharles Kincaid

Mike C SSC-Insane Points: 23224 More actions · Answer 9

Charles Kincaid (3/6/2008)
With all due respect to all of y'all after these many years I've seen some zany stuff. I'm rarely surprised any more. I write file parsers all the time. I just don't have to use SQL to do the job. I'm also not reading millions of rows at a crack either.
We all have to play the cards we are dealt. We don't have to like them and we get to complain about them.

I agree with your point of view, as I've done the same type of thing and have written many custom file parsers and custom ETL applications over the years. Honestly I don't think I've ever seen a header/footer file format in which all of the headers and footers could be so easily discarded without a second thought.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 10

How does your process work for files like this?

I don't believe it was meant to... wasn't advertised that way except maybe by a slightly misleading title.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Mike C SSC-Insane Points: 23224 More actions · Answer 11

Jeff Moden (3/6/2008)
I don't believe it was meant to... wasn't advertised that way except maybe by a slightly misleading title.

Yes, catchy title. When I first clicked on it I actually expected it would be about the type of file I was asking about. The author's right though, I've never had any need to solve this very specific problem. I was kind of wondering how common this specific problem is compared to, say, ANSI X12 EDI and other formats where simply stripping headers and footers is not an option?

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 12

Mike C (3/6/2008)
Yes, catchy title. When I first clicked on it I actually expected it would be about the type of file I was asking about. The author's right though, I've never had any need to solve this very specific problem. I was kind of wondering how common this specific problem is compared to, say, ANSI X12 EDI and other formats where simply stripping headers and footers is not an option?

Heh... guess it depends on what business you're in... I've been fortunate in not ever having to mess with EDI files. I have had to do exactly what's in this article a couple of dozen times. Not necessarily using the methods in this article each time, but same problem files.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Mike C SSC-Insane Points: 23224 More actions · Answer 13

Jeff Moden (3/6/2008)

Mike C (3/6/2008)
Heh... guess it depends on what business you're in... I've been fortunate in not ever having to mess with EDI files. I have had to do exactly what's in this article a couple of dozen times. Not necessarily using the methods in this article each time, but same problem files.

I guess I should have suspected, but I've gotten used to working with file formats where the headers and footers (if there are any) are used to group detail lines for specific reasons. I guess there's still a proliferation of headers and footers in files out there with the sole purpose of getting in the way to make the ETL process harder 🙂

dba4eva SSC-Addicted Points: 467 More actions · Answer 14

Is it possible to use SSIS to import this same flat file if it doesn't have a ROW delimiter?

I get these files which are basically text streams and they are only delimited by record.

I've written a utlity to handle it now (it's slow and difficult to manage on all my remote sites) but is there anything already in SSIS that would just delimit the row at a specified length?

Perhaps there's a different approach that doesn't require an external utility?

Mike C SSC-Insane Points: 23224 More actions · Answer 15

johnr (3/19/2008)
Is it possible to use SSIS to import this same flat file if it doesn't have a ROW delimiter?
I get these files which are basically text streams and they are only delimited by record.
I've written a utlity to handle it now (it's slow and difficult to manage on all my remote sites) but is there anything already in SSIS that would just delimit the row at a specified length?
Perhaps there's a different approach that doesn't require an external utility?

You can use the Bulk Insert task with a BCP format file to import fixed-length record files. You can probably do it in the data flow task as well with some transformations, although I haven't had occasion to do that myself yet.