SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Simple Method for Importing Ragged files


Simple Method for Importing Ragged files

Author
Message
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)

Group: General Forum Members
Points: 209851 Visits: 41973
Cool... let's see your solution for this, Brandon.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)

Group: General Forum Members
Points: 209851 Visits: 41973
Heh... Hour or so? Like I said previously, I've used this method to do 5.1 million rows in about 3 minutes. I'm sure there's a faster method, but it worked for me.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
paul.ibison
paul.ibison
SSC-Enthusiastic
SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)

Group: General Forum Members
Points: 112 Visits: 144
Just to clarify - the "hour or so" being referred to was the time taken to create the package.
Rgds,
Paul Ibison
James Goodwin
James Goodwin
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1663 Visits: 1107
The "Ragged Right" Files I have to deal with are Mixed Row Type files as well. Because of that, I cannot use column names from the csv file, but also, each row starts with a TypeID.

So, in DTS, I create a transform task for each row type and use a Transform Task with this code on the first column:

Function Main()
If DTSSource("Col001") = "HDRID" then
Main =DTSTransformStat_OK
ELSE
Main = DTSTransformStat_SkipRow
End if
End Function

In the example for the article your initial transform would just check the length and skip it if it was a header or footer. There's no need to import-export-import, even to get the column headings.
--
JimFive
paul.ibison
paul.ibison
SSC-Enthusiastic
SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)SSC-Enthusiastic (112 reputation)

Group: General Forum Members
Points: 112 Visits: 144
Brandon,

all due respect but I think you're missing the point.
Judging by some of the questions, some others (Jeff apart) have also never really tried to solve this problem in practice.

Take a csv file with 50 columns and use your conditional split starting with a single column and see how long it takes. You'll have to define each of the 50 columns separately, and define the substring function separately for each one. In the example CSV file have no fixed width so your substring function will take account of looking for the commas, possibly nested inside a string. How long before this is robust? How tedious will it be to do all this coding? And the resulting code required will be huge.

This method can do the same in an hour or so and have a very simple resultant package.

Cheers,

Paul Ibison
Mike C
Mike C
SSCertifiable
SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)

Group: General Forum Members
Points: 6345 Visits: 1172
paul.ibison (3/6/2008)
Brandon,

all due respect but I think you're missing the point.
Judging by some of the questions, some others (Jeff apart) have also never really tried to solve this problem in practice.

Take a csv file with 50 columns and use your conditional split starting with a single column and see how long it takes. You'll have to define each of the 50 columns separately, and define the substring function separately for each one. In the example CSV file have no fixed width so your substring function will take account of looking for the commas, possibly nested inside a string. How long before this is robust? How tedious will it be to do all this coding? And the resulting code required will be huge.

This method can do the same in an hour or so and have a very simple resultant package.

Cheers,

Paul Ibison


I'll admit I've never had to solve this exact problem before, where simply stripping headers and footers away would do the trick. Problems I've had to solve involving headers and footers have sometimes involved files with complex hierarchical structures from legacy systems, like this:

FILE
ORDERS|2
HDR|100293|987|20080326
ITM|897654|9876.87|3
ITM|098643|76.34|12
FTR|100293|2
HDR|100294|456|20080326
ITM|765432|11.99|6
FTR|100294|1
ENDORDERS|2
CUST|2
HDR|987
ITM|Joe|Jackson|98 Palomino Way|Los Angeles|CA|90823
FTR|987
HDR|456
ITM|Lisa|Lewis|123 Sesame Street|New York|NY|10014
FTR|456
ENDCUST|2
ENDFILE

How does your process work for files like this? Ignoring header and footer information in this file isn't an option since you will lose important information during the process, such as the order #s and order dates, the line item count, and other auditing information included in the file like record counts, etc.
pnewhart
pnewhart
Old Hand
Old Hand (369 reputation)Old Hand (369 reputation)Old Hand (369 reputation)Old Hand (369 reputation)Old Hand (369 reputation)Old Hand (369 reputation)Old Hand (369 reputation)Old Hand (369 reputation)

Group: General Forum Members
Points: 369 Visits: 386
paul.ibison (3/6/2008)
Brandon,

all due respect but I think you're missing the point.
Judging by some of the questions, some others (Jeff apart) have also never really tried to solve this problem in practice.

Take a csv file with 50 columns and use your conditional split starting with a single column and see how long it takes. You'll have to define each of the 50 columns separately, and define the substring function separately for each one. In the example CSV file have no fixed width so your substring function will take account of looking for the commas, possibly nested inside a string. How long before this is robust? How tedious will it be to do all this coding? And the resulting code required will be huge.

This method can do the same in an hour or so and have a very simple resultant package.

Cheers,

Paul Ibison


Unfortunately in my line of work...a IT solutions vendor...it's part of my job to be a tedious coder. Again, with all due respect to you, I've never seen an example in practice of having header and trailer records along with field names.



Charles Kincaid
Charles Kincaid
Hall of Fame
Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)

Group: General Forum Members
Points: 3221 Visits: 2384
pnewhart (3/6/2008)

Unfortunately in my line of work...a IT solutions vendor...it's part of my job to be a tedious coder. Again, with all due respect to you, I've never seen an example in practice of having header and trailer records along with field names.


With all due respect to all of y'all after these many years I've seen some zany stuff. I'm rarely surprised any more. I write file parsers all the time. I just don't have to use SQL to do the job. I'm also not reading millions of rows at a crack either.

We all have to play the cards we are dealt. We don't have to like them and we get to complain about them.

ATBCharles Kincaid
Mike C
Mike C
SSCertifiable
SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)SSCertifiable (6.3K reputation)

Group: General Forum Members
Points: 6345 Visits: 1172
Charles Kincaid (3/6/2008)

With all due respect to all of y'all after these many years I've seen some zany stuff. I'm rarely surprised any more. I write file parsers all the time. I just don't have to use SQL to do the job. I'm also not reading millions of rows at a crack either.

We all have to play the cards we are dealt. We don't have to like them and we get to complain about them.


I agree with your point of view, as I've done the same type of thing and have written many custom file parsers and custom ETL applications over the years. Honestly I don't think I've ever seen a header/footer file format in which all of the headers and footers could be so easily discarded without a second thought.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)SSC Guru (209K reputation)

Group: General Forum Members
Points: 209851 Visits: 41973
How does your process work for files like this?


I don't believe it was meant to... wasn't advertised that way except maybe by a slightly misleading title.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search