Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««23456»»»

Simple Method for Importing Ragged files Expand / Collapse
Author
Message
Posted Wednesday, March 5, 2008 3:21 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:27 AM
Points: 35,769, Visits: 32,437
Cool... let's see your solution for this, Brandon.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #464803
Posted Thursday, March 6, 2008 6:48 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:27 AM
Points: 35,769, Visits: 32,437
Heh... Hour or so? Like I said previously, I've used this method to do 5.1 million rows in about 3 minutes. I'm sure there's a faster method, but it worked for me.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #465105
Posted Thursday, March 6, 2008 7:15 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, February 8, 2013 5:26 AM
Points: 14, Visits: 144
Just to clarify - the "hour or so" being referred to was the time taken to create the package.
Rgds,
Paul Ibison
Post #465132
Posted Thursday, March 6, 2008 8:57 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, January 27, 2014 10:14 AM
Points: 1,322, Visits: 1,091
The "Ragged Right" Files I have to deal with are Mixed Row Type files as well. Because of that, I cannot use column names from the csv file, but also, each row starts with a TypeID.

So, in DTS, I create a transform task for each row type and use a Transform Task with this code on the first column:

Function Main()
If DTSSource("Col001") = "HDRID" then
Main =DTSTransformStat_OK
ELSE
Main = DTSTransformStat_SkipRow
End if
End Function

In the example for the article your initial transform would just check the length and skip it if it was a header or footer. There's no need to import-export-import, even to get the column headings.
--
JimFive
Post #465227
Posted Thursday, March 6, 2008 9:34 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, February 8, 2013 5:26 AM
Points: 14, Visits: 144
Brandon,

all due respect but I think you're missing the point.
Judging by some of the questions, some others (Jeff apart) have also never really tried to solve this problem in practice.

Take a csv file with 50 columns and use your conditional split starting with a single column and see how long it takes. You'll have to define each of the 50 columns separately, and define the substring function separately for each one. In the example CSV file have no fixed width so your substring function will take account of looking for the commas, possibly nested inside a string. How long before this is robust? How tedious will it be to do all this coding? And the resulting code required will be huge.

This method can do the same in an hour or so and have a very simple resultant package.

Cheers,

Paul Ibison
Post #464978
Posted Thursday, March 6, 2008 12:53 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, September 24, 2014 1:20 PM
Points: 1,276, Visits: 1,135
paul.ibison (3/6/2008)
Brandon,

all due respect but I think you're missing the point.
Judging by some of the questions, some others (Jeff apart) have also never really tried to solve this problem in practice.

Take a csv file with 50 columns and use your conditional split starting with a single column and see how long it takes. You'll have to define each of the 50 columns separately, and define the substring function separately for each one. In the example CSV file have no fixed width so your substring function will take account of looking for the commas, possibly nested inside a string. How long before this is robust? How tedious will it be to do all this coding? And the resulting code required will be huge.

This method can do the same in an hour or so and have a very simple resultant package.

Cheers,

Paul Ibison


I'll admit I've never had to solve this exact problem before, where simply stripping headers and footers away would do the trick. Problems I've had to solve involving headers and footers have sometimes involved files with complex hierarchical structures from legacy systems, like this:

FILE
ORDERS|2
HDR|100293|987|20080326
ITM|897654|9876.87|3
ITM|098643|76.34|12
FTR|100293|2
HDR|100294|456|20080326
ITM|765432|11.99|6
FTR|100294|1
ENDORDERS|2
CUST|2
HDR|987
ITM|Joe|Jackson|98 Palomino Way|Los Angeles|CA|90823
FTR|987
HDR|456
ITM|Lisa|Lewis|123 Sesame Street|New York|NY|10014
FTR|456
ENDCUST|2
ENDFILE

How does your process work for files like this? Ignoring header and footer information in this file isn't an option since you will lose important information during the process, such as the order #s and order dates, the line item count, and other auditing information included in the file like record counts, etc.
Post #465444
Posted Thursday, March 6, 2008 1:07 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Today @ 1:37 PM
Points: 271, Visits: 317
paul.ibison (3/6/2008)
Brandon,

all due respect but I think you're missing the point.
Judging by some of the questions, some others (Jeff apart) have also never really tried to solve this problem in practice.

Take a csv file with 50 columns and use your conditional split starting with a single column and see how long it takes. You'll have to define each of the 50 columns separately, and define the substring function separately for each one. In the example CSV file have no fixed width so your substring function will take account of looking for the commas, possibly nested inside a string. How long before this is robust? How tedious will it be to do all this coding? And the resulting code required will be huge.

This method can do the same in an hour or so and have a very simple resultant package.

Cheers,

Paul Ibison


Unfortunately in my line of work...a IT solutions vendor...it's part of my job to be a tedious coder. Again, with all due respect to you, I've never seen an example in practice of having header and trailer records along with field names.



Post #465459
Posted Thursday, March 6, 2008 6:05 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 11:42 AM
Points: 821, Visits: 2,028
pnewhart (3/6/2008)

Unfortunately in my line of work...a IT solutions vendor...it's part of my job to be a tedious coder. Again, with all due respect to you, I've never seen an example in practice of having header and trailer records along with field names.


With all due respect to all of y'all after these many years I've seen some zany stuff. I'm rarely surprised any more. I write file parsers all the time. I just don't have to use SQL to do the job. I'm also not reading millions of rows at a crack either.

We all have to play the cards we are dealt. We don't have to like them and we get to complain about them.


ATB

Charles Kincaid

Post #465586
Posted Thursday, March 6, 2008 7:28 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, September 24, 2014 1:20 PM
Points: 1,276, Visits: 1,135
Charles Kincaid (3/6/2008)

With all due respect to all of y'all after these many years I've seen some zany stuff. I'm rarely surprised any more. I write file parsers all the time. I just don't have to use SQL to do the job. I'm also not reading millions of rows at a crack either.

We all have to play the cards we are dealt. We don't have to like them and we get to complain about them.


I agree with your point of view, as I've done the same type of thing and have written many custom file parsers and custom ETL applications over the years. Honestly I don't think I've ever seen a header/footer file format in which all of the headers and footers could be so easily discarded without a second thought.
Post #465602
Posted Thursday, March 6, 2008 8:52 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:27 AM
Points: 35,769, Visits: 32,437
How does your process work for files like this?


I don't believe it was meant to... wasn't advertised that way except maybe by a slightly misleading title.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #465620
« Prev Topic | Next Topic »

Add to briefcase «««23456»»»

Permissions Expand / Collapse