Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Simple Method for Importing Ragged files Expand / Collapse
Author
Message
Posted Wednesday, March 5, 2008 7:48 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, October 20, 2011 8:38 AM
Points: 1, Visits: 20
If the goal is to skip the "incorrect" rows, the simplest way is to set a DTS package for data import, and set error number to 10 (assuming less then ten short rows as in your example).
Post #464445
Posted Wednesday, March 5, 2008 7:58 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, November 21, 2014 12:05 PM
Points: 271, Visits: 314
Laura Meyerovich (3/5/2008)
If the goal is to skip the "incorrect" rows, the simplest way is to set a DTS package for data import, and set error number to 10 (assuming less then ten short rows as in your example).


This also assumes that the data rows you want to omit actually create an error. In some cases, the package may just import the row values into the defined data columns without error.



Post #464453
Posted Wednesday, March 5, 2008 8:03 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, February 8, 2013 5:26 AM
Points: 14, Visits: 144
I looked at the conditional split, but if you have >50 columns (as I have), you'll need to manually define each column - that means define the name and write 50 substring clauses. If you have extracted the data to a csv staging flat file, with the column headers as the first row, this part is all done automatically.
Cheers,
Paul Ibison
Post #464455
Posted Wednesday, March 5, 2008 8:15 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, November 21, 2014 12:05 PM
Points: 271, Visits: 314
paul.ibison (3/5/2008)
I looked at the conditional split, but if you have >50 columns (as I have), you'll need to manually define each column - that means define the name and write 50 substring clauses. If you have extracted the data to a csv staging flat file, with the column headers as the first row, this part is all done automatically.
Cheers,
Paul Ibison


OK. I misread your statement about defining the columns. I generally follow a supplied file layout to define the columns. It's not been my experience where a file would have column names given and have a header and trailer row.



Post #464467
Posted Wednesday, March 5, 2008 8:51 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:29 AM
Points: 35,554, Visits: 32,150
paul.ibison (3/5/2008)
True - the title is a little misleading. The problem is that there isn't a 'correct' term for such files and this is the nearest commonly understood one. Anyway, what you need is this link: http://www.sql-server-performance.com/article_print.aspx?id=1056&type=art
Rgds,
Paul Ibison


Actually, there is... it's called a "mixed rowtype" file.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #464498
Posted Wednesday, March 5, 2008 8:52 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, May 31, 2013 8:28 AM
Points: 410, Visits: 107
Definitely an ingenious solution; however, it can quickly become a poor approach when having to deal with larger import file -think telephone call records for instance.

I've found that the best approach is -and forgive me for still using DTS! :)- the judicious application of modest VB scripting to quickly manipulate suspicous data, or 'cleansing' done directly within the package and temp table.
Post #464501
Posted Wednesday, March 5, 2008 8:55 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:29 AM
Points: 35,554, Visits: 32,150
brewmanz.sqlservercentral (3/5/2008)[hrI was hoping for the import of the classic ragged-right file - a variable number of fields in records that are wanted.


For your definition of "ragged right", is each row supposed to have the same number of "fields" in the same position in each row but because the right hand fields aren't necessarily populated nor padded, that's what makes it ragged?

Do you have an example ragged right file and a record layout that I could play with?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #464507
Posted Wednesday, March 5, 2008 9:02 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, November 21, 2014 12:05 PM
Points: 271, Visits: 314
admin (3/5/2008)
Definitely an ingenious solution; however, it can quickly become a poor approach when having to deal with larger import file -think telephone call records for instance.

I've found that the best approach is -and forgive me for still using DTS! :)- the judicious application of modest VB scripting to quickly manipulate suspicous data, or 'cleansing' done directly within the package and temp table.


DTS is IMO a preferred method for doing quick imports with a lot less overhead. However, there are various scripting tools available within SSIS also. Also, since DTS is being deprecated for SQL Server 2008, it may be time to bite the bullet. :)



Post #464510
Posted Wednesday, March 5, 2008 9:09 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:29 AM
Points: 35,554, Visits: 32,150
admin (3/5/2008)
Definitely an ingenious solution; however, it can quickly become a poor approach when having to deal with larger import file -think telephone call records for instance.

I've found that the best approach is -and forgive me for still using DTS! :)- the judicious application of modest VB scripting to quickly manipulate suspicous data, or 'cleansing' done directly within the package and temp table.


Actually, I've used the method in this article for CDR (Call Detail Records) cleansing... Consider that bulk insert will import 5.1 million rows in 60 seconds and BCP will do an export in about the same time... that means you can do a 5.1 million row "clean up" of this nature at the rate of 5.1 million rows in about 3 minutes (1 export, 2 imports).

Of course, CDRs are normally in a fixed field format and it's a lot cheaper to just input the file into a single column and use substring to split the records whilst ignoring the "short rows". Works nasty fast. Another advantage is for CDR files like what some of the "Bell" companies send... they send a mixed bag... they use a certain "record indicator" in the same postion across multiple record types that have different layouts (CDRs vs Tax Records vs records counts for both). A simple substring "detector" allows me to filter out all but the rows that I want to split.

Heh... and forgive me for not using DTS! I've actually never learned how to use it because I can normally beat the guys at work that do use it for performance.

And, no... I'm not making fun of anyone who uses DTS... if you take it that way, then I apologize. I'm just saying I've always been able to beat DTS with T-SQL especially on CDR files.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #464518
Posted Wednesday, March 5, 2008 9:10 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:29 AM
Points: 35,554, Visits: 32,150
Paul Ibison (3/4/2008)
Comments posted to this topic are about the item


Pretty good article, Paul... straight and to the point.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #464521
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse