|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Thursday, October 20, 2011 8:38 AM
Points: 1,
Visits: 20
|
|
| If the goal is to skip the "incorrect" rows, the simplest way is to set a DTS package for data import, and set error number to 10 (assuming less then ten short rows as in your example).
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Tuesday, April 30, 2013 7:01 AM
Points: 267,
Visits: 271
|
|
Laura Meyerovich (3/5/2008) If the goal is to skip the "incorrect" rows, the simplest way is to set a DTS package for data import, and set error number to 10 (assuming less then ten short rows as in your example).
This also assumes that the data rows you want to omit actually create an error. In some cases, the package may just import the row values into the defined data columns without error.
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Friday, February 08, 2013 5:26 AM
Points: 14,
Visits: 144
|
|
I looked at the conditional split, but if you have >50 columns (as I have), you'll need to manually define each column - that means define the name and write 50 substring clauses. If you have extracted the data to a csv staging flat file, with the column headers as the first row, this part is all done automatically. Cheers, Paul Ibison
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Tuesday, April 30, 2013 7:01 AM
Points: 267,
Visits: 271
|
|
paul.ibison (3/5/2008) I looked at the conditional split, but if you have >50 columns (as I have), you'll need to manually define each column - that means define the name and write 50 substring clauses. If you have extracted the data to a csv staging flat file, with the column headers as the first row, this part is all done automatically. Cheers, Paul Ibison
OK. I misread your statement about defining the columns. I generally follow a supplied file layout to define the columns. It's not been my experience where a file would have column names given and have a header and trailer row.
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 12:30 PM
Points: 32,893,
Visits: 26,770
|
|
paul.ibison (3/5/2008) True - the title is a little misleading. The problem is that there isn't a 'correct' term for such files and this is the nearest commonly understood one. Anyway, what you need is this link: http://www.sql-server-performance.com/article_print.aspx?id=1056&type=art Rgds, Paul Ibison
Actually, there is... it's called a "mixed rowtype" file.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Tuesday, May 14, 2013 5:08 PM
Points: 410,
Visits: 106
|
|
Definitely an ingenious solution; however, it can quickly become a poor approach when having to deal with larger import file -think telephone call records for instance.
I've found that the best approach is -and forgive me for still using DTS! :)- the judicious application of modest VB scripting to quickly manipulate suspicous data, or 'cleansing' done directly within the package and temp table.
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 12:30 PM
Points: 32,893,
Visits: 26,770
|
|
brewmanz.sqlservercentral (3/5/2008)[hrI was hoping for the import of the classic ragged-right file - a variable number of fields in records that are wanted.
For your definition of "ragged right", is each row supposed to have the same number of "fields" in the same position in each row but because the right hand fields aren't necessarily populated nor padded, that's what makes it ragged?
Do you have an example ragged right file and a record layout that I could play with?
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Tuesday, April 30, 2013 7:01 AM
Points: 267,
Visits: 271
|
|
admin (3/5/2008) Definitely an ingenious solution; however, it can quickly become a poor approach when having to deal with larger import file -think telephone call records for instance.
I've found that the best approach is -and forgive me for still using DTS! :)- the judicious application of modest VB scripting to quickly manipulate suspicous data, or 'cleansing' done directly within the package and temp table.
DTS is IMO a preferred method for doing quick imports with a lot less overhead. However, there are various scripting tools available within SSIS also. Also, since DTS is being deprecated for SQL Server 2008, it may be time to bite the bullet. :)
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 12:30 PM
Points: 32,893,
Visits: 26,770
|
|
admin (3/5/2008) Definitely an ingenious solution; however, it can quickly become a poor approach when having to deal with larger import file -think telephone call records for instance.
I've found that the best approach is -and forgive me for still using DTS! :)- the judicious application of modest VB scripting to quickly manipulate suspicous data, or 'cleansing' done directly within the package and temp table.
Actually, I've used the method in this article for CDR (Call Detail Records) cleansing... Consider that bulk insert will import 5.1 million rows in 60 seconds and BCP will do an export in about the same time... that means you can do a 5.1 million row "clean up" of this nature at the rate of 5.1 million rows in about 3 minutes (1 export, 2 imports).
Of course, CDRs are normally in a fixed field format and it's a lot cheaper to just input the file into a single column and use substring to split the records whilst ignoring the "short rows". Works nasty fast. Another advantage is for CDR files like what some of the "Bell" companies send... they send a mixed bag... they use a certain "record indicator" in the same postion across multiple record types that have different layouts (CDRs vs Tax Records vs records counts for both). A simple substring "detector" allows me to filter out all but the rows that I want to split.
Heh... and forgive me for not using DTS! I've actually never learned how to use it because I can normally beat the guys at work that do use it for performance.
And, no... I'm not making fun of anyone who uses DTS... if you take it that way, then I apologize. I'm just saying I've always been able to beat DTS with T-SQL especially on CDR files.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 12:30 PM
Points: 32,893,
Visits: 26,770
|
|
Paul Ibison (3/4/2008) Comments posted to this topic are about the item
Pretty good article, Paul... straight and to the point.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|