Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234

Remove non printable characters Expand / Collapse
Author
Message
Posted Monday, December 6, 2010 3:41 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:22 AM
Points: 36,712, Visits: 31,162


Thanks for the link and welcome aboard! Shifting gears, please be sure to test that code from that link for performance before you think of using it because it's likely a whole lot slower than the code already posted on this thread because of the scalar function with a While loop in it. The "nested REPLACE" code posted on this thread will absolutely fly.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1030897
Posted Thursday, December 30, 2010 9:16 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, February 4, 2011 1:39 PM
Points: 3, Visits: 12
Actually, I did test it and it works great. I tested it on a table with almost 4 million rows and it only took a minute or two.
Post #1040973
Posted Thursday, December 30, 2010 12:44 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 9:21 PM
Points: 11,192, Visits: 11,090
telastruct (12/30/2010)
Actually, I did test it and it works great. I tested it on a table with almost 4 million rows and it only took a minute or two.

Now try the 'nested REPLACE' code Jeff referred to.
You should find it's much faster (and more efficient).
Actually.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1041146
Posted Thursday, December 30, 2010 1:21 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:22 AM
Points: 36,712, Visits: 31,162
telastruct (12/30/2010)
Actually, I did test it and it works great. I tested it on a table with almost 4 million rows and it only took a minute or two.


As Paul said, test the nested-REPLACE method... you'll find that 4 million rows are process in just several seconds... That's provided that you're not outputing the 4 million rows to the display which is the "great equalizer". Lot's of people think their code runs well when they compared it to other code only by how long it takes to render on the screen. Bad code and good code will frequently take the same time on-screen simply because they have the same number of rows to display.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1041177
Posted Thursday, December 30, 2010 8:30 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, February 4, 2011 1:39 PM
Points: 3, Visits: 12
Thanks, guys. I apologize if my "actually" seemed snitty- I see how it could be taken that way after re-reading it. I guess I should have followed it up with the question: If the replace method "can clean the 13 million rows in about 25 minutes" (about 2 minutes for each million rows), how is that faster than 2 minutes for about 4 million rows? I'm not a DB expert, and am sure there's probably some other factor I'm missing...?

Even if it is not as efficient, I do like the way the other solution is packaged- it seems like the replace method would be pretty easy to incorporate.

Thank you again for your thoughts.
Post #1041260
Posted Friday, December 31, 2010 1:39 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:22 AM
Points: 36,712, Visits: 31,162
telastruct (12/30/2010)
Thanks, guys. I apologize if my "actually" seemed snitty- I see how it could be taken that way after re-reading it. I guess I should have followed it up with the question: If the replace method "can clean the 13 million rows in about 25 minutes" (about 2 minutes for each million rows), how is that faster than 2 minutes for about 4 million rows? I'm not a DB expert, and am sure there's probably some other factor I'm missing...?

Even if it is not as efficient, I do like the way the other solution is packaged- it seems like the replace method would be pretty easy to incorporate.

Thank you again for your thoughts.


Understood and thanks for the feedback.

The real key is the two different rowcounts on 2 different tables on 2 different machines by two different people using two different methods that haven't been fully posted. For example, are they doing an UPDATE, piping to a new table, or displaying on the screen? What's the status of fragmentation for both tables? How many indexes are being updated? Are the systems using the same version? Are the systems using the same capacity pipe and hard drives? What is the wait-state of the two systems? Are they both "dead quiet" or is one supporting a world wide heavy hit Web site with lots of reporting going on?

There's something else that folks need to understand... each system has a "tipping point" where UPDATEs are concerned. For example... on one system it may take only 10 seconds to update a million rows and, likewise, only 20 seconds to update 2 million rows and only 30 seconds to update 3 million rows. But, on that same system with the same table and index structures using the exact same query, it may suddenly take 4 hours to update only 4 million rows. Everything else being the same, another system may be able to handle 6 million rows without reaching the "tipping point".

If you want to do a comparison, it has to be on the same table and the same data or at least data with the same level of randomization. You just can't tell what's what with all the other variables currenlty in play.

I'll also admit that I've seen certain memory only scalar functions beat other methods even when they have a WHILE loop in it. But, we won't actually know until at least one person actually tests both methods on the same data on the same machine. I was hoping you'd be that person so I didn't actually have to be "the one" yet again.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1041473
Posted Wednesday, July 25, 2012 11:17 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, May 21, 2014 8:27 AM
Points: 1, Visits: 7
This is exactly what i was looking for. My data had ASCII(160) appended to it. I just added that to the list and it worked very well.

-MK
Post #1335319
Posted Saturday, May 18, 2013 10:01 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, July 17, 2014 7:54 PM
Points: 2, Visits: 25
This is what I have been looking for and forgive me if this is a simple question (I am trying to learn some of these concepts, like in-line table-valued functions), but how do you easily apply this to an entire table? How do you use this, even? It returns a table (not permanent), so I selected the result set into a new table, but I don't know if that defeats the point of using the TVF in the first place.

This:

select * into dbo.testtable from
(SELECT t1.id as id, t1.title as title, clean.cleaned as cleaned from t1
cross apply dbo.if_cleanwithreplace(t1.title) clean) a

fixes one column
Post #1454277
Posted Saturday, May 25, 2013 2:14 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:22 AM
Points: 36,712, Visits: 31,162
ryan.blosser (5/18/2013)
This is what I have been looking for and forgive me if this is a simple question (I am trying to learn some of these concepts, like in-line table-valued functions), but how do you easily apply this to an entire table? How do you use this, even? It returns a table (not permanent), so I selected the result set into a new table, but I don't know if that defeats the point of using the TVF in the first place.

This:

select * into dbo.testtable from
(SELECT t1.id as id, t1.title as title, clean.cleaned as cleaned from t1
cross apply dbo.if_cleanwithreplace(t1.title) clean) a

fixes one column


You can simplify that query quite a bit. No need for the nesting.
SELECT t1.ID, t1.Title, clean.Cleaned
INTO dbo.TestTable
FROM t1
CROSS APPLY dbo.if_cleanwithreplace(t1.Title) clean
;


The purpose of the iTVF, in this case, is to function more like an iSF (Inline Scalar Function) which is typically about 7 times faster than a run-of-the-mill scalar function. See the following article for more on that.
http://www.sqlservercentral.com/articles/T-SQL/91724/

Putting the results in a table doesn't negate the value of such functions. It depends on what you want to do. The results could just as easily be returned directly.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1456813
Posted Wednesday, May 21, 2014 8:25 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Friday, July 18, 2014 7:28 AM
Points: 422, Visits: 1,549
Interesting solution.

How would you handle multiple columns to clean? Doing multiple cross applies seems like it would kill performance as you could be theoretically joining back to the same 4 million record table 2, 3, or more times.

Thanks.
Post #1573206
« Prev Topic | Next Topic »

Add to briefcase «««1234

Permissions Expand / Collapse