Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««56789»»

How to Make Scalar UDFs Run Faster (SQL Spackle) Expand / Collapse
Author
Message
Posted Sunday, September 23, 2012 12:17 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:25 AM
Points: 11,168, Visits: 10,927
Removed. (I quoted Jeff partially quoting Barry without reading all the way back to see that Barry had already referenced the document I wanted to mention).



Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1363266
Posted Sunday, September 23, 2012 12:19 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Yesterday @ 4:56 PM
Points: 8,271, Visits: 8,717
SQL Kiwi (9/23/2012)
L' Eomot Inversé (9/23/2012)
It would be interesting to see the C# so that we could see what it is that's being optimised. Clearly either the C# optimiser is really awful, or the C# is deliberately written in such a way as to avoid the most insane optimisations. For example, it would be nice to know how many multiplications the optimised code does, it would be nice to know how many values it reads from an array, and so on. Decades ago I saw a Fortran system which would have run this, if naively written so that the optimiser could be really clever, in about 10 microseconds. Of course it would have done exactly one multiplication.

Please be assured that the .NET compiler and optimizer is state-of-the-art. The overhead is caused by CLR being in hosted in SQL Server. The C# code will indeed be compiled down to just one machine-code instruction (which may take less than one CPU cycle to execute) but it takes time to connect things up and switch between SQL Server and the hosted environment.

I think you misunderstood me Paul - my fault for not expressing myself more clearly.

rragnmo seemed to me to be talking about two distinct thigs - a chunk of pure C# measured on its own (his first paragraph) and a CLR function embedded into SQL (his second paragraph). I was concerned only with his second paragraph. When an attempt is made to do the whole thing in c# (or Fortran, or any other procedural language) there's a risk that the optimiser notices that the first one million minus one function calls are irrelevant so it leaves them out. This doesn't happen with SQL, currently (but it might in the furture) so generally there';s an attempt to prevent the optimiser from doing this to avoid meaningless comparisons. My "one multiplication" was for the whole process, which needed only one multiplication (probably actually a left shift, depending on the target machine, as it's a multiplication by 2) instead of a million multiplications. If that sort of optimisation was happening, we would be comparing a forest with a single tree.

Having seen people make utterly wrong decisions based on run times of tasks which could be optimied out just about completely, I just wondered how well protected rragnmo's measurements were aginst that sort of thing. One of the things he said in his comment seemed to indicate that he did have had it in mind, but was concerned that he might not have done enough to avoid it.


Tom
Post #1363269
Posted Sunday, September 23, 2012 12:27 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:48 AM
Points: 35,950, Visits: 30,232
SQL Kiwi (9/23/2012)
The C# code will indeed be compiled down to just one machine-code instruction (which may take less than one CPU cycle to execute)...


Are you sure, Paul? IIRC, even a simple JMP instruction takes at least 2 clock cycles.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1363271
Posted Sunday, September 23, 2012 12:29 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:25 AM
Points: 11,168, Visits: 10,927
G Bryant McClellan (8/1/2012)
While every testing method negatively affects the performance of the thing being tested by making the tester part of the experiment, some are obviously worse than others. I'd be interested to hear from some of the SQL Internals authors if they have some insight into how SET STATISTICS TIME ON works and how its use may or may not affect other situations where it is used. Paul? Kalen? Are you thinking about this yet?

Having STATISTICS TIME ON causes SQL Server to measure things every time a T-SQL frame is entered and exited. The way T-SQL scalar functions are implemented now (with BEGIN...END syntax) an (expensive!) new T-SQL frame is created on every call.

http://blogs.msdn.com/b/sqlclrperf/archive/2007/03/08/scalar-functions.aspx




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1363272
Posted Sunday, September 23, 2012 12:31 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:25 AM
Points: 11,168, Visits: 10,927
RichB (8/1/2012)
Have you tried looking at these with a server side trace? I'd be very curious to know whether that can have a similar problem.

Yes a server-side trace has very similar effects compared with STATISTICS TIME for similar reasons. Your question reminded me of a real issue here on SSC from several years ago: http://www.sqlservercentral.com/Forums/FindPost886822.aspx




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1363273
Posted Sunday, September 23, 2012 12:34 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:25 AM
Points: 11,168, Visits: 10,927
milldyce (8/1/2012)
A bit of a tangent, but I remember a number of queries for which response times exploded when trying to view the Execution Plan.

Requesting an actual execution plan causes SQL Server to insert extra (invisible) operators into the executable to measure the things like actual row counts, rebinds, rewinds...and so on. The output from all this data collection is converted into XML at the end of execution and returned to the client. Understandably, all these extra operators and activity generate significant overhead.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1363275
Posted Sunday, September 23, 2012 12:38 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:25 AM
Points: 11,168, Visits: 10,927
Jeff Moden (9/23/2012)
SQL Kiwi (9/23/2012)
The C# code will indeed be compiled down to just one machine-code instruction (which may take less than one CPU cycle to execute)...

Are you sure, Paul? IIRC, even a simple JMP instruction takes at least 2 clock cycles.

I was referring to the fact that modern processors employ tricks like prefetch, pipelining, branch prediction, speculative execution, out-of-order execution...and so on...meaning that multiple instructions can be retired per clock. An simple instruction that multiplies by two (which may well be optimized to a left shift as Tom mentions) may be timed at 1 clock but might well take less than that. e.g. http://stackoverflow.com/questions/692718/how-to-find-cpu-cycle-for-an-assembly-instruction




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1363276
Posted Sunday, September 23, 2012 12:45 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:25 AM
Points: 11,168, Visits: 10,927
L' Eomot Inversé (9/23/2012)
I think you misunderstood me Paul - my fault for not expressing myself more clearly.

rragnmo seemed to me to be talking about two distinct thigs - a chunk of pure C# measured on its own (his first paragraph) and a CLR function embedded into SQL (his second paragraph). I was concerned only with his second paragraph. When an attempt is made to do the whole thing in c# (or Fortran, or any other procedural language) there's a risk that the optimiser notices that the first one million minus one function calls are irrelevant so it leaves them out. This doesn't happen with SQL, currently (but it might in the furture) so generally there';s an attempt to prevent the optimiser from doing this to avoid meaningless comparisons. My "one multiplication" was for the whole process, which needed only one multiplication (probably actually a left shift, depending on the target machine, as it's a multiplication by 2) instead of a million multiplications. If that sort of optimisation was happening, we would be comparing a forest with a single tree.

Oh I see! Yes, sorry I did misread your question. You're right that this sort of optimization doesn't happen now, and I doubt it ever will - unless SQL Server's interpreted T-SQL language is completed replaced by .NET (unlikely, perhaps sadly).




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1363279
Posted Sunday, September 23, 2012 12:47 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:25 AM
Points: 11,168, Visits: 10,927
Right. Working back through the comments completed

Thank you Jeff for so many mentions in your fine article!




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1363281
Posted Sunday, September 23, 2012 2:48 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:48 AM
Points: 35,950, Visits: 30,232
SQL Kiwi (9/23/2012)
Jeff Moden (9/23/2012)
SQL Kiwi (9/23/2012)
The C# code will indeed be compiled down to just one machine-code instruction (which may take less than one CPU cycle to execute)...

Are you sure, Paul? IIRC, even a simple JMP instruction takes at least 2 clock cycles.

I was referring to the fact that modern processors employ tricks like prefetch, pipelining, branch prediction, speculative execution, out-of-order execution...and so on...meaning that multiple instructions can be retired per clock. An simple instruction that multiplies by two (which may well be optimized to a left shift as Tom mentions) may be timed at 1 clock but might well take less than that. e.g. http://stackoverflow.com/questions/692718/how-to-find-cpu-cycle-for-an-assembly-instruction


Thanks for the explanation, Paul. It's been a long time (6502 days) since I've even gone near machine language. The processors certainly have come a long way at that level.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

"Change is inevitable. Change for the better is not." -- 04 August 2013
(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1363299
« Prev Topic | Next Topic »

Add to briefcase «««56789»»

Permissions Expand / Collapse