SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


How to Make Scalar UDFs Run Faster (SQL Spackle)


How to Make Scalar UDFs Run Faster (SQL Spackle)

Author
Message
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)

Group: General Forum Members
Points: 36334 Visits: 11361
Removed. (I quoted Jeff partially quoting Barry without reading all the way back to see that Barry had already referenced the document I wanted to mention).



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Tom Thomson
Tom Thomson
One Orange Chip
One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)One Orange Chip (26K reputation)

Group: General Forum Members
Points: 26283 Visits: 12506
SQL Kiwi (9/23/2012)
L' Eomot Inversé (9/23/2012)
It would be interesting to see the C# so that we could see what it is that's being optimised. Clearly either the C# optimiser is really awful, or the C# is deliberately written in such a way as to avoid the most insane optimisations. For example, it would be nice to know how many multiplications the optimised code does, it would be nice to know how many values it reads from an array, and so on. Decades ago I saw a Fortran system which would have run this, if naively written so that the optimiser could be really clever, in about 10 microseconds. Of course it would have done exactly one multiplication.

Please be assured that the .NET compiler and optimizer is state-of-the-art. The overhead is caused by CLR being in hosted in SQL Server. The C# code will indeed be compiled down to just one machine-code instruction (which may take less than one CPU cycle to execute) but it takes time to connect things up and switch between SQL Server and the hosted environment.

I think you misunderstood me Paul - my fault for not expressing myself more clearly.

rragnmo seemed to me to be talking about two distinct thigs - a chunk of pure C# measured on its own (his first paragraph) and a CLR function embedded into SQL (his second paragraph). I was concerned only with his second paragraph. When an attempt is made to do the whole thing in c# (or Fortran, or any other procedural language) there's a risk that the optimiser notices that the first one million minus one function calls are irrelevant so it leaves them out. This doesn't happen with SQL, currently (but it might in the furture) so generally there';s an attempt to prevent the optimiser from doing this to avoid meaningless comparisons. My "one multiplication" was for the whole process, which needed only one multiplication (probably actually a left shift, depending on the target machine, as it's a multiplication by 2) instead of a million multiplications. If that sort of optimisation was happening, we would be comparing a forest with a single tree.

Having seen people make utterly wrong decisions based on run times of tasks which could be optimied out just about completely, I just wondered how well protected rragnmo's measurements were aginst that sort of thing. One of the things he said in his comment seemed to indicate that he did have had it in mind, but was concerned that he might not have done enough to avoid it.

Tom

Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (219K reputation)SSC Guru (219K reputation)SSC Guru (219K reputation)SSC Guru (219K reputation)SSC Guru (219K reputation)SSC Guru (219K reputation)SSC Guru (219K reputation)SSC Guru (219K reputation)

Group: General Forum Members
Points: 219618 Visits: 42002
SQL Kiwi (9/23/2012)
The C# code will indeed be compiled down to just one machine-code instruction (which may take less than one CPU cycle to execute)...


Are you sure, Paul? IIRC, even a simple JMP instruction takes at least 2 clock cycles.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)

Group: General Forum Members
Points: 36334 Visits: 11361
G Bryant McClellan (8/1/2012)
While every testing method negatively affects the performance of the thing being tested by making the tester part of the experiment, some are obviously worse than others. I'd be interested to hear from some of the SQL Internals authors if they have some insight into how SET STATISTICS TIME ON works and how its use may or may not affect other situations where it is used. Paul? Kalen? Are you thinking about this yet?

Having STATISTICS TIME ON causes SQL Server to measure things every time a T-SQL frame is entered and exited. The way T-SQL scalar functions are implemented now (with BEGIN...END syntax) an (expensive!) new T-SQL frame is created on every call.

http://blogs.msdn.com/b/sqlclrperf/archive/2007/03/08/scalar-functions.aspx



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)

Group: General Forum Members
Points: 36334 Visits: 11361
RichB (8/1/2012)
Have you tried looking at these with a server side trace? I'd be very curious to know whether that can have a similar problem.

Yes a server-side trace has very similar effects compared with STATISTICS TIME for similar reasons. Your question reminded me of a real issue here on SSC from several years ago: http://www.sqlservercentral.com/Forums/FindPost886822.aspx



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)

Group: General Forum Members
Points: 36334 Visits: 11361
milldyce (8/1/2012)
A bit of a tangent, but I remember a number of queries for which response times exploded when trying to view the Execution Plan.

Requesting an actual execution plan causes SQL Server to insert extra (invisible) operators into the executable to measure the things like actual row counts, rebinds, rewinds...and so on. The output from all this data collection is converted into XML at the end of execution and returned to the client. Understandably, all these extra operators and activity generate significant overhead.



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)

Group: General Forum Members
Points: 36334 Visits: 11361
Jeff Moden (9/23/2012)
SQL Kiwi (9/23/2012)
The C# code will indeed be compiled down to just one machine-code instruction (which may take less than one CPU cycle to execute)...

Are you sure, Paul? IIRC, even a simple JMP instruction takes at least 2 clock cycles.

I was referring to the fact that modern processors employ tricks like prefetch, pipelining, branch prediction, speculative execution, out-of-order execution...and so on...meaning that multiple instructions can be retired per clock. An simple instruction that multiplies by two (which may well be optimized to a left shift as Tom mentions) may be timed at 1 clock but might well take less than that. e.g. http://stackoverflow.com/questions/692718/how-to-find-cpu-cycle-for-an-assembly-instruction



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)

Group: General Forum Members
Points: 36334 Visits: 11361
L' Eomot Inversé (9/23/2012)
I think you misunderstood me Paul - my fault for not expressing myself more clearly.

rragnmo seemed to me to be talking about two distinct thigs - a chunk of pure C# measured on its own (his first paragraph) and a CLR function embedded into SQL (his second paragraph). I was concerned only with his second paragraph. When an attempt is made to do the whole thing in c# (or Fortran, or any other procedural language) there's a risk that the optimiser notices that the first one million minus one function calls are irrelevant so it leaves them out. This doesn't happen with SQL, currently (but it might in the furture) so generally there';s an attempt to prevent the optimiser from doing this to avoid meaningless comparisons. My "one multiplication" was for the whole process, which needed only one multiplication (probably actually a left shift, depending on the target machine, as it's a multiplication by 2) instead of a million multiplications. If that sort of optimisation was happening, we would be comparing a forest with a single tree.

Oh I see! Yes, sorry I did misread your question. You're right that this sort of optimization doesn't happen now, and I doubt it ever will - unless SQL Server's interpreted T-SQL language is completed replaced by .NET (unlikely, perhaps sadly).



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)SSC-Dedicated (36K reputation)

Group: General Forum Members
Points: 36334 Visits: 11361
Right. Working back through the comments completed :-)

Thank you Jeff for so many mentions in your fine article!



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (219K reputation)SSC Guru (219K reputation)SSC Guru (219K reputation)SSC Guru (219K reputation)SSC Guru (219K reputation)SSC Guru (219K reputation)SSC Guru (219K reputation)SSC Guru (219K reputation)

Group: General Forum Members
Points: 219618 Visits: 42002
SQL Kiwi (9/23/2012)
Jeff Moden (9/23/2012)
SQL Kiwi (9/23/2012)
The C# code will indeed be compiled down to just one machine-code instruction (which may take less than one CPU cycle to execute)...

Are you sure, Paul? IIRC, even a simple JMP instruction takes at least 2 clock cycles.

I was referring to the fact that modern processors employ tricks like prefetch, pipelining, branch prediction, speculative execution, out-of-order execution...and so on...meaning that multiple instructions can be retired per clock. An simple instruction that multiplies by two (which may well be optimized to a left shift as Tom mentions) may be timed at 1 clock but might well take less than that. e.g. http://stackoverflow.com/questions/692718/how-to-find-cpu-cycle-for-an-assembly-instruction


Thanks for the explanation, Paul. It's been a long time (6502 days) since I've even gone near machine language. The processors certainly have come a long way at that level.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search