Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««678910»»

How to Make Scalar UDFs Run Faster (SQL Spackle) Expand / Collapse
Author
Message
Posted Sunday, September 23, 2012 3:01 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 6:20 PM
Points: 11,192, Visits: 11,093
Jeff Moden (9/23/2012)
Thanks for the explanation, Paul. It's been a long time (6502 days) since I've even gone near machine language.

Me too! My first ever computer was a Commodore VIC-20 (1MHz 8-bit 6502 CPU) I bought with paper-round money.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1363302
Posted Sunday, September 23, 2012 3:07 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:18 AM
Points: 36,751, Visits: 31,202
SQL Kiwi (9/23/2012)
Right. Working back through the comments completed

Thank you Jeff for so many mentions in your fine article!


The articles you wrote on CROSS APPLY are the best on the net IMHO. Thank you for writing them. They give the rest of us a good spot to refer folks to on the subject.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1363303
Posted Sunday, September 23, 2012 4:15 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, September 25, 2012 3:50 PM
Points: 3, Visits: 26
Yes, I was indeed talking about two different things. One is the CLR-in-SQL function, and the other is a pure C# application.

I think the C# application was fairly well protected against being optimized out. I didn't realize I never attached it; I can see if I have it around. I inspected the generated IL and the runtime execution traces. I also observed that it scaled linearly with the number of iterations. Just to be certain, I also accumulate a value in the loop body, and print out that accumulated value at the end.

In my experience, the CLR JIT optimizer rarely optimizes out entire loop bodies. It just does not go through that work to do the analysis, likely under the assumption that it has little benefit for real world (non-benchmark) cases. (The Oracle JVM actually does do such eliminations fairly often. In fact, I find the Oracle JVM to perform stronger optimizations across the board. However, with C# and the CLR, you have pointers and non-heap types and similar things, and thus have more ability to optimize what you find necessary. Basically, as a rule of thumb, performance almost always follows: simple C# < java < optimized C#.)

It seems like there are further limits on really making the SQL CLR functionality match the performance of pure CLR code for numerical computation. For example, when passing over a large set of primitive values, it is not fast to make any call at all for each and every entry. It would commonly be faster to first shuttle the values into a primitive array and then execute on that, for reasons of memory locality and call overhead. Further, SQL Server could in theory persist such copies at will, as long as it treated them as a cache and invalidated them. For data warehousing scenarios this would work well, since updates are rare, and that is where high-performance numerical computation is common. Unfortunately, this kind of usage has not been a priority. I suppose there just isn't much customer demand for a unified system... As it is, I would be cautious about overdoing any complex computations that are not core SQL Server scenarios. For many such cases of numerical computation, the optimal design (often by orders of magnitude) will be: dump the data from SQL Server, compute, spew data into SQL Server.

Now, optimizing large-scale extraction and import is a different topic, and is also far from ideal performance with standard mechanisms...

I wish we didn't have to consider all of this complexity. There is no reason why we should have to worry about making CLR versions of TSQL functions that could be created by a template transform, to get high performance. We also shouldn't have to worry that the best we can do for some large-scale computation on a data warehouse might be 1000x slower than what we could do if we dump the data, even when highly-optimized. This really reduces the flexibility and simplicity of systems built around SQL Server for a wide variety of cases. Of course, again, these are not the core customer cases, so...

I assume the standalone C# code I was measuring was something like (although more JIT warming, etc. would be desirable):

using System;
using System.Diagnostics;

namespace Test
{
class Program
{
static void Main(string[] args)
{
int[] data = new int[10000000];
for (int i = 0; i < data.Length; i++)
{
data[i] = i;
}

int acc = 0;
acc += Mult(data);

acc = 0;
Stopwatch time = Stopwatch.StartNew();
acc += Mult(data);
time.Stop();
Console.WriteLine("Time: " + time.ElapsedMilliseconds + " ms [" + acc + "]");

acc = 0;
time.Restart();
acc += Mult(data);
time.Stop();
Console.WriteLine("Time: " + time.ElapsedMilliseconds + " ms [" + acc + "]");

acc = 0;
time.Restart();
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
time.Stop();
Console.WriteLine("Time: " + time.ElapsedMilliseconds + " ms [" + acc + "]");
}

public static int Mult(int[] data)
{
int res = 0;
foreach (int d in data)
{
//res += Mult(d);
res = Mult(d);
}
return res;
}

public static int Mult(int d)
{
return d * 2;
}
}
}

(Incidentally, it is out of scope for this forum, but to back up my aside above... Here is the rough code in Java:


public class Test
{
public static void main(String[] args)
{
int[] data = new int[10000000];
for (int i = 0; i < data.length; i++)
{
data[i] = i;
}

int acc = 0;
acc += Mult(data);

acc = 0;
long time = System.nanoTime();
acc += Mult(data);
time = System.nanoTime() - time;
System.out.println("Time: " + (time / 1000000) + " ms [" + acc + "]");

acc = 0;
time = System.nanoTime();
acc += Mult(data);
time = System.nanoTime() - time;
System.out.println("Time: " + (time / 1000000) + " ms [" + acc + "]");

acc = 0;
time = System.nanoTime();
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
acc += Mult(data);
time = System.nanoTime() - time;
System.out.println("Time: " + (time / 1000000) + " ms [" + acc + "]");
}

public static int Mult(int[] data)
{
int res = 0;
for (int d : data)
{
//res += Mult(d);
res = Mult(d);
}
return res;
}

public static int Mult(int d)
{
return d * 2;
}
}

Unlike the C# code, this always takes 0 ms to execute. It appears to simply analyze the code and conclude that only the last iteration of the loop will matter. If you change the Mult function to accumulate instead, it no longer can do that. Still, it then takes only 6 ms to execute, which is much faster than the C# version.)
Post #1363307
Posted Sunday, September 23, 2012 7:16 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 4:40 AM
Points: 8,557, Visits: 9,050
rragno (9/23/2012)
Unlike the C# code, this always takes 0 ms to execute. It appears to simply analyze the code and conclude that only the last iteration of the loop will matter. If you change the Mult function to accumulate instead, it no longer can do that. Still, it then takes only 6 ms to execute, which is much faster than the C# version.)

So the Java optimiser is somewhat better than the C# optimiser.
Of course I would have expected pre-compiled code in either of these languages to run in 0 ms for the version with the accumulation as well, so these numbers suggest that optimisation techniques (for procedural languages like these) have deteriorated badly in the last four decades, or everything is doing recompilation all the time, or people have decide that some of the possible optimisations are rather pointless.
And it seems pretty clear that the C# optimisation is really bad.


Tom
Post #1363315
Posted Tuesday, September 25, 2012 9:53 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Thursday, June 5, 2014 10:54 AM
Points: 9,902, Visits: 9,480
SQL Kiwi (9/23/2012)
Removed. (I quoted Jeff partially quoting Barry without reading all the way back to see that Barry had already referenced the document I wanted to mention).


Hey! I got something write!


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #1364414
Posted Tuesday, September 25, 2012 9:55 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Thursday, June 5, 2014 10:54 AM
Points: 9,902, Visits: 9,480
RBarryYoung (9/25/2012)
SQL Kiwi (9/23/2012)
Removed. (I quoted Jeff partially quoting Barry without reading all the way back to see that Barry had already referenced the document I wanted to mention).


Hey! I got something write right!


(Doggone it! )


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #1364416
Posted Thursday, January 17, 2013 9:47 PM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Sunday, July 20, 2014 5:23 PM
Points: 4,576, Visits: 8,341
Jeff Moden (8/5/2012)
mmilodragovich (8/1/2012)
Thanks for the detailed and METHODICAL analysis. I once was troubleshooting a performance problem in a large, complicated application and it came down to a "set quoted identifer on" statement in a scalar function. The statement was not needed, was probably generated by a developer who used a right-click-on-object to generate the "create" statement. The function was being called massive number of times so even the slightest performance difference was hugely magnified. I've been wary ever since but know that knowledge is power, so thanks for the article.


You're welcome and thanks for the feedback. I've had similar not-easy-to-explain things happen in this wonderful world of T-SQL.


Actually, it's not so hard to explain.
We all forget time to time that SQL is an interpreter, not a compiler.
Therefore any piece if code is at least re-evaluated (worst case scenario - recompiled) every time it's invoked.

Scalar function is applied in the final recordset.
One time for an each entry a returned row.

If you try to trace a query with a scalar functuion in it you'll see this:

SP:Starting	SELECT @Result = dbo.TimesTwo(SomeNumber) FROM #TestTable ;
SP:StmtStarting RETURN @SomeINT * 2
SP:StmtCompleted RETURN @SomeINT * 2
SP:Completed SELECT @Result = dbo.TimesTwo(SomeNumber) FROM #TestTable ;

repeated as many times as number of rows returned by the query.

At the same time, table function (parametrised view) in JOIN statement is invoked ONCE - so reavaluated (recompiled) once.

That's where the performance difference comes from.
SQL Server is just being busy creatind-recreating the scalar function in memory, that's all.

In real life it should not make much difference, unless your system (querying) design is really stupid.
There is no way a user wants to see more than a hundred rows on UI.
Therefore you should never have million rows scenario in real-life application.
And if you apply a WHERE clause to all those queries leaving under 1000 rows in the final recordsets you'll see that that difference is really minor, less than 5%.
Not worth any attention, really.
Same case as explicitly dropping #tables in a SP code - if you do a test run on only create/drop operations in a SP and execute it 10000 than overheade be really noticable.
But in a real life procedures the impact will be totally unnoticable. Well, there will a negative impact on performance when you explicitely drop a # table at the end of a SP, but it's totally forgivable for sake of one's style (not my style! )

In real life you should not have millin-row SELECT statements, so overhead of a scalar function can be ignored, if it's more comfortable to code.
Unless you're doing some kind of Data Transformation task, but even then - it's one-off, and even a big overhead can be tolarated.

Views used for reporting - yes, that's where you should be concerned.
And not using UDF's in SELECT part of the queries must be a mandatory rule.
Post #1408708
Posted Thursday, January 17, 2013 11:04 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:18 AM
Points: 36,751, Visits: 31,202
Ah, Sergiy... my ol' friend. Long time no see. Thank you for stopping by and thanks for the great explanations.

Yeah... I agree... most folks using front end code aren't going to notice any difference between using a Scalar UDF or an iTVF. Chances are, many won't even be able to measure the differences because of the extremely low rowcounts especially since I showed them why not to use SET STATISTICS to measure Scalar UDF performance.

I do try to avoid Scalar UDFs, though, because most of the stuff I do is on staging tables where the cumulative times of multiple functions across many columns does start to make a difference especially when loading hundreds of large files in such a manner. 5% performance differences can really start to stack up in batch code.

For GUI code, I write it as if it were large volume batch code because so many people will "leverage" any code they can find in a schedule pinch for their own batch code. If the iTVFs turn out to be a bit more complicated than some will understand, a well placed comment here and there in the code takes care of the problem. If not, then they probably shouldn't be trying to modify the code to begin with.

Of course, it's still possible to write crap code in an iTVF. iTVFs aren't a panancea.



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1408721
Posted Tuesday, January 22, 2013 7:56 PM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Sunday, July 20, 2014 5:23 PM
Points: 4,576, Visits: 8,341
Can't believe it.
Turn out the SQL2000 solution is not that straight forward.
Or, should I say, is very limited.

If we change the function a little bit:
CREATE FUNCTION dbo.TimesN
(@SomeINT int, @Multiplier int)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN SELECT Doubled = @SomeINT * @Multiplier
GO

this still works fine:
SELECT * FROM dbo.TimesN(10, 2)

Returns 20 as expected.

But if you try to use it against a table :
DECLARE @Result INT

--===== Begin measuring duration using GETDATE
DECLARE @StartTime DATETIME;
SELECT @StartTime = GETDATE()
--===== Test the code using the "throw-away" variable
SELECT @Result = (SELECT Doubled FROM dbo.TimesN(SomeNumber, 2))
FROM #TestTable
--===== "Stop the timer" and report the duration
PRINT DATEDIFF(ms,@StartTime,GETDATE())

You are getting this:

Msg 155, Level 15, State 1, Line 7
'SomeNumber' is not a recognized OPTIMIZER LOCK HINTS option.

Does not like 2 or more parameters, and that's it.

Same code works perfectly in SQL2008.
Post #1410312
Posted Tuesday, June 24, 2014 7:50 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 6:39 AM
Points: 2,751, Visits: 7,166
Hi Jeff,

I know this is an old article reprinted and you might not see this, but I thought I'd ask anyway.

If you're using a Scalar UDF to define a Computed Column, can you still use a iTVF?


--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
Post #1585513
« Prev Topic | Next Topic »

Add to briefcase «««678910»»

Permissions Expand / Collapse