Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

How to Make Scalar UDFs Run Faster (SQL Spackle) Expand / Collapse
Author
Message
Posted Wednesday, August 01, 2012 8:08 AM


Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, March 11, 2014 7:55 AM
Points: 24, Visits: 91
Nice article. Very informative.

Interestingly enough, I just found out a couple of months ago in reading the articles on set-based programming versus procedural programming on Code Project http://www.codeproject.com/Articles/34142/Understanding-Set-based-and-Procedural-approaches and Subho on .Net http://www.shubho.net/2009/03/understanding-set-based-and-procedural_09.html that the use of a function in a select clause results in RBAR processing. They pointed that in a million row table you would be calling the function a million times. It makes perfect sense, though I had never thought about it that way, even while avoiding Cursors like the plague. They recommended that the use of correllated subqueries and JOINS and even While loops would avoid this performance hit. Now I see that using a table valued function with a cross apply will work as well. So I will be adding this to my list of tools.

Thank you also for pointing out the issues with set statistics time ON. I will be certain not to use it from now on.
Post #1338560
Posted Wednesday, August 01, 2012 9:09 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, April 03, 2014 10:50 AM
Points: 63, Visits: 359
Awesome article, Jeff, thanks! There are so many great takeaways here. Very interesting comments in the forum so far as well; I'm looking forward to following this one.

G Bryant McClellan (8/1/2012)
While every testing method negatively affects the performance of the thing being tested by making the tester part of the experiment, some are obviously worse than others. I'd be interested to hear from some of the SQL Internals authors if they have some insight into how SET STATISTICS TIME ON works and how its use may or may not affect other situations where it is used. Paul? Kalen? Are you thinking about this yet?


I'd like to echo this. I'm really interested in better understanding what's happening here with SET STATISTICS TIME ON, and when it should/shouldn't be used. That could be a spackle article in itself.
Post #1338611
Posted Wednesday, August 01, 2012 9:31 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 8:21 AM
Points: 310, Visits: 576
Excellent article. Best read of the week. :)
Post #1338628
Posted Wednesday, August 01, 2012 9:33 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Friday, March 28, 2014 2:25 PM
Points: 9,902, Visits: 9,479
As always, Great Article Jeff!

You know, I think that I noted this basic problem with SET STATISTICS (that it adds a per-statement overhead) way back in my "15 Ways To Lose Your Cursors Article, Part 2", making it inappropriate for measuring Cursor performance, but it never occurred to me that the same problem would apply to measuring UDF's. Duh!!!

Anyway, great catch on that Jeff.

However, there is a mysterious coda to your background on scalar UDF's. It turns out that there really is an Inline Scalar UDF object type in SQL Server, as described here in the 2005 documentation. It says:

User-defined scalar functions return a single data value of the type defined in the RETURNS clause. For an inline scalar function, there is no function body; the scalar value is the result of a single statement.

(emphasis mine)

You can confirm this with the following query:
SELECT name
FROM master..spt_values
WHERE type = 'O9T'
AND name LIKE '%function%'

So as far back as SQL 2005, they actually had the object-type for it, however, AFAIK, none exist (not even in the hidden [mssqlsystemresource] database) and there's no syntax to create one. So it appears that this is something that Microsoft must have anticipated in the run-up to SQL Server 2005 by adding a type for it (and doc!), but never actually implemented for some reason. Even odder, they never took it out of the doc, it's still there in the 2012 version.(!)

Though it is one of the single most requested features for all of Ms Sql Server. Primarily because the default UDF's are still slower (even with Jeff's measurement corrections) and we end up having to back-end ITVF's to get the same effect. (clumsy, but it works).


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #1338631
Posted Wednesday, August 01, 2012 9:36 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: 2 days ago @ 1:34 PM
Points: 15,442, Visits: 9,588
Here's a function I tested, and the test I used:

USE [ProofOfConcept]
GO
/****** Object: UserDefinedFunction [dbo].[InitialCapTest1] Script Date: 08/01/2012 11:20:42 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[InitialCapTest1] (@String VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @Output VARCHAR(8000);

WITH Seeds(Seed)
AS (SELECT Val
FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS Vals (Val)),
Numbers(Number)
AS (SELECT TOP (8000)
ROW_NUMBER() OVER (ORDER BY S10.Seed)
FROM Seeds AS S10
CROSS JOIN Seeds AS S100
CROSS JOIN Seeds AS S1000
CROSS JOIN Seeds AS S10000)
SELECT @Output = ( SELECT CASE WHEN SUBSTRING(@String, Number - 1, 1) NOT LIKE '[a-z]'
THEN UPPER(SUBSTRING(@String, Number, 1))
ELSE LOWER(SUBSTRING(@String, Number, 1))
END
FROM Numbers
WHERE Number <= LEN(@String)
ORDER BY Number
FOR XML PATH(''),
TYPE).value('.[1]', 'varchar(8000)');

RETURN @Output;
END;

I created that, and the function referenced in the article, then populated a test environment as follows:

CREATE TABLE ##T
(ID INT IDENTITY
PRIMARY KEY,
Col1 VARCHAR(8000));

INSERT INTO ##T
(Col1)
SELECT 'Scalar UDFs are generally accepted as one of the worst things you can do performance-wise in T-SQL. However, they may have earned a reputation that they simply don''t deserve because a common method of measuring performance (SET STATISTICS TIME ON) injects a major performance problem of its own when used to measure the performance of Scalar UDFs.

That, notwithstanding, some Scalar UDFs really are a lot slower than running direct code but direct code means there is no encapsulation and may not be as easy, quick, or consistent to support development efforts. Even Scalar UDFs have the advantage of getting everyone to do things the same way. Of course, everyone doing all the same thing the slow way isn''t an advantage at all.

In some cases and if the Scalar UDF code can be written as a single query, there can be a highly significant performance gain by converting (or writing originally) the Scalar UDF as an "Inline Scalar Function" (iSF), which is the same as writing an "Inline Table Valued Function" (iTVF) that returns only one value.

Doing so is not a performance panacea, though. Although it hasn''t been proven within the confines of this article, it''s a well known fact that "all in one query" solutions are frequently performance problems themselves. To be sure of any benefit in converting Scalar UDFs to iSFs, you must test and it must be with a lot of rows to really do it right.

Last and certainly not least, understand that anything you measure will be changed especially when it comes to SET STATISTICS TIME ON. Although I''ll personally continue to use it on forums to conveniently show performance differences between coding methods, I''ll do so only after I''ve made absolutely sure that it''s not injecting major but artificial performance problems itself. '
FROM dbo.Numbers;

(You may recognize the text.) I wanted to test on a reasonable number of rows, on bigger strings. Perhaps a weird test, but it's what I wanted to check. 10,001 rows in ##T from that based on my Numbers table.

The tests:

DECLARE @T TABLE (Col1 VARCHAR(8000));

DECLARE @Start DATETIME = GETDATE();

INSERT INTO @T
(Col1)
SELECT dbo.InitialCap(Col1)
FROM ##T;

SELECT DATEDIFF(millisecond, @Start, GETDATE());

DECLARE @T TABLE (Col1 VARCHAR(8000));

DECLARE @Start DATETIME = GETDATE();

INSERT INTO @T
(Col1)
SELECT dbo.InitialCapTest1(Col1)
FROM ##T;

SELECT DATEDIFF(millisecond, @Start, GETDATE());

InitialCap runtime, per DateDiff = 182,526 ms
InitialCapTest1 = 21,673

"Single-query" version, even without being iSF, is about 9X faster, per this test.

Tested again, with "Bill O'Malley" as the string, 10k rows.

TRUNCATE TABLE ##T;

INSERT INTO ##T (Col1)
SELECT 'Bill O''Malley'
FROM dbo.Numbers;

(I picked that string, because InitialCaps handled it "incorrectly". It's per-spec, but not "human right".)

Same tests as above.

InitialCaps runtime = 100 ms
InitialCapsTest1 runtime = 9836 ms

So, on a short string, the InitialCaps version is much faster, while on a long string, the InitialCapsTest1 version is much faster.

Summary: Know your data patterns. Picking which function to use depends a lot on what you'll be running it on, not just on "one test said".

Post-script: Name-case coding will be wrong much of the time. Bill O'Malley will likely end up as Bill O'malley, or Bob's will end up as Bob'S, unless you code some seriously smart algorithms into it. Any code that handles MacDonald correctly will probably mangle Macy or Macey, and vice-versa. The more "correct" you make it, the more complex it will be, and the slower it will run, and (likely) the buggier it will get.

Definitely an interesting point about checking timing with Stats On. Traces are usually better, in my experience, and I've used the GetDate() trick for years. The GetDate() one is especially useful if you need to test multiple statements in a single script.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1338634
Posted Wednesday, August 01, 2012 11:19 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, March 28, 2014 4:24 AM
Points: 49, Visits: 397
Another great article, Jeff. Thanks! I had no idea that an inline table-valued function could be so much faster than an scalar UDF.

I confirmed the "times two" results on my machine, measuring elapsed time only:

1 Million rows:
No function 213 ms
Scalar UDF 2063 ms
Inline Table-Valued UDF 183 ms
Table-Valued UDF 85620 ms

10 Million rows:
No function 1980 ms
Scalar UDF 20760 ms
Inline Table-Valued UDF 1913 ms
Table-Valued UDF still running after 8 minutes

This was also a nice reminder that inline table-valued functions are MUCH faster than a table-valued function.

Post #1338697
Posted Wednesday, August 01, 2012 11:27 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, April 04, 2014 5:19 PM
Points: 265, Visits: 2,538
Thanks! I learned a couple things here.

So well written. Thanks for taking the time to write this article in a way so understandable and sure to educate.
Post #1338704
Posted Wednesday, August 01, 2012 12:20 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, September 25, 2012 3:50 PM
Points: 3, Visits: 26
It is irritating how big the gap is between the performance here and high-performance code, though. Why should we have to be impressed that multiplying a million numbers takes 800 ms? A simplistic implementation of the same process, applying a function call, in C# clocks in at 0.8 ms on my old laptop. That is 1000x faster. Obviously, there are reasons for some slowdown, but this magnitude is hard to accept.

Also skipped here is simply using a CLR-implemented function. Something like:

[Microsoft.SqlServer.Server.SqlFunction(
DataAccess = DataAccessKind.None, IsDeterministic = true,
IsPrecise = false, SystemDataAccess = SystemDataAccessKind.None
)]
public static SqlInt32 Mult2(SqlInt32 n)
{
return n * 2;
}

This gives much better performance.

My measurements on a modern machine, running SQL Server 2012:

Baseline (questionable due to optimization, but...):
414 ms.
150 ms - w/o SET STATISTICS

TSQL Function:
2403 ms.
1653 ms - w/o SET STATISTICS

iSF Function:
263 ms.
203 ms - w/o SET STATISTICS

CLR Function:
349 ms.
346 ms - w/o SET STATISTICS

While the iSF beats the CLR function, the CLR version can be used just like the normal TSQL one.

The optimizer could be ruining all of these measurements, of course.

And as I started with, all of this is orders of magnitude away from simple code for the same task.
Post #1338740
Posted Wednesday, August 01, 2012 12:57 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, March 24, 2014 11:32 AM
Points: 20, Visits: 236
Nicely done. Something to keep in mind when testing.
Post #1338767
Posted Wednesday, August 01, 2012 1:17 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Thursday, March 13, 2014 9:27 PM
Points: 3,283, Visits: 6,670
Great article as usual Jeff. I did not know about the Set Statistics. Thx. I will keep this in mind when I have to look at performance.


-Roy
Post #1338779
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse