RE: How to Make Scalar UDFs Run Faster (SQL Spackle)

GSquared

SSC Guru

Points: 260824

August 1, 2012 at 9:36 am

#1520360

Here's a function I tested, and the test I used:

USE [ProofOfConcept]

GO

/****** Object: UserDefinedFunction [dbo].[InitialCapTest1] Script Date: 08/01/2012 11:20:42 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

ALTER FUNCTION [dbo].[InitialCapTest1] (@String VARCHAR(8000))

RETURNS VARCHAR(8000)

AS

BEGIN

DECLARE @Output VARCHAR(8000);

WITH Seeds(Seed)

AS (SELECT Val

FROM ( VALUES ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1), ( 1) ) AS Vals (Val)),

Numbers(Number)

AS (SELECT TOP (8000)

ROW_NUMBER() OVER (ORDER BY S10.Seed)

FROM Seeds AS S10

CROSS JOIN Seeds AS S100

CROSS JOIN Seeds AS S1000

CROSS JOIN Seeds AS S10000)

SELECT @Output = ( SELECT CASE WHEN SUBSTRING(@String, Number - 1, 1) NOT LIKE '[a-z]'

THEN UPPER(SUBSTRING(@String, Number, 1))

ELSE LOWER(SUBSTRING(@String, Number, 1))

END

FROM Numbers

WHERE Number <= LEN(@String)

ORDER BY Number

FOR XML PATH(''),

TYPE).value('.[1]', 'varchar(8000)');

RETURN @Output;

END;

I created that, and the function referenced in the article, then populated a test environment as follows:

CREATE TABLE ##T

(ID INT IDENTITY

PRIMARY KEY,

Col1 VARCHAR(8000));

INSERT INTO ##T

(Col1)

SELECT 'Scalar UDFs are generally accepted as one of the worst things you can do performance-wise in T-SQL. However, they may have earned a reputation that they simply don''t deserve because a common method of measuring performance (SET STATISTICS TIME ON) injects a major performance problem of its own when used to measure the performance of Scalar UDFs.

That, notwithstanding, some Scalar UDFs really are a lot slower than running direct code but direct code means there is no encapsulation and may not be as easy, quick, or consistent to support development efforts. Even Scalar UDFs have the advantage of getting everyone to do things the same way. Of course, everyone doing all the same thing the slow way isn''t an advantage at all.

In some cases and if the Scalar UDF code can be written as a single query, there can be a highly significant performance gain by converting (or writing originally) the Scalar UDF as an "Inline Scalar Function" (iSF), which is the same as writing an "Inline Table Valued Function" (iTVF) that returns only one value.

Doing so is not a performance panacea, though. Although it hasn''t been proven within the confines of this article, it''s a well known fact that "all in one query" solutions are frequently performance problems themselves. To be sure of any benefit in converting Scalar UDFs to iSFs, you must test and it must be with a lot of rows to really do it right.

Last and certainly not least, understand that anything you measure will be changed especially when it comes to SET STATISTICS TIME ON. Although I''ll personally continue to use it on forums to conveniently show performance differences between coding methods, I''ll do so only after I''ve made absolutely sure that it''s not injecting major but artificial performance problems itself. '

FROM dbo.Numbers;

(You may recognize the text.) I wanted to test on a reasonable number of rows, on bigger strings. Perhaps a weird test, but it's what I wanted to check. 10,001 rows in ##T from that based on my Numbers table.

The tests:

DECLARE @T TABLE (Col1 VARCHAR(8000));

DECLARE @Start DATETIME = GETDATE();

INSERT INTO @T

(Col1)

SELECT dbo.InitialCap(Col1)

FROM ##T;

SELECT DATEDIFF(millisecond, @Start, GETDATE());

DECLARE @T TABLE (Col1 VARCHAR(8000));

DECLARE @Start DATETIME = GETDATE();

INSERT INTO @T

(Col1)

SELECT dbo.InitialCapTest1(Col1)

FROM ##T;

SELECT DATEDIFF(millisecond, @Start, GETDATE());

InitialCap runtime, per DateDiff = 182,526 ms

InitialCapTest1 = 21,673

"Single-query" version, even without being iSF, is about 9X faster, per this test.

Tested again, with "Bill O'Malley" as the string, 10k rows.

TRUNCATE TABLE ##T;

INSERT INTO ##T (Col1)

SELECT 'Bill O''Malley'

FROM dbo.Numbers;

(I picked that string, because InitialCaps handled it "incorrectly". It's per-spec, but not "human right".)

Same tests as above.

InitialCaps runtime = 100 ms

InitialCapsTest1 runtime = 9836 ms

So, on a short string, the InitialCaps version is much faster, while on a long string, the InitialCapsTest1 version is much faster.

Summary: Know your data patterns. Picking which function to use depends a lot on what you'll be running it on, not just on "one test said".

Post-script: Name-case coding will be wrong much of the time. Bill O'Malley will likely end up as Bill O'malley, or Bob's will end up as Bob'S, unless you code some seriously smart algorithms into it. Any code that handles MacDonald correctly will probably mangle Macy or Macey, and vice-versa. The more "correct" you make it, the more complex it will be, and the slower it will run, and (likely) the buggier it will get.

Definitely an interesting point about checking timing with Stats On. Traces are usually better, in my experience, and I've used the GetDate() trick for years. The GetDate() one is especially useful if you need to test multiple statements in a single script.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon