SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Scalar UDF Performance Issue


Scalar UDF Performance Issue

Author
Message
Bam Fisk
Bam Fisk
Grasshopper
Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)

Group: General Forum Members
Points: 19 Visits: 41
I'm trying to tidy up a SPROC by wrapping a CASE statement up in a function and I'm geting a huge performance hit. If I leave the CASE statement in the base script it runs in a second, I move it to a function and it takes over a minute. Below is how I'm handling the case portion inline and using the function as well as the function itself. What am I missing?

TIA

INLINE SNIPPET:


"StaffType" = CASE
WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'C' AND REG_TEMP = 'R' THEN 'Contractor'
WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'C' AND REG_TEMP = 'T' THEN 'Contractor'
WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'G' AND REG_TEMP = 'T' THEN 'Contractor'
WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'T' AND REG_TEMP = 'T' THEN 'Contractor'
WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'V' AND REG_TEMP = 'T' THEN 'Vendor'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = '' AND REG_TEMP = 'R' THEN 'Employee'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = '' AND REG_TEMP = 'T' THEN 'Employee'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'C' AND REG_TEMP = 'R' THEN 'Employee'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'C' AND REG_TEMP = 'T' THEN 'Employee'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'G' AND REG_TEMP = 'T' THEN 'Employee'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'I' AND REG_TEMP = 'T' THEN 'Employee'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'T' AND REG_TEMP = 'T' THEN 'Employee'
ELSE 'ERROR'
END




FUNCTION CALL VERSION SNIPPET:


, "StaffType" = (SELECT dbo.fn_StaffTypeMap(p.EMPLID))



FUNCTION:

USE [HRData]
GO
/****** Object: UserDefinedFunction [dbo].[fn_StaffTypeMap] Script Date: 06/17/2009 13:00:31 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

/*

AUTHOR: Bill King
CREATE DATE: 6/16/2009
DESCRIPTION: Used to convert the PER_ORG/Empl_Class/Reg_Temp combos to
a more user friendly format.

COMMENTS:


SAMPLE EXECUTION: SELECT dbo.fn_StaffTypeMap ('16011')

*/

CREATE FUNCTION [dbo].[fn_StaffTypeMap]
(
@EmplID CHAR(5)
)

RETURNS VARCHAR(20)
AS
BEGIN

DECLARE @MappedValue AS VARCHAR(20)
SELECT @MappedValue = (
SELECT CASE
WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'C' AND REG_TEMP = 'R' THEN 'Contractor'
WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'C' AND REG_TEMP = 'T' THEN 'Contractor'
WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'G' AND REG_TEMP = 'T' THEN 'Contractor'
WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'T' AND REG_TEMP = 'T' THEN 'Contractor'
WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'V' AND REG_TEMP = 'T' THEN 'Vendor'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = '' AND REG_TEMP = 'R' THEN 'Employee'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = '' AND REG_TEMP = 'T' THEN 'Employee'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'C' AND REG_TEMP = 'R' THEN 'Employee'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'C' AND REG_TEMP = 'T' THEN 'Employee'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'G' AND REG_TEMP = 'T' THEN 'Employee'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'I' AND REG_TEMP = 'T' THEN 'Employee'
WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'T' AND REG_TEMP = 'T' THEN 'Employee'
ELSE 'ERROR'
END
FROM PSPerson
WHERE EmplID = @EmplID

)
RETURN (@MappedValue)
END










Piotr.Rodak
Piotr.Rodak
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2830 Visits: 1761
Problem with scalar functions used this way - where you pass a column as a parameter - is that they are not inlined by the query compiler. This means they are executed for each row and this is the main performance issue.
There is a trick you can do - instead of scalar function you can write a table valued function that returns 1 row. Query optimizer is then wise enough to choose different execution plan which allows to avoid the performance hit.
So the function would look like this:


CREATE FUNCTION [dbo].[fn_StaffTypeMap]
(
@EmplID CHAR(5)
)

RETURNS TABLE
AS
RETURN SELECT convert(varchar(20), [your CASE statement]) as res;
GO





HTH

Piotr

...and your only reply is slàinte mhath
Carl Federl
Carl Federl
SSCertifiable
SSCertifiable (6.9K reputation)SSCertifiable (6.9K reputation)SSCertifiable (6.9K reputation)SSCertifiable (6.9K reputation)SSCertifiable (6.9K reputation)SSCertifiable (6.9K reputation)SSCertifiable (6.9K reputation)SSCertifiable (6.9K reputation)

Group: General Forum Members
Points: 6912 Visits: 4352
Piotr is quite correct that the function is executed for each row and the start-up and shutdown cost of each function execution is very expensive including such processing as memory allocation and deallocation.

If the objective is not to hardcode the mapping in all of the SQL, a better solution is to have an auxiliary table:
CREATE TABLE StaffTypeMap
( PER_ORG
, EMPL_CLASS
, REG_TEMP
, StaffType
, PRIMARY KEY ( PER_ORG , EMPL_CLASS, REG_TEMP )
)


THEN your SQL is on the pattern of:
SELECT COALESCE(StaffTypeMap.StaffType, 'UNKNOWN')
FROM X
LEFT OUTER JOIN
StaffTypeMap
ON StaffTypeMap.PER_ORG = X.PER_ORG
AND StaffTypeMap.EMPL_CLASS = X.EMPL_CLASS
AND StaffTypeMap.REG_TEMP = X.REG_TEMP



SQL = Scarcely Qualifies as a Language
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)

Group: General Forum Members
Points: 35784 Visits: 11361
Yet other options include:

1. Creating a view (possibly indexed) over PSPerson and adding a column which contains the CASE.
2. Adding a computed column for the CASE to the PSPerson table, and preferably persisting it.

If you wish, you can encapsulate the CASE (without the data access) in a function and use that in the persisted computed column definition. An example of this idea follows:


USE tempdb
GO
CREATE FUNCTION dbo.f (@p1 INT, @p2 INT)
RETURNS INT
WITH SCHEMABINDING, RETURNS NULL ON NULL INPUT
AS BEGIN RETURN CASE WHEN @p1 = 1 AND @p2 = 1 THEN 0 WHEN @p1 = 1 AND @p2 = 2 THEN 1 WHEN @p1 = 2 AND @p2 = 1 THEN 2 ELSE 3 END END
GO
CREATE TABLE dbo.Demo (A INT NOT NULL, B INT NOT NULL, C AS dbo.f (A, B) PERSISTED);
GO
INSERT dbo.Demo (A, B)
SELECT TOP (25000)
CONVERT(INT,RAND(CHECKSUM(NEWID())) * 2 + 1),
CONVERT(INT,RAND(CHECKSUM(NEWID())) * 2 + 1)
FROM master.sys.columns c1
CROSS
JOIN master.sys.columns c2
GO
SELECT *
FROM dbo.Demo;
GO
DROP TABLE dbo.Demo;
DROP FUNCTION dbo.f;



Paul



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Bam Fisk
Bam Fisk
Grasshopper
Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)

Group: General Forum Members
Points: 19 Visits: 41
Thanks all. Much appreciated!



tfifield
tfifield
SSCrazy
SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)SSCrazy (2.5K reputation)

Group: General Forum Members
Points: 2505 Visits: 2890
Paul,
I've kind of forgotten what the advantage to SCHEMABINGING is for a scalar UDF. I think I read it somewhere, but I can't find it in my SQL saved SQL docs. What does this do exactly?
Todd Fifield
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)

Group: General Forum Members
Points: 35784 Visits: 11361
tfifield (6/20/2009)
Paul,
I've kind of forgotten what the advantage to SCHEMABINGING is for a scalar UDF. I think I read it somewhere, but I can't find it in my SQL saved SQL docs. What does this do exactly?
Todd Fifield

Take a look at this: http://blogs.msdn.com/sqlprogrammability/archive/2006/05/12/596424.aspx

Cheers,

Paul



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)SSC Guru (216K reputation)

Group: General Forum Members
Points: 216168 Visits: 41986
Nice article and great explanation in the article... too bad they didn't actually test it for performance... ;-) Both of the following UDF's render identical execution plans and they both take turns winning on my desktop. That also means that compared to the inline code, they both make some nasty sucking sounds. :-P I got the function code below from the article.


--===== Create a scalar function with SchemaBinding 
 CREATE FUNCTION dbo.ComputeNumSB(@i INT)
RETURNS INT
   WITH SCHEMABINDING
  BEGIN
        RETURN @i * 2 + 50
    END
GO
--===== Create a scalar function without SchemaBinding
 CREATE FUNCTION dbo.ComputeNum(@i INT)
RETURNS INT
  BEGIN
        RETURN @i * 2 + 50
    END
GO
--===== Create and populate the Tally table on the fly
 SELECT TOP 1000000 --equates to more than 30 years of dates
        IDENTITY(INT,1,1) AS N
   INTO dbo.Tally
   FROM Master.dbo.SysColumns sc1,
        Master.dbo.SysColumns sc2

--===== Add a Primary Key to maximize performance
  ALTER TABLE dbo.Tally
    ADD CONSTRAINT PK_Tally_N 
        PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

--===== Allow the general public to use it
  GRANT SELECT ON dbo.Tally TO PUBLIC
GO
--===== Test more than once... the functions take turns winning.
     -- The inline code ALWAYS blows the functions away.
DECLARE @BitBucket INT
DECLARE @TestSize  INT
 SELECT @TestSize = 100000

SET STATISTICS TIME ON
 SELECT @BitBucket = dbo.ComputeNumSB(t.N)
   FROM dbo.Tally t
  WHERE t.N <= @TestSize

 SELECT @BitBucket = dbo.ComputeNum(t.N)
   FROM dbo.Tally t
  WHERE t.N <= @TestSize

 SELECT @BitBucket = t.N * 2 + 50
   FROM dbo.Tally t
  WHERE t.N <= @TestSize
SET STATISTICS TIME OFF
PRINT REPLICATE('=',80)
GO 6



For those that don't know what a Tally table is, now would be a good time to learn. Please see the following article...
http://www.sqlservercentral.com/articles/T-SQL/62867/

I agree with most of the other's said here... a UDF is probably not the way to go here. An aux table or a bit of inline code is the way to go. Here's the run results from the code above...

Beginning execution loop

SQL Server Execution Times:
CPU time = 8625 ms, elapsed time = 16175 ms.

SQL Server Execution Times:
CPU time = 8922 ms, elapsed time = 16226 ms.

SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 78 ms.
================================================================================

SQL Server Execution Times:
CPU time = 8704 ms, elapsed time = 16103 ms.

SQL Server Execution Times:
CPU time = 8875 ms, elapsed time = 16358 ms.

SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 76 ms.
================================================================================

SQL Server Execution Times:
CPU time = 8203 ms, elapsed time = 16051 ms.

SQL Server Execution Times:
CPU time = 8406 ms, elapsed time = 16075 ms.

SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 94 ms.
================================================================================

SQL Server Execution Times:
CPU time = 8922 ms, elapsed time = 15994 ms.

SQL Server Execution Times:
CPU time = 8922 ms, elapsed time = 19858 ms.

SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 81 ms.
================================================================================

SQL Server Execution Times:
CPU time = 8203 ms, elapsed time = 17882 ms.

SQL Server Execution Times:
CPU time = 8844 ms, elapsed time = 19245 ms.

SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 86 ms.
================================================================================

SQL Server Execution Times:
CPU time = 8297 ms, elapsed time = 16888 ms.

SQL Server Execution Times:
CPU time = 8656 ms, elapsed time = 18983 ms.

SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 79 ms.
================================================================================
Batch execution completed 6 times.


--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)

Group: General Forum Members
Points: 35784 Visits: 11361
Jeff Moden (6/20/2009)
Nice article and great explanation in the article... too bad they didn't actually test it for performance... ;-) Both of the following UDF's render identical execution plans and they both take turns winning on my desktop. That also means that compared to the inline code, they both make some nasty sucking sounds. :-P I got the function code below from the article.[/font]

Hey Jeff,

Short answer: Try it with an UPDATE statement as the article does ;-) w00t

Long answer in progress :-)

Paul



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)SSC-Dedicated (35K reputation)

Group: General Forum Members
Points: 35784 Visits: 11361
Long answer:

SCHEMABINDING performs no magic with simple integer math - so we don't expect a difference with the SELECTs in your example code.

Where it does make a huge difference, is when the query optimizer (QO) must consider whether the function is deterministic or not. The QO has many fewer options when it encounters a non-deterministic function.

When marked as schema-bound, the engine analyzes the content of the function to decide on whether it is deterministic or not.

A scalar function which is not marked with SCHEMABINDING it is marked as doing data access - without analyzing the content of the function at all. This not only prevents it from be evaluated with deterministic optimizations, it also means that things like halloween protection become necessary in UPDATE statements, where they would not be otherwise. (This is the specific example in the article I linked to.)

The script below illustrates the dramatic differences that often appear.

USE tempdb;
GO
-- Conditional drops
IF OBJECT_ID(N'dbo.ComputeNum', N'FN') IS NOT NULL DROP FUNCTION dbo.ComputeNum;
IF OBJECT_ID(N'dbo.ComputeNumSB', N'FN') IS NOT NULL DROP FUNCTION dbo.ComputeNumSB;
IF OBJECT_ID(N'dbo.TestTable', N'U') IS NOT NULL DROP TABLE dbo.TestTable;
GO
-- Schema-bound (will be evaluated for determinism)
CREATE FUNCTION dbo.ComputeNumSB (@i INT) RETURNS INT WITH SCHEMABINDING AS BEGIN RETURN @i * 2 + 50 END;
GO
-- Not schema-bound, will be marked as doing data access
CREATE FUNCTION dbo.ComputeNum (@i INT) RETURNS INT AS BEGIN RETURN @i * 2 + 50 END;
GO
-- Create a test table which we can update
CREATE TABLE dbo.TestTable (A INT IDENTITY PRIMARY KEY, B INT NOT NULL)
GO
INSERT dbo.TestTable (B)
SELECT TOP (100000)
CONVERT(INT, 123456 * RAND(CHECKSUM(NEWID())))
FROM master.sys.columns C1
CROSS
JOIN master.sys.columns C2
GO
-- Check the properties of the functions
/*
SELECT OBJECTPROPERTYEX(OBJECT_ID(N'dbo.ComputeNum', N'FN'), 'IsDeterministic') -- 0 = No
SELECT OBJECTPROPERTYEX(OBJECT_ID(N'dbo.ComputeNum', N'FN'), 'SystemDataAccess') -- 1 = Yes
SELECT OBJECTPROPERTYEX(OBJECT_ID(N'dbo.ComputeNum', N'FN'), 'UserDataAccess') -- 1 = Yes

SELECT OBJECTPROPERTYEX(OBJECT_ID(N'dbo.ComputeNumSB', N'FN'), 'IsDeterministic') -- 1 = Yes
SELECT OBJECTPROPERTYEX(OBJECT_ID(N'dbo.ComputeNumSB', N'FN'), 'SystemDataAccess') -- 0 = No
SELECT OBJECTPROPERTYEX(OBJECT_ID(N'dbo.ComputeNumSB', N'FN'), 'UserDataAccess') -- 0 = No
*/
GO
--
-- TESTS
--
DECLARE @TestSize INT;
SELECT @TestSize = 100000;

SET STATISTICS IO ON;

-- Non-schema-bound version (compare the plans!)
BEGIN TRANSACTION
UPDATE dbo.TestTable
SET B = dbo.ComputeNum(B)
WHERE A BETWEEN 1 AND @TestSize
ROLLBACK

--Table 'TestTable'. Scan count 1, logical reads 200213, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Table 'Worktable'. Scan count 1, logical reads 201856, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

-- Schema-bound
BEGIN TRANSACTION
UPDATE dbo.TestTable
SET B = dbo.ComputeNumSB(B)
WHERE A BETWEEN 1 AND @TestSize
ROLLBACK

--Table 'TestTable'. Scan count 1, logical reads 213, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SET STATISTICS IO OFF;
GO
IF OBJECT_ID(N'dbo.ComputeNum', N'FN') IS NOT NULL DROP FUNCTION dbo.ComputeNum;
IF OBJECT_ID(N'dbo.ComputeNumSB', N'FN') IS NOT NULL DROP FUNCTION dbo.ComputeNumSB;
IF OBJECT_ID(N'dbo.TestTable', N'U') IS NOT NULL DROP TABLE dbo.TestTable;




Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search