|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Thursday, June 10, 2010 10:22 AM
Points: 3,
Visits: 41
|
|
I'm trying to tidy up a SPROC by wrapping a CASE statement up in a function and I'm geting a huge performance hit. If I leave the CASE statement in the base script it runs in a second, I move it to a function and it takes over a minute. Below is how I'm handling the case portion inline and using the function as well as the function itself. What am I missing?
TIA
INLINE SNIPPET:
"StaffType" = CASE WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'C' AND REG_TEMP = 'R' THEN 'Contractor' WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'C' AND REG_TEMP = 'T' THEN 'Contractor' WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'G' AND REG_TEMP = 'T' THEN 'Contractor' WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'T' AND REG_TEMP = 'T' THEN 'Contractor' WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'V' AND REG_TEMP = 'T' THEN 'Vendor' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = '' AND REG_TEMP = 'R' THEN 'Employee' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = '' AND REG_TEMP = 'T' THEN 'Employee' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'C' AND REG_TEMP = 'R' THEN 'Employee' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'C' AND REG_TEMP = 'T' THEN 'Employee' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'G' AND REG_TEMP = 'T' THEN 'Employee' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'I' AND REG_TEMP = 'T' THEN 'Employee' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'T' AND REG_TEMP = 'T' THEN 'Employee' ELSE 'ERROR' END
FUNCTION CALL VERSION SNIPPET:
, "StaffType" = (SELECT dbo.fn_StaffTypeMap(p.EMPLID))
FUNCTION:
USE [HRData] GO /****** Object: UserDefinedFunction [dbo].[fn_StaffTypeMap] Script Date: 06/17/2009 13:00:31 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO
/*
AUTHOR: Bill King CREATE DATE: 6/16/2009 DESCRIPTION: Used to convert the PER_ORG/Empl_Class/Reg_Temp combos to a more user friendly format. COMMENTS:
SAMPLE EXECUTION: SELECT dbo.fn_StaffTypeMap ('16011')
*/
CREATE FUNCTION [dbo].[fn_StaffTypeMap] ( @EmplID CHAR(5) )
RETURNS VARCHAR(20) AS BEGIN DECLARE @MappedValue AS VARCHAR(20) SELECT @MappedValue = ( SELECT CASE WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'C' AND REG_TEMP = 'R' THEN 'Contractor' WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'C' AND REG_TEMP = 'T' THEN 'Contractor' WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'G' AND REG_TEMP = 'T' THEN 'Contractor' WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'T' AND REG_TEMP = 'T' THEN 'Contractor' WHEN PER_ORG = 'CWR' AND EMPL_CLASS = 'V' AND REG_TEMP = 'T' THEN 'Vendor' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = '' AND REG_TEMP = 'R' THEN 'Employee' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = '' AND REG_TEMP = 'T' THEN 'Employee' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'C' AND REG_TEMP = 'R' THEN 'Employee' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'C' AND REG_TEMP = 'T' THEN 'Employee' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'G' AND REG_TEMP = 'T' THEN 'Employee' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'I' AND REG_TEMP = 'T' THEN 'Employee' WHEN PER_ORG = 'EMP' AND EMPL_CLASS = 'T' AND REG_TEMP = 'T' THEN 'Employee' ELSE 'ERROR' END FROM PSPerson WHERE EmplID = @EmplID ) RETURN (@MappedValue) END
|
|
|
|
|
SSC Eights!
      
Group: General Forum Members
Last Login: Thursday, May 10, 2012 9:49 AM
Points: 800,
Visits: 1,759
|
|
Problem with scalar functions used this way - where you pass a column as a parameter - is that they are not inlined by the query compiler. This means they are executed for each row and this is the main performance issue. There is a trick you can do - instead of scalar function you can write a table valued function that returns 1 row. Query optimizer is then wise enough to choose different execution plan which allows to avoid the performance hit. So the function would look like this:
CREATE FUNCTION [dbo].[fn_StaffTypeMap] ( @EmplID CHAR(5) ) RETURNS TABLE AS RETURN SELECT convert(varchar(20), [your CASE statement]) as res; GO
HTH
Piotr
...and your only reply is slàinte mhath
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: 2 days ago @ 8:36 AM
Points: 2,224,
Visits: 4,094
|
|
Piotr is quite correct that the function is executed for each row and the start-up and shutdown cost of each function execution is very expensive including such processing as memory allocation and deallocation.
If the objective is not to hardcode the mapping in all of the SQL, a better solution is to have an auxiliary table:
CREATE TABLE StaffTypeMap ( PER_ORG , EMPL_CLASS , REG_TEMP , StaffType , PRIMARY KEY ( PER_ORG , EMPL_CLASS, REG_TEMP ) ) THEN your SQL is on the pattern of:
SELECT COALESCE(StaffTypeMap.StaffType, 'UNKNOWN') FROM X LEFT OUTER JOIN StaffTypeMap ON StaffTypeMap.PER_ORG = X.PER_ORG AND StaffTypeMap.EMPL_CLASS = X.EMPL_CLASS AND StaffTypeMap.REG_TEMP = X.REG_TEMP
SQL = Scarcely Qualifies as a Language
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 10:48 AM
Points: 10,990,
Visits: 10,578
|
|
Yet other options include:
1. Creating a view (possibly indexed) over PSPerson and adding a column which contains the CASE. 2. Adding a computed column for the CASE to the PSPerson table, and preferably persisting it.
If you wish, you can encapsulate the CASE (without the data access) in a function and use that in the persisted computed column definition. An example of this idea follows:
USE tempdb GO CREATE FUNCTION dbo.f (@p1 INT, @p2 INT) RETURNS INT WITH SCHEMABINDING, RETURNS NULL ON NULL INPUT AS BEGIN RETURN CASE WHEN @p1 = 1 AND @p2 = 1 THEN 0 WHEN @p1 = 1 AND @p2 = 2 THEN 1 WHEN @p1 = 2 AND @p2 = 1 THEN 2 ELSE 3 END END GO CREATE TABLE dbo.Demo (A INT NOT NULL, B INT NOT NULL, C AS dbo.f (A, B) PERSISTED); GO INSERT dbo.Demo (A, B) SELECT TOP (25000) CONVERT(INT,RAND(CHECKSUM(NEWID())) * 2 + 1), CONVERT(INT,RAND(CHECKSUM(NEWID())) * 2 + 1) FROM master.sys.columns c1 CROSS JOIN master.sys.columns c2 GO SELECT * FROM dbo.Demo; GO DROP TABLE dbo.Demo; DROP FUNCTION dbo.f;
Paul
Paul White SQL Server MVP SQLblog.com @SQL_Kiwi
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Thursday, June 10, 2010 10:22 AM
Points: 3,
Visits: 41
|
|
Thanks all. Much appreciated!
|
|
|
|
|
SSC Eights!
      
Group: General Forum Members
Last Login: Friday, April 05, 2013 8:10 PM
Points: 958,
Visits: 2,873
|
|
Paul, I've kind of forgotten what the advantage to SCHEMABINGING is for a scalar UDF. I think I read it somewhere, but I can't find it in my SQL saved SQL docs. What does this do exactly? Todd Fifield
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 10:48 AM
Points: 10,990,
Visits: 10,578
|
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 7:18 AM
Points: 33,112,
Visits: 27,038
|
|
Nice article and great explanation in the article... too bad they didn't actually test it for performance... Both of the following UDF's render identical execution plans and they both take turns winning on my desktop. That also means that compared to the inline code, they both make some nasty sucking sounds. I got the function code below from the article.
--===== Create a scalar function with SchemaBinding CREATE FUNCTION dbo.ComputeNumSB(@i INT) RETURNS INT WITH SCHEMABINDING BEGIN RETURN @i * 2 + 50 END GO --===== Create a scalar function without SchemaBinding CREATE FUNCTION dbo.ComputeNum(@i INT) RETURNS INT BEGIN RETURN @i * 2 + 50 END GO --===== Create and populate the Tally table on the fly SELECT TOP 1000000 --equates to more than 30 years of dates IDENTITY(INT,1,1) AS N INTO dbo.Tally FROM Master.dbo.SysColumns sc1, Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance ALTER TABLE dbo.Tally ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
--===== Allow the general public to use it GRANT SELECT ON dbo.Tally TO PUBLIC GO --===== Test more than once... the functions take turns winning. -- The inline code ALWAYS blows the functions away. DECLARE @BitBucket INT DECLARE @TestSize INT SELECT @TestSize = 100000
SET STATISTICS TIME ON SELECT @BitBucket = dbo.ComputeNumSB(t.N) FROM dbo.Tally t WHERE t.N <= @TestSize
SELECT @BitBucket = dbo.ComputeNum(t.N) FROM dbo.Tally t WHERE t.N <= @TestSize
SELECT @BitBucket = t.N * 2 + 50 FROM dbo.Tally t WHERE t.N <= @TestSize SET STATISTICS TIME OFF PRINT REPLICATE('=',80) GO 6
For those that don't know what a Tally table is, now would be a good time to learn. Please see the following article... http://www.sqlservercentral.com/articles/T-SQL/62867/
I agree with most of the other's said here... a UDF is probably not the way to go here. An aux table or a bit of inline code is the way to go. Here's the run results from the code above...
Beginning execution loop
SQL Server Execution Times: CPU time = 8625 ms, elapsed time = 16175 ms.
SQL Server Execution Times: CPU time = 8922 ms, elapsed time = 16226 ms.
SQL Server Execution Times: CPU time = 78 ms, elapsed time = 78 ms. ================================================================================
SQL Server Execution Times: CPU time = 8704 ms, elapsed time = 16103 ms.
SQL Server Execution Times: CPU time = 8875 ms, elapsed time = 16358 ms.
SQL Server Execution Times: CPU time = 78 ms, elapsed time = 76 ms. ================================================================================
SQL Server Execution Times: CPU time = 8203 ms, elapsed time = 16051 ms.
SQL Server Execution Times: CPU time = 8406 ms, elapsed time = 16075 ms.
SQL Server Execution Times: CPU time = 78 ms, elapsed time = 94 ms. ================================================================================
SQL Server Execution Times: CPU time = 8922 ms, elapsed time = 15994 ms.
SQL Server Execution Times: CPU time = 8922 ms, elapsed time = 19858 ms.
SQL Server Execution Times: CPU time = 78 ms, elapsed time = 81 ms. ================================================================================
SQL Server Execution Times: CPU time = 8203 ms, elapsed time = 17882 ms.
SQL Server Execution Times: CPU time = 8844 ms, elapsed time = 19245 ms.
SQL Server Execution Times: CPU time = 94 ms, elapsed time = 86 ms. ================================================================================
SQL Server Execution Times: CPU time = 8297 ms, elapsed time = 16888 ms.
SQL Server Execution Times: CPU time = 8656 ms, elapsed time = 18983 ms.
SQL Server Execution Times: CPU time = 78 ms, elapsed time = 79 ms. ================================================================================ Batch execution completed 6 times.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 10:48 AM
Points: 10,990,
Visits: 10,578
|
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 10:48 AM
Points: 10,990,
Visits: 10,578
|
|
Long answer:
SCHEMABINDING performs no magic with simple integer math - so we don't expect a difference with the SELECTs in your example code.
Where it does make a huge difference, is when the query optimizer (QO) must consider whether the function is deterministic or not. The QO has many fewer options when it encounters a non-deterministic function.
When marked as schema-bound, the engine analyzes the content of the function to decide on whether it is deterministic or not.
A scalar function which is not marked with SCHEMABINDING it is marked as doing data access - without analyzing the content of the function at all. This not only prevents it from be evaluated with deterministic optimizations, it also means that things like halloween protection become necessary in UPDATE statements, where they would not be otherwise. (This is the specific example in the article I linked to.)
The script below illustrates the dramatic differences that often appear.
USE tempdb; GO -- Conditional drops IF OBJECT_ID(N'dbo.ComputeNum', N'FN') IS NOT NULL DROP FUNCTION dbo.ComputeNum; IF OBJECT_ID(N'dbo.ComputeNumSB', N'FN') IS NOT NULL DROP FUNCTION dbo.ComputeNumSB; IF OBJECT_ID(N'dbo.TestTable', N'U') IS NOT NULL DROP TABLE dbo.TestTable; GO -- Schema-bound (will be evaluated for determinism) CREATE FUNCTION dbo.ComputeNumSB (@i INT) RETURNS INT WITH SCHEMABINDING AS BEGIN RETURN @i * 2 + 50 END; GO -- Not schema-bound, will be marked as doing data access CREATE FUNCTION dbo.ComputeNum (@i INT) RETURNS INT AS BEGIN RETURN @i * 2 + 50 END; GO -- Create a test table which we can update CREATE TABLE dbo.TestTable (A INT IDENTITY PRIMARY KEY, B INT NOT NULL) GO INSERT dbo.TestTable (B) SELECT TOP (100000) CONVERT(INT, 123456 * RAND(CHECKSUM(NEWID()))) FROM master.sys.columns C1 CROSS JOIN master.sys.columns C2 GO -- Check the properties of the functions /* SELECT OBJECTPROPERTYEX(OBJECT_ID(N'dbo.ComputeNum', N'FN'), 'IsDeterministic') -- 0 = No SELECT OBJECTPROPERTYEX(OBJECT_ID(N'dbo.ComputeNum', N'FN'), 'SystemDataAccess') -- 1 = Yes SELECT OBJECTPROPERTYEX(OBJECT_ID(N'dbo.ComputeNum', N'FN'), 'UserDataAccess') -- 1 = Yes
SELECT OBJECTPROPERTYEX(OBJECT_ID(N'dbo.ComputeNumSB', N'FN'), 'IsDeterministic') -- 1 = Yes SELECT OBJECTPROPERTYEX(OBJECT_ID(N'dbo.ComputeNumSB', N'FN'), 'SystemDataAccess') -- 0 = No SELECT OBJECTPROPERTYEX(OBJECT_ID(N'dbo.ComputeNumSB', N'FN'), 'UserDataAccess') -- 0 = No */ GO -- -- TESTS -- DECLARE @TestSize INT; SELECT @TestSize = 100000;
SET STATISTICS IO ON;
-- Non-schema-bound version (compare the plans!) BEGIN TRANSACTION UPDATE dbo.TestTable SET B = dbo.ComputeNum(B) WHERE A BETWEEN 1 AND @TestSize ROLLBACK
--Table 'TestTable'. Scan count 1, logical reads 200213, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. --Table 'Worktable'. Scan count 1, logical reads 201856, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- Schema-bound BEGIN TRANSACTION UPDATE dbo.TestTable SET B = dbo.ComputeNumSB(B) WHERE A BETWEEN 1 AND @TestSize ROLLBACK
--Table 'TestTable'. Scan count 1, logical reads 213, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SET STATISTICS IO OFF; GO IF OBJECT_ID(N'dbo.ComputeNum', N'FN') IS NOT NULL DROP FUNCTION dbo.ComputeNum; IF OBJECT_ID(N'dbo.ComputeNumSB', N'FN') IS NOT NULL DROP FUNCTION dbo.ComputeNumSB; IF OBJECT_ID(N'dbo.TestTable', N'U') IS NOT NULL DROP TABLE dbo.TestTable;
Paul White SQL Server MVP SQLblog.com @SQL_Kiwi
|
|
|
|