Why Scalar Functions Can Be Costly

  • Mickey Stuewe

    SSC Eights!

    Points: 879

    Comments posted to this topic are about the item Why Scalar Functions Can Be Costly

    Mickey Stuewe
    Sr Database Developer
    My blog
    Follow me on twitter: @SQLMickey
    Connect with me on LinkedIn
    --------------------------------------------------------------------------
    I laugh loudly and I laugh often. Just ask anyone who knows me.

  • Eirikur Eiriksson

    SSC Guru

    Points: 182344

    Thank you Mickey for this article, good Monday morning reading.

    The part I find hard to agree with is "When to Use Scalar Functions", particularly when the example given can easily be converted into a much more efficient iTVFN with hardly any changes in the syntax. In my opinion, the only time scalar UDF should be used is when that is the only choice, such as on calculated columns etc..

    😎

    The following example shows that the iTVFN version of the function is roughly 15 times faster than the scalar version, the former adding roughly 0.0006 milliseconds per row while the latter adds 0.009 milliseconds per row on my old i5 laptop. In other words the iTVFN takes 0.6 seconds on a 1,000,000 row sample set and the scalar 9 seconds on the same set.

    Test data

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.USP_CREATE_PERSON_ADDRESS') IS NOT NULL DROP PROCEDURE dbo.USP_CREATE_PERSON_ADDRESS;

    GO

    CREATE PROCEDURE dbo.USP_CREATE_PERSON_ADDRESS

    (

    @SAMPLE_SIZE INT

    ,@MSGLANGID SMALLINT = 1033 -- Default to English

    )

    AS

    /******************************************************************************

    2015 Eirikur Eiriksson

    ---------------------------------------

    Create "realistic" sample data set (SDP_ID, FirstName, LastName, Address)

    in any of the languages found in sys.syslanguages.

    ---------------------------------------

    Parameters

    1. @SAMPLE_SIZE INT, number of rows produced, max 10^9

    2. @MSGLANGID SMALLINT, msglangid of the language to use from the

    sys.syslanguages table.

    ---------------------------------------

    Note: to view the applicable languages and corresponding msglangid run the

    following query.

    ---------------------------------------

    SELECT

    SL.alias

    ,SL.msglangid

    FROM sys.syslanguages SL;

    ---------------------------------------

    SELECT DISTINCT

    SL.msglangid

    FROM sys.syslanguages SL;

    ---------------------------------------

    Output structure:

    ---------------------------------------

    SDP_ID BIGINT NOT NULL

    FirstName NVARCHAR(50) NOT NULL

    LastName NVARCHAR(50) NOT NULL

    [Address] NVARCHAR(100) NOT NULL

    ---------------------------------------

    *******************************************************************************/

    WITH T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N))

    , NUMS(N) AS (SELECT TOP (@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N

    FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)

    ,NAME_STRING(SXNAME) AS

    (

    SELECT

    (SELECT

    REPLACE( -- 32

    REPLACE( -- 34 "

    REPLACE( -- 37 %

    REPLACE( -- 39 '

    REPLACE( -- 40 (

    REPLACE( -- 41 )

    REPLACE( -- 42 *

    REPLACE( -- 44 ,

    REPLACE( -- 45 -

    REPLACE( -- 46 .

    REPLACE( -- 47 /

    REPLACE( -- 58 :

    REPLACE( -- 59 ;

    REPLACE( -- 61 =

    REPLACE( -- 64 @

    REPLACE( -- 95 _

    REPLACE( -- 124 |

    SM.text

    ,NCHAR(32),N'') -- 32

    ,NCHAR(34),N'') -- 34 "

    ,NCHAR(37),N'') -- 37 %

    ,NCHAR(39),N'') -- 39 '

    ,NCHAR(40),N'') -- 40 (

    ,NCHAR(41),N'') -- 41 )

    ,NCHAR(42),N'') -- 42 *

    ,NCHAR(44),N'') -- 44 ,

    ,NCHAR(45),N'') -- 45 -

    ,NCHAR(46),N'') -- 46 .

    ,NCHAR(47),N'') -- 47 /

    ,NCHAR(58),N'') -- 58 :

    ,NCHAR(59),N'') -- 59 ;

    ,NCHAR(61),N'') -- 61 =

    ,NCHAR(64),N'') -- 64 @

    ,NCHAR(95),N'') -- 95 _

    ,NCHAR(124),N'') -- 124 |

    FROM sys.messages SM

    WHERE SM.language_id = @MSGLANGID

    FOR XML PATH(''), TYPE).value('.[1]','NVARCHAR(4000)') AS SXNAME

    )

    ,BASE_DATA AS

    (

    SELECT

    NCHAR(65 + ABS(CHECKSUM(NEWID())) % 25)

    + LOWER(SUBSTRING(NS.SXNAME,ABS(CHECKSUM(NEWID())) % 3800,5 + ABS(CHECKSUM(NEWID())) % 10)) AS FirstName

    ,NCHAR(65 + ABS(CHECKSUM(NEWID())) % 25)

    + LOWER(SUBSTRING(NS.SXNAME,ABS(CHECKSUM(NEWID())) % 3800,5 + ABS(CHECKSUM(NEWID())) % 10)) AS LastName

    ,NCHAR(65 + ABS(CHECKSUM(NEWID())) % 25)

    + LOWER(SUBSTRING(NS.SXNAME,ABS(CHECKSUM(NEWID())) % 3800,7 + ABS(CHECKSUM(NEWID())) % 10))

    + NCHAR(32) + CONVERT(NVARCHAR(4),1 + (ABS(CHECKSUM(NEWID())) % 1000),0) AS [Address]

    FROM NUMS NM

    CROSS APPLY NAME_STRING NS

    )

    SELECT

    ROW_NUMBER() OVER

    (

    ORDER BY (SELECT NULL)

    ) AS SDP_ID

    ,BD.FirstName

    ,BD.LastName

    ,BD.Address

    FROM BASE_DATA BD

    ;

    GO

    IF OBJECT_ID(N'dbo.SAMPLE_Person') IS NOT NULL DROP TABLE dbo.SAMPLE_Person;

    CREATE TABLE dbo.SAMPLE_Person

    (

    SDP_ID BIGINT NOT NULL CONSTRAINT PK_DBO_PERSON_PERSON_ID PRIMARY KEY CLUSTERED

    ,FirstName NVARCHAR(50) NOT NULL

    ,LastName NVARCHAR(50) NOT NULL

    ,[Address] NVARCHAR(100) NOT NULL

    );

    GO

    INSERT INTO dbo.SAMPLE_Person (SDP_ID,FirstName,LastName,[Address])

    EXEC dbo.USP_CREATE_PERSON_ADDRESS 1000000,1033;

    The two functions

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.ITVFN_PROPER_CASE') IS NOT NULL DROP FUNCTION dbo.ITVFN_PROPER_CASE;

    GO

    CREATE FUNCTION dbo.ITVFN_PROPER_CASE

    (

    @INPUT NVARCHAR(50)

    )

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    SELECT

    STUFF(LOWER(@INPUT),1,1,UPPER(LEFT(@INPUT,1))) AS PROPER_CASED

    ;

    GO

    IF OBJECT_ID(N'dbo.SFN_PROPER_CASE') IS NOT NULL DROP FUNCTION dbo.SFN_PROPER_CASE;

    GO

    CREATE FUNCTION dbo.SFN_PROPER_CASE

    (

    @INPUT NVARCHAR(50)

    )

    RETURNS NVARCHAR(50)

    WITH SCHEMABINDING

    AS

    BEGIN

    RETURN

    (SELECT

    STUFF(LOWER(@INPUT),1,1,UPPER(LEFT(@INPUT,1))))

    ;

    END

    GO

    Test harness

    DECLARE @NCHAR_BUCKET_01 NVARCHAR(50) = N'';

    DECLARE @NCHAR_BUCKET_02 NVARCHAR(50) = N'';

    DECLARE @TIMER TABLE (T_TEXT VARCHAR(100) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME()));

    INSERT INTO @TIMER(T_TEXT) VALUES ('DRY RUN');

    SELECT

    @NCHAR_BUCKET_01 = SP.FirstName

    ,@NCHAR_BUCKET_02 = SP.LastName

    FROM dbo.SAMPLE_Person SP;

    INSERT INTO @TIMER(T_TEXT) VALUES ('DRY RUN');

    INSERT INTO @TIMER(T_TEXT) VALUES ('SCALAR');

    SELECT

    @NCHAR_BUCKET_01 = dbo.SFN_PROPER_CASE(SP.FirstName)

    ,@NCHAR_BUCKET_02 = dbo.SFN_PROPER_CASE(SP.LastName)

    FROM dbo.SAMPLE_Person SP;

    INSERT INTO @TIMER(T_TEXT) VALUES ('SCALAR');

    INSERT INTO @TIMER(T_TEXT) VALUES ('ITVFN');

    SELECT

    @NCHAR_BUCKET_01 = X.PROPER_CASED

    ,@NCHAR_BUCKET_02 = Y.PROPER_CASED

    FROM dbo.SAMPLE_Person SP

    CROSS APPLY dbo.ITVFN_PROPER_CASE(SP.FirstName) AS X

    CROSS APPLY dbo.ITVFN_PROPER_CASE(SP.FirstName) AS Y;

    INSERT INTO @TIMER(T_TEXT) VALUES ('ITVFN');

    -- Test results

    SELECT

    T.T_TEXT

    ,DATEDIFF(millisecond,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION

    FROM @TIMER T

    GROUP BY T.T_TEXT

    ORDER BY DURATION ASC;

    Results

    T_TEXT DURATION

    ---------- -----------

    DRY RUN 230

    ITVFN 547

    SCALAR 8994

    Edit: typo

  • bjh1977

    SSChasing Mays

    Points: 630

    Small correction required - UDFs came in as part of 2000 didn't they?

  • Carlo Romagnano

    SSC-Insane

    Points: 21713

    bjh1977 (1/4/2016)


    Small correction required - UDFs came in as part of 2000 didn't they?

    I think they started earlier with stored procs ('90): sql 6.0

  • TheSQLGuru

    SSC Guru

    Points: 134017

    1) UDFs existed at least as early as SQL 2000: https://technet.microsoft.com/en-us/library/aa175085(v=sql.80).aspx

    2) This article fails to mention an EXCEPTIONALLY important flaw (among many others that aren't covered such as preventing the acquisition of good statistics) and that is single-threading. Concatenate a string like the article says using a UDF and your big-fact-table-hitting-query is going to be running without parallelism. http://sqlblog.com/blogs/paul_white/archive/2011/12/23/forcing-a-parallel-query-execution-plan.aspx

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Thanks for the good article.

  • x

    SSC-Insane

    Points: 23352

    I'm not understanding here, doesn't the summing operation still have to occur for each row in the "cross apply" version?

    edit: Another thing that bothers me about the article is that the author implies that the "execution" count is the ultimate measure of T-SQL performance goodness, whereas, as shown by Eirikur Eiriksson, even with the "better" execution count with that second scalar UDF example, theres obviously more to the story here.

  • Phil Parkin

    SSC Guru

    Points: 243549

    Eirikur's response rendered what I was going to write mostly redundant. But I do have some minor points:

    1) If you 'obfuscate' something, you make it unclear or difficult to interpret. The execution plan does not do that: instead it completely hides the multi-execution issue.

    2) It would be interesting to see how you created/configured that extended events session. Your 'how to identify the cost' comment made me think that you were going to find a way to identify the 'real' cost of using scalar functions within an execution plan.

    3) (typo) I think it's 'formatted' rather than 'formated'.

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

  • Alan Burstein

    SSC Guru

    Points: 61026

    TheSQLGuru (1/4/2016)


    1) UDFs existed at least as early as SQL 2000: https://technet.microsoft.com/en-us/library/aa175085(v=sql.80).aspx

    2) This article fails to mention an EXCEPTIONALLY important flaw (among many others that aren't covered such as preventing the acquisition of good statistics) and that is single-threading. Concatenate a string like the article says using a UDF and your big-fact-table-hitting-query is going to be running without parallelism. http://sqlblog.com/blogs/paul_white/archive/2011/12/23/forcing-a-parallel-query-execution-plan.aspx

    Both Kevin and Eirikur beat me to it. As Eirikur demonstrated, an inline table valued function is almost always the way to go. Killing the possibility of a parallel query plan is the #1 problem work scalr UDF's.

    That said, great work Micky. I particularly like how you used Extended Events to show what was going on under the hood.

    -- Alan Burstein


    Helpful links:Best practices for getting help on SQLServerCentral -- Jeff ModenHow to Post Performance Problems -- Gail ShawNasty fast set-based string manipulation functions:For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)To split strings based on patterns try PatternSplitCMNeed to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8KI cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code.  -- Itzik Ben-Gan 2001

  • lucien.jacquet

    SSC Enthusiast

    Points: 143

    The example given selects rows based on the result of a mathematical operation performed on two columns within each row. This derived result does not exist on its own; it must be calculated for each row, so it actually seems to make sense that each row must be read. It appears that the engine then creates a tally table with the same number of rows, includes the result of the calculation as a key column, then scans this table for qualifying results.

    But if the selection criteria was based on an existing column value - and if this column was properly indexed - would that not reduce the number of reads to just the number of qualifying rows? Even if the function then performed a similar computation using existing columns to derive a new result, it seems to me that the rows read should still be held to just "qualifying" rows, avoiding a full-table scan.

    My point is not to contradict your assertion that scalar functions "can be" expensive, but to suggest that your example is a worst-case scenario. Is it possible that your recommendation to never query a table or view inside of a scalar function might be overly broad?

  • TheSQLGuru

    SSC Guru

    Points: 134017

    lucien.jacquet (1/4/2016)


    My point is not to contradict your assertion that scalar functions "can be" expensive, but to suggest that your example is a worst-case scenario. Is it possible that your recommendation to never query a table or view inside of a scalar function might be overly broad?

    NO, IT IS ABSOLUTELY NOT OVERLY BROAD TO SAY NEVER QUERY A TABLE OR VIEW INSIDE OF A SCALAR FUNCTION!!!!!!!! It is actually more appropriate to say NEVER EVER USER SCALAR UDFS - PERIOD!! There are few absolutes in the world, but that is one of them for me. There is almost ALWAYS a way to avoid them. Please find and read my "Death by UDF" chapter in the SQL Server MVP Deep Dives 2 book to learn more about why they must be avoided.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Phil Parkin

    SSC Guru

    Points: 243549

    TheSQLGuru (1/4/2016)


    lucien.jacquet (1/4/2016)


    My point is not to contradict your assertion that scalar functions "can be" expensive, but to suggest that your example is a worst-case scenario. Is it possible that your recommendation to never query a table or view inside of a scalar function might be overly broad?

    NO, IT IS ABSOLUTELY NOT OVERLY BROAD TO SAY NEVER QUERY A TABLE OR VIEW INSIDE OF A SCALAR FUNCTION!!!!!!!! It is actually more appropriate to say NEVER EVER USER SCALAR UDFS - PERIOD!! There are few absolutes in the world, but that is one of them for me. There is almost ALWAYS a way to avoid them. Please find and read my "Death by UDF" chapter in the SQL Server MVP Deep Dives 2 book to learn more about why they must be avoided.

    Not even for CHECK constraints?

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

  • Eirikur Eiriksson

    SSC Guru

    Points: 182344

    Phil Parkin (1/4/2016)


    TheSQLGuru (1/4/2016)


    lucien.jacquet (1/4/2016)


    My point is not to contradict your assertion that scalar functions "can be" expensive, but to suggest that your example is a worst-case scenario. Is it possible that your recommendation to never query a table or view inside of a scalar function might be overly broad?

    NO, IT IS ABSOLUTELY NOT OVERLY BROAD TO SAY NEVER QUERY A TABLE OR VIEW INSIDE OF A SCALAR FUNCTION!!!!!!!! It is actually more appropriate to say NEVER EVER USER SCALAR UDFS - PERIOD!! There are few absolutes in the world, but that is one of them for me. There is almost ALWAYS a way to avoid them. Please find and read my "Death by UDF" chapter in the SQL Server MVP Deep Dives 2 book to learn more about why they must be avoided.

    Not even for CHECK constraints?

    That's about the only exception

    😎

    Worse still are the multi-statement table-valued functions, in fact there is no excuse or justification for using those.

  • Eirikur Eiriksson

    SSC Guru

    Points: 182344

    bjh1977 (1/4/2016)


    Small correction required - UDFs came in as part of 2000 didn't they?

    That's correct.

    😎

  • Recce70

    Say Hey Kid

    Points: 676

    I remember that Simon Sabin did an article called "Scalar functions are evil", but whenever I needed to send someone a link to it I'd Google for "Simon Sabin is evil" 😛

Viewing 15 posts - 1 through 15 (of 54 total)

You must be logged in to reply to this topic. Login to reply