Scalar UDF versus Inline UDF

  • Hello,

    Running SQL 2008 R2 Enterprise.

    I am aware of the fundamental difference between how a scalar user defined function and in-line table valued function are called by the optimizer. I need some help interpreting if I am tackling this use case properly as I'm not sure if in-line UDF is actually the better performer.

    I am calculating currency exchange rates for an order. The requirement is to return the most recent exchange rate for a given currency. I have a very simple lookup table that holds the current exchange rate (daily refresh is good enough for our requirements, no real-time exchange rate updates... for now :-)).

    CREATE TABLE [dbo].[CurrencyExchangeRates]

    (

    [ShortName] [char](3) NOT NULL,

    [PublishDate] [date] NOT NULL DEFAULT (getdate()),

    [ExchangeRate] [decimal](19, 9) NOT NULL,

    CONSTRAINT [CLIX_CurrencyExchangeRates_ShortName_PublishDate] PRIMARY KEY CLUSTERED

    (

    [ShortName] ASC,

    [PublishDate] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [Data]

    ) ON [Data]

    I am aware my clustering key choice is a bit odd, but we will always query using the currency short name, and since this is updated by batch job nightly, page splits will unlikely cause serious performance impact. We only need support for Canadian Dollars and Mexican Pesos. This saves me additional non-clustered indexes if chose an identity column as PK and clustering key. I might change my mind if requirements change down the road, but this approach should reliably yield clustered index seeks. We have Enterprise edition so partitioning is always an option, but probably overkill for now.

    Here is a snippet I am working with to load up some sample data and test the performance of calculating exchange rate using both a scalar UDF and in-line TVF.

    USE Sandbox

    GO

    -- Create temp table

    IF OBJECT_ID('tempdb..#Shipments') IS NOT NULL

    DROP TABLE tempdb..#Shipments

    CREATE TABLE #Shipments

    (

    IdINT IDENTITYNOT NULL

    ,CostDECIMAL(19,4)NOT NULL

    ,CurrencyShortNameCHAR(3)NOT NULL

    )

    SET NOCOUNT ON;

    INSERT INTO #Shipments (Cost, CurrencyShortName)

    VALUES (

    RAND() * 1000

    ,CASE WHEN RAND() >=.5 THEN 'CAD' ELSE 'MXN' END

    )

    GO 100000

    CREATE CLUSTERED INDEX PK_Shipments_Id ON #Shipments( [Id] )

    -- Verify we have sample data

    SELECT CurrencyShortName, COUNT(*)

    FROM #Shipments

    GROUP BY CurrencyShortName

    -- Create a scalar user defined function

    CREATE FUNCTION [dbo].[fncGetCurrencyExchangeRate]

    (

    @CurrencyShortName CHAR(3)

    )

    RETURNS DECIMAL(19,9)

    AS

    BEGIN

    DECLARE @ExchangeRate DECIMAL(19,9)

    SELECT TOP 1 @ExchangeRate = [ExchangeRate]

    FROM [dbo].[CurrencyExchangeRates]

    WHERE [ShortName] = @CurrencyShortName

    ORDER BY [PublishDate] DESC

    RETURN @ExchangeRate

    END

    GO

    -- Create an inline table valued function

    CREATE FUNCTION [itvf].[CurrentExchangeRate]

    (

    @CurrencyShortName CHAR(3)

    )

    RETURNS TABLE

    AS RETURN

    (

    SELECT TOP 1 [ExchangeRate]

    FROM [dbo].[CurrencyExchangeRates]

    WHERE [ShortName] = @CurrencyShortName

    ORDER BY [PublishDate] DESC

    )

    GO

    -- Turn on for debugging

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    -- Query using inline table valued function

    SELECT TOP 1000

    s.[Id]

    ,s.[CurrencyShortName]

    ,CONVERT( DECIMAL(19,2), s.[Cost]) AS [Cost USD]

    ,CONVERT( DECIMAL(19,2), s.Cost * er.ExchangeRate) AS [Cost Converted]

    FROM #Shipments s

    CROSS APPLY [itvf].[CurrentExchangeRate] (s.CurrencyShortName) er

    -- Query using scalar function

    SELECT TOP 1000

    s.[Id]

    ,s.[CurrencyShortName]

    ,CONVERT( DECIMAL(19,2), s.[Cost]) AS [Cost USD]

    ,CONVERT( DECIMAL(19,2), s.Cost * dbo.fncGetCurrencyExchangeRate(s.[CurrencyShortName]) ) AS [Cost Converted]

    FROM #Shipments s

    -- Cleanup

    DROP TABLE #Shipments

    GO

    I tried to get this out on SQLFiddle, but it doesn't seem to like how I generate sample data. I'm probably screwing that up too.

    When I run on my local machine I observe that the inline TVF has 2000 logical reads and completes in 75-100ms

    The scalar UDF has far fewer reads, but has ~300ms elapsed time.

    Do these results make sense? Which is the winner?

    Is there a way I can optimize my inline UDF approach to get the page reads down?

    Insight is appreciated. Thanks!

  • One reason can be that scalar functions work row by row while inline functions don't.

    Or it could be that scalar functions prevent parallelism (because of the previous cause) and inline functions don't.

    Your testings seem logical, but I would change the order of the columns on your index (but I'm no expert on that).

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • So that's exactly what I'm trying to sort out. The inline function had 2000 page reads so appears to be RBAR, but less CPU. Harder to tell with the scalar, but the high CPU also suggests RBAR.

    In the real world production server I'm concerned the pages won't always be in memory and the high page reads of the inline version will translate to physical IO that will affect performance.

    If you read above carefully I explained my reasoning for the key column ordering. I won't always know the date, but will always know the currency short name. This is not a transaction heavy table so page splits are not a deal breaker.

    I suppose I could alter the indexing strategy to use a IDENTITY column and build a separate non-clustered covering index and see how that impacts performance. To cover I would also have to include the ExchangeRate and PublishDate, and now we're essentially duplicating the clustered index... so that's why I went the way I did.

  • If it's a currency exchange rate table, you'll always need the date. There's no use of a rate without the date. A non-clustered index makes no sense because the clustered should be enough for any query and a non-clustered will only duplicate data.

    Maybe someone else can give a better input. πŸ˜‰

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • You might also look into adding SCHEMABINDING to your itvf.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Tracing scalar UDFs adds enormous time to its execution. That is, just trying to compute the time vastly increases the time. Thus, scalar UDFs generally are performing much better in reality than the SET STATS shows.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (8/21/2014)


    Tracing scalar UDFs adds enormous time to its execution. That is, just trying to compute the time vastly increases the time. Thus, scalar UDFs generally are performing much better in reality than the SET STATS shows.

    Jeff Moden's article here [/url]provides evidence for this, and also shows how to remove some of the noise from the time measurement.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • The biggest issue I would have with the scalar udf (and I had a similar issue when working on time zone conversion) is that the scalar UDF is being called once for every row in the result set returned. Since you are doing data access in the UDF and not just a calculation as is in Jeff's article, I'd be wary of performance degradation as the # of rows in the Exchange Rate table grows or the number of rows you are processing grows. You can see this behavior by using Profiler or Extended Events. I used Profiler with SQL:StmtStarting, SQL:StmtCompleted, and SP:Completed events to show the # of calls to the udf. If you SP:StmtCompleted you can see the reads done by the scalar udf which aren't reported in the SET STATISTICS IO or in the SQL:stmtcompleted event.

    I did a little testing and the best performance will come from joining to a view (Profiler results in comments above each query). here's what I did for test data as I adapted what you had:

    USE test;

    GO

    IF OBJECT_ID('dbo.[fncGetCurrencyExchangeRate]', 'FN') IS NOT NULL

    BEGIN;

    DROP FUNCTION dbo.[fncGetCurrencyExchangeRate];

    END;

    IF OBJECT_ID('dbo.[CurrentExchangeRate]', 'IF') IS NOT NULL

    BEGIN;

    DROP FUNCTION dbo.[CurrentExchangeRate];

    END;

    IF OBJECT_ID('dbo.LatestExchangeRates', 'V') IS NOT NULL

    BEGIN;

    DROP VIEW dbo.LatestExchangeRates;

    END;

    IF OBJECT_ID('dbo.CurrencyExchangeRates', 'U') IS NOT NULL

    BEGIN;

    DROP TABLE dbo.CurrencyExchangeRates;

    END;

    CREATE TABLE [dbo].[CurrencyExchangeRates]

    (

    [ShortName] [CHAR](3) NOT NULL,

    [PublishDate] [DATE] NOT NULL

    DEFAULT (GETDATE()),

    [ExchangeRate] [DECIMAL](19, 9) NOT NULL,

    CONSTRAINT [CLIX_CurrencyExchangeRates_ShortName_PublishDate] PRIMARY KEY CLUSTERED

    ([ShortName] ASC, [PublishDate] ASC)

    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,

    IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON)

    );

    GO

    -- Create a scalar user defined function

    CREATE FUNCTION [dbo].[fncGetCurrencyExchangeRate]

    (

    @CurrencyShortName CHAR(3)

    )

    RETURNS DECIMAL(19, 9)

    AS

    BEGIN

    DECLARE @ExchangeRate DECIMAL(19, 9);

    SELECT TOP 1

    @ExchangeRate = [ExchangeRate]

    FROM

    [dbo].[CurrencyExchangeRates]

    WHERE

    [ShortName] = @CurrencyShortName

    ORDER BY

    [PublishDate] DESC;

    RETURN @ExchangeRate;

    END;

    GO

    -- Create an inline table valued function

    CREATE FUNCTION dbo.[CurrentExchangeRate]

    (

    @CurrencyShortName CHAR(3)

    )

    RETURNS TABLE

    AS RETURN

    (

    SELECT TOP 1

    [ExchangeRate]

    FROM

    [dbo].[CurrencyExchangeRates]

    WHERE

    [ShortName] = @CurrencyShortName

    ORDER BY

    [PublishDate] DESC

    );

    GO

    CREATE VIEW LatestExchangeRates

    AS

    WITH ExchageRates

    AS (

    SELECT

    CER.ShortName,

    ROW_NUMBER() OVER (PARTITION BY CER.ShortName ORDER BY CER.PublishDate DESC) AS rowNO,

    CER.ExchangeRate

    FROM

    dbo.CurrencyExchangeRates AS CER

    )

    SELECT

    ExchageRates.ShortName,

    ExchageRates.ExchangeRate

    FROM

    ExchageRates

    WHERE

    ExchageRates.rowNO = 1;

    GO

    -- Create temp table

    IF OBJECT_ID('tempdb..#Shipments') IS NOT NULL

    DROP TABLE #Shipments;

    CREATE TABLE #Shipments

    (

    Id INT IDENTITY

    NOT NULL,

    Cost DECIMAL(19, 4) NOT NULL,

    CurrencyShortName CHAR(3) NOT NULL

    );

    SET NOCOUNT ON;

    INSERT INTO #Shipments

    (

    Cost,

    CurrencyShortName

    )

    SELECT

    CASE WHEN AO.object_id < 0 THEN -1

    ELSE 1

    END * AO.object_id,

    CASE WHEN AO.object_id % 3 = 1 THEN 'CAD'

    ELSE 'MXN'

    END

    FROM

    sys.all_objects AS AO;

    GO

    CREATE CLUSTERED INDEX PK_Shipments_Id ON #Shipments( [id] );

    INSERT INTO dbo.CurrencyExchangeRates

    (

    ShortName,

    PublishDate,

    ExchangeRate

    )

    SELECT

    CASE WHEN ROW_NUMBER() OVER (ORDER BY AC.column_id) % 3 = 1

    THEN 'CAD'

    ELSE 'MXN'

    END AS ShortName,

    CASE WHEN ROW_NUMBER() OVER (ORDER BY AC.column_id) % 2 = 1

    THEN DATEADD(DAY,

    -1 * ROW_NUMBER() OVER (ORDER BY AC.column_id),

    GETDATE())

    ELSE DATEADD(DAY,

    -2 * ROW_NUMBER() OVER (ORDER BY AC.column_id),

    GETDATE())

    END,

    RAND(ROW_NUMBER() OVER (ORDER BY AC.column_id))

    FROM

    sys.all_columns AS AC;

    -- Verify we have sample data

    SELECT

    CurrencyShortName,

    COUNT(*)

    FROM

    #Shipments

    GROUP BY

    CurrencyShortName;

    GO

    -- Query using inline table valued function

    /* From Profiler 3 Rows, 0 CPU, 2227 Reads, 167 Duration */

    SELECT TOP 1000

    s.[Id],

    s.[CurrencyShortName],

    CONVERT(DECIMAL(19, 2), s.[Cost]) AS [Cost USD],

    CONVERT(DECIMAL(19, 2), s.Cost * er.ExchangeRate) AS [Cost Converted]

    FROM

    #Shipments s

    CROSS APPLY dbo.[CurrentExchangeRate](s.CurrencyShortName) er

    ORDER BY

    S.Id;

    -- Query using scalar function

    /* From Profiler 3000 Rows, 94 CPU, 6020 Reads (I had to dump to a table and Sum Reads across all the Scalar UDF calls because BatchCompleted doesn't include those), 276 Duration */

    SELECT TOP 1000

    s.[Id],

    s.[CurrencyShortName],

    CONVERT(DECIMAL(19, 2), s.[Cost]) AS [Cost USD],

    CONVERT(DECIMAL(19, 2), s.Cost *

    dbo.fncGetCurrencyExchangeRate(s.[CurrencyShortName])) AS [Cost Converted]

    FROM

    #Shipments s

    ORDER BY

    S.Id;

    /* Joining to a view */

    /* From Profiler 3 Rows, 0 CPU, 2047 Reads, 208 Duration */

    SELECT TOP 1000

    S.[Id],

    S.[CurrencyShortName],

    CONVERT(DECIMAL(19, 2), S.[Cost]) AS [Cost USD],

    CONVERT(DECIMAL(19, 2), S.Cost * ER.ExchangeRate) AS [Cost Converted]

    FROM

    #Shipments AS S

    JOIN LatestExchangeRates AS ER

    ON ER.ShortName = S.CurrencyShortName

    ORDER BY

    S.Id;

    -- Cleanup

    DROP TABLE #Shipments;

    GO

    I didn't run each one a bunch of times to get average durations and CPU, but I think you'll find over multiple iterations the join to the view will outperform the UDF's every time.

    Oh and you can probably get better performance by at the least changing the clustered index to

    CLUSTERED

    ([ShortName] ASC, [PublishDate] DESC)

    Because you are ordering by PublishDate desc in the UDF.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply