Scalar UDF versus Inline UDF

Question

Post reply

Scalar UDF versus Inline UDF

jallmond

Ten Centuries

Points: 1181
More actions
August 21, 2014 at 8:31 am

#292487

Hello,
Running SQL 2008 R2 Enterprise.
I am aware of the fundamental difference between how a scalar user defined function and in-line table valued function are called by the optimizer. I need some help interpreting if I am tackling this use case properly as I'm not sure if in-line UDF is actually the better performer.
I am calculating currency exchange rates for an order. The requirement is to return the most recent exchange rate for a given currency. I have a very simple lookup table that holds the current exchange rate (daily refresh is good enough for our requirements, no real-time exchange rate updates... for now :-)).
CREATE TABLE [dbo].[CurrencyExchangeRates]
(
[ShortName] [char](3) NOT NULL,
[PublishDate] [date] NOT NULL DEFAULT (getdate()),
[ExchangeRate] [decimal](19, 9) NOT NULL,
CONSTRAINT [CLIX_CurrencyExchangeRates_ShortName_PublishDate] PRIMARY KEY CLUSTERED
(
[ShortName] ASC,
[PublishDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [Data]
) ON [Data]
I am aware my clustering key choice is a bit odd, but we will always query using the currency short name, and since this is updated by batch job nightly, page splits will unlikely cause serious performance impact. We only need support for Canadian Dollars and Mexican Pesos. This saves me additional non-clustered indexes if chose an identity column as PK and clustering key. I might change my mind if requirements change down the road, but this approach should reliably yield clustered index seeks. We have Enterprise edition so partitioning is always an option, but probably overkill for now.
Here is a snippet I am working with to load up some sample data and test the performance of calculating exchange rate using both a scalar UDF and in-line TVF.
USE Sandbox
GO
-- Create temp table
IF OBJECT_ID('tempdb..#Shipments') IS NOT NULL
DROP TABLE tempdb..#Shipments
CREATE TABLE #Shipments
(
IdINT IDENTITYNOT NULL
,CostDECIMAL(19,4)NOT NULL
,CurrencyShortNameCHAR(3)NOT NULL
)
SET NOCOUNT ON;
INSERT INTO #Shipments (Cost, CurrencyShortName)
VALUES (
RAND() * 1000
,CASE WHEN RAND() >=.5 THEN 'CAD' ELSE 'MXN' END
)
GO 100000
CREATE CLUSTERED INDEX PK_Shipments_Id ON #Shipments( [Id] )
-- Verify we have sample data
SELECT CurrencyShortName, COUNT(*)
FROM #Shipments
GROUP BY CurrencyShortName
-- Create a scalar user defined function
CREATE FUNCTION [dbo].[fncGetCurrencyExchangeRate]
(
@CurrencyShortName CHAR(3)
)
RETURNS DECIMAL(19,9)
AS
BEGIN
DECLARE @ExchangeRate DECIMAL(19,9)
SELECT TOP 1 @ExchangeRate = [ExchangeRate]
FROM [dbo].[CurrencyExchangeRates]
WHERE [ShortName] = @CurrencyShortName
ORDER BY [PublishDate] DESC
RETURN @ExchangeRate
END
GO
-- Create an inline table valued function
CREATE FUNCTION [itvf].[CurrentExchangeRate]
(
@CurrencyShortName CHAR(3)
)
RETURNS TABLE
AS RETURN
(
SELECT TOP 1 [ExchangeRate]
FROM [dbo].[CurrencyExchangeRates]
WHERE [ShortName] = @CurrencyShortName
ORDER BY [PublishDate] DESC
)
GO
-- Turn on for debugging
SET STATISTICS IO ON
SET STATISTICS TIME ON
-- Query using inline table valued function
SELECT TOP 1000
s.[Id]
,s.[CurrencyShortName]
,CONVERT( DECIMAL(19,2), s.[Cost]) AS [Cost USD]
,CONVERT( DECIMAL(19,2), s.Cost * er.ExchangeRate) AS [Cost Converted]
FROM #Shipments s
CROSS APPLY [itvf].[CurrentExchangeRate] (s.CurrencyShortName) er
-- Query using scalar function
SELECT TOP 1000
s.[Id]
,s.[CurrencyShortName]
,CONVERT( DECIMAL(19,2), s.[Cost]) AS [Cost USD]
,CONVERT( DECIMAL(19,2), s.Cost * dbo.fncGetCurrencyExchangeRate(s.[CurrencyShortName]) ) AS [Cost Converted]
FROM #Shipments s
-- Cleanup
DROP TABLE #Shipments
GO
I tried to get this out on SQLFiddle, but it doesn't seem to like how I generate sample data. I'm probably screwing that up too.
When I run on my local machine I observe that the inline TVF has 2000 logical reads and completes in 75-100ms
The scalar UDF has far fewer reads, but has ~300ms elapsed time.
Do these results make sense? Which is the winner?
Is there a way I can optimize my inline UDF approach to get the page reads down?
Insight is appreciated. Thanks!

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Luis Cazares SSC Guru Points: 183698 More actions · Answer 1

One reason can be that scalar functions work row by row while inline functions don't.

Or it could be that scalar functions prevent parallelism (because of the previous cause) and inline functions don't.

Your testings seem logical, but I would change the order of the columns on your index (but I'm no expert on that).

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

jallmond Ten Centuries Points: 1181 More actions · Answer 2

So that's exactly what I'm trying to sort out. The inline function had 2000 page reads so appears to be RBAR, but less CPU. Harder to tell with the scalar, but the high CPU also suggests RBAR.

In the real world production server I'm concerned the pages won't always be in memory and the high page reads of the inline version will translate to physical IO that will affect performance.

If you read above carefully I explained my reasoning for the key column ordering. I won't always know the date, but will always know the currency short name. This is not a transaction heavy table so page splits are not a deal breaker.

I suppose I could alter the indexing strategy to use a IDENTITY column and build a separate non-clustered covering index and see how that impacts performance. To cover I would also have to include the ExchangeRate and PublishDate, and now we're essentially duplicating the clustered index... so that's why I went the way I did.

Luis Cazares SSC Guru Points: 183698 More actions · Answer 3

If it's a currency exchange rate table, you'll always need the date. There's no use of a rate without the date. A non-clustered index makes no sense because the clustered should be enough for any query and a non-clustered will only duplicate data.

Maybe someone else can give a better input. 😉

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Sean Lange SSC Guru Points: 286589 More actions · Answer 4

You might also look into adding SCHEMABINDING to your itvf.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

ScottPletcher SSC Guru Points: 100952 More actions · Answer 5

Tracing scalar UDFs adds enormous time to its execution. That is, just trying to compute the time vastly increases the time. Thus, scalar UDFs generally are performing much better in reality than the SET STATS shows.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

ScottPletcher (8/21/2014)
Tracing scalar UDFs adds enormous time to its execution. That is, just trying to compute the time vastly increases the time. Thus, scalar UDFs generally are performing much better in reality than the SET STATS shows.

Jeff Moden's article here [/url]provides evidence for this, and also shows how to remove some of the noise from the time measurement.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jack Corbett SSC Guru Points: 184393 More actions · Answer 7

The biggest issue I would have with the scalar udf (and I had a similar issue when working on time zone conversion) is that the scalar UDF is being called once for every row in the result set returned. Since you are doing data access in the UDF and not just a calculation as is in Jeff's article, I'd be wary of performance degradation as the # of rows in the Exchange Rate table grows or the number of rows you are processing grows. You can see this behavior by using Profiler or Extended Events. I used Profiler with SQL:StmtStarting, SQL:StmtCompleted, and SP:Completed events to show the # of calls to the udf. If you SP:StmtCompleted you can see the reads done by the scalar udf which aren't reported in the SET STATISTICS IO or in the SQL:stmtcompleted event.

I did a little testing and the best performance will come from joining to a view (Profiler results in comments above each query). here's what I did for test data as I adapted what you had:

USE test;

GO

IF OBJECT_ID('dbo.[fncGetCurrencyExchangeRate]', 'FN') IS NOT NULL

BEGIN;

DROP FUNCTION dbo.[fncGetCurrencyExchangeRate];

END;

IF OBJECT_ID('dbo.[CurrentExchangeRate]', 'IF') IS NOT NULL

BEGIN;

DROP FUNCTION dbo.[CurrentExchangeRate];

END;

IF OBJECT_ID('dbo.LatestExchangeRates', 'V') IS NOT NULL

BEGIN;

DROP VIEW dbo.LatestExchangeRates;

END;

IF OBJECT_ID('dbo.CurrencyExchangeRates', 'U') IS NOT NULL

BEGIN;

DROP TABLE dbo.CurrencyExchangeRates;

END;

CREATE TABLE [dbo].[CurrencyExchangeRates]

(

[ShortName] [CHAR](3) NOT NULL,

[PublishDate] [DATE] NOT NULL

DEFAULT (GETDATE()),

[ExchangeRate] [DECIMAL](19, 9) NOT NULL,

CONSTRAINT [CLIX_CurrencyExchangeRates_ShortName_PublishDate] PRIMARY KEY CLUSTERED

([ShortName] ASC, [PublishDate] ASC)

WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,

IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,

ALLOW_PAGE_LOCKS = ON)

);

GO

-- Create a scalar user defined function

CREATE FUNCTION [dbo].[fncGetCurrencyExchangeRate]

(

@CurrencyShortName CHAR(3)

)

RETURNS DECIMAL(19, 9)

AS

BEGIN

DECLARE @ExchangeRate DECIMAL(19, 9);

SELECT TOP 1

@ExchangeRate = [ExchangeRate]

FROM

[dbo].[CurrencyExchangeRates]

WHERE

[ShortName] = @CurrencyShortName

ORDER BY

[PublishDate] DESC;

RETURN @ExchangeRate;

END;

GO

-- Create an inline table valued function

CREATE FUNCTION dbo.[CurrentExchangeRate]

(

@CurrencyShortName CHAR(3)

)

RETURNS TABLE

AS RETURN

(

SELECT TOP 1

[ExchangeRate]

FROM

[dbo].[CurrencyExchangeRates]

WHERE

[ShortName] = @CurrencyShortName

ORDER BY

[PublishDate] DESC

);

GO

CREATE VIEW LatestExchangeRates

AS

WITH ExchageRates

AS (

SELECT

CER.ShortName,

ROW_NUMBER() OVER (PARTITION BY CER.ShortName ORDER BY CER.PublishDate DESC) AS rowNO,

CER.ExchangeRate

FROM

dbo.CurrencyExchangeRates AS CER

)

SELECT

ExchageRates.ShortName,

ExchageRates.ExchangeRate

FROM

ExchageRates

WHERE

ExchageRates.rowNO = 1;

GO

-- Create temp table

IF OBJECT_ID('tempdb..#Shipments') IS NOT NULL

DROP TABLE #Shipments;

CREATE TABLE #Shipments

(

Id INT IDENTITY

NOT NULL,

Cost DECIMAL(19, 4) NOT NULL,

CurrencyShortName CHAR(3) NOT NULL

);

SET NOCOUNT ON;

INSERT INTO #Shipments

(

Cost,

CurrencyShortName

)

SELECT

CASE WHEN AO.object_id < 0 THEN -1

ELSE 1

END * AO.object_id,

CASE WHEN AO.object_id % 3 = 1 THEN 'CAD'

ELSE 'MXN'

END

FROM

sys.all_objects AS AO;

GO

CREATE CLUSTERED INDEX PK_Shipments_Id ON #Shipments( [id] );

INSERT INTO dbo.CurrencyExchangeRates

(

ShortName,

PublishDate,

ExchangeRate

)

SELECT

CASE WHEN ROW_NUMBER() OVER (ORDER BY AC.column_id) % 3 = 1

THEN 'CAD'

ELSE 'MXN'

END AS ShortName,

CASE WHEN ROW_NUMBER() OVER (ORDER BY AC.column_id) % 2 = 1

THEN DATEADD(DAY,

-1 * ROW_NUMBER() OVER (ORDER BY AC.column_id),

GETDATE())

ELSE DATEADD(DAY,

-2 * ROW_NUMBER() OVER (ORDER BY AC.column_id),

GETDATE())

END,

RAND(ROW_NUMBER() OVER (ORDER BY AC.column_id))

FROM

sys.all_columns AS AC;

-- Verify we have sample data

SELECT

CurrencyShortName,

COUNT(*)

FROM

#Shipments

GROUP BY

CurrencyShortName;

GO

-- Query using inline table valued function

/* From Profiler 3 Rows, 0 CPU, 2227 Reads, 167 Duration */

SELECT TOP 1000

s.[Id],

s.[CurrencyShortName],

CONVERT(DECIMAL(19, 2), s.[Cost]) AS [Cost USD],

CONVERT(DECIMAL(19, 2), s.Cost * er.ExchangeRate) AS [Cost Converted]

FROM

#Shipments s

CROSS APPLY dbo.[CurrentExchangeRate](s.CurrencyShortName) er

ORDER BY

S.Id;

-- Query using scalar function

/* From Profiler 3000 Rows, 94 CPU, 6020 Reads (I had to dump to a table and Sum Reads across all the Scalar UDF calls because BatchCompleted doesn't include those), 276 Duration */

SELECT TOP 1000

s.[Id],

s.[CurrencyShortName],

CONVERT(DECIMAL(19, 2), s.[Cost]) AS [Cost USD],

CONVERT(DECIMAL(19, 2), s.Cost *

dbo.fncGetCurrencyExchangeRate(s.[CurrencyShortName])) AS [Cost Converted]

FROM

#Shipments s

ORDER BY

S.Id;

/* Joining to a view */

/* From Profiler 3 Rows, 0 CPU, 2047 Reads, 208 Duration */

SELECT TOP 1000

S.[Id],

S.[CurrencyShortName],

CONVERT(DECIMAL(19, 2), S.[Cost]) AS [Cost USD],

CONVERT(DECIMAL(19, 2), S.Cost * ER.ExchangeRate) AS [Cost Converted]

FROM

#Shipments AS S

JOIN LatestExchangeRates AS ER

ON ER.ShortName = S.CurrencyShortName

ORDER BY

S.Id;

-- Cleanup

DROP TABLE #Shipments;

GO

I didn't run each one a bunch of times to get average durations and CPU, but I think you'll find over multiple iterations the join to the view will outperform the UDF's every time.

Oh and you can probably get better performance by at the least changing the clustered index to

CLUSTERED

([ShortName] ASC, [PublishDate] DESC)

Because you are ordering by PublishDate desc in the UDF.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 8

You may want to give this blog post a quick perusal.

http://www.sqlservercentral.com/blogs/lynnpettis/2009/05/07/comparing-hardcoded-functions-in-line-tvf-s-and-scalar-functions/