Why Scalar Functions Can Be Costly

  • A few of you commented on the fact I stated the wrong edition of SQL that User Defined Scalar functions were introduced in. I apologize. I will have it corrected.

    There was also a Twitter thread on this topic with regards to my article. Kalen Delaney had an interesting fact that I didn't know. MS had announced User Defined Scalar Functions for SQL 6.5, but then they pulled them before the release. They were then introduced in SQL 2000.

    So, now when you're playing Trivial Pursuit, SQL Server edition, you'll have a fun fact under your belt.

    Mickey Stuewe
    Sr Database Developer
    My blog
    Follow me on twitter: @SQLMickey
    Connect with me on LinkedIn
    --------------------------------------------------------------------------
    I laugh loudly and I laugh often. Just ask anyone who knows me.

  • lucien.jacquet (1/4/2016)My point is not to contradict your assertion that scalar functions "can be" expensive, but to suggest that your example is a worst-case scenario. Is it possible that your recommendation to never query a table or view inside of a scalar function might be overly broad?

    Hi Lucien,

    It's actually the opposite. There are very few cases where a scalar function is the "best" solution and most of them have been commented on throughout the day. On the flip side, there are no absolutes when it comes to solving problems with SQL. There are crazy edge cases that go against best practices

    Mickey

    Mickey Stuewe
    Sr Database Developer
    My blog
    Follow me on twitter: @SQLMickey
    Connect with me on LinkedIn
    --------------------------------------------------------------------------
    I laugh loudly and I laugh often. Just ask anyone who knows me.

  • winkl (1/4/2016)


    Thanks for the article, Mickey.

    Have you tested your scalar UDF using schemabinding? We're testing that now to see if we can bring some of the cost down, but so far the improvement seems negligible. (a recent change to a sproc to use UDF to modularize a chunk of code that returns patient int age on date and used in WHERE criteria seems to have tanked efficiency). I'm just curious if you had any considerations for schemabinding.

    I came across this last night:

    https://www.mssqltips.com/sqlservertip/1692/using-schema-binding-to-improve-sql-server-udf-performance/

    We're most likely going to use your suggestion with tvf with cross apply...the case you've laid out is compelling.

    Thanks again!

    Hi winkl,

    No, I haven't tried schema bindings with UDFs. Mostly because, I avoid Scalar functions.

    When using the tvf, make sure to use the in-line tvf, instead of the multi-line tvf. I have future articles to show why, but I didn't want you to wait. πŸ™‚

    Cheers!

    Mickey

    Mickey Stuewe
    Sr Database Developer
    My blog
    Follow me on twitter: @SQLMickey
    Connect with me on LinkedIn
    --------------------------------------------------------------------------
    I laugh loudly and I laugh often. Just ask anyone who knows me.

  • Good Article, Just I was wondering about scalar function, the difference between having read from table inside and don't. In the first example where you're using dbo.fn_SumRangeOfNumbers() I guess you're reading from tail Numbers table. What will happend if you change the code and instead of that put the formula:

    (SecondNumber*(SecondNumber +1))/2-(FirstNumber*(FirstNumber +1))/2

    I think will return the same result, but what's happening with the reads then? are the same (13) or 1 like in Name Format function?

  • Eirikur Eiriksson (1/4/2016)


    Alan.B (1/4/2016)


    I been banging my head trying to beat that mTVF for several months :hehe:

    Anything to ease the pain cause that got to hurt:-D

    😎

    I created a new thread and posted my perf test and associated code here

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • gocheski (1/5/2016)


    Good Article, Just I was wondering about scalar function, the difference between having read from table inside and don't. In the first example where you're using dbo.fn_SumRangeOfNumbers() I guess you're reading from tail Numbers table. What will happend if you change the code and instead of that put the formula:

    (SecondNumber*(SecondNumber +1))/2-(FirstNumber*(FirstNumber +1))/2

    I think will return the same result, but what's happening with the reads then? are the same (13) or 1 like in Name Format function?

    If I'm reading your question correctly, you are having the tally table used in the main query and are passing in SecondNumber and FirstNumber to a Scalar Function. If that is the case, then your reads won't be affected the same way. The tally table will be part of the main query, and not queried for every row returned.

    Mickey

    Mickey Stuewe
    Sr Database Developer
    My blog
    Follow me on twitter: @SQLMickey
    Connect with me on LinkedIn
    --------------------------------------------------------------------------
    I laugh loudly and I laugh often. Just ask anyone who knows me.

  • Great post. Very clear.

    What if you modify dbo.fn_SumRangeOfNumbers() to do a loop and sum without using a tally table? Would it be the same effect as dbo.StandardNameFormat()?

  • Hi,

    I thought his article extremely enlightening. I am not a sql guru, so apologise in advance if this is a silly comment. We use scalar functions a lot as we have to use the same results set in multiple queries and if we change the logic, we don't want to rewrite every where. Would it be possibly to use the function  itself in the cross apply (or even an outer apply)? That way, following the logic of the article, the scalar function will be executed the same number of times?

    Or am I being too much a glass half full kind of guy πŸ™‚ ?

  • richard.davies 86895 - Friday, June 2, 2017 1:29 AM

    Hi,

    I thought his article extremely enlightening. I am not a sql guru, so apologise in advance if this is a silly comment. We use scalar functions a lot as we have to use the same results set in multiple queries and if we change the logic, we don't want to rewrite every where. Would it be possibly to use the function  itself in the cross apply (or even an outer apply)? That way, following the logic of the article, the scalar function will be executed the same number of times?

    Or am I being too much a glass half full kind of guy πŸ™‚ ?

    Yes you can use a TVF rather than an in-line query as in the article. This approach is covered in Simon Sabin's article http://sqlblogcasts.com/blogs/simons/archive/2008/11/03/TSQL-Scalar-functions-are-evil-.aspx.
    If you ever need to refer to that article, just Google "Simon Sabin is evil"  πŸ˜›

  • I won't argue with your premise or your conclusion, but in this case a SQL Server provided function (assuming FirstNumber and SecondNumber are integers) and simple math would have solved the problem and I wonder how it would perform.

    Select  FirstNumber
     , SecondNumber
     , convert(float, FirstNumber + SecondNumber) / 2 * (SecondNumber-FirstNumber+1)

  • Useful article, thank you!!

    Thanks to the internet archive (Wayback Machine), we know that MS tried to implement "in-line scalar functions" which would not have had the cost mentioned in your article. See MS documentation on the wayback machine here

    Of course when they realised they had documented more Microsoft Vapourwareβ„’, the docs were quickly rescinded.

  • I was somewhat bewildered until I realized the scalar function used a tally table instead of an expression such as:
    (FirstNumber + SecondNumber) * (1 + SecondNumber - FirstNumber) / 2

  • what did you expect?
    you have the function in the where clause which means it will be called for each row and then once more for each output row.
    it makes perfect sense
    don't expect wonders when you write bad code

  • Phil Parkin - Monday, January 4, 2016 11:21 AM

    Alan.B (1/4/2016)


    --For me, that's been the 0.0001% exception.

    Wow. That suggests you have written/tested 10,000 (or more) TVFs :w00t:

    Wouldn't that be 1,000,000 (or more) TVFs?

  • t.ovod-everett - Friday, June 2, 2017 10:55 AM

    Phil Parkin - Monday, January 4, 2016 11:21 AM

    Alan.B (1/4/2016)


    --For me, that's been the 0.0001% exception.

    Wow. That suggests you have written/tested 10,000 (or more) TVFs :w00t:

    Wouldn't that be 1,000,000 (or more) TVFs?

    Yep! I really do like writing TVFs! :w00t:

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 15 posts - 31 through 45 (of 53 total)

You must be logged in to reply to this topic. Login to reply