Why Scalar Functions Can Be Costly

  • Alan.B - Monday, January 4, 2016 11:16 AM

    Eirikur Eiriksson (1/4/2016)


    Worse still are the multi-statement table-valued functions, in fact there is no excuse or justification for using those.

    I agree with this 99.9999% of the time but I did find one exception here... For the Longest Common Substring; I've been playing around with this for a couple years and here's my iTVF solution:
    ...{snip}...
    Phil Factor has a mTVF solution here: https://www.simple-talk.com/blogs/2014/12/20/string-comparisons-in-sql-the-longest-common-substring/[/url]My solution is much, much faster than Phil's for shorter strings but slows down as the strings get longer with the tipping point being somewhere around 100 characters. The problem with the iTVF version is that, without an updateable variable, I have to use a more brute-force approach. I can't beat the mTVF in this case for longer strings (yet). For me, that's been the 0.0001% exception.

    I know it's an older post but just saw this.  Although it's a seriously interesting exercise, is there a practical use to finding the longest common string?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Mickey Stuewe - Sunday, January 3, 2016 11:24 PM

    Comments posted to this topic are about the item Why Scalar Functions Can Be Costly

    Good article, Mickey.  You said there would be follow-up articles on mTVFs and iTVFs.  I hope the comments on this thread didn't scare you off.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden - Saturday, June 3, 2017 8:48 AM

    Mickey Stuewe - Sunday, January 3, 2016 11:24 PM

    Comments posted to this topic are about the item Why Scalar Functions Can Be Costly

    Good article, Mickey.  You said there would be follow-up articles on mTVFs and iTVFs.  I hope the comments on this thread didn't scare you off.

    I'll try not to scare Mickey off 😀
    😎

  • Jeff Moden - Saturday, June 3, 2017 8:44 AM

    Alan.B - Monday, January 4, 2016 11:16 AM

    Eirikur Eiriksson (1/4/2016)


    Worse still are the multi-statement table-valued functions, in fact there is no excuse or justification for using those.

    I agree with this 99.9999% of the time but I did find one exception here... For the Longest Common Substring; I've been playing around with this for a couple years and here's my iTVF solution:
    ...{snip}...
    Phil Factor has a mTVF solution here: https://www.simple-talk.com/blogs/2014/12/20/string-comparisons-in-sql-the-longest-common-substring/[/url]My solution is much, much faster than Phil's for shorter strings but slows down as the strings get longer with the tipping point being somewhere around 100 characters. The problem with the iTVF version is that, without an updateable variable, I have to use a more brute-force approach. I can't beat the mTVF in this case for longer strings (yet). For me, that's been the 0.0001% exception.

    I know it's an older post but just saw this.  Although it's a seriously interesting exercise, is there a practical use to finding the longest common string?

    It does serve it's purpose when trying to match i.e. user's input and fuzzy matching. As fuzzy matching is an Enterprise feature in SSIS then one often has to revert to Standard SQL Server alternatives.
    😎

  • Nice article, Mickey.  I missed it when it was first published, but it sounds like the stuff I talk about all the time.

  • Eirikur Eiriksson - Saturday, June 3, 2017 8:59 AM

    Jeff Moden - Saturday, June 3, 2017 8:44 AM

    Alan.B - Monday, January 4, 2016 11:16 AM

    Eirikur Eiriksson (1/4/2016)


    Worse still are the multi-statement table-valued functions, in fact there is no excuse or justification for using those.

    I agree with this 99.9999% of the time but I did find one exception here... For the Longest Common Substring; I've been playing around with this for a couple years and here's my iTVF solution:
    ...{snip}...
    Phil Factor has a mTVF solution here: https://www.simple-talk.com/blogs/2014/12/20/string-comparisons-in-sql-the-longest-common-substring/[/url]My solution is much, much faster than Phil's for shorter strings but slows down as the strings get longer with the tipping point being somewhere around 100 characters. The problem with the iTVF version is that, without an updateable variable, I have to use a more brute-force approach. I can't beat the mTVF in this case for longer strings (yet). For me, that's been the 0.0001% exception.

    I know it's an older post but just saw this.  Although it's a seriously interesting exercise, is there a practical use to finding the longest common string?

    It does serve it's purpose when trying to match i.e. user's input and fuzzy matching. As fuzzy matching is an Enterprise feature in SSIS then one often has to revert to Standard SQL Server alternatives.
    😎

    Is it any good or is it horribly slow and requires a table or index scan like most on-the-fly fuzzy lookups?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Another point of optimization would be rewriting the scalar Function dbo.fn_SumRangeOfNumbers() so instead of summing integers from a Tally table the function should use the formula for the sum of an arithmetic progression.
    i.e.: (SecondNumber - FirstNumber + 1) * (FirstNumber + SecondNumber) / 2

  • Jeff Moden - Saturday, June 3, 2017 8:44 AM

    Alan.B - Monday, January 4, 2016 11:16 AM

    Eirikur Eiriksson (1/4/2016)


    Worse still are the multi-statement table-valued functions, in fact there is no excuse or justification for using those.

    I agree with this 99.9999% of the time but I did find one exception here... For the Longest Common Substring; I've been playing around with this for a couple years and here's my iTVF solution:
    ...{snip}...
    Phil Factor has a mTVF solution here: https://www.simple-talk.com/blogs/2014/12/20/string-comparisons-in-sql-the-longest-common-substring/[/url]My solution is much, much faster than Phil's for shorter strings but slows down as the strings get longer with the tipping point being somewhere around 100 characters. The problem with the iTVF version is that, without an updateable variable, I have to use a more brute-force approach. I can't beat the mTVF in this case for longer strings (yet). For me, that's been the 0.0001% exception.

    I know it's an older post but just saw this.  Although it's a seriously interesting exercise, is there a practical use to finding the longest common string?

    Sorry I missed your question Jeff; I just saw it now.

    fI have never used the longest common substring to solve a business problem. I was posted on a forum few years ago and I thought it was a fun and challenging exercise. Today I use it to when showing people how to use tally tables - it seams to make a good impression on people new to the concept. 

    One thing you can do with a Longest common substring function is do some preliminary plagiarism detection.  I have a couple examples but can't find them at the moment. Let's say, for example, you have a website like SSC where authors can submit articles. You could (after some initial cleanup such as making breaks/line feeds/carriage returns uniform earlier in the routine) compare a submitted article to all existing articles in your database to check for a longest common substring longer than <user defined number>.  If the query finds that an article has a shares a, say, 90-character substring with another article, the editor is notified to review and see if the submitting author is steeling someone's work or is correctly referencing another article where they give credit.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • TheSQLGuru - Monday, January 4, 2016 7:02 PM

    Not even for Check Constraints since you can get bad data. Search for sql server udf check constraint bug and you can go here: http://sqlblog.com/blogs/tibor_karaszi/archive/2009/12/17/be-careful-with-constraints-calling-udfs.aspxAddressing someone else's comment about WITH SCHEMABINDING: that is a MUST, especially if you use UDFs in UPDATES. You can avoid a nasty table spool put in place to prevent the Halloween Problem in some cases. I recommend using SCHEMABINDING all the time it is an option though - build a box around the user, and in this case the user is YOU, the developer. 🙂

    Not even for computed columns. Another reason why scalar functions in computed columns is a bad idea

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 9 posts - 46 through 54 (of 54 total)

You must be logged in to reply to this topic. Login to reply