udf very slow?

  • I created a User Defined Function that performs some arithmetic calculations on a few columns and returns an amount.

    The UDF does not do any lookup on the database, just uses the parameters passed (and calls another UDF with similar characteristics).

    I organised my underlying table so that all required columns are available in that table.

    I expected the UDF performance to be extremely quick but it happens to be VERY slow.

    Retrieving ~300,000 rows with the UDF takes about 10mns.

    Just removing the function makes it about 15 seconds (scanning the table and displaying in SSMS).

    There is no index at all on my table and I don't specify any criteria so both Tests involve just a table scan.

    I was under the impression that I would have no performance problem as long as there was no database access within the UDF but this seems wrong.

    I suppose I could get the required performance using a CLR function but I would rather avoid that because it is way beyond the technical skills of my customer (I am a consultant).

    My bottleneck is entirely CPU

    Any idea how to improve this?

  • If you have 300,000 rows, the UDF will be called 300,000 times if you defined a scalar UDF.

    More info:

    User Defined Functions and Performance

    Using a table valued function could improve performance.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I know but I was hoping it would be fast...

    Just talked to my customer and I'll try using CLR!

    Yipee!!! 😀

  • Eric Mamet (10/31/2013)


    I know but I was hoping it would be fast...

    Just talked to my customer and I'll try using CLR!

    Yipee!!! 😀

    Never seen someone so excited about CLR 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • It is much faster to apply a table function.

    CREATE TABLE Test1 (ID INT);

    GO

    INSERT INTO Test1

    SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY a.object_id)

    FROM sys.all_columns a

    CROSS JOIN sys.all_columns b

    GO

    CREATE FUNCTION UDF_INLINE (@Input INT)

    RETURNS INT

    AS

    BEGIN

    DECLARE @I INT;

    SET @I = @Input * 0.14;

    RETURN @I;

    END;

    GO

    CREATE FUNCTION UDF_APPLY (@Input INT)

    RETURNS TABLE

    AS

    RETURN (SELECT @Input * 0.14 AS Result);

    GO

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    -- Return the column with no function

    SELECT

    ID

    INTO

    #test1

    FROM

    Test1;

    /*

    Table 'Test1'. Scan count 1, logical reads 3345

    CPU time = 437 ms, elapsed time = 533 ms.

    */

    -- Return the inline function

    SELECT

    dbo.UDF_INLINE(ID) AS RowName

    INTO

    #test2

    FROM

    Test1;

    /*

    Table '#test2'. Scan count 0, logical reads 1001607

    Table 'Test1'. Scan count 1, logical reads 3345

    CPU time = 10389 ms, elapsed time = 13965 ms.

    */

    -- Apply the function

    SELECT

    b.Result

    INTO

    #test3

    FROM

    Test1 t

    CROSS APPLY

    dbo.UDF_APPLY(t.ID) AS b;

    /*

    Table 'Test1'. Scan count 1, logical reads 3345

    CPU time = 577 ms, elapsed time = 578 ms.

    */

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • I know but I was hoping it would be fast...

    Just talked to my customer and I'll try using CLR!

    Yipee!!!

    I don't know if a CLR is going to be the way to go. I have been a big proponent of CLRs for some things because they do have their place but, in this case, there is no reason to believe that a CLR is going to perform aggregations faster than a T-SQL function. Remember, creating and Implementing a CLR is not a trivial task and introduces new risks and overhead for your SQL environment. You may just be adding more work without any added benefits.

    Take a look at this article How to Make Scalar UDFs Run Faster (SQL Spackle)[/url]. As Sean was saying and demonstrated: you should get much better results turning your function into an inline table valued function. You will have to change your query logic to include a cross apply but that is something numerous people on this site can help you with if you get stuck.

    P.S. I believe there is a new SQL Server Central Stairway on CLRs coming soon. I expect that it will be a good read. 😉

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Actually, I am not too sure to understand what the table function is doing in all this...

    As for the speed, it's shockingly different!

    To retrieve about 1.5 million rows using a simple TSQL UDF took about 4 minutes.

    Replacing the TSQL UDF by a CLR function shrinks the 4 minutes to 18"

    Removing any function still gives me about the same (17").

    In other words, the CLR function is practically invisible in terms of performance! :w00t:

    In fact, I should have remembered because this is an experiment that Itzik Ben Gan had already demonstrated in his wonderful book "Inside SQL Server 2005 TSQL Querying"

    Silly me...

  • CLR might perform very well, but it seems to me that you're cracking nuts with a sledgehammer (or as said in spanish, killing flies with cannonballs).

    Your bottleneck was CPU using UDFs because it will limit your CPU use to one (in other words, you're not using parallelism).

    Good T-SQL should be enough for your problem, but it's all up to you.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • What's the nature of the function you're trying to apply to the data? As Koen says, using functions against large sets of rows can have serious performance implications. If you can do it all inline, you may consider a computed column on the table, which SQL might be able to optimize better than calling a function for each row.

    Executive Junior Cowboy Developer, Esq.[/url]

  • The machine I used was a dual processor and yes CPU was limited to 1 processor.

    So I had one processor flat out for 4 mns.

    I doubt that using 2 processors instead of one would improve the performance to a couple of seconds...

    I don't have access to the actual procedure now but I'll try to post it tomorrow.

  • The lack of parallelism is just a part of the reasons why udfs slow down the code performance. I'll be glad to help (and I'm sure others will be as well) when we know what is intended with the function.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Sean Pearce (10/31/2013)


    It is much faster to apply a table function.

    CREATE TABLE Test1 (ID INT);

    GO

    INSERT INTO Test1

    SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY a.object_id)

    FROM sys.all_columns a

    CROSS JOIN sys.all_columns b

    GO

    CREATE FUNCTION UDF_INLINE (@Input INT)

    RETURNS INT

    AS

    BEGIN

    DECLARE @I INT;

    SET @I = @Input * 0.14;

    RETURN @I;

    END;

    GO

    CREATE FUNCTION UDF_APPLY (@Input INT)

    RETURNS TABLE

    AS

    RETURN (SELECT @Input * 0.14 AS Result);

    GO

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    -- Return the column with no function

    SELECT

    ID

    INTO

    #test1

    FROM

    Test1;

    /*

    Table 'Test1'. Scan count 1, logical reads 3345

    CPU time = 437 ms, elapsed time = 533 ms.

    */

    -- Return the inline function

    SELECT

    dbo.UDF_INLINE(ID) AS RowName

    INTO

    #test2

    FROM

    Test1;

    /*

    Table '#test2'. Scan count 0, logical reads 1001607

    Table 'Test1'. Scan count 1, logical reads 3345

    CPU time = 10389 ms, elapsed time = 13965 ms.

    */

    -- Apply the function

    SELECT

    b.Result

    INTO

    #test3

    FROM

    Test1 t

    CROSS APPLY

    dbo.UDF_APPLY(t.ID) AS b;

    /*

    Table 'Test1'. Scan count 1, logical reads 3345

    CPU time = 577 ms, elapsed time = 578 ms.

    */

    +1. I don't know what others call the type of function you wrote for the "Apply" function but I call them "iSF" or "Inline Scalar Function". Of course, they're really just an Inline Table Valued Function (iTVF) that returns a single element.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eric Mamet (10/31/2013)


    I know but I was hoping it would be fast...

    Just talked to my customer and I'll try using CLR!

    Yipee!!! 😀

    Heh... hate to throw a wet blanket on your fire but you don't need to resort to an SQLCLR function for something so simple (although if you're comfortable with that, then fire away! It can be a great solution when done properly.:-D). Please see the example that Sean Pierce provided in his post above and please see the following article that demonstrates the problem and the fix (the "iSF").

    How to Make Scalar UDFs Run Faster (SQL Spackle)

    [/url]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/31/2013)


    +1. I don't know what others call the type of function you wrote for the "Apply" function but I call them "iSF" or "Inline Scalar Function". Of course, they're really just an Inline Table Valued Function (iTVF) that returns a single element.

    I was really struggling with terminology when I wrote that :blush:

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • Indeed this is what I have done and the result is very impressive.

    I don't think I can detect any load due to the SQLCLR function.

    I have even been able to convince the DBA in charge to CLR enable the database 😛

    Perfect!

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply