Scalar functions sudden slow down

  • Alan.B (12/8/2015)


    On a side note, and I don't know if "the sudden slow down" happened when after the scalar udfs were added, but Scalar UDFs kill parallelism. Often the whole query plan (or part of it at least) can't be parallel. This will slow things down. I have seen scalar UDFs take 1 minute queries and turn them into 20 minute queries.

    "The sudden slow down" happened after indexes reorg, the UDFs have been running for about one year before it.

  • Fandy08 (12/8/2015)


    So, definitively, the reason must have been bad plans after reorg of indexes

    No. Not bad plans. Plans which run more data through the UDFs than previously, resulting in the UDFs running more often and hence slower. The root problem here is that UDFs run at least once for each row, depending on the plan chosen, they can run many, many, many more times, and the optimiser can't get an accurate cost estimation for them.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Orlando Colamatteo (12/8/2015)


    agree it is counterintuitive. rebuilding indexes amd stats (even if only a subset of stats) should not hurt things.

    any noticeable changes (increases or decreases) in data volumes on or arounf when performance changed?

    On the tables underlying the procedure performing worse I only did reorg, no data change, on other (huge) tables instead I deleted hundreds millions record and reorg, but the procedures on them, that use the same UDFs are performing more or less the same.

  • GilaMonster (12/10/2015)


    Fandy08 (12/8/2015)


    So, definitively, the reason must have been bad plans after reorg of indexes

    No. Not bad plans. Plans which run more data through the UDFs than previously, resulting in the UDFs running more often and hence slower. The root problem here is that UDFs run at least once for each row, depending on the plan chosen, they can run many, many, many more times, and the optimiser can't get an accurate cost estimation for them.

    Ok, I understood now Gail.

    Did I miss to check something before deciding to rebuild?

    In other words, could I prevent someway this issue to happen? (Other than using a copy of DB because is huge and I don't have space 🙂 )

  • Fandy08 (12/10/2015)


    Orlando Colamatteo (12/8/2015)


    agree it is counterintuitive. rebuilding indexes amd stats (even if only a subset of stats) should not hurt things.

    any noticeable changes (increases or decreases) in data volumes on or arounf when performance changed?

    On the tables underlying the procedure performing worse I only did reorg, no data change, on other (huge) tables instead I deleted hundreds millions record and reorg, but the procedures on them, that use the same UDFs are performing more or less the same.

    from this info I think it is not the UDF itself causng the issue but the query using it (what Gail is driving at). presumably the queries differ between the proc now running slower (over table where delete+reorg was done) and the one that was more or less unaffected (over table with only reorg). next is to isolate which query in the proc is running slow and provide the actual execution for that query as an attachment to this thread so we can examine it.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • UDFs are slow. They run once per row. The optimiser cannot cost them properly. The problem is the UDF usage.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • no doubt refactoring the UDFs out of the equation should be a long term goal towards better performance and more predictable scaling. woild love to get to the bottom on this particular sudden slowdown.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Fandy08 (12/11/2015)


    GilaMonster (12/10/2015)


    Fandy08 (12/8/2015)


    So, definitively, the reason must have been bad plans after reorg of indexes

    No. Not bad plans. Plans which run more data through the UDFs than previously, resulting in the UDFs running more often and hence slower. The root problem here is that UDFs run at least once for each row, depending on the plan chosen, they can run many, many, many more times, and the optimiser can't get an accurate cost estimation for them.

    Ok, I understood now Gail.

    Did I miss to check something before deciding to rebuild?

    In other words, could I prevent someway this issue to happen? (Other than using a copy of DB because is huge and I don't have space 🙂 )

    Stated that UDFs are used (I will try to change them on inline table value whenever possible), for future interventions on data could I prevent someway this issue to happen?

  • Yes. Remove the UDFs. They're a notorious performance problem, especially when they're data-accessing UDFs that run in a clause of a query, because they will run multiple times and, as you've seen, a change to the plan (caused by stats or just the current plan being aged out of memory) can result in a massive slowdown as the number of times the UDF runs changes.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Why would a UDF run more than once per row, unless it's used more than once per row?

  • depends how its used. you have to look at the execution plan tonsee ot but its in the intermediate steps of the plan.

    a linear example: imagine a UDF used in a where clause wrapping a column. all rows in the table need to be run through the UDF so the result can be compared to a scalar-valued input parameter. in this case all rows in the table are run through the UDF to find maybe 10 rows in the target table.

    a more sneaky, non-linear example: an alternate plan can singnificantly increase the number of calls to the UDF while the data volumes are mostly unchanged. an example might be if the optimizer decides to use a nested loop join against two large datasets due to out of date statistics instead of a hash join causing many more predicates uaing a UDF to be evaluated.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Nested loop joins may fetch the same row many times from the outer table, and if there are functions there they might get evaluated before the join. Also you may have a query that returns 5 rows, but the UDF runs against an intermediate resultset which could have lots more rows than the 5 rows eventually returned.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (12/14/2015)


    Nested loop joins may fetch the same row many times from the outer table, and if there are functions there they might get evaluated before the join. Also you may have a query that returns 5 rows, but the UDF runs against an intermediate resultset which could have lots more rows than the 5 rows eventually returned.

    Yes, conditional clauses and such can obviously reduce the final number of rows returned, but I guess I expected the query engine to cache results once evaluated, so it surprises me that it would run a function again for a result already computed. But I guess it doesn't always choose the most efficient plan of action. And available memory may not allow such caching, even if it was a selectable option for the engine. Probably a good thing - if the engine could figure out everything on its own, some of us might be out of work.

Viewing 13 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply