• Jeff Moden (8/5/2012)


    mmilodragovich (8/1/2012)


    Thanks for the detailed and METHODICAL analysis. I once was troubleshooting a performance problem in a large, complicated application and it came down to a "set quoted identifer on" statement in a scalar function. The statement was not needed, was probably generated by a developer who used a right-click-on-object to generate the "create" statement. The function was being called massive number of times so even the slightest performance difference was hugely magnified. I've been wary ever since but know that knowledge is power, so thanks for the article.

    You're welcome and thanks for the feedback. I've had similar not-easy-to-explain things happen in this wonderful world of T-SQL.

    Actually, it's not so hard to explain.

    We all forget time to time that SQL is an interpreter, not a compiler.

    Therefore any piece if code is at least re-evaluated (worst case scenario - recompiled) every time it's invoked.

    Scalar function is applied in the final recordset.

    One time for an each entry a returned row.

    If you try to trace a query with a scalar functuion in it you'll see this:

    SP:StartingSELECT @Result = dbo.TimesTwo(SomeNumber) FROM #TestTable ;

    SP:StmtStartingRETURN @SomeINT * 2

    SP:StmtCompletedRETURN @SomeINT * 2

    SP:CompletedSELECT @Result = dbo.TimesTwo(SomeNumber) FROM #TestTable ;

    repeated as many times as number of rows returned by the query.

    At the same time, table function (parametrised view) in JOIN statement is invoked ONCE - so reavaluated (recompiled) once.

    That's where the performance difference comes from.

    SQL Server is just being busy creatind-recreating the scalar function in memory, that's all.

    In real life it should not make much difference, unless your system (querying) design is really stupid.

    There is no way a user wants to see more than a hundred rows on UI.

    Therefore you should never have million rows scenario in real-life application.

    And if you apply a WHERE clause to all those queries leaving under 1000 rows in the final recordsets you'll see that that difference is really minor, less than 5%.

    Not worth any attention, really.

    Same case as explicitly dropping #tables in a SP code - if you do a test run on only create/drop operations in a SP and execute it 10000 than overheade be really noticable.

    But in a real life procedures the impact will be totally unnoticable. Well, there will a negative impact on performance when you explicitely drop a # table at the end of a SP, but it's totally forgivable for sake of one's style (not my style! :-))

    In real life you should not have millin-row SELECT statements, so overhead of a scalar function can be ignored, if it's more comfortable to code.

    Unless you're doing some kind of Data Transformation task, but even then - it's one-off, and even a big overhead can be tolarated.

    Views used for reporting - yes, that's where you should be concerned.

    And not using UDF's in SELECT part of the queries must be a mandatory rule.

    _____________
    Code for TallyGenerator