User Defined Functions Investigation

  • Here's the late night (quiet server) run times for 25K rows...

    Formula in SELECT
    (25000 row(s) affected)
    00:00:00:080

    ------------------------------

    Function in SELECT

    (25000 row(s) affected)
    00:00:00:157

    ------------------------------

    Formula in WHERE

    (25000 row(s) affected)
    00:00:00:593

    ------------------------------

    Function in WHERE

    (25000 row(s) affected)
    00:00:00:580

    ------------------------------

    ==============================

    Function in SELECT

    (25000 row(s) affected)
    00:00:00:123

    ------------------------------

    Formula in SELECT

    (25000 row(s) affected)
    00:00:00:140

    ------------------------------

    Function in WHERE

    (25000 row(s) affected)
    00:00:00:580

    ------------------------------

    Formula in WHERE

    (25000 row(s) affected)
    00:00:00:547

    ------------------------------

     
    And here's this morning's run for a 1/4 million (250K) rows...
     
    Formula in SELECT
    (250000 row(s) affected)
    00:00:01:453

    ------------------------------

    Function in SELECT

    (250000 row(s) affected)
    00:00:01:470

    ------------------------------

    Formula in WHERE

    (250000 row(s) affected)
    00:00:05:530

    ------------------------------

    Function in WHERE

    (250000 row(s) affected)
    00:00:06:170

    ------------------------------

    ==============================

    Function in SELECT

    (250000 row(s) affected)
    00:00:01:390

    ------------------------------

    Formula in SELECT

    (250000 row(s) affected)
    00:00:01:410

    ------------------------------

    Function in WHERE

    (250000 row(s) affected)
    00:00:05:513

    ------------------------------

    Formula in WHERE

    (250000 row(s) affected)
    00:00:06:890

    ------------------------------

    You'll notice that the Function frequently wins out in speed over the Clear Code (Formula) as previously noted in this thread by myself and Sergiy.  It kinda destroy's Carls hypothesis and refutes Greg's note about UDF's "always being harmful".  It certainly supports what Sergiy has been saying and I believe the following is the finding of this study:

    The myth that UDF's have any significant impact on performance is just that, a myth and that fact has been proven within the bounds of this brief study.  Poorly written code is what saps performance, not the mere use of UDF's, Scalar or not. 

    To summarily dismiss the use of UDF's without having done these initial proofs would have been a horrible mistake especially since a well written library of UDF's to do even the simplest of tasks/conversions can save Developers many hours of, if nothing else, typing and embedding documentation.  Properly named and intuitive-to-use UDF's add clarity to code, reduce "code clutter", and keep Developer's from having to reinvent the wheel when in the heat of battle also know as a "rush project".  It is a fact, as Greg did mention somewhat, that careful planning and the strategic deployment of UDF's to assist Developers in their efforts can certainly lend itself to the speedy development efforts.  However, not all things can be thought of before-hand and the ad-hoc development of needed UDF's (submitted for code review/approval) can only cause the more expediant completion of any given project.

    Carl, you started this and have been amazingly quiet... are you just taking notes and waiting for the dust to settle or what?

    Sergiy, man this has been fun!  Thanks for going back and forth with me on these tests.  And, yeah, I changed my mind... I'll probably make a UDF that does nothing but strip the time off a date and add it to the "library".  

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    How can you disprove that UDFs are NOT harmful?  Let me see.  If the cost of a query exceeds to high and parallelism is used, a Scalar UDF will prevent parallelism, could be bad.  I am not saying that parallelism is always good thing, but I have seen large ETL processes suffer at the hands of Scalar UDFs.  Also, another point is the FACT that UDFs have an internal cursor to them.  That means that you are saying that an explicit CURSOR around large datasets is OK.  Or, CURSORS are ok.  I would rather say cursors have a time and place.  Be smart and play it safe.  This is what I mean by Strategic planning with UDFs.  The way that I see it, there is no dispute over Scalar UDFs being harmful.  They are.  Now, you may not see any direct impact, but nonetheless they have influence over performance.  Go ahead and use your Scalar UDFs on large datasets, and one day if the set becomes large enough, you will be re-evaluating those UDFs.

    I know the MS guys can speak to this more clearly.

    I agree, this is fun reading everyones thoughts and seeing what you came up with.  Most of my day consists of evaluating Scalar UDFs, and we just went through all of our code performing Performance Evaluations and most on the ETL process were rewritten to use straight T-SQL vs. Scalar UDFs.  I have seen UDFs cause so much harm that by removing one UDF, changed the performance of load script from 4 hours to 50 minutes.

    Thanks

    ~ Greg

  • "Carl, you started this and have been amazingly quiet... are you just taking notes and waiting for the dust to settle or what?"

    Someone has been out sick for the last 2 days, so I got stuck with the "on-call" and production support. Production support mostly means answering foolish questions like:

    "Please reset my database password. The answers are:

    1. You are attempting to connect to the wrong server.

    2. You are using Windows authentication so please call the help desk.

    Second favorite request: "Can I have the SA password?"

    I will have some results later using a benchmark approach to simulate an OLTP environment. This is done by running 24 simultaneous isql.exe command files versus an 8 CPU Cluster with the output going to files. There are 3 groups of 8 command files that do:

    select with no "Date Only" as a baseline

    select with UDF

    select with In-Line calculation

    Cheers

    SQL = Scarcely Qualifies as a Language

  • UDFs and the impact on performance.

    Abstract:

    Most SQL Database Management Systems include the capability of allowing the user to implement an algorithm by writing their own functions either in a SQL dialect or using a host language. These functions are referred to as User Defined Functions. A series of test using Microsoft SQL Server 2000 were performed that simulates a high OLTP work-load on a high-end server class machine and the performance of these SQL statement without the algorithm, with the algorithm written in-line within the statement and as a UDF. The thruput and resource utilization of the alternatives are then analyzed.

    Backgound:

    Most comparisons that have been previously performed concentrate on comparing elapsed time when the needed algorithm is written in-line using vendor provided function versus encapsulating the vendor provided functions within a UDF. Additionally, the tests were performed sequentially which allows environmental factors, such as an un-detected work load on the DBMS, to make the results questionable. This comparison will track elapsed time, logical reads and CPU utilization with the various alternative SQL statements running in parallel over a fixed time span. The SQL Server Tracing facility is used to capture these statistics for each statement. To insure that only the only the steady state resource utilization is analyzed, the first 15 seconds of start-up and the last 15 seconds of shut-down of the workload run are not included in the analysis.

    Environment:

    The DBMS used is Microsoft SQL Server 2000 Enterprise Edition with Service Pack 4 running under Windows 2003 Server Enterprise Edition. The server hardware is an 8 CPU Xeon 3.0 GHz with 4 GB of memory. No other process where using the server or the DBMS and pre and post tests showed CPU utilization of less than one percent. Disk drives were removed as a factor in the tests by forcing the tables accessed to be resident in memory. To allow others to run the same benchmark, the SQL Server demo database named Northwind was used which contains the table "Orders" having 830 rows and three datetime columns.

    Test Case:

    The test case used an algorithm to remove the time portion of a column that holds both the date and time. For example, given a column with the value of '2005-12-07 15:46:34.373', the algorithm should return '2005-12-07 00:00:00.000'.

    The SQL Server algorithm is convert(datetime, convert(int, @DateTime - 0.5 ) )

    The User Defined Function SQL source is:

    create function dbo.DateOnly

    ( @DateTimedatetime

    )

    RETURNS datetime

    AS

    BEGIN

    RETURN convert(datetime, convert(int, @DateTime - 0.5 ) )

    END

    go

    Test Run SQL statements:

    Three sets of test run scripts were created.

    1.Select from the tables without using any algorithm.

    select 'TC-NOT'

    ,Orders.OrderId

    ,Orders.OrderDate

    ,Orders.RequiredDate

    ,Orders.ShippedDate

    from dbo.Orders

    2.Select from the table with the algorithm in-line within the SQL Statement executed.

    select 'TC-SQL'

    ,Orders.OrderId

    ,convert(datetime, convert(int,Orders.OrderDate - 0.5)) AS OrderDate

    ,convert(datetime, convert(int,Orders.RequiredDate - 0.5))AS RequiredDate

    ,convert(datetime, convert(int,Orders.ShippedDate - 0.5)) AS ShippedDate

    from dbo.Orders

    3.Select from the table with the algorithm within a UDF.

    select 'TC-UDF'

    ,Orders.OrderId

    ,dbo.DateOnly( Orders.OrderDate )AS OrderDate

    ,dbo.DateOnly( Orders.RequiredDate )AS RequiredDate

    ,dbo.DateOnly( Orders.ShippedDate )AS ShippedDate

    from dbo.Orders

    Workload scripts:

    Three sets of work load scripts were written so that each SQL statement would begin at the same times and run for a fixed duration of two minutes. The UDF work load scripts has the below source and the other two scripts are similar. The "WAITFOR TIME" command causes a pause until the time is reached and then the remaining SQL statements are executed.

    WAITFOR TIME '13:10:00'

    declare @EndTSdatetime

    set @EndTs= dateadd(mi , 2 , current_timestamp )

    WHILE current_timestamp< @EndTs

    begin

    select 'TC-UDF'

    ,Orders.OrderId

    ,dbo.DateOnly( Orders.OrderDate )AS OrderDate

    ,dbo.DateOnly( Orders.RequiredDate )AS RequiredDate

    ,dbo.DateOnly( Orders.ShippedDate )AS ShippedDate

    from dbo.Orders

    end

    go

    Simultaneous execution of work load scripts.

    For each work load script, four Windows command files were created to invoke the script using the SQL Server ISQL utility where the output is written to file. See the Microsoft SQL Server Books OnLine for the meaning of the parameters. Each command file specifies a different output file name.

    isql.exe -E -SUSAG551IOMT1-4 -dNorthwind -h-1 -w1000 -i"R:\UDF\Source\Extend_Orders_DateOnly_UDF.sql" -o"R:\UDF\OutputFiles\Extend_Orders_DateOnly_01.txt"

    Resource Utilization Statistics:

    The below statisicts use the exclamation mark as the delimiter:

    Method!Base!SQL!UDF!SQL - Base!UDF- Base

    Execution!2401!2280!1880!!

    Total Duration!718685!712109!715065!!

    Total Reads!62426!59280!52640!!

    Total CPU!3235!4342!58779!!

    Avg Duration MS!299.327!312.329!380.354!13.00!81.03

    Average Reads!26.000!26.000!28.000!0.00!2.00

    Average CPU!1.347!1.904!31.265!0.56!29.92

    Analysis:

    The use of a UDF instead of in-line SQL resulted in a 21 % increase in duration and a 15 fold increase in CPU Utilization.

    Recommendation:

    SQL Server User Defined function should be avoided due to the huge performance impact especially the effect on CPU utilization.

    SQL = Scarcely Qualifies as a Language

  • Thanks Carl, good work.  Anyone who has been performance tuning with SQL Server and has worked with Scalar UDFs would reckognize the performance hit.  It is amazing how harmful they can be.  Careful planning should be used when using Scalar UDFs.

    Carl, do you think you can run a test of a SQL Server View and an In-line table UDF?  I have seen similar execution plans between the two and cannot see a negative effect on performance.  Unless, you have 20 joins, lots of column output, and nested sub-queries.

    Thanks

    Greg

  • I have at least 5 other sets of comparisons to run to be included in the final article and the results make the DATEONLY UDF look good. A lot of the time spent so far was on the infrastucture such as setting up traces and confirming that the results reflect the actual resource utilization.

    1. Deterministic UDFs vs In-Line SQL in the SELECT section of SQL statement. These will include math, date and string algorithms.

    2. Non-Deterministic UDFs vs In-Line SQL in the SELECT section of SQL statement. These include single row, the SUM aggregation, the MAX aggregation and the LIST.

    3. Deterministic UDFs vs In-Line SQL in the WHERE section of SQL statement

    4. Non-Deterministic UDFs vs In-Line SQL in the WHERE section of SQL statement

    5. Deterministic UDFs vs In-Line SQL as a CHECK Constraint

    6. Non-Deterministic UDFs as CHECK Constraint vs In-Line SQL in a Trigger

    Cheers.

    For references, I have found:

    Ken Henderson's "The Guru's Guide to SQL Server Stored Procedures, XML and HTML" which offers some interesting test cases. Since all of the source is on a CD, this is very convinient.

    Joseph Gama's "TSQL functions" at http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=502&lngWId=5 These UDFs appear to be all deterministic.

    Andrew Novick's UDF library at http://www.novicksoftware.com/

    SQL = Scarcely Qualifies as a Language

  • Do you actually believe that???

    Average CPU!1.347!1.904!31.265!0.56!29.92

    Analysis:

    The use of a UDF instead of in-line SQL resulted in a 21 % increase in duration and a 15 fold increase in CPU Utilization.

    You are making common beginner's mistake:

    you do not take into cinsideration the simple fact that process of collecting statistics is taking time and resources as well.

    SQL Server does not compute statistics on built-in functions. Just because they are not subject for investigations.

    But every execution of UDF adds extra record into statistics log.

    It's not UDF to utilize your CPU, it's your statistics!

    Your conlusions are wrong because your experiment is badly desined.

    In real system you do not record statistics on every executed statement, so UDF will not have any noticable impact on performance.

     

    _____________
    Code for TallyGenerator

  • Regarding your comments on the trace invalidating the statistics.

    "But every execution of UDF adds extra record into statistics log."

    It depends on how the trace is configured.

    The trace used to capture the statistics only includes Event ID 45 - SQL:StmtCompleted.

    If the trace had included Event Id 41 - SP:StmtCompleted, then you would be correct that each of statements within the UDF would be additionally written to the trace file. But this is not occuring.

    Additionally:

    1. Separate tests were run without the trace being active that captured the count of the number of SQL statement run within the two minute window for each of the 12 connections. These counts match the count found by the trace.

    2. The resource used by the trace are at the server level not at the statement level. Since all of the SQL statements that are being benchmarked are running at the same time, they are all be equally impacted by the resources used by the trace.

    I stand by the statistics reported and the conclusions.

    SQL = Scarcely Qualifies as a Language

  • An additional comment on tracing.

    If one uses the SQL Profilier to capture the trace information, I have found that this tool does have an effect on the overall performance of the server.

    That is why the trace is set up to run in the background and write to a file, not to a GUI.

    SQL = Scarcely Qualifies as a Language

  • Switch off all SQL statistics and run Performance Monitor from Windows "Administrative Tools"

    And then try to catch your 15 times difference in CPU load.

    _____________
    Code for TallyGenerator

  • Greg Wrote: Go ahead and use your Scalar UDFs on large datasets, and one day if the set becomes large enough, you will be re-evaluating those UDFs.

    Allow me to quote from one of my earlier posts:  "AND, each test was on 2 MILLION records using the same table."  I really appreciate your concerns but my question would be, how big is large enough?  An observation might also be that if you're processing more than that using OLTP in a single batch, perhaps a change in processing schedule is in order.  That, not withstanding, I'd be happy to run a test with whatever size record set you can provide once you've shown the results from the same tests.  I would also ask that you share the test code so we can run them and make our own evaluation of your tests.

    Carl wrote: Additionally, the tests were performed sequentially which allows environmental factors, such as an un-detected work load on the DBMS, to make the results questionable.

    Not in my case.  As I stated previously... Here's the results from my test runs on a stand-alone single-user non-networked server on my little ol' 1.8Ghz, 2GB ram, IDE disk, desktop server running SQL Server 2000 Developer's Edition with SP3a on an XP-sp2 box... I ran it a couple of times to be sure... very similar results on all runs... Only the later runs were done on a multi-user server.  In all cases, the comparisons were reversed within each run to detect that possibility.  That, notwithstanding, I'll be happy to run parallel tests on a server and a stand alone box.

    Shifting gears a bit folks, I don't give a rat's patooti about whether or not SQL decides to use an internal cursor or not.  In many cases, those internal cursors beat the clear code for performance.  Am I suggesting that someone write cursors or that I condone the use of developer made cursors?  Not on your life...

    I can't speak of the parallelism problems that Greg is warning us about because I've not done any testing for the alleged fault, yet.  You can bet I will.

    Sergiy, once again, I agree with you... to measure something is to change it.  That's why I wrapped each instance of code to be measured in simple elapsed time code instead of allowing embedded features to do the measurements in an unpredictable fashion... I have noticed that setting statistics times and doing traces can and frequently does change the elapsed time of otherwise good code.  I will however, be doing some testing with tracing and stats on just to see how much they affect identical code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have re-done the same performance comparison but captured resource utilization from the sysprocess table instead of using a trace.

    On the average CPU Milliseconds with sysprocess versus trace

    For UDFs, 29.642 vs 31.265 for a 5% decrease in CPU without the trace.

    For SQLs, 1.720 vs 1.904 for a 10% decrease in CPU without the trace.

    For None, 1.518 vs 1.347 for a 12% INCREASE in CPU without the trace.

    The conclusions remain the same, that in a two minute window:

    1. Execution counts was 3,160 for the Inline SQL versus 2,417 for the UDF. This equates to at 24% drop in thruput.

    2. Average Cpu usage in milliseconds was 1.72 for the Inline SQL was versus 29.642 for the UDF. This equates to a 17 times more CPU resource needed.

    DENY CREATE FUNCTION TO PUBLIC !!!

    SQL Scripts:

    CREATE TABLE dbo.ProcessResources

    ( Condition varchar (255) NOT NULL

    , Spid smallint NOT NULL

    , SQLCnt int NOT NULL

    , CpuMs int NOT NULL

    , PhysicalIo int NOT NULL

    )

    go

    -- Revised SQL Batch to capture statistics from the sysprocesses table.

    WAITFOR TIME '9:49:00'

    declare @EndTSdatetime

    ,@SQLCntinteger

    set@SQLCnt = 0

    set @EndTs= dateadd(mi , 2 , current_timestamp )

    while current_timestamp< @EndTs

    begin

    select 'TC-UDF'

    ,Orders.OrderId

    ,dbo.DateOnly( Orders.OrderDate )AS OrderDate

    ,dbo.DateOnly( Orders.RequiredDate )AS RequiredDate

    ,dbo.DateOnly( Orders.ShippedDate )AS ShippedDate

    from dbo.Orders

    SET@SQLCnt = @SQLCnt + 1

    end

    Insert into udf_traces.dbo.ProcessResources

    (Condition,spid,SQLCnt , CpuMs,PhysicalIo)

    select 'TC-UDF'

    ,spid

    ,@SQLCnt

    , CPU

    ,Physical_IO

    from master.dbo.sysprocesses

    where spid = @@spid

    go

    -- Summarize by test case

    select condition, SUM(SQLCnt), SUM(CPUMS) , SUM(PhysicalIo)

    FROM UDF_Traces.dbo.ProcessResources

    Group by Condition

    go

    -- Show the Resources used by each Process

    SELECT Condition, Spid, SQLCnt, CpuMs, PhysicalIo

    FROM UDF_Traces.dbo.ProcessResources

    order by Condition, spid

    The details for each process:

    CaseSPIDSQLCPUPIO

    TC-NOT643282500

    TC-NOT652364530

    TC-NOT663836560

    TC-NOT674687340

    TC-NOT685499210

    TC-NOT695529070

    TC-NOT702503280

    TC-NOT713945470

    TC-SQL723194690

    TC-SQL732484690

    TC-SQL743926880

    TC-SQL751402650

    TC-SQL763094840

    TC-SQL774637810

    TC-SQL783075630

    TC-SQL792394380

    TC-UDF80458135470

    TC-UDF81517152500

    TC-UDF8225373120

    TC-UDF83510148750

    TC-UDF8427282190

    TC-UDF85474138130

    TC-UDF8622268440

    TC-UDF87383117030

    SQL = Scarcely Qualifies as a Language

  • I have re-done the same performance comparison but captured resource utilization from the sysprocess table instead of using a trace.

    On the average CPU Milliseconds with sysprocess versus trace

    For UDFs, 29.642 vs 31.265 for a 5% decrease in CPU without the trace.

    For SQLs, 1.720 vs 1.904 for a 10% decrease in CPU without the trace.

    For None, 1.518 vs 1.347 for a 12% INCREASE in CPU without the trace.

    The conclusions remain the same, that in a two minute window:

    1. Execution counts was 3,160 for the Inline SQL versus 2,417 for the UDF. This equates to at 24% drop in thruput.

    2. Average Cpu usage in milliseconds was 1.72 for the Inline SQL was versus 29.642 for the UDF. This equates to a 17 times more CPU resource needed.

    DENY CREATE FUNCTION TO PUBLIC !!!

    SQL Scripts:

    CREATE TABLE dbo.ProcessResources

    ( Condition varchar (255) NOT NULL

    , Spid smallint NOT NULL

    , SQLCnt int NOT NULL

    , CpuMs int NOT NULL

    , PhysicalIo int NOT NULL

    )

    go

    -- Revised SQL Batch to capture statistics from the sysprocesses table.

    WAITFOR TIME '9:49:00'

    declare @EndTSdatetime

    ,@SQLCntinteger

    set@SQLCnt = 0

    set @EndTs= dateadd(mi , 2 , current_timestamp )

    while current_timestamp< @EndTs

    begin

    select 'TC-UDF'

    ,Orders.OrderId

    ,dbo.DateOnly( Orders.OrderDate )AS OrderDate

    ,dbo.DateOnly( Orders.RequiredDate )AS RequiredDate

    ,dbo.DateOnly( Orders.ShippedDate )AS ShippedDate

    from dbo.Orders

    SET@SQLCnt = @SQLCnt + 1

    end

    Insert into udf_traces.dbo.ProcessResources

    (Condition,spid,SQLCnt , CpuMs,PhysicalIo)

    select 'TC-UDF'

    ,spid

    ,@SQLCnt

    , CPU

    ,Physical_IO

    from master.dbo.sysprocesses

    where spid = @@spid

    go

    -- Summarize by test case

    select condition, SUM(SQLCnt), SUM(CPUMS) , SUM(PhysicalIo)

    FROM UDF_Traces.dbo.ProcessResources

    Group by Condition

    go

    -- Show the Resources used by each Process

    SELECT Condition, Spid, SQLCnt, CpuMs, PhysicalIo

    FROM UDF_Traces.dbo.ProcessResources

    order by Condition, spid

    The details for each process:

    CaseSPIDSQLCPUPIO

    TC-NOT643282500

    TC-NOT652364530

    TC-NOT663836560

    TC-NOT674687340

    TC-NOT685499210

    TC-NOT695529070

    TC-NOT702503280

    TC-NOT713945470

    TC-SQL723194690

    TC-SQL732484690

    TC-SQL743926880

    TC-SQL751402650

    TC-SQL763094840

    TC-SQL774637810

    TC-SQL783075630

    TC-SQL792394380

    TC-UDF80458135470

    TC-UDF81517152500

    TC-UDF8225373120

    TC-UDF83510148750

    TC-UDF8427282190

    TC-UDF85474138130

    TC-UDF8622268440

    TC-UDF87383117030

    SQL = Scarcely Qualifies as a Language

  • I have re-done the same performance comparison but captured resource utilization from the sysprocess table instead of using a trace.

    On the average CPU Milliseconds with sysprocess versus trace

    For UDFs, 29.642 vs 31.265 for a 5% decrease in CPU without the trace.

    For SQLs, 1.720 vs 1.904 for a 10% decrease in CPU without the trace.

    For None, 1.518 vs 1.347 for a 12% INCREASE in CPU without the trace.

    The conclusions remain the same, that in a two minute window:

    1. Execution counts was 3,160 for the Inline SQL versus 2,417 for the UDF. This equates to at 24% drop in thruput.

    2. Average Cpu usage in milliseconds was 1.72 for the Inline SQL was versus 29.642 for the UDF. This equates to a 17 times more CPU resource needed.

    DENY CREATE FUNCTION TO PUBLIC !!!

    SQL Scripts:

    CREATE TABLE dbo.ProcessResources

    ( Condition varchar (255) NOT NULL

    , Spid smallint NOT NULL

    , SQLCnt int NOT NULL

    , CpuMs int NOT NULL

    , PhysicalIo int NOT NULL

    )

    go

    -- Revised SQL Batch to capture statistics from the sysprocesses table.

    WAITFOR TIME '9:49:00'

    declare @EndTSdatetime

    ,@SQLCntinteger

    set@SQLCnt = 0

    set @EndTs= dateadd(mi , 2 , current_timestamp )

    while current_timestamp< @EndTs

    begin

    select 'TC-UDF'

    ,Orders.OrderId

    ,dbo.DateOnly( Orders.OrderDate )AS OrderDate

    ,dbo.DateOnly( Orders.RequiredDate )AS RequiredDate

    ,dbo.DateOnly( Orders.ShippedDate )AS ShippedDate

    from dbo.Orders

    SET@SQLCnt = @SQLCnt + 1

    end

    Insert into udf_traces.dbo.ProcessResources

    (Condition,spid,SQLCnt , CpuMs,PhysicalIo)

    select 'TC-UDF'

    ,spid

    ,@SQLCnt

    , CPU

    ,Physical_IO

    from master.dbo.sysprocesses

    where spid = @@spid

    go

    -- Summarize by test case

    select condition, SUM(SQLCnt), SUM(CPUMS) , SUM(PhysicalIo)

    FROM UDF_Traces.dbo.ProcessResources

    Group by Condition

    go

    -- Show the Resources used by each Process

    SELECT Condition, Spid, SQLCnt, CpuMs, PhysicalIo

    FROM UDF_Traces.dbo.ProcessResources

    order by Condition, spid

    The details for each process:

    CaseSPIDSQLCPUPIO

    TC-NOT643282500

    TC-NOT652364530

    TC-NOT663836560

    TC-NOT674687340

    TC-NOT685499210

    TC-NOT695529070

    TC-NOT702503280

    TC-NOT713945470

    TC-SQL723194690

    TC-SQL732484690

    TC-SQL743926880

    TC-SQL751402650

    TC-SQL763094840

    TC-SQL774637810

    TC-SQL783075630

    TC-SQL792394380

    TC-UDF80458135470

    TC-UDF81517152500

    TC-UDF8225373120

    TC-UDF83510148750

    TC-UDF8427282190

    TC-UDF85474138130

    TC-UDF8622268440

    TC-UDF87383117030

    SQL = Scarcely Qualifies as a Language

  • BANG!   I'm dead.  You guys finally proved your point with some cold, hard facts.  I didn't like Carl's test the way it was because he left out two very important statistics.... number of rows and total elapsed time for each run.  And, what I'm really interested in is elapsed time because I know it sometimes takes resources to get performance.   So, using Carl's example, but rewritten to give me what I'm looking for, I ran two separate tests using the same UDF we've been using all along... here's the code...

    Test #1 using a UDF:
    drop table MyTimes
     CREATE TABLE dbo.MyTimes

            (

            RunNum    INT  IDENTITY(1,1) PRIMARY KEY CLUSTERED,

            TimeStart DATETIME,

            TimeEnd   DATETIME,

            TheCount  INT

            )

     
    DECLARE @Limit      SMALLINT

        SET @Limit    = 5

    DECLARE @Counter    SMALLINT

        SET @Counter  = 1

    DECLARE @TimeStart  DATETIME

    DECLARE @TimeEnd    DATETIME

    DECLARE @MyRowCount INT

     
      WHILE @Counter <= @Limit

      BEGIN

                SET @TimeStart = GETDATE()

             SELECT 'TC-UDF',

                    OrderId,

                    dbo.DateOnly(OrderDate)    AS OrderDate,

                    dbo.DateOnly(RequiredDate) AS RequiredDate,

                    dbo.DateOnly(ShippedDate)  AS ShippedDate

               FROM dbo.Orders

     
             SELECT @TimeEnd = GETDATE(),@MyRowCount = @@ROWCOUNT

             INSERT INTO MyTimes

                    (TimeStart,TimeEnd,TheCount)

             SELECT @TimeStart,@TimeEnd,@MyRowCount

        SET @Counter = @Counter + 1

        END

    SELECT *,DATEDIFF(ms,TimeStart,TimeEnd) AS Elapsed_ms FROM MyTimes

    -------------------------------------------------------------------------

     
    Test #2 using "clear code":
     
     CREATE TABLE dbo.MyTimes

            (

            RunNum    INT  IDENTITY(1,1) PRIMARY KEY CLUSTERED,

            TimeStart DATETIME,

            TimeEnd   DATETIME,

            TheCount  INT

            )

     
    DECLARE @Limit      SMALLINT

        SET @Limit    = 5

    DECLARE @Counter    SMALLINT

        SET @Counter  = 1

    DECLARE @TimeStart  DATETIME

    DECLARE @TimeEnd    DATETIME

    DECLARE @MyRowCount INT

     
      WHILE @Counter <= @Limit

      BEGIN

                SET @TimeStart = GETDATE()

             SELECT 'TC-UDF',

                    OrderId,

                    convert(datetime, convert(int, OrderDate - 0.5))  AS OrderDate,

                    convert(datetime, convert(int, RequiredDate - 0.5)) AS RequiredDate,

                    convert(datetime, convert(int, ShippedDate - 0.5))  AS ShippedDate

               FROM dbo.Orders

     
             SELECT @TimeEnd = GETDATE(),@MyRowCount = @@ROWCOUNT

             INSERT INTO MyTimes

                    (TimeStart,TimeEnd,TheCount)

             SELECT @TimeStart,@TimeEnd,@MyRowCount

        SET @Counter = @Counter + 1

        END

     
    SELECT *,DATEDIFF(ms,TimeStart,TimeEnd) AS Elapsed_ms FROM MyTimes

     

    I also didn't like the small number of rows (only 830) offered by the Orders table in the Northwind database... I mean this IS supposed to be a performance test, isn't it?  So, I made my own and changed the TOP number to produce various sizes for the different tests... Here's the code:

    --===================================================================

    --      Create the test Table (Careful!!! Has a drop!!!!)

    --===================================================================

    --===== If the test table exists, drop it (CAREFUL!!!!)

         IF OBJECT_ID('dbo.Orders') IS NOT NULL

            DROP TABLE dbo.Orders

    --===== Create and populate the test table on the fly

     SELECT TOP 2500000 --Change this number to change the final size

            IDENTITY(INT,1,1) AS OrderId,

            GETDATE() AS OrderDate,

            GETDATE() AS RequiredDate,

            GETDATE() AS ShippedDate,

            'THEY WILL BOTH PERFORM THE SAME' AS SomeString,

            'MYTH BUSTED!!!' AS SomeOtherString,

            NEWID() AS GUID

       INTO dbo.Orders

       FROM dbo.SYSCOLUMNS sc1,

            dbo.SYSCOLUMNS sc2,

            dbo.SYSCOLUMNS sc3

    --===== Add a primary key to the test table

      ALTER TABLE dbo.Orders

            ADD PRIMARY KEY CLUSTERED (OrderId)

    GO

    Here's the results for Test #1 using 2.5 million records (Uses the UDF):

    RunNum TimeStart               TimeEnd                TheCount  Elapsed_ms

    1      2005-12-08 22:32:01.580 2005-12-08 22:33:57.393 2500000  115813

    2      2005-12-08 22:33:57.393 2005-12-08 22:35:52.580 2500000  115186

    3      2005-12-08 22:35:52.580 2005-12-08 22:37:47.300 2500000  114720

    4      2005-12-08 22:37:47.300 2005-12-08 22:39:41.613 2500000  114313

    5      2005-12-08 22:39:41.613 2005-12-08 22:41:35.893 2500000  114280

    Here's the results for Test #2 using 2.5 million records (Uses clear code):

    RunNum TimeStart                      TimeEnd                       TheCount Elapsed_ms

    1          2005-12-08 22:46:11.270 2005-12-08 22:47:06.300 2500000 55030

    2          2005-12-08 22:47:06.300 2005-12-08 22:48:00.643 2500000 54343

    3          2005-12-08 22:48:00.677 2005-12-08 22:48:54.630 2500000 53953

    4          2005-12-08 22:48:54.643 2005-12-08 22:49:48.207 2500000 53563

    5          2005-12-08 22:49:48.207 2005-12-08 22:50:41.817 2500000 53610

    Clearly, the clear code beat the pant's off the UDF version.  But we saw this in the last test code... on the larger tests, the UDF lost but on the smaller tests, the UDF won.  So, I reran the code but for only 2500 records and here's the results...

    Here's the results for Test #1 using 2500 records (Uses the UDF):

    RunNum TimeStart                      TimeEnd                        TheCount Elapsed_ms

    1          2005-12-08 22:58:44.940 2005-12-08 22:58:45.050 2500       110

    2          2005-12-08 22:58:45.050 2005-12-08 22:58:45.177 2500       126

    3          2005-12-08 22:58:45.177 2005-12-08 22:58:45.283 2500       106

    4          2005-12-08 22:58:45.283 2005-12-08 22:58:45.410 2500       126

    5          2005-12-08 22:58:45.410 2005-12-08 22:58:45.533 2500       123

    Here's the results for Test #2 using 2500 records (Uses clear code):

    RunNum TimeStart                      TimeEnd                         TheCount Elapsed_ms

    1          2005-12-08 23:00:48.707 2005-12-08 23:00:48.753 2500        46

    2          2005-12-08 23:00:48.753 2005-12-08 23:00:48.817 2500        63

    3          2005-12-08 23:00:48.817 2005-12-08 23:00:48.880 2500        63

    4          2005-12-08 23:00:48.880 2005-12-08 23:00:48.927 2500        46

    5          2005-12-08 23:00:48.927 2005-12-08 23:00:48.987 2500        60

    Again, and very clearly still, the clear code beat the pant's off the UDF.  It wasn't by the 17 times that Carl claimed (I never do trust CPU times, just elapsed times), but it is very consistant... the clear code is about twice as fast as the UDF.

    This test is also a bit more real world than my previous test that showed the UDF sometimes winning... so Crow's beware... I've gotta eat some here.

    For the books, this was done on a stand-alone, single 1.8 Ghz CPU, copy of MS-SQL Server 2000 sp3A Developer's Edition with 2GB of RAM and twin 80GB IDE hard-drives.  In other words, there's no chance that anyone else ran something while I was doing this and I didn't even move the mouse during the tests.  And, there were no scheduled tasks anywhere near the runs.  All the runs were run several times with nearly identical results.

    Sergiy, I don't think I'll be making that UDF after all.  It's not quite clear to me why the other tests sometimes showed the UDF winning but it clearly did not in these tests.  Is there anything that you can see that I've done wrong that may give undo favor to the clear code in the test code above?  And, yes, I did run each test without the loop several times to ensure that the loop code wasn't what was doing it.

    Now, where's that Crow?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 53 total)

You must be logged in to reply to this topic. Login to reply