December 7, 2005 at 7:30 am
Here's the late night (quiet server) run times for 25K rows...
------------------------------
Function in SELECT
------------------------------
Formula in WHERE
------------------------------
Function in WHERE
------------------------------
==============================
Function in SELECT
------------------------------
Formula in SELECT
------------------------------
Function in WHERE
------------------------------
Formula in WHERE
------------------------------
------------------------------
Function in SELECT
------------------------------
Formula in WHERE
------------------------------
Function in WHERE
------------------------------
==============================
Function in SELECT
------------------------------
Formula in SELECT
------------------------------
Function in WHERE
------------------------------
Formula in WHERE
------------------------------
The myth that UDF's have any significant impact on performance is just that, a myth and that fact has been proven within the bounds of this brief study. Poorly written code is what saps performance, not the mere use of UDF's, Scalar or not.
To summarily dismiss the use of UDF's without having done these initial proofs would have been a horrible mistake especially since a well written library of UDF's to do even the simplest of tasks/conversions can save Developers many hours of, if nothing else, typing and embedding documentation. Properly named and intuitive-to-use UDF's add clarity to code, reduce "code clutter", and keep Developer's from having to reinvent the wheel when in the heat of battle also know as a "rush project". It is a fact, as Greg did mention somewhat, that careful planning and the strategic deployment of UDF's to assist Developers in their efforts can certainly lend itself to the speedy development efforts. However, not all things can be thought of before-hand and the ad-hoc development of needed UDF's (submitted for code review/approval) can only cause the more expediant completion of any given project.
Carl, you started this and have been amazingly quiet... are you just taking notes and waiting for the dust to settle or what?
Sergiy, man this has been fun! Thanks for going back and forth with me on these tests. And, yeah, I changed my mind... I'll probably make a UDF that does nothing but strip the time off a date and add it to the "library".
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2005 at 9:52 am
Jeff,
How can you disprove that UDFs are NOT harmful? Let me see. If the cost of a query exceeds to high and parallelism is used, a Scalar UDF will prevent parallelism, could be bad. I am not saying that parallelism is always good thing, but I have seen large ETL processes suffer at the hands of Scalar UDFs. Also, another point is the FACT that UDFs have an internal cursor to them. That means that you are saying that an explicit CURSOR around large datasets is OK. Or, CURSORS are ok. I would rather say cursors have a time and place. Be smart and play it safe. This is what I mean by Strategic planning with UDFs. The way that I see it, there is no dispute over Scalar UDFs being harmful. They are. Now, you may not see any direct impact, but nonetheless they have influence over performance. Go ahead and use your Scalar UDFs on large datasets, and one day if the set becomes large enough, you will be re-evaluating those UDFs.
I know the MS guys can speak to this more clearly.
I agree, this is fun reading everyones thoughts and seeing what you came up with. Most of my day consists of evaluating Scalar UDFs, and we just went through all of our code performing Performance Evaluations and most on the ETL process were rewritten to use straight T-SQL vs. Scalar UDFs. I have seen UDFs cause so much harm that by removing one UDF, changed the performance of load script from 4 hours to 50 minutes.
Thanks
~ Greg
December 7, 2005 at 11:45 am
"Carl, you started this and have been amazingly quiet... are you just taking notes and waiting for the dust to settle or what?"
Someone has been out sick for the last 2 days, so I got stuck with the "on-call" and production support. Production support mostly means answering foolish questions like:
"Please reset my database password. The answers are:
1. You are attempting to connect to the wrong server.
2. You are using Windows authentication so please call the help desk.
Second favorite request: "Can I have the SA password?"
I will have some results later using a benchmark approach to simulate an OLTP environment. This is done by running 24 simultaneous isql.exe command files versus an 8 CPU Cluster with the output going to files. There are 3 groups of 8 command files that do:
select with no "Date Only" as a baseline
select with UDF
select with In-Line calculation
Cheers
SQL = Scarcely Qualifies as a Language
December 7, 2005 at 3:08 pm
UDFs and the impact on performance.
Abstract:
Most SQL Database Management Systems include the capability of allowing the user to implement an algorithm by writing their own functions either in a SQL dialect or using a host language. These functions are referred to as User Defined Functions. A series of test using Microsoft SQL Server 2000 were performed that simulates a high OLTP work-load on a high-end server class machine and the performance of these SQL statement without the algorithm, with the algorithm written in-line within the statement and as a UDF. The thruput and resource utilization of the alternatives are then analyzed.
Backgound:
Most comparisons that have been previously performed concentrate on comparing elapsed time when the needed algorithm is written in-line using vendor provided function versus encapsulating the vendor provided functions within a UDF. Additionally, the tests were performed sequentially which allows environmental factors, such as an un-detected work load on the DBMS, to make the results questionable. This comparison will track elapsed time, logical reads and CPU utilization with the various alternative SQL statements running in parallel over a fixed time span. The SQL Server Tracing facility is used to capture these statistics for each statement. To insure that only the only the steady state resource utilization is analyzed, the first 15 seconds of start-up and the last 15 seconds of shut-down of the workload run are not included in the analysis.
Environment:
The DBMS used is Microsoft SQL Server 2000 Enterprise Edition with Service Pack 4 running under Windows 2003 Server Enterprise Edition. The server hardware is an 8 CPU Xeon 3.0 GHz with 4 GB of memory. No other process where using the server or the DBMS and pre and post tests showed CPU utilization of less than one percent. Disk drives were removed as a factor in the tests by forcing the tables accessed to be resident in memory. To allow others to run the same benchmark, the SQL Server demo database named Northwind was used which contains the table "Orders" having 830 rows and three datetime columns.
Test Case:
The test case used an algorithm to remove the time portion of a column that holds both the date and time. For example, given a column with the value of '2005-12-07 15:46:34.373', the algorithm should return '2005-12-07 00:00:00.000'.
The SQL Server algorithm is convert(datetime, convert(int, @DateTime - 0.5 ) )
The User Defined Function SQL source is:
create function dbo.DateOnly
( @DateTimedatetime
)
RETURNS datetime
AS
BEGIN
RETURN convert(datetime, convert(int, @DateTime - 0.5 ) )
END
go
Test Run SQL statements:
Three sets of test run scripts were created.
1.Select from the tables without using any algorithm.
select 'TC-NOT'
,Orders.OrderId
,Orders.OrderDate
,Orders.RequiredDate
,Orders.ShippedDate
from dbo.Orders
2.Select from the table with the algorithm in-line within the SQL Statement executed.
select 'TC-SQL'
,Orders.OrderId
,convert(datetime, convert(int,Orders.OrderDate - 0.5)) AS OrderDate
,convert(datetime, convert(int,Orders.RequiredDate - 0.5))AS RequiredDate
,convert(datetime, convert(int,Orders.ShippedDate - 0.5)) AS ShippedDate
from dbo.Orders
3.Select from the table with the algorithm within a UDF.
select 'TC-UDF'
,Orders.OrderId
,dbo.DateOnly( Orders.OrderDate )AS OrderDate
,dbo.DateOnly( Orders.RequiredDate )AS RequiredDate
,dbo.DateOnly( Orders.ShippedDate )AS ShippedDate
from dbo.Orders
Workload scripts:
Three sets of work load scripts were written so that each SQL statement would begin at the same times and run for a fixed duration of two minutes. The UDF work load scripts has the below source and the other two scripts are similar. The "WAITFOR TIME" command causes a pause until the time is reached and then the remaining SQL statements are executed.
WAITFOR TIME '13:10:00'
declare @EndTSdatetime
set @EndTs= dateadd(mi , 2 , current_timestamp )
WHILE current_timestamp< @EndTs
begin
select 'TC-UDF'
,Orders.OrderId
,dbo.DateOnly( Orders.OrderDate )AS OrderDate
,dbo.DateOnly( Orders.RequiredDate )AS RequiredDate
,dbo.DateOnly( Orders.ShippedDate )AS ShippedDate
from dbo.Orders
end
go
Simultaneous execution of work load scripts.
For each work load script, four Windows command files were created to invoke the script using the SQL Server ISQL utility where the output is written to file. See the Microsoft SQL Server Books OnLine for the meaning of the parameters. Each command file specifies a different output file name.
isql.exe -E -SUSAG551IOMT1-4 -dNorthwind -h-1 -w1000 -i"R:\UDF\Source\Extend_Orders_DateOnly_UDF.sql" -o"R:\UDF\OutputFiles\Extend_Orders_DateOnly_01.txt"
Resource Utilization Statistics:
The below statisicts use the exclamation mark as the delimiter:
Method!Base!SQL!UDF!SQL - Base!UDF- Base
Execution!2401!2280!1880!!
Total Duration!718685!712109!715065!!
Total Reads!62426!59280!52640!!
Total CPU!3235!4342!58779!!
Avg Duration MS!299.327!312.329!380.354!13.00!81.03
Average Reads!26.000!26.000!28.000!0.00!2.00
Average CPU!1.347!1.904!31.265!0.56!29.92
Analysis:
The use of a UDF instead of in-line SQL resulted in a 21 % increase in duration and a 15 fold increase in CPU Utilization.
Recommendation:
SQL Server User Defined function should be avoided due to the huge performance impact especially the effect on CPU utilization.
SQL = Scarcely Qualifies as a Language
December 7, 2005 at 3:17 pm
Thanks Carl, good work. Anyone who has been performance tuning with SQL Server and has worked with Scalar UDFs would reckognize the performance hit. It is amazing how harmful they can be. Careful planning should be used when using Scalar UDFs.
Carl, do you think you can run a test of a SQL Server View and an In-line table UDF? I have seen similar execution plans between the two and cannot see a negative effect on performance. Unless, you have 20 joins, lots of column output, and nested sub-queries.
Thanks
Greg
December 7, 2005 at 3:32 pm
I have at least 5 other sets of comparisons to run to be included in the final article and the results make the DATEONLY UDF look good. A lot of the time spent so far was on the infrastucture such as setting up traces and confirming that the results reflect the actual resource utilization.
1. Deterministic UDFs vs In-Line SQL in the SELECT section of SQL statement. These will include math, date and string algorithms.
2. Non-Deterministic UDFs vs In-Line SQL in the SELECT section of SQL statement. These include single row, the SUM aggregation, the MAX aggregation and the LIST.
3. Deterministic UDFs vs In-Line SQL in the WHERE section of SQL statement
4. Non-Deterministic UDFs vs In-Line SQL in the WHERE section of SQL statement
5. Deterministic UDFs vs In-Line SQL as a CHECK Constraint
6. Non-Deterministic UDFs as CHECK Constraint vs In-Line SQL in a Trigger
Cheers.
For references, I have found:
Ken Henderson's "The Guru's Guide to SQL Server Stored Procedures, XML and HTML" which offers some interesting test cases. Since all of the source is on a CD, this is very convinient.
Joseph Gama's "TSQL functions" at http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=502&lngWId=5 These UDFs appear to be all deterministic.
Andrew Novick's UDF library at http://www.novicksoftware.com/
SQL = Scarcely Qualifies as a Language
December 7, 2005 at 7:13 pm
Do you actually believe that???
Average CPU!1.347!1.904!31.265!0.56!29.92
Analysis:
The use of a UDF instead of in-line SQL resulted in a 21 % increase in duration and a 15 fold increase in CPU Utilization.
You are making common beginner's mistake:
you do not take into cinsideration the simple fact that process of collecting statistics is taking time and resources as well.
SQL Server does not compute statistics on built-in functions. Just because they are not subject for investigations.
But every execution of UDF adds extra record into statistics log.
It's not UDF to utilize your CPU, it's your statistics!
Your conlusions are wrong because your experiment is badly desined.
In real system you do not record statistics on every executed statement, so UDF will not have any noticable impact on performance.
_____________
Code for TallyGenerator
December 7, 2005 at 8:18 pm
Regarding your comments on the trace invalidating the statistics.
"But every execution of UDF adds extra record into statistics log."
It depends on how the trace is configured.
The trace used to capture the statistics only includes Event ID 45 - SQL:StmtCompleted.
If the trace had included Event Id 41 - SP:StmtCompleted, then you would be correct that each of statements within the UDF would be additionally written to the trace file. But this is not occuring.
Additionally:
1. Separate tests were run without the trace being active that captured the count of the number of SQL statement run within the two minute window for each of the 12 connections. These counts match the count found by the trace.
2. The resource used by the trace are at the server level not at the statement level. Since all of the SQL statements that are being benchmarked are running at the same time, they are all be equally impacted by the resources used by the trace.
I stand by the statistics reported and the conclusions.
SQL = Scarcely Qualifies as a Language
December 7, 2005 at 8:26 pm
An additional comment on tracing.
If one uses the SQL Profilier to capture the trace information, I have found that this tool does have an effect on the overall performance of the server.
That is why the trace is set up to run in the background and write to a file, not to a GUI.
SQL = Scarcely Qualifies as a Language
December 7, 2005 at 9:03 pm
Switch off all SQL statistics and run Performance Monitor from Windows "Administrative Tools"
And then try to catch your 15 times difference in CPU load.
_____________
Code for TallyGenerator
December 7, 2005 at 9:22 pm
Greg Wrote: Go ahead and use your Scalar UDFs on large datasets, and one day if the set becomes large enough, you will be re-evaluating those UDFs.
Allow me to quote from one of my earlier posts: "AND, each test was on 2 MILLION records using the same table." I really appreciate your concerns but my question would be, how big is large enough? An observation might also be that if you're processing more than that using OLTP in a single batch, perhaps a change in processing schedule is in order. That, not withstanding, I'd be happy to run a test with whatever size record set you can provide once you've shown the results from the same tests. I would also ask that you share the test code so we can run them and make our own evaluation of your tests.
Carl wrote: Additionally, the tests were performed sequentially which allows environmental factors, such as an un-detected work load on the DBMS, to make the results questionable.
Not in my case. As I stated previously... Here's the results from my test runs on a stand-alone single-user non-networked server on my little ol' 1.8Ghz, 2GB ram, IDE disk, desktop server running SQL Server 2000 Developer's Edition with SP3a on an XP-sp2 box... I ran it a couple of times to be sure... very similar results on all runs... Only the later runs were done on a multi-user server. In all cases, the comparisons were reversed within each run to detect that possibility. That, notwithstanding, I'll be happy to run parallel tests on a server and a stand alone box.
Shifting gears a bit folks, I don't give a rat's patooti about whether or not SQL decides to use an internal cursor or not. In many cases, those internal cursors beat the clear code for performance. Am I suggesting that someone write cursors or that I condone the use of developer made cursors? Not on your life...
I can't speak of the parallelism problems that Greg is warning us about because I've not done any testing for the alleged fault, yet. You can bet I will.
Sergiy, once again, I agree with you... to measure something is to change it. That's why I wrapped each instance of code to be measured in simple elapsed time code instead of allowing embedded features to do the measurements in an unpredictable fashion... I have noticed that setting statistics times and doing traces can and frequently does change the elapsed time of otherwise good code. I will however, be doing some testing with tracing and stats on just to see how much they affect identical code.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 8, 2005 at 1:57 pm
I have re-done the same performance comparison but captured resource utilization from the sysprocess table instead of using a trace.
On the average CPU Milliseconds with sysprocess versus trace
For UDFs, 29.642 vs 31.265 for a 5% decrease in CPU without the trace.
For SQLs, 1.720 vs 1.904 for a 10% decrease in CPU without the trace.
For None, 1.518 vs 1.347 for a 12% INCREASE in CPU without the trace.
The conclusions remain the same, that in a two minute window:
1. Execution counts was 3,160 for the Inline SQL versus 2,417 for the UDF. This equates to at 24% drop in thruput.
2. Average Cpu usage in milliseconds was 1.72 for the Inline SQL was versus 29.642 for the UDF. This equates to a 17 times more CPU resource needed.
DENY CREATE FUNCTION TO PUBLIC !!!
SQL Scripts:
CREATE TABLE dbo.ProcessResources
( Condition varchar (255) NOT NULL
, Spid smallint NOT NULL
, SQLCnt int NOT NULL
, CpuMs int NOT NULL
, PhysicalIo int NOT NULL
)
go
-- Revised SQL Batch to capture statistics from the sysprocesses table.
WAITFOR TIME '9:49:00'
declare @EndTSdatetime
,@SQLCntinteger
set@SQLCnt = 0
set @EndTs= dateadd(mi , 2 , current_timestamp )
while current_timestamp< @EndTs
begin
select 'TC-UDF'
,Orders.OrderId
,dbo.DateOnly( Orders.OrderDate )AS OrderDate
,dbo.DateOnly( Orders.RequiredDate )AS RequiredDate
,dbo.DateOnly( Orders.ShippedDate )AS ShippedDate
from dbo.Orders
SET@SQLCnt = @SQLCnt + 1
end
Insert into udf_traces.dbo.ProcessResources
(Condition,spid,SQLCnt , CpuMs,PhysicalIo)
select 'TC-UDF'
,spid
,@SQLCnt
, CPU
,Physical_IO
from master.dbo.sysprocesses
where spid = @@spid
go
-- Summarize by test case
select condition, SUM(SQLCnt), SUM(CPUMS) , SUM(PhysicalIo)
FROM UDF_Traces.dbo.ProcessResources
Group by Condition
go
-- Show the Resources used by each Process
SELECT Condition, Spid, SQLCnt, CpuMs, PhysicalIo
FROM UDF_Traces.dbo.ProcessResources
order by Condition, spid
The details for each process:
CaseSPIDSQLCPUPIO
TC-NOT643282500
TC-NOT652364530
TC-NOT663836560
TC-NOT674687340
TC-NOT685499210
TC-NOT695529070
TC-NOT702503280
TC-NOT713945470
TC-SQL723194690
TC-SQL732484690
TC-SQL743926880
TC-SQL751402650
TC-SQL763094840
TC-SQL774637810
TC-SQL783075630
TC-SQL792394380
TC-UDF80458135470
TC-UDF81517152500
TC-UDF8225373120
TC-UDF83510148750
TC-UDF8427282190
TC-UDF85474138130
TC-UDF8622268440
TC-UDF87383117030
SQL = Scarcely Qualifies as a Language
December 8, 2005 at 1:57 pm
I have re-done the same performance comparison but captured resource utilization from the sysprocess table instead of using a trace.
On the average CPU Milliseconds with sysprocess versus trace
For UDFs, 29.642 vs 31.265 for a 5% decrease in CPU without the trace.
For SQLs, 1.720 vs 1.904 for a 10% decrease in CPU without the trace.
For None, 1.518 vs 1.347 for a 12% INCREASE in CPU without the trace.
The conclusions remain the same, that in a two minute window:
1. Execution counts was 3,160 for the Inline SQL versus 2,417 for the UDF. This equates to at 24% drop in thruput.
2. Average Cpu usage in milliseconds was 1.72 for the Inline SQL was versus 29.642 for the UDF. This equates to a 17 times more CPU resource needed.
DENY CREATE FUNCTION TO PUBLIC !!!
SQL Scripts:
CREATE TABLE dbo.ProcessResources
( Condition varchar (255) NOT NULL
, Spid smallint NOT NULL
, SQLCnt int NOT NULL
, CpuMs int NOT NULL
, PhysicalIo int NOT NULL
)
go
-- Revised SQL Batch to capture statistics from the sysprocesses table.
WAITFOR TIME '9:49:00'
declare @EndTSdatetime
,@SQLCntinteger
set@SQLCnt = 0
set @EndTs= dateadd(mi , 2 , current_timestamp )
while current_timestamp< @EndTs
begin
select 'TC-UDF'
,Orders.OrderId
,dbo.DateOnly( Orders.OrderDate )AS OrderDate
,dbo.DateOnly( Orders.RequiredDate )AS RequiredDate
,dbo.DateOnly( Orders.ShippedDate )AS ShippedDate
from dbo.Orders
SET@SQLCnt = @SQLCnt + 1
end
Insert into udf_traces.dbo.ProcessResources
(Condition,spid,SQLCnt , CpuMs,PhysicalIo)
select 'TC-UDF'
,spid
,@SQLCnt
, CPU
,Physical_IO
from master.dbo.sysprocesses
where spid = @@spid
go
-- Summarize by test case
select condition, SUM(SQLCnt), SUM(CPUMS) , SUM(PhysicalIo)
FROM UDF_Traces.dbo.ProcessResources
Group by Condition
go
-- Show the Resources used by each Process
SELECT Condition, Spid, SQLCnt, CpuMs, PhysicalIo
FROM UDF_Traces.dbo.ProcessResources
order by Condition, spid
The details for each process:
CaseSPIDSQLCPUPIO
TC-NOT643282500
TC-NOT652364530
TC-NOT663836560
TC-NOT674687340
TC-NOT685499210
TC-NOT695529070
TC-NOT702503280
TC-NOT713945470
TC-SQL723194690
TC-SQL732484690
TC-SQL743926880
TC-SQL751402650
TC-SQL763094840
TC-SQL774637810
TC-SQL783075630
TC-SQL792394380
TC-UDF80458135470
TC-UDF81517152500
TC-UDF8225373120
TC-UDF83510148750
TC-UDF8427282190
TC-UDF85474138130
TC-UDF8622268440
TC-UDF87383117030
SQL = Scarcely Qualifies as a Language
December 8, 2005 at 1:59 pm
I have re-done the same performance comparison but captured resource utilization from the sysprocess table instead of using a trace.
On the average CPU Milliseconds with sysprocess versus trace
For UDFs, 29.642 vs 31.265 for a 5% decrease in CPU without the trace.
For SQLs, 1.720 vs 1.904 for a 10% decrease in CPU without the trace.
For None, 1.518 vs 1.347 for a 12% INCREASE in CPU without the trace.
The conclusions remain the same, that in a two minute window:
1. Execution counts was 3,160 for the Inline SQL versus 2,417 for the UDF. This equates to at 24% drop in thruput.
2. Average Cpu usage in milliseconds was 1.72 for the Inline SQL was versus 29.642 for the UDF. This equates to a 17 times more CPU resource needed.
DENY CREATE FUNCTION TO PUBLIC !!!
SQL Scripts:
CREATE TABLE dbo.ProcessResources
( Condition varchar (255) NOT NULL
, Spid smallint NOT NULL
, SQLCnt int NOT NULL
, CpuMs int NOT NULL
, PhysicalIo int NOT NULL
)
go
-- Revised SQL Batch to capture statistics from the sysprocesses table.
WAITFOR TIME '9:49:00'
declare @EndTSdatetime
,@SQLCntinteger
set@SQLCnt = 0
set @EndTs= dateadd(mi , 2 , current_timestamp )
while current_timestamp< @EndTs
begin
select 'TC-UDF'
,Orders.OrderId
,dbo.DateOnly( Orders.OrderDate )AS OrderDate
,dbo.DateOnly( Orders.RequiredDate )AS RequiredDate
,dbo.DateOnly( Orders.ShippedDate )AS ShippedDate
from dbo.Orders
SET@SQLCnt = @SQLCnt + 1
end
Insert into udf_traces.dbo.ProcessResources
(Condition,spid,SQLCnt , CpuMs,PhysicalIo)
select 'TC-UDF'
,spid
,@SQLCnt
, CPU
,Physical_IO
from master.dbo.sysprocesses
where spid = @@spid
go
-- Summarize by test case
select condition, SUM(SQLCnt), SUM(CPUMS) , SUM(PhysicalIo)
FROM UDF_Traces.dbo.ProcessResources
Group by Condition
go
-- Show the Resources used by each Process
SELECT Condition, Spid, SQLCnt, CpuMs, PhysicalIo
FROM UDF_Traces.dbo.ProcessResources
order by Condition, spid
The details for each process:
CaseSPIDSQLCPUPIO
TC-NOT643282500
TC-NOT652364530
TC-NOT663836560
TC-NOT674687340
TC-NOT685499210
TC-NOT695529070
TC-NOT702503280
TC-NOT713945470
TC-SQL723194690
TC-SQL732484690
TC-SQL743926880
TC-SQL751402650
TC-SQL763094840
TC-SQL774637810
TC-SQL783075630
TC-SQL792394380
TC-UDF80458135470
TC-UDF81517152500
TC-UDF8225373120
TC-UDF83510148750
TC-UDF8427282190
TC-UDF85474138130
TC-UDF8622268440
TC-UDF87383117030
SQL = Scarcely Qualifies as a Language
December 8, 2005 at 9:43 pm
BANG! I'm dead. You guys finally proved your point with some cold, hard facts. I didn't like Carl's test the way it was because he left out two very important statistics.... number of rows and total elapsed time for each run. And, what I'm really interested in is elapsed time because I know it sometimes takes resources to get performance. So, using Carl's example, but rewritten to give me what I'm looking for, I ran two separate tests using the same UDF we've been using all along... here's the code...
(
RunNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
TimeStart DATETIME,
TimeEnd DATETIME,
TheCount INT
)
SET @Limit = 5
DECLARE @Counter SMALLINT
SET @Counter = 1
DECLARE @TimeStart DATETIME
DECLARE @TimeEnd DATETIME
DECLARE @MyRowCount INT
BEGIN
SET @TimeStart = GETDATE()
SELECT 'TC-UDF',
OrderId,
dbo.DateOnly(OrderDate) AS OrderDate,
dbo.DateOnly(RequiredDate) AS RequiredDate,
dbo.DateOnly(ShippedDate) AS ShippedDate
FROM dbo.Orders
INSERT INTO MyTimes
(TimeStart,TimeEnd,TheCount)
SELECT @TimeStart,@TimeEnd,@MyRowCount
SET @Counter = @Counter + 1
END
-------------------------------------------------------------------------
(
RunNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
TimeStart DATETIME,
TimeEnd DATETIME,
TheCount INT
)
SET @Limit = 5
DECLARE @Counter SMALLINT
SET @Counter = 1
DECLARE @TimeStart DATETIME
DECLARE @TimeEnd DATETIME
DECLARE @MyRowCount INT
BEGIN
SET @TimeStart = GETDATE()
SELECT 'TC-UDF',
OrderId,
convert(datetime, convert(int, OrderDate - 0.5)) AS OrderDate,
convert(datetime, convert(int, RequiredDate - 0.5)) AS RequiredDate,
convert(datetime, convert(int, ShippedDate - 0.5)) AS ShippedDate
FROM dbo.Orders
INSERT INTO MyTimes
(TimeStart,TimeEnd,TheCount)
SELECT @TimeStart,@TimeEnd,@MyRowCount
SET @Counter = @Counter + 1
END
--===================================================================
-- Create the test Table (Careful!!! Has a drop!!!!)
--===================================================================
--===== If the test table exists, drop it (CAREFUL!!!!)
IF OBJECT_ID('dbo.Orders') IS NOT NULL
DROP TABLE dbo.Orders
--===== Create and populate the test table on the fly
SELECT TOP 2500000 --Change this number to change the final size
IDENTITY(INT,1,1) AS OrderId,
GETDATE() AS OrderDate,
GETDATE() AS RequiredDate,
GETDATE() AS ShippedDate,
'THEY WILL BOTH PERFORM THE SAME' AS SomeString,
'MYTH BUSTED!!!' AS SomeOtherString,
NEWID() AS GUID
INTO dbo.Orders
FROM dbo.SYSCOLUMNS sc1,
dbo.SYSCOLUMNS sc2,
dbo.SYSCOLUMNS sc3
--===== Add a primary key to the test table
ALTER TABLE dbo.Orders
ADD PRIMARY KEY CLUSTERED (OrderId)
GO
Here's the results for Test #1 using 2.5 million records (Uses the UDF):
RunNum TimeStart TimeEnd TheCount Elapsed_ms
1 2005-12-08 22:32:01.580 2005-12-08 22:33:57.393 2500000 115813
2 2005-12-08 22:33:57.393 2005-12-08 22:35:52.580 2500000 115186
3 2005-12-08 22:35:52.580 2005-12-08 22:37:47.300 2500000 114720
4 2005-12-08 22:37:47.300 2005-12-08 22:39:41.613 2500000 114313
5 2005-12-08 22:39:41.613 2005-12-08 22:41:35.893 2500000 114280
Here's the results for Test #2 using 2.5 million records (Uses clear code):
RunNum TimeStart TimeEnd TheCount Elapsed_ms
1 2005-12-08 22:46:11.270 2005-12-08 22:47:06.300 2500000 55030
2 2005-12-08 22:47:06.300 2005-12-08 22:48:00.643 2500000 54343
3 2005-12-08 22:48:00.677 2005-12-08 22:48:54.630 2500000 53953
4 2005-12-08 22:48:54.643 2005-12-08 22:49:48.207 2500000 53563
5 2005-12-08 22:49:48.207 2005-12-08 22:50:41.817 2500000 53610
Clearly, the clear code beat the pant's off the UDF version. But we saw this in the last test code... on the larger tests, the UDF lost but on the smaller tests, the UDF won. So, I reran the code but for only 2500 records and here's the results...
Here's the results for Test #1 using 2500 records (Uses the UDF):
RunNum TimeStart TimeEnd TheCount Elapsed_ms
1 2005-12-08 22:58:44.940 2005-12-08 22:58:45.050 2500 110
2 2005-12-08 22:58:45.050 2005-12-08 22:58:45.177 2500 126
3 2005-12-08 22:58:45.177 2005-12-08 22:58:45.283 2500 106
4 2005-12-08 22:58:45.283 2005-12-08 22:58:45.410 2500 126
5 2005-12-08 22:58:45.410 2005-12-08 22:58:45.533 2500 123
Here's the results for Test #2 using 2500 records (Uses clear code):
RunNum TimeStart TimeEnd TheCount Elapsed_ms
1 2005-12-08 23:00:48.707 2005-12-08 23:00:48.753 2500 46
2 2005-12-08 23:00:48.753 2005-12-08 23:00:48.817 2500 63
3 2005-12-08 23:00:48.817 2005-12-08 23:00:48.880 2500 63
4 2005-12-08 23:00:48.880 2005-12-08 23:00:48.927 2500 46
5 2005-12-08 23:00:48.927 2005-12-08 23:00:48.987 2500 60
Again, and very clearly still, the clear code beat the pant's off the UDF. It wasn't by the 17 times that Carl claimed (I never do trust CPU times, just elapsed times), but it is very consistant... the clear code is about twice as fast as the UDF.
This test is also a bit more real world than my previous test that showed the UDF sometimes winning... so Crow's beware... I've gotta eat some here.
For the books, this was done on a stand-alone, single 1.8 Ghz CPU, copy of MS-SQL Server 2000 sp3A Developer's Edition with 2GB of RAM and twin 80GB IDE hard-drives. In other words, there's no chance that anyone else ran something while I was doing this and I didn't even move the mouse during the tests. And, there were no scheduled tasks anywhere near the runs. All the runs were run several times with nearly identical results.
Sergiy, I don't think I'll be making that UDF after all. It's not quite clear to me why the other tests sometimes showed the UDF winning but it clearly did not in these tests. Is there anything that you can see that I've done wrong that may give undo favor to the clear code in the test code above? And, yes, I did run each test without the loop several times to ensure that the loop code wasn't what was doing it.
Now, where's that Crow?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 53 total)
You must be logged in to reply to this topic. Login to reply