Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««7891011»»»

The Numbers Table Expand / Collapse
Author
Message
Posted Thursday, November 27, 2008 9:58 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 8:46 AM
Points: 20,747, Visits: 32,562
jacroberts (11/27/2008)
timothyawiseman (11/27/2008)
jacroberts (11/25/2008)
timothyawiseman (11/24/2008)[hr

>>The problem is that your example did not show a case where efficiency is not a concern.

>>But, in your example you say there is no need to worry about the efficiency of a query that runs with no user waiting, but I disagree.


If you had read to the end of the sentence I wrote you would have seen that efficiency is not a concern in this example. So I repeat:

If you have a daily report that is run in batch, with no user waiting at the end of a button for the result, it doesn't matter if the SQL takes 3.2 seconds instead of 3.1 seconds to run.


Yes, I know what you said, but I respectfully disagree.

Once again, at that point in time, the developer-time required to optimize it may simply be a higher price than it is worth, but it will always matter how long it takes to run.

When I was newer to programming, I thought there were times when it would never matter, so I often didn't worry about it. Then the datasets would grow and small inefficiencies would become magnified until they mattered a great deal.

And just because its normally a nightly report that runs when no one is around, as the company grows you may find that time when no one is around begins to shrink quickly. Also, there may be times when management decides they need a refresh with the latest data in the middle of the day.

I have written procedures meant to run in the middle of the night only, but then later had someone rushing up saying they needed it produced now (in my case, it was mostly copying data from production to a reporting server).

And even if you genuinely know for whatever reason that the length of time that the procedure takes to run will actually never matter, remember that humans are creatures of habit. If a programmer is in the habit of writing efficient code, they will do it even when they are not trying to, but if they are not they may find it a very hard thing to do even when they have to.

There are times when other concerns, such as developer-time or robustness with more validation and error correction, outweigh and override the drive for efficiency, but I believe that it is always a concern.


Ok, I have recently written some code that uses an RBAR TVF to increase the efficiency and speed of a query.
Here's my example:
We had a table of dates on the system called RefDates that reports use in their SQL. The table RefDates contained every day in a 5 year period or approximately 2,000 rows. We had another table of events that had to be reported against; approximately 1 event was generated every minute. The report was generated for a period of 1 month (31 days max). The query was very slow as there were a lot of rows in the RefDates lookup table.

A typical report had the following line:

RIGHT JOIN RefDates C
ON B.Date = C.Date
WHERE C.Date >= @StartDateTime
AND C.Date < @EndDateTime

Instead I put a RBAR table valued function:

CREATE FUNCTION [dbo].[GenRefDates]   
(
@StartDate datetime,
@EndDate datetime
)
RETURNS @table TABLE (Date datetime)
AS BEGIN

DECLARE @tmpDate datetime
SET @tmpDate = Convert(varchar, @StartDate, 112)
SET @EndDate = Convert(varchar, @EndDate, 112)

WHILE @tmpDate <= @EndDate
BEGIN
INSERT INTO @table VALUES (@tmpDate)
SET @tmpDate = DateAdd(dd, 1, @tmpDate)
END

RETURN
END

The query was then changed to:
RIGHT JOIN GenRefDates(@StartDateTime, @EndDateTime) C
ON B.Date = C.Date

This changed the run time of the query from about a minute to a few seconds. So this shows that a RBAR table valued function can be much more efficient than using a lookup ref table. BTW the time taken to generate the TVF with 31 days is a few micro seconds and is totally insignificant compared to the rest of the query.

So there is an example where an RBAR TVF is much more efficient than a lookup table.


This is true, for a small number of rows. Your RBAR TVF will become ineffecient should requirements change and the number of rows it has to generate becomes quite large. This is what we all have been talking about in this thread. Try it, see how long your RBAR TVF takes to generate 10,000 rows versus the same function written in a set-based manner.




Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #609898
Posted Thursday, November 27, 2008 10:03 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 4:47 AM
Points: 334, Visits: 2,268
This changed the run time of the query from about a minute to a few seconds. So this shows that a RBAR table valued function can be much more efficient than using a lookup ref table. BTW the time taken to generate the TVF with 31 days is a few micro seconds and is totally insignificant compared to the rest of the query.

So there is an example where an RBAR TVF is much more efficient than a lookup table.


We have to take your word for it as it is impossible to verify the initial table was set up properly. It would need to be as small as possible and be indexed on the field you match to be any good. Also the datatypes of the field you match against must be the same as that of the variable, but I will assume you know all that.

If you don't consider this:

What might seem as a constant for us humans with respect to the query will not be so for the optimizer as it plans at the batch level and reuses those plans. Thus any local variable or parameter is NOT a constant in your query plan! SQL Sever is improving in this respect, but its remains tricky and counter intuitive. What problably happended in your case is that each row of the input table is first converted before the comparison (check the queryplan)! With a TVF you only have to do such conversion once and the comparisons for each record takes place against your generated table that has a matching type as output.

RIGHT JOIN RefDates C
ON B.Date = C.Date
WHERE C.Date >= @StartDateTime
AND C.Date < @EndDateTime


why did you not use in the original query something like this?

RIGHT JOIN RefDates C
ON B.Date = C.Date and C.Date >= @StartDateTime AND C.Date < @EndDateTime


But in all honesty, I simply think you ran out of luck. I had it once with a stored procedure that had a paramterised query. No matter what I did, it was extremely slow compared to a test query with constants I ran before. All types matched and still I had no luck. Matching against parameters or local variables sometimes has these effects (only seen it twice I think with such a big effect). I did the counter intuitive, and created a dynamic SQL statement so all the parameters became constants and ran that instead, and it was rocking (and ugly)!

As for RBAR defeats persistent table, that is a conculsion I would not draw based on your experience. You can ofcourse still use a numbers table to do only one insert within your stored procedure and it would be faster then RBAR. The true improvement comes some something else, and with some tweaking you will find that to be the case.

Also not long ago I optimised a complex query that was using a list of numbers that needed splitting. I currenly have two versions of the appropiate function. An inline version and one that creates a small table in the temp DB as it is not inline. Both use a numbers function and both where much faster then the procedural version I once used. But the non-inline version was still faster in this particular query, sometimes you have you have to break queries down a little to help the optimizer make sound decissions.
Post #609901
Posted Thursday, November 27, 2008 10:41 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 5:56 AM
Points: 338, Visits: 1,432
Lynn Pettis (11/27/2008)
[quote]jacroberts (11/27/2008)
[quote]timothyawiseman (11/27/2008)
[quote]jacroberts (11/25/2008)
[quote]timothyawiseman (11/24/2008)[hr


This is true, for a small number of rows. Your RBAR TVF will become ineffecient should requirements change and the number of rows it has to generate becomes quite large. This is what we all have been talking about in this thread. Try it, see how long your RBAR TVF takes to generate 10,000 rows versus the same function written in a set-based manner.



But it's not for 10,000 rows it's for a max of 31 days in a month. Even if it were generating 10,000 rows, which incidently would table about 2 seconds, it would still be insignificant compared to the run time of the query.
Post #609920
Posted Thursday, November 27, 2008 10:48 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 8:46 AM
Points: 20,747, Visits: 32,562
jacroberts (11/27/2008)
Lynn Pettis (11/27/2008)
[quote]jacroberts (11/27/2008)
[quote]timothyawiseman (11/27/2008)
[quote]jacroberts (11/25/2008)
[quote]timothyawiseman (11/24/2008)[hr


This is true, for a small number of rows. Your RBAR TVF will become ineffecient should requirements change and the number of rows it has to generate becomes quite large. This is what we all have been talking about in this thread. Try it, see how long your RBAR TVF takes to generate 10,000 rows versus the same function written in a set-based manner.



But it's not for 10,000 rows it's for a max of 31 days in a month. Even if it were generating 10,000 rows, which incidently would table about 2 seconds, it would still be insignificant compared to the run time of the query.


You are missing the major points. One requirements change, and what is effecient now won't be an longer. And two, someone is going to come along and see your code in your RBAR TVF and use it somewhere else and it won't be effecient at all.

If you take the time to write solid, efficient, and scalable code all the time you don't have to worry about those two things occuring as much. Something I have learned since becoming a member of this awesome site and getting tips and tricks from the like of Jeff Moden, Gail Shaw, etc..




Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #609922
Posted Thursday, November 27, 2008 11:10 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:41 AM
Points: 35,400, Visits: 31,961
Lynn Pettis (11/27/2008)
And two, someone is going to come along and see your code in your RBAR TVF and use it somewhere else and it won't be effecient at all.


Heh... job security for him... if he keeps planting timebombs like this, the customer is sure to call back for more "professional tuning services". :P


If you take the time to write solid, efficient, and scalable code all the time you don't have to worry about those two things occuring as much.


If you compare the code, which takes longer to write? The cross join method or the WHILE loop? Nah... like I said before, it takes no extra time to do it right the first time.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #609929
Posted Thursday, November 27, 2008 12:40 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 5:56 AM
Points: 338, Visits: 1,432
Lynn Pettis (11/27/2008)
jacroberts (11/27/2008)
Lynn Pettis (11/27/2008)
[quote]jacroberts (11/27/2008)
[quote]timothyawiseman (11/27/2008)
[quote]jacroberts (11/25/2008)
[quote]timothyawiseman (11/24/2008)[hr


This is true, for a small number of rows. Your RBAR TVF will become ineffecient should requirements change and the number of rows it has to generate becomes quite large. This is what we all have been talking about in this thread. Try it, see how long your RBAR TVF takes to generate 10,000 rows versus the same function written in a set-based manner.



But it's not for 10,000 rows it's for a max of 31 days in a month. Even if it were generating 10,000 rows, which incidently would table about 2 seconds, it would still be insignificant compared to the run time of the query.


You are missing the major points. One requirements change, and what is effecient now won't be an longer. And two, someone is going to come along and see your code in your RBAR TVF and use it somewhere else and it won't be effecient at all.

If you take the time to write solid, efficient, and scalable code all the time you don't have to worry about those two things occuring as much. Something I have learned since becoming a member of this awesome site and getting tips and tricks from the like of Jeff Moden, Gail Shaw, etc..



It's horses for courses. There are times when it's good and times when it's bad to use different methods. If you think using a lookup table will always make your queries run faster then you are wrong. It is quite possible to use a lookup table on a query that runs quickly on a small amount of data, when there are requierments changes or even just with the progression of time causing more data to be added to a table, or the lookup table becomes larger the query will become horrendously slow and will need someone to come in and optimise it. In the example I gave the number of days reported on never changed from the number of days in a month; what did change was the number of days in the lookup table and the number of rows in the tables to be reported on. This the reason the report slowed down and using an RBAR TVF fixed that particular problem. I'm not saying that you should use them indiscriminately but there are times when they are suitable. Not only did the report run faster but it also removed the need to regularly insert more rows into the RefDates table as the new TVF would work for any dates, not just the ones on the lookup table.
Post #609960
Posted Thursday, November 27, 2008 8:25 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:41 AM
Points: 35,400, Visits: 31,961
jacroberts (11/27/2008)
Ok, I have recently written some code that uses an RBAR TVF to increase the efficiency and speed of a query.
Here's my example:
We had a table of dates on the system called RefDates that reports use in their SQL. The table RefDates contained every day in a 5 year period or approximately 2,000 rows. We had another table of events that had to be reported against; approximately 1 event was generated every minute. The report was generated for a period of 1 month (31 days max). The query was very slow as there were a lot of rows in the RefDates lookup table.

A typical report had the following line:

RIGHT JOIN RefDates C
ON B.Date = C.Date
WHERE C.Date >= @StartDateTime
AND C.Date < @EndDateTime

Instead I put a RBAR table valued function:

CREATE FUNCTION [dbo].[GenRefDates]   
(
@StartDate datetime,
@EndDate datetime
)
RETURNS @table TABLE (Date datetime)
AS BEGIN

DECLARE @tmpDate datetime
SET @tmpDate = Convert(varchar, @StartDate, 112)
SET @EndDate = Convert(varchar, @EndDate, 112)

WHILE @tmpDate <= @EndDate
BEGIN
INSERT INTO @table VALUES (@tmpDate)
SET @tmpDate = DateAdd(dd, 1, @tmpDate)
END

RETURN
END

The query was then changed to:
RIGHT JOIN GenRefDates(@StartDateTime, @EndDateTime) C
ON B.Date = C.Date

This changed the run time of the query from about a minute to a few seconds. So this shows that a RBAR table valued function can be much more efficient than using a lookup ref table. BTW the time taken to generate the TVF with 31 days is a few micro seconds and is totally insignificant compared to the rest of the query.

So there is an example where an RBAR TVF is much more efficient than a lookup table.


I'm not sure what other change you may have made to the code, but shifting to your RBAR function instead of using the RefDates table they had wasn't what improved the performance. No form of RBAR will ever beat proper set based code, but let me prove it to you... first, some test code to simulate your table of events... I realize that a million rows consitutes only 1.9012852687655030 years of data according to your specs above, but it should suffice for the test...

USE TempDB
GO
--===== Create and populate a 1,000,000 row test table.
-- Column "EventID" has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique dates with midnight times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Jeff Moden
SELECT TOP 1000000
EventID = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.Event
FROM Master.sys.All_Columns t1
CROSS JOIN Master.sys.All_Columns t2

--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.Event
ADD PRIMARY KEY CLUSTERED (EventID)

CREATE INDEX IX_Event_SomeDate_SomeInt ON dbo.Events (SomeDate,SomeInt) INCLUDE (SomeMoney)

--===== Delete all the data for 15 May 2008 for testing
DELETE dbo.Event
WHERE SomeDate >= '20080515'
AND SomeDate < '20080516'

We also need a RefDate table... you said the one they had only had about 2000 rows in it and that THAT was the main source of the performance problem... what the heck, let's build one with 11,000 rows just so you don't think that's a problem anymore... :P

USE TempDB
GO
--===== Create and populate the RefDate table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates
DATEADD(dd,ROW_NUMBER() OVER (ORDER BY sc1.ID),'20000101') AS Date
INTO dbo.RefDate
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2

ALTER TABLE dbo.RefDate
ALTER COLUMN Date DATETIME NOT NULL
GO
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.RefDate
ADD CONSTRAINT PK_Date_Date
PRIMARY KEY CLUSTERED (Date) WITH FILLFACTOR = 100
GO
--===== Allow the general public to use it
GRANT SELECT ON dbo.RefDate TO PUBLIC

And now, the test... each section of code is "stand-alone" with all it's own variables and all... I'm using your RBAR function in the second second section...

USE TempDB
GO
--===== Set based method with Date table
PRINT '===== Set based method with Date table ====='
SET STATISTICS TIME ON
DECLARE @MonthYear DATETIME,
@MonthStart DATETIME,
@MonthEnd DATETIME,
@NextMonthStart DATETIME

SELECT @MonthYear = 'May 2008',
@MonthStart = @MonthYear,
@NextMonthStart = DATEADD(mm,1,@MonthStart),
@MonthEnd = @NextMonthStart-1

SET NOCOUNT ON

SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total
FROM dbo.Event e
RIGHT OUTER JOIN
dbo.RefDate d
ON e.SomeDate = d.Date
WHERE d.Date >= @MonthStart
AND d.Date < @NextMonthStart
GROUP BY d.Date, e.SomeInt
ORDER BY d.Date, e.SomeInt
SET STATISTICS TIME OFF
GO
PRINT REPLICATE('=',100)
GO
--===== Method with RBAR looping function
PRINT '===== Method with RBAR looping function ====='
SET STATISTICS TIME ON
DECLARE @MonthYear DATETIME,
@MonthStart DATETIME,
@MonthEnd DATETIME,
@NextMonthStart DATETIME

SELECT @MonthYear = 'May 2008',
@MonthStart = @MonthYear,
@NextMonthStart = DATEADD(mm,1,@MonthStart),
@MonthEnd = @NextMonthStart-1

SET NOCOUNT ON

SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total
FROM dbo.Event e
RIGHT OUTER JOIN
dbo.GenRefDates(@MonthStart,@MonthEnd) d
ON e.SomeDate = d.Date
GROUP BY d.Date, e.SomeInt
ORDER BY d.Date, e.SomeInt
SET STATISTICS TIME OFF
GO
-------------------------------------------------------------
GO

I don't know about you, but on my humble 6 year old, single 1.8 Ghz CPU, 1GB ram, IDE hard drive system running SQL Server 2005 Developer's Edition sp2, here's what I get for runtimes...

===== Set based method with Date table =====
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.

SQL Server Execution Times:
CPU time = 390 ms, elapsed time = 930 ms.
====================================================================================================
===== Method with RBAR looping function =====

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.

SQL Server Execution Times:
CPU time = 19078 ms, elapsed time = 20878 ms.

Like I said, I don't know what else you changed in the report code or what condition their RefDate table was in or whatever other improper thing they may have had going on, but your RBAR function will never be faster than proper set based code and neither will any other form of RBAR.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #610043
Posted Friday, November 28, 2008 4:36 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 4:47 AM
Points: 334, Visits: 2,268
Jeff Moden (11/27/2008)

I don't know about you, but on my humble 6 year old, single 1.8 Ghz CPU, 1GB ram, IDE hard drive system running SQL Server 2005 Developer's Edition sp2, here's what I get for runtimes...

===== Set based method with Date table =====
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.

SQL Server Execution Times:
CPU time = 390 ms, elapsed time = 930 ms.
====================================================================================================
===== Method with RBAR looping function =====

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.

SQL Server Execution Times:
CPU time = 19078 ms, elapsed time = 20878 ms.

Like I said, I don't know what else you changed in the report code or what condition their RefDate table was in or whatever other improper thing they may have had going on, but your RBAR function will never be faster than proper set based code and neither will any other form of RBAR.


Considdering that for me, just writing a post (let alone a focused one) takes a lot of time and then reading all the work you have put into your argument makes me feel "a bit" humble :). Did you even sleep?

Either way, you made a strong case here that something else must have been wrong in the original solution. Be it the queries themselfs, the modeling/indexing or even not up to date statistics.
Post #610244
Posted Friday, November 28, 2008 5:51 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 5:56 AM
Points: 338, Visits: 1,432
Jeff Moden (11/27/2008)
jacroberts (11/27/2008)
Ok, I have recently written some code that uses an RBAR TVF to increase the efficiency and speed of a query.
Here's my example:
We had a table of dates on the system called RefDates that reports use in their SQL. The table RefDates contained every day in a 5 year period or approximately 2,000 rows. We had another table of events that had to be reported against; approximately 1 event was generated every minute. The report was generated for a period of 1 month (31 days max). The query was very slow as there were a lot of rows in the RefDates lookup table.

A typical report had the following line:

RIGHT JOIN RefDates C
ON B.Date = C.Date
WHERE C.Date >= @StartDateTime
AND C.Date < @EndDateTime

Instead I put a RBAR table valued function:

CREATE FUNCTION [dbo].[GenRefDates]   
(
@StartDate datetime,
@EndDate datetime
)
RETURNS @table TABLE (Date datetime)
AS BEGIN

DECLARE @tmpDate datetime
SET @tmpDate = Convert(varchar, @StartDate, 112)
SET @EndDate = Convert(varchar, @EndDate, 112)

WHILE @tmpDate <= @EndDate
BEGIN
INSERT INTO @table VALUES (@tmpDate)
SET @tmpDate = DateAdd(dd, 1, @tmpDate)
END

RETURN
END

The query was then changed to:
RIGHT JOIN GenRefDates(@StartDateTime, @EndDateTime) C
ON B.Date = C.Date

This changed the run time of the query from about a minute to a few seconds. So this shows that a RBAR table valued function can be much more efficient than using a lookup ref table. BTW the time taken to generate the TVF with 31 days is a few micro seconds and is totally insignificant compared to the rest of the query.

So there is an example where an RBAR TVF is much more efficient than a lookup table.


I'm not sure what other change you may have made to the code, but shifting to your RBAR function instead of using the RefDates table they had wasn't what improved the performance. No form of RBAR will ever beat proper set based code, but let me prove it to you... first, some test code to simulate your table of events... I realize that a million rows consitutes only 1.9012852687655030 years of data according to your specs above, but it should suffice for the test...

USE TempDB
GO
--===== Create and populate a 1,000,000 row test table.
-- Column "EventID" has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique dates with midnight times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Jeff Moden
SELECT TOP 1000000
EventID = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.Event
FROM Master.sys.All_Columns t1
CROSS JOIN Master.sys.All_Columns t2

--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.Event
ADD PRIMARY KEY CLUSTERED (EventID)

CREATE INDEX IX_Event_SomeDate_SomeInt ON dbo.Events (SomeDate,SomeInt) INCLUDE (SomeMoney)

--===== Delete all the data for 15 May 2008 for testing
DELETE dbo.Event
WHERE SomeDate >= '20080515'
AND SomeDate < '20080516'

We also need a RefDate table... you said the one they had only had about 2000 rows in it and that THAT was the main source of the performance problem... what the heck, let's build one with 11,000 rows just so you don't think that's a problem anymore... :P

USE TempDB
GO
--===== Create and populate the RefDate table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates
DATEADD(dd,ROW_NUMBER() OVER (ORDER BY sc1.ID),'20000101') AS Date
INTO dbo.RefDate
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2

ALTER TABLE dbo.RefDate
ALTER COLUMN Date DATETIME NOT NULL
GO
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.RefDate
ADD CONSTRAINT PK_Date_Date
PRIMARY KEY CLUSTERED (Date) WITH FILLFACTOR = 100
GO
--===== Allow the general public to use it
GRANT SELECT ON dbo.RefDate TO PUBLIC

And now, the test... each section of code is "stand-alone" with all it's own variables and all... I'm using your RBAR function in the second second section...

USE TempDB
GO
--===== Set based method with Date table
PRINT '===== Set based method with Date table ====='
SET STATISTICS TIME ON
DECLARE @MonthYear DATETIME,
@MonthStart DATETIME,
@MonthEnd DATETIME,
@NextMonthStart DATETIME

SELECT @MonthYear = 'May 2008',
@MonthStart = @MonthYear,
@NextMonthStart = DATEADD(mm,1,@MonthStart),
@MonthEnd = @NextMonthStart-1

SET NOCOUNT ON

SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total
FROM dbo.Event e
RIGHT OUTER JOIN
dbo.RefDate d
ON e.SomeDate = d.Date
WHERE d.Date >= @MonthStart
AND d.Date < @NextMonthStart
GROUP BY d.Date, e.SomeInt
ORDER BY d.Date, e.SomeInt
SET STATISTICS TIME OFF
GO
PRINT REPLICATE('=',100)
GO
--===== Method with RBAR looping function
PRINT '===== Method with RBAR looping function ====='
SET STATISTICS TIME ON
DECLARE @MonthYear DATETIME,
@MonthStart DATETIME,
@MonthEnd DATETIME,
@NextMonthStart DATETIME

SELECT @MonthYear = 'May 2008',
@MonthStart = @MonthYear,
@NextMonthStart = DATEADD(mm,1,@MonthStart),
@MonthEnd = @NextMonthStart-1

SET NOCOUNT ON

SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total
FROM dbo.Event e
RIGHT OUTER JOIN
dbo.GenRefDates(@MonthStart,@MonthEnd) d
ON e.SomeDate = d.Date
GROUP BY d.Date, e.SomeInt
ORDER BY d.Date, e.SomeInt
SET STATISTICS TIME OFF
GO
-------------------------------------------------------------
GO

I don't know about you, but on my humble 6 year old, single 1.8 Ghz CPU, 1GB ram, IDE hard drive system running SQL Server 2005 Developer's Edition sp2, here's what I get for runtimes...

===== Set based method with Date table =====
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.

SQL Server Execution Times:
CPU time = 390 ms, elapsed time = 930 ms.
====================================================================================================
===== Method with RBAR looping function =====

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.

SQL Server Execution Times:
CPU time = 19078 ms, elapsed time = 20878 ms.

Like I said, I don't know what else you changed in the report code or what condition their RefDate table was in or whatever other improper thing they may have had going on, but your RBAR function will never be faster than proper set based code and neither will any other form of RBAR.



Thank you for going to all that effort but your query shows nothing about the speed of RBAR TVF compared to inline. It is different from my query as there is a complicated join in my one.
But my point is if your queries intend to show the difference between the two methods they should be equivalent in every other way like this:


--===== Set based method with Date table
PRINT '===== Set based method with Date table ====='
SET STATISTICS TIME ON
DECLARE @MonthYear DATETIME,
@MonthStart DATETIME,
@MonthEnd DATETIME,
@NextMonthStart DATETIME

SELECT @MonthYear = 'May 2008',
@MonthStart = @MonthYear,
@NextMonthStart = DATEADD(mm,1,@MonthStart),
@MonthEnd = @NextMonthStart-1

SET NOCOUNT ON

SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total
FROM dbo.Event e
RIGHT OUTER JOIN
dbo.RefDates d
ON e.SomeDate = d.Date
WHERE d.Date >= @MonthStart
AND d.Date < @NextMonthStart
GROUP BY d.Date, e.SomeInt
ORDER BY d.Date, e.SomeInt
SET STATISTICS TIME OFF
GO
PRINT REPLICATE('=',100)
GO
--===== Method with RBAR looping function
PRINT '===== Method with RBAR looping function ====='
SET STATISTICS TIME ON
DECLARE @MonthYear DATETIME,
@MonthStart DATETIME,
@MonthEnd DATETIME,
@NextMonthStart DATETIME

SELECT @MonthYear = 'May 2008',
@MonthStart = @MonthYear,
@NextMonthStart = DATEADD(mm,1,@MonthStart),
@MonthEnd = @NextMonthStart-1

SET NOCOUNT ON

SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total
FROM dbo.Event e
RIGHT OUTER JOIN
dbo.GenRefDates(@MonthStart,@MonthEnd) d
ON e.SomeDate = d.Date
WHERE d.Date >= @MonthStart
AND d.Date < @NextMonthStart
GROUP BY d.Date, e.SomeInt
ORDER BY d.Date, e.SomeInt
SET STATISTICS TIME OFF
GO


I've tested it on my machine and the results were:

===== Set based method with Date table =====

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 312 ms, elapsed time = 417 ms.
====================================================================================================
===== Method with RBAR looping function =====

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 299 ms, elapsed time = 376 ms.

The RBAR was actually quicker.


The reason for the difference you found is nothing to do with the RBAR function but the difference in the where clause:

  RIGHT OUTER JOIN
dbo.RefDate d
ON e.SomeDate = d.Date
WHERE d.Date >= @MonthStart
AND d.Date < @NextMonthStart

vs.

RIGHT OUTER JOIN
dbo.GenRefDates(@MonthStart,@MonthEnd) d
ON e.SomeDate = d.Date


I'm really sorry to say that your example doesn't stand up to even the quickest analysis.
The performance difference you are measuring is due to a completely different matter.
Post #610297
Posted Friday, November 28, 2008 7:55 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: 2 days ago @ 7:59 PM
Points: 1,945, Visits: 3,080
Let me make an appeal to simple logic.

You build a look-up table once (and if you did a proper calendar table, you also benefit from other uses). It probably has a clustered index or key to take advantage of the sequential nature of time. It probably has other constraints to ensure data integrity.

You have a table-valued function that creates a new table every time, without an index or key. And certainly without constraints for the optimizer to use.

There are (n) sessions in the DB. The Calendar table is cached in main storage -- 20, 50 or even 100 years is small. The function is re-computed over and over.

So before you get to the query, the look-up table has won on both speed and integrity.


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #610369
« Prev Topic | Next Topic »

Add to briefcase «««7891011»»»

Permissions Expand / Collapse