• Jeff Moden (11/27/2008)


    jacroberts (11/27/2008)


    Ok, I have recently written some code that uses an RBAR TVF to increase the efficiency and speed of a query.

    Here's my example:

    We had a table of dates on the system called RefDates that reports use in their SQL. The table RefDates contained every day in a 5 year period or approximately 2,000 rows. We had another table of events that had to be reported against; approximately 1 event was generated every minute. The report was generated for a period of 1 month (31 days max). The query was very slow as there were a lot of rows in the RefDates lookup table.

    A typical report had the following line:

    RIGHT JOIN RefDates C

    ON B.Date = C.Date

    WHERE C.Date >= @StartDateTime

    AND C.Date < @EndDateTime

    Instead I put a RBAR table valued function:

    CREATE FUNCTION [dbo].[GenRefDates]

    (

    @StartDate datetime,

    @EndDate datetime

    )

    RETURNS @table TABLE (Date datetime)

    AS BEGIN

    DECLARE @tmpDate datetime

    SET @tmpDate = Convert(varchar, @StartDate, 112)

    SET @EndDate = Convert(varchar, @EndDate, 112)

    WHILE @tmpDate <= @EndDate

    BEGIN

    INSERT INTO @table VALUES (@tmpDate)

    SET @tmpDate = DateAdd(dd, 1, @tmpDate)

    END

    RETURN

    END

    The query was then changed to:

    RIGHT JOIN GenRefDates(@StartDateTime, @EndDateTime) C

    ON B.Date = C.Date

    This changed the run time of the query from about a minute to a few seconds. So this shows that a RBAR table valued function can be much more efficient than using a lookup ref table. BTW the time taken to generate the TVF with 31 days is a few micro seconds and is totally insignificant compared to the rest of the query.

    So there is an example where an RBAR TVF is much more efficient than a lookup table.

    I'm not sure what other change you may have made to the code, but shifting to your RBAR function instead of using the RefDates table they had wasn't what improved the performance. No form of RBAR will ever beat proper set based code, but let me prove it to you... first, some test code to simulate your table of events... I realize that a million rows consitutes only 1.9012852687655030 years of data according to your specs above, but it should suffice for the test...

    USE TempDB

    GO

    --===== Create and populate a 1,000,000 row test table.

    -- Column "EventID" has a range of 1 to 1,000,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique dates with midnight times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

    -- Jeff Moden

    SELECT TOP 1000000

    EventID = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(ABS(CHECKSUM(NEWID()))%3653+36524 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.Event

    FROM Master.sys.All_Columns t1

    CROSS JOIN Master.sys.All_Columns t2

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.Event

    ADD PRIMARY KEY CLUSTERED (EventID)

    CREATE INDEX IX_Event_SomeDate_SomeInt ON dbo.Events (SomeDate,SomeInt) INCLUDE (SomeMoney)

    --===== Delete all the data for 15 May 2008 for testing

    DELETE dbo.Event

    WHERE SomeDate >= '20080515'

    AND SomeDate < '20080516'

    We also need a RefDate table... you said the one they had only had about 2000 rows in it and that THAT was the main source of the performance problem... what the heck, let's build one with 11,000 rows just so you don't think that's a problem anymore... 😛

    USE TempDB

    GO

    --===== Create and populate the RefDate table on the fly

    SELECT TOP 11000 --equates to more than 30 years of dates

    DATEADD(dd,ROW_NUMBER() OVER (ORDER BY sc1.ID),'20000101') AS Date

    INTO dbo.RefDate

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    ALTER TABLE dbo.RefDate

    ALTER COLUMN Date DATETIME NOT NULL

    GO

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.RefDate

    ADD CONSTRAINT PK_Date_Date

    PRIMARY KEY CLUSTERED (Date) WITH FILLFACTOR = 100

    GO

    --===== Allow the general public to use it

    GRANT SELECT ON dbo.RefDate TO PUBLIC

    And now, the test... each section of code is "stand-alone" with all it's own variables and all... I'm using your RBAR function in the second second section...

    USE TempDB

    GO

    --===== Set based method with Date table

    PRINT '===== Set based method with Date table ====='

    SET STATISTICS TIME ON

    DECLARE @MonthYear DATETIME,

    @MonthStart DATETIME,

    @MonthEnd DATETIME,

    @NextMonthStart DATETIME

    SELECT @MonthYear = 'May 2008',

    @MonthStart = @MonthYear,

    @NextMonthStart = DATEADD(mm,1,@MonthStart),

    @MonthEnd = @NextMonthStart-1

    SET NOCOUNT ON

    SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total

    FROM dbo.Event e

    RIGHT OUTER JOIN

    dbo.RefDate d

    ON e.SomeDate = d.Date

    WHERE d.Date >= @MonthStart

    AND d.Date < @NextMonthStart

    GROUP BY d.Date, e.SomeInt

    ORDER BY d.Date, e.SomeInt

    SET STATISTICS TIME OFF

    GO

    PRINT REPLICATE('=',100)

    GO

    --===== Method with RBAR looping function

    PRINT '===== Method with RBAR looping function ====='

    SET STATISTICS TIME ON

    DECLARE @MonthYear DATETIME,

    @MonthStart DATETIME,

    @MonthEnd DATETIME,

    @NextMonthStart DATETIME

    SELECT @MonthYear = 'May 2008',

    @MonthStart = @MonthYear,

    @NextMonthStart = DATEADD(mm,1,@MonthStart),

    @MonthEnd = @NextMonthStart-1

    SET NOCOUNT ON

    SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total

    FROM dbo.Event e

    RIGHT OUTER JOIN

    dbo.GenRefDates(@MonthStart,@MonthEnd) d

    ON e.SomeDate = d.Date

    GROUP BY d.Date, e.SomeInt

    ORDER BY d.Date, e.SomeInt

    SET STATISTICS TIME OFF

    GO

    -------------------------------------------------------------

    GO

    I don't know about you, but on my humble 6 year old, single 1.8 Ghz CPU, 1GB ram, IDE hard drive system running SQL Server 2005 Developer's Edition sp2, here's what I get for runtimes...

    [font="Arial Black"]===== Set based method with Date table =====[/font]

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    SQL Server Execution Times:

    [font="Arial Black"]CPU time = 390 ms, elapsed time = 930 ms.[/font]

    ====================================================================================================

    [font="Arial Black"]===== Method with RBAR looping function =====[/font]

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    SQL Server Execution Times:

    [font="Arial Black"]CPU time = 19078 ms, elapsed time = 20878 ms.[/font]

    Like I said, I don't know what else you changed in the report code or what condition their RefDate table was in or whatever other improper thing they may have had going on, but your RBAR function will never be faster than proper set based code and neither will any other form of RBAR.

    Thank you for going to all that effort but your query shows nothing about the speed of RBAR TVF compared to inline. It is different from my query as there is a complicated join in my one.

    But my point is if your queries intend to show the difference between the two methods they should be equivalent in every other way like this:

    --===== Set based method with Date table

    PRINT '===== Set based method with Date table ====='

    SET STATISTICS TIME ON

    DECLARE @MonthYear DATETIME,

    @MonthStart DATETIME,

    @MonthEnd DATETIME,

    @NextMonthStart DATETIME

    SELECT @MonthYear = 'May 2008',

    @MonthStart = @MonthYear,

    @NextMonthStart = DATEADD(mm,1,@MonthStart),

    @MonthEnd = @NextMonthStart-1

    SET NOCOUNT ON

    SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total

    FROM dbo.Event e

    RIGHT OUTER JOIN

    dbo.RefDates d

    ON e.SomeDate = d.Date

    WHERE d.Date >= @MonthStart

    AND d.Date < @NextMonthStart

    GROUP BY d.Date, e.SomeInt

    ORDER BY d.Date, e.SomeInt

    SET STATISTICS TIME OFF

    GO

    PRINT REPLICATE('=',100)

    GO

    --===== Method with RBAR looping function

    PRINT '===== Method with RBAR looping function ====='

    SET STATISTICS TIME ON

    DECLARE @MonthYear DATETIME,

    @MonthStart DATETIME,

    @MonthEnd DATETIME,

    @NextMonthStart DATETIME

    SELECT @MonthYear = 'May 2008',

    @MonthStart = @MonthYear,

    @NextMonthStart = DATEADD(mm,1,@MonthStart),

    @MonthEnd = @NextMonthStart-1

    SET NOCOUNT ON

    SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total

    FROM dbo.Event e

    RIGHT OUTER JOIN

    dbo.GenRefDates(@MonthStart,@MonthEnd) d

    ON e.SomeDate = d.Date

    WHERE d.Date >= @MonthStart

    AND d.Date < @NextMonthStart

    GROUP BY d.Date, e.SomeInt

    ORDER BY d.Date, e.SomeInt

    SET STATISTICS TIME OFF

    GO

    I've tested it on my machine and the results were:

    ===== Set based method with Date table =====

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 312 ms, elapsed time = 417 ms.

    ====================================================================================================

    ===== Method with RBAR looping function =====

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 299 ms, elapsed time = 376 ms.

    The RBAR was actually quicker.

    The reason for the difference you found is nothing to do with the RBAR function but the difference in the where clause:

    RIGHT OUTER JOIN

    dbo.RefDate d

    ON e.SomeDate = d.Date

    WHERE d.Date >= @MonthStart

    AND d.Date < @NextMonthStart

    vs.

    RIGHT OUTER JOIN

    dbo.GenRefDates(@MonthStart,@MonthEnd) d

    ON e.SomeDate = d.Date

    I'm really sorry to say that your example doesn't stand up to even the quickest analysis.

    The performance difference you are measuring is due to a completely different matter.