http://www.sqlservercentral.com/blogs/lynnpettis/2010/06/03/sins-of-a-recursive-cte/

Printed 2014/04/18 11:23AM

Sins of a Recursive CTE?

By Lynn Pettis, 2010/06/03

On June 1, 2010, Steve Gray had an article, Sins of SQL: The Time Table, published on sqlservercentral.com.  I’m not going to discuss the pro’s or con’s of his approach to solving what he considers a problem with Time Dimensions, that may well be the target of a separate article.  No,, I am going to address the use of a recursive CTE in his in-line table valued function.

If you have used a recursive CTE, you know it is inherently RBAR (a Modenism for Row By Agonizing Row).  However, it seems that Steve believes that this routine is the best around.  Well, let us do some testing.  First, if you are interested in Steve’s code, you can find it in his article so I won’t reproduce it here.  I did make one change to his code for my testing, I changed the name from dbo.fn_GetTimestampRange to dbo.fn_GetTimestampRangeRCTE to indicate that it uses a Recursive CTE.

Next, I used the code for my Dynamic Tally table (you can find that code here, The Dynamic Tally or Numbers Table) as base for creating a new inline TVF called dbo.fn_GetTimestampRangeCTE.  Here is the code I wrote:

create function [dbo].[fn_GetTimestampRangeCTE]( 
@pFromDate datetime,
@pToDate datetime,
@pIncludeSeconds bit
)
returns table
as
return(
with BaseNum (
N
) as (
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1
),
L1 (
N
) as (
select
bn1.N
from
BaseNum bn1
cross join BaseNum bn2
cross join BaseNum bn3
),
L2 (
N
) as (
select
a1.N
from
L1 a1
cross join L1 a2
),
Tally (
DateKey
) as (
select top ((datediff(ss, @pFromDate, @pToDate) / isnull(nullif(cast(@pIncludeSeconds as int),0),60)) + 1)
dateadd(ss, row_number() over (order by a1.N), @pFromDate)
from
L2 a1
cross join L2 a2
)
select
Result.DateKey,
DATEPART(YEAR, Result.DateKey) AS [Year],
DATEPART(QUARTER, Result.DateKey) AS [Quarter],
DATEPART(WEEK, Result.DateKey) AS [Week],
DATEPART(WEEKDAY, Result.DateKey) AS [WeekDayNumber],
DATENAME(WEEKDAY, Result.DateKey) AS [WeekDayName],
DATEPART(DAYOFYEAR, Result.DateKey) AS [DayOfYear],
DATEPART(MONTH, Result.DateKey) AS [Month],
DATEPART(DAY, Result.DateKey) AS [Day],
DATEPART(HOUR, Result.DateKey) AS [Hour],
DATEPART(MINUTE, Result.DateKey) AS [Minute],
DATEPART(SECOND, Result.DateKey) AS [Second]
from
Tally Result
);

In comparing the two functions I used a simplistic method, SET STATISTICS IO and SET STATISTICS TIME.  Also, I decided to go ahead and use the great equalizer as well and allow the output of both funtions to be displayed in SSMS.  Here are the results of a few test runs:

set statistics io on;
set statistics time on;

select * from dbo.fn_GetTimestampRangeCTE ('2010-07-01','2010-07-02', 0)

set statistics time off;
set statistics io off;

set statistics io on;
set statistics time on;


select * from dbo.fn_GetTimestampRangeRCTE ('2010-07-01','2010-07-02', 0) OPTION (MAXRECURSION 0);


set statistics time off;
set statistics io off;

 

(1441 row(s) affected)

SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 351 ms.

(1441 row(s) affected)
Table 'Worktable'. Scan count 8, logical reads 1002, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 457 ms.

 

set statistics io on;
set statistics time on;

select * from dbo.fn_GetTimestampRangeCTE ('2010-07-01','2010-07-02', 1)

set statistics time off;
set statistics io off;

set statistics io on;
set statistics time on;

select * from dbo.fn_GetTimestampRangeRCTE ('2010-07-01','2010-07-02', 1) OPTION (MAXRECURSION 0);

set statistics time off;
set statistics io off;

 

(86401 row(s) affected)

SQL Server Execution Times:
   CPU time = 562 ms,  elapsed time = 4261 ms.

(86401 row(s) affected)
Table 'Worktable'. Scan count 11, logical reads 7241, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 3641 ms,  elapsed time = 5484 ms.

 

Just looking at this few runs you can see that the recursive CTE is slower.  Also notice the use of a worktable by SQL Server for the function using the recursive CTE.  But for demonstration purposes, let’s take a look at the two functions when generating tables for a full month:

 

set statistics io on;
set statistics time on;

select * from dbo.fn_GetTimestampRangeCTE ('2010-07-01','2010-08-01', 0)

set statistics time off;
set statistics io off;

set statistics io on;
set statistics time on;

select * from dbo.fn_GetTimestampRangeRCTE ('2010-07-01','2010-08-01', 0) OPTION (MAXRECURSION 0);

set statistics time off;
set statistics io off;

 

(44641 row(s) affected)

SQL Server Execution Times:
   CPU time = 281 ms,  elapsed time = 2402 ms.

(44641 row(s) affected)
Table 'Worktable'. Scan count 8, logical reads 2682, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 1297 ms,  elapsed time = 2437 ms.

 

set statistics io on;
set statistics time on;

select * from dbo.fn_GetTimestampRangeCTE ('2010-07-01','2010-08-01', 1)

set statistics time off;
set statistics io off;

set statistics io on;
set statistics time on;

select * from dbo.fn_GetTimestampRangeRCTE ('2010-07-01','2010-08-01', 1) OPTION (MAXRECURSION 0);

set statistics time off;
set statistics io off;

 

(2678401 row(s) affected)

SQL Server Execution Times:
   CPU time = 19312 ms,  elapsed time = 122609 ms.

(2678401 row(s) affected)
Table 'Worktable'. Scan count 11, logical reads 95321, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 102875 ms,  elapsed time = 150496 ms.

 

As you can see, the function using the recursive CTE uses more IO and CPU than the version based on the code for the dynamic tally table.  The only question that needs to be answered is if you were to chose one of the these two functions, which would you use?  I know which one I’d select.

 

Edit:  After additional comments in the discussion of the article, Sins of SQL: The Time Table, I found a slight problem with my own code.  Below is the updated code:

create function [dbo].[fn_GetTimestampRangeCTE]( 
    @pFromDate datetime, 
    @pToDate datetime, 
    @pIncludeSeconds bit 
) 
returns table 
as 
return( 
    with BaseNum ( 
        N 
    ) as ( 
    select 1 union all 
    select 1 union all 
    select 1 union all 
    select 1 union all 
    select 1 union all 
    select 1 union all 
    select 1 union all 
    select 1 union all 
    select 1 union all 
    select 1 
    ), 
    L1 ( 
        N 
    ) as ( 
    select 
        bn1.N 
    from 
        BaseNum bn1 
        cross join BaseNum bn2 
        cross join BaseNum bn3 
    ), 
    L2 ( 
        N 
    ) as ( 
    select 
        a1.N 
    from 
        L1 a1 
        cross join L1 a2 
    ), 
    Tally ( 
        DateKey 
    ) as ( 

select top (cast(datediff(mi, @pFromDate, @pToDate) as bigint) * isnull(nullif(1,cast(@pIncludeSeconds as bigint)),60) + 1)

dateadd(ss, (row_number() over (order by a1.N) - 1) * isnull(nullif(cast(@pIncludeSeconds as bigint),0),60), @pFromDate)

from L2 a1 cross join L2 a2 ) select Result.DateKey, DATEPART(YEAR, Result.DateKey) AS [Year], DATEPART(QUARTER, Result.DateKey) AS [Quarter], DATEPART(WEEK, Result.DateKey) AS [Week], DATEPART(WEEKDAY, Result.DateKey) AS [WeekDayNumber], DATENAME(WEEKDAY, Result.DateKey) AS [WeekDayName], DATEPART(DAYOFYEAR, Result.DateKey) AS [DayOfYear], DATEPART(MONTH, Result.DateKey) AS [Month], DATEPART(DAY, Result.DateKey) AS [Day], DATEPART(HOUR, Result.DateKey) AS [Hour], DATEPART(MINUTE, Result.DateKey) AS [Minute], DATEPART(SECOND, Result.DateKey) AS [Second] from Tally Result );

I then ran the following code as a quick test sending the output to temporay tables:

set statistics io on;
set statistics time on;

select * into #Test1 from dbo.fn_GetTimestampRangeCTE3 ('2000-01-01','2100-01-01', 0)

set statistics time off;
set statistics io off;

set statistics io on;
set statistics time on;

select * into #Test2 from dbo.fn_GetTimestampRangeRCTE ('2000-01-01','2100-01-01', 0) OPTION (MAXRECURSION 0);

set statistics time off;
set statistics io off;


Here is the results of that run:

SQL Server Execution Times:

CPU time = 432079 ms, elapsed time = 453693 ms.

(52596001 row(s) affected)

Table 'Worktable'. Scan count 8, logical reads 2046345, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 1525828 ms, elapsed time = 1553694 ms.

(52596001 row(s) affected)

 


As you can see, generating 100 years of data at the minute level took approximately 7.5 minutes using my inline-TVF versus 25.8 minutes using the one from the article.

 

 

 


Copyright © 2002-2014 Simple Talk Publishing. All Rights Reserved. Privacy Policy. Terms of Use. Report Abuse.