RE: Sins of SQL: The Time Table

SSC Guru

Points: 442467

June 6, 2010 at 12:55 pm

Steven James Gray (6/6/2010)
Inefficiency is relative. As slow as it is, it's possible to generate enough data to cover a century in less than half an hour when dealing at the minute resolution, and without any effort applied to optimise what's there. Relative to the time and effort to maintain and keep on top of populating a time table manually, it seems like small-fry. You'd not reprocess the whole dimension every time you loaded a fact in to the tables either but rather only incrementally populate ranges as you go.

Actually inefficiency is not relative, it may just not be noticable. Inefficient is still inefficient if there is another means to generate the same results that is better.

If you look at the code, the recursive steps in the example given are broken into two recursive elements - the date range and the time one, rather than a single second-level recursive process. The query plans in this case take the date/time combinations and do a table spool over them, and then as each day value is generated it is cross-joined to the time combinations. That's a sum total of 36,500 total recursions per century, total! To prove this is the case try running the following query using the TVF from the article:
SELECT * FROM dbo.[fn_GetTimestampRange]('1 January 2006','1 February 2006', 0) OPTION(MAXRECURSION 60)
This will produce a rowset of 44641 rows, but never recurses more than 60 levels as a result of the optimisations obtained by the multiple re-use of the numbers expression within the function. The only part of it that actually grows in cost is the recursive step-per-day. Your maxrecursion value needs to be at least 60, and at most the number of distinct days in the range covered. As a result, this is substantially more efficient than your article would imply.
It's possible to optimise this even further by doing the same trick when resolving all the possible year/month permutations, meaning the recursive cost would shrink further, but I thought that'd impact readability and not gain much for performance.

Care to prove your statement? I'm pretty sure that my routine (with a slight modification thanks to you allowing me to see an ineffeciency in my code, which I have fixed in my blog) will consistantly beat your code in head to head competition.

As it stands the current design has the advantage being an inlinable TVF (and most performance work would likely prevent that), which means it can be used directly in a DSV, and streams the rows as fast as they're produced, rather than waiting for all rows to be computed in a temp-table, as the multi-step TVF approach would do.

Have you taken a close look at my code (either version)? It is also written as an inline-TVF and can be used in exactly the same fashion as yours. It has the benefit of being more efficient. After making a small modification to my code, I ran both functions to generate 100 years of data at one minute intervals, writing the output to temporary tables:

set statistics io on;

set statistics time on;

select * into #Test1 from dbo.fn_GetTimestampRangeCTE ('2000-01-01','2100-01-01', 0)

set statistics time off;

set statistics io off;

set statistics io on;

set statistics time on;

select * into #Test2 from dbo.fn_GetTimestampRangeRCTE ('2000-01-01','2100-01-01', 0) OPTION (MAXRECURSION 0);

set statistics time off;

set statistics io off;

SQL Server Execution Times:

CPU time = 432079 ms, elapsed time = 453693 ms.

(52596001 row(s) affected)

Table 'Worktable'. Scan count 8, logical reads 2046345, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 1525828 ms, elapsed time = 1553694 ms.

(52596001 row(s) affected)

Once again, you'll see that your function, using a recursive CTE is quite a bit slower. And even generating 52,596,001 rows of data, mine still isn't generating additional IO like the recursive CTE does.