Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««56789

Sins of SQL: The Time Table Expand / Collapse
Author
Message
Posted Sunday, June 6, 2010 11:42 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, March 4, 2011 11:38 AM
Points: 53, Visits: 99
Lynn Pettis (6/6/2010)
Steve,

You really are missing the entire point. I am not disputing what the function does but HOW it does it. Your use of a recursive CTE is inefficient REGARDLESS of the application of the overall code in this regard. What, may I ask, is so complex about my code in comparision to yours? Is it how I have formatted the code?


I'm not missing the point. The benefit you will extract from said code improvements over your lifetime has likely been exceeded by the energy you've exhausted debating/examining the matter. If this were code I were recommending for a high volume transactional OLTP application, I'd likely be more enthused about a performance debate, and put as much energy into that aspect as I have with the other areas of discussion. It's not, so I won't.

I look forward to our next discussion.
Post #933339
Posted Monday, June 7, 2010 8:31 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Friday, September 12, 2014 9:43 AM
Points: 7,284, Visits: 13,824
Lynn Pettis (6/1/2010)
Its lack of scalability is its biggest problem. A recursive CTE is still RBAR (a Modenism for Row By Agonizing Row).


Here's a little comparison between "quirky update" and rCTE which supports most folks' observation that the rCTE is slower. However, I'd disagree with you Lynn that the rCTE fails to scale - the "quirky update" is 5-7 times faster whether the dataset is 100k rows or 1 million rows. Sure it's slower - but it does scale.

Here's the test with 1 million rows:
DROP TABLE #Numbers
SELECT TOP 1000000 --000
n = ROW_NUMBER() OVER (ORDER BY a.name),
CalcValue = CAST(NULL AS BIGINT)
INTO #Numbers
FROM master.dbo.syscolumns a, master.dbo.syscolumns b
CREATE UNIQUE CLUSTERED INDEX CIn ON #Numbers ([n] ASC)


SET STATISTICS IO ON
SET STATISTICS TIME ON

-- 'Quirky' update
DECLARE @Lastval INT = 0, @CalcValue BIGINT
UPDATE #Numbers SET
@CalcValue = CalcValue = (@Lastval + n),
@Lastval = n
-- (1,000,000 row(s) affected) / CPU time = 4218 ms, elapsed time = 5719 ms.
-- Table #Numbers... Scan count 1, logical reads 3113, physical reads 6, read-ahead reads 3146, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

-- Recursive CTE
;WITH Calculator (n, CalcValue) AS (
SELECT n.n,
CalcValue = CAST(n.n AS BIGINT)
FROM #Numbers n
WHERE n.n = 1
UNION ALL
SELECT n.n,
CalcValue = n.n + c.n
FROM #Numbers n
INNER JOIN Calculator c ON c.n + 1 = n.n -- nice
)
SELECT n, CalcValue
FROM Calculator
OPTION (MAXRECURSION 0)
-- (1,000,000 row(s) affected) / CPU time = 32438 ms, elapsed time = 35148 ms.
-- Table 'Worktable'. Scan count 2, logical reads 6000001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SET STATISTICS IO Off
SET STATISTICS TIME Off

And with 100k rows:
DROP TABLE #Numbers
SELECT TOP 100000
n = ROW_NUMBER() OVER (ORDER BY a.name),
CalcValue = CAST(NULL AS BIGINT)
INTO #Numbers
FROM master.dbo.syscolumns a, master.dbo.syscolumns b
CREATE UNIQUE CLUSTERED INDEX CIn ON #Numbers ([n] ASC)


SET STATISTICS IO ON
SET STATISTICS TIME ON

-- 'Quirky' update
DECLARE @Lastval INT = 0, @CalcValue BIGINT
UPDATE #Numbers SET
@CalcValue = CalcValue = (@Lastval + n),
@Lastval = n
-- (100000 row(s) affected) / CPU time = 454 ms, elapsed time = 526 ms.
-- Table #Numbers... Scan count 1, logical reads 314, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

-- Recursive CTE
;WITH Calculator (n, CalcValue) AS (
SELECT n.n,
CalcValue = CAST(n.n AS BIGINT)
FROM #Numbers n
WHERE n.n = 1
UNION ALL
SELECT n.n,
CalcValue = n.n + c.n
FROM #Numbers n
INNER JOIN Calculator c ON c.n + 1 = n.n -- nice
)
SELECT n, CalcValue
FROM Calculator
OPTION (MAXRECURSION 0)
-- (100000 row(s) affected) / CPU time = 3203 ms, elapsed time = 3483 ms.
-- Table 'Worktable'. Scan count 2, logical reads 600001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SET STATISTICS IO Off
SET STATISTICS TIME Off

Whichever measure of "resource used" you choose for comparison, both methods appear to scale in a linear manner.

If you've absolutely got to do a job row by row, such as a running totals, then a quirky update is virtually guaranteed to complete before a rCTE. That's not the whole picture though. The rCTE gives you output which you may have to write back: the quirky update does the opposite. The rCTE is often quicker to write with a little practice too. I use both methods, sometimes even writing both for a particular job then choosing one or the other on merit - which might be readability over performance.

Cheers

ChrisM


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #933538
Posted Tuesday, June 8, 2010 7:28 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, September 11, 2014 8:17 AM
Points: 44, Visits: 156
The other point not mentioned, is that, for DimDate and DimTime, the real reason over and above the performance of joining via an int (and typically not going via the DimDate either) is the additional information required.


Almost all places have a financial calendar that is NOT the normal calendar, and having both of these in the table is essential.

Same for time, shifts are NOT (legislatively in most countries) 24 hours long, and also don't match up to the 24 hour period anyway
Post #934347
Posted Tuesday, July 12, 2011 12:06 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Saturday, December 15, 2012 4:09 PM
Points: 13, Visits: 65
Hi Steve,

Great article on the fn_gettimerangestamp. I've been using a date table for some time at a few people's suggestions for some a variety of issues at work. I'm not in IT, just a manager with limited resources and a bit of a SQL aficionado. Anyway, I'm in the process of setting up a reporting database/cube and I'm definitely going to try to make use of your function. I work in transit and everything is measured in seconds (ie revenue hours, deadhead hours, layover hours etc all need to be calculated to at least the minute, but are stored in seconds in the source database). So this will be a godsend if it helps improve performance.

I was wondering if you've ever had any ideas on setting up time tables (not in the SQL sense, but schedules). I've been working on pivoting times on routes using SQL Pivot, but it's a bit cumbersome. Some of the variables that I encounter are that I have fixed routes with fixed patterns. However, routes will sometimes start service mid trip, so the earliest time is not actually at the normal starting point. The other issue is that there are a lot of repeat stops when a route does a loop.

I can provide more background if you've worked on anything like this, just curious.

Thanks for the great info!
Craig
Post #1140141
« Prev Topic | Next Topic »

Add to briefcase «««56789

Permissions Expand / Collapse