June 18, 2013 at 6:30 am
ChrisM@Work (6/18/2013)
sagesmith (6/18/2013)
Please post the times each approach takes, this is the only basis for a conclusion. As others have already mentioned, in many scenarios the cursor far outperforms the recursive cte.Over to you then. Sample data setup is a page or two back. I've compared iTally vs rCTE here:
http://www.sqlservercentral.com/Forums/FindPost1464526.aspx
Feel free to write your fastest cursor-based solution π
π That was really funny.
But, I think( and truly hope ) it was a typo by the poster when he said "cursor far outperforms the recursive cte".
If in case he really meant it, even I would love to be enlightened π
How to post data/code on a forum to get the best help - Jeff Moden
http://www.sqlservercentral.com/articles/Best+Practices/61537/
June 18, 2013 at 6:34 am
ChrisM@Work (6/18/2013)
sagesmith (6/18/2013)
Please post the times each approach takes, this is the only basis for a conclusion. As others have already mentioned, in many scenarios the cursor far outperforms the recursive cte.Over to you then. Sample data setup is a page or two back. I've compared iTally vs rCTE here:
http://www.sqlservercentral.com/Forums/FindPost1464526.aspx
Feel free to write your fastest cursor-based solution π
My comment was for the author of the article who was comparing rCTE to Cursor. Your Tally approach is great for that sample scenario, no argument. I have found, as others have mentioned, that as you attempt to scale the rCTE approach the performance seems to go in an inverse relationship to the amount of rows. In these scenarios a Cursor or the Tally approach (inline or maybe faster as a Numbers table on disk?) does much better than the rCTE.
I have seen these enlightened articles on rCTE before and I think they should come with a word of caution (unless I'm the one who is not enlightened? please prove me wrong).
June 18, 2013 at 7:45 am
If I may make a fine point:
It is more accurate to say that in a multi-statement batch, the statement (query) preceding a CTE should be terminated by a semi-colon.
I find the following just a little easier to grasp: (Change MonthCount to int)
With CTE_Base (SeqNo, Begin_Date, Month_Count, Payment)
as
(select SeqNo, Date_Field AS Begin_Date,Month_Count, Payment from dbo.abc
UNION all
select SeqNo, dateadd(mm, 1, Begin_Date), Month_Count -1, Payment from CTE_Base
where Month_Count > 1
)
SELECT SeqNo, Begin_Date, Month_Count, Payment
FROM CTE_Base
order by SeqNo, Begin_Date
June 18, 2013 at 7:53 am
your Tally kills the rCTE and hybrid Cursor/Numbers as you can see. On my box:
Inline Tally = 146 milliseconds
rCTE = 2113 milliseconds
Cursor/Numbers = 1183 milliseconds
The Cursor based solution is only partially so, I still couldn't bring myself to copy the author's while inner loop...yikes. But the point I'm trying to make is with bashing cursors:
http://stackoverflow.com/questions/287445/why-do-people-hate-sql-cursors-so-much
see most popular answer under The Size Issue heading.
--If the table exists, drop it.
if exists (select * from sys.tables
where name = 'abc'
and Schema_Name(schema_id) = 'dbo'
and [type] = 'U')
drop table dbo.abc
go
--Create the source table, dbo.abc.
create table dbo.abc
(SeqNo smallint
,Date_Field datetime
,Month_Count int
,Payment decimal(10,2))
go
--Populate the source table, dbo.abc
insert into dbo.abc (SeqNo, Date_Field, Month_Count, Payment)
values (1, '20090101', 95000, 100)
,(2, '20100101', 7000, 200)
,(3, '20110101', 5000, 300)
go
--If exists, drop the destination table dbo.def
if exists (select * from sys.tables
where name = 'def'
and Schema_NAME(schema_id) = 'dbo'
and [type] = 'U')
drop table dbo.def
go
--Create the destination table, dbo.def
create table dbo.def
(SeqNo smallint
,Date_Field smalldatetime
,Payment decimal(10,2))
go
------------ Inline Tally version ------------------
-- black hole variables:
DECLARE @SeqNo SMALLINT, @Date_Field DATETIME, @Payment DECIMAL (10,2)
DECLARE @StartTime DATETIME;
-- time store
SET @StartTime = GETDATE()
;WITH E1(n) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), -- 10 rows
E2(n) AS (SELECT 1 FROM E1 a, E1 b),-- 10x10 rows
E3(n) AS (SELECT 1 FROM E2 a, E2 b, E2 c)-- 100x100x100 rows
SELECT
@SeqNo = SeqNo,
@Date_Field = DATEADD(MONTH,x.n,Date_Field),
@Payment = Payment
FROM dbo.abc
CROSS APPLY(SELECT TOP(Month_Count) n = -1+ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E3) x
SELECT 'Inline Tally = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'
PRINT '-------------------------------------------------------------------------'
-------------- rCTE version ------------------------------
SET @StartTime = GETDATE()
;with CTE_Base (SeqNo, Date_Field, Month_Count, Payment, Begin_Date, End_Date, Frequency)
as
(select SeqNo, Date_Field, Month_Count, Payment, Date_Field, dateadd(mm, Month_Count-1, Date_Field), 1 from dbo.abc
union all
select SeqNo, dateadd(mm, Frequency, Date_Field), Month_Count, Payment, Begin_Date, End_Date, Frequency
from CTE_Base
where dateadd(mm, Frequency, Date_Field) between Begin_Date and End_Date)
select
@SeqNo = SeqNo,
@Date_Field = Date_Field,
@Payment = Payment
from CTE_Base
where Date_Field between Begin_Date and End_Date
order by SeqNo, Date_Field
OPTION(MAXRECURSION 0)
SELECT 'rCTE = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'
PRINT '-------------------------------------------------------------------------'
-------------- CURSOR version ------------------------------
SET @StartTime = GETDATE()
declare @l_SeqNo int
,@l_date_field DATETIME
,@l_Month_Count int
,@l_Payment decimal(10, 2)
,@l_counter SMALLINT
,@l_max_month_count INT = (SELECT MAX(Month_Count) FROM dbo.abc);
SELECT TOP (@l_max_month_count) IDENTITY(int,1,1) AS number
into #Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
select @l_counter = 0
set nocount on;
declare i_Cursor CURSOR FAST_FORWARD
FOR
select SeqNo, Date_Field, Month_Count, Payment from dbo.abc
open i_Cursor
fetch next from i_Cursor into
@l_SeqNo
,@l_date_field
,@l_Month_Count
,@l_Payment
while @@fetch_status = 0
begin
select number, dateadd(mm, number-1, @l_date_field), @l_Payment
FROM #Numbers
WHERE number<=@l_Month_Count
fetch next from i_Cursor into
@l_SeqNo
,@l_date_field
,@l_Month_Count
,@l_Payment
end
close i_Cursor
deallocate i_Cursor
set nocount off;
SELECT 'Cursor/Numbers = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'
PRINT '-------------------------------------------------------------------------'
DROP TABLE #Numbers;
June 18, 2013 at 7:55 am
sagesmith (6/18/2013)
ChrisM@Work (6/18/2013)
sagesmith (6/18/2013)
Please post the times each approach takes, this is the only basis for a conclusion. As others have already mentioned, in many scenarios the cursor far outperforms the recursive cte.Over to you then. Sample data setup is a page or two back. I've compared iTally vs rCTE here:
http://www.sqlservercentral.com/Forums/FindPost1464526.aspx
Feel free to write your fastest cursor-based solution π
My comment was for the author of the article who was comparing rCTE to Cursor. Your Tally approach is great for that sample scenario, no argument. I have found, as others have mentioned, that as you attempt to scale the rCTE approach the performance seems to go in an inverse relationship to the amount of rows. In these scenarios a Cursor or the Tally approach (inline or maybe faster as a Numbers table on disk?) does much better than the rCTE.
I have seen these enlightened articles on rCTE before and I think they should come with a word of caution (unless I'm the one who is not enlightened? please prove me wrong).
I can show you rCTE's scaling well and performing remarkably well too - threads and articles are linked in Dwain's article in my signature below. One or two will surprise you including a super-fast distinct documented by Paul White, and very fast running totals. I can tell you haven't yet read it π
I've seen rCTE's performing poorly against other coding methods too. Even posted one or two. What I haven't seen is this "inverse relationship" i.e. where the rCTE fails to scale. Do you have an example?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 18, 2013 at 8:16 am
Jonathan AC Roberts (6/18/2013)
It can also be done even more simply with an INNER JOIN and tally table (and is faster than the rCTE version)
SELECT A.Seqno,
DATEADD(month, T.N, A.Date_field) Date_Field,
A.Payment
FROM dbo.abc A
INNER JOIN dbo.tsqlc_Tally T
ON T.N < A.Month_Count
I use this tally table it starts from zero and has just over a million rows.
That's the solution I was thinking about while reading this scenario.
I was also wondering if this was close to real code or an entirely contrived example - I don't understand the business case for creating data like this.
Also, what is the impact on logs for potentially large (10M+) insert-all-at-once versus the cursor solution? Was the target table being used in a transactional system? If yes, the locks held to do a large insert might be felt by transaction customers while the row-locks for a single insert might not. Another extreme use-case would be cross-server inserts - distributed transaction overhead can become prohibitive... and sometimes is sidestepped by calling a 4part name SP with cursor variables.
fwiw: Of course I'm an advocate of set-based solutions. These are just a few examples of why the default answer for almost any performance question is "it depends" π
June 18, 2013 at 8:47 am
If "Show Actual Execution Plan" is set prior to executing the code and then both the queries (the CURSOR based one and the recursive CTE based one) are executed, you will note that the query cost of the conventional CURSOR based approach to this problem's solution is around 91% where as it is around 9% with the recursive CTE based solution. That is almost a ten fold savings in resource utilization with the recursive CTE based solution.
As far as I understand, query costs are only comparable within the single execution plan. You can compare execution plans to determine how the optimizer handles the different queries, but the percent values are only relative to a single execution plan. 9% of a 10 minute query is more than 91% of a 10 second query. To compare different queries, you need to use more absolute metrics such as I/O or CPU time.
Wes
(A solid design is always preferable to a creative workaround)
June 18, 2013 at 8:52 am
One other thing:
Please note that I have used TOP 100 PERCENT in the SELECT statement in this function because I want to retain the ORDER BY clause, which is otherwise not allowed in InLine table-values functions.
Top 100 Percent... Order by no longer has any effect on the result. SQL 2008 ignores that combination. You can read more here.
Wes
(A solid design is always preferable to a creative workaround)
June 18, 2013 at 8:53 am
whenriksen (6/18/2013)
If "Show Actual Execution Plan" is set prior to executing the code and then both the queries (the CURSOR based one and the recursive CTE based one) are executed, you will note that the query cost of the conventional CURSOR based approach to this problem's solution is around 91% where as it is around 9% with the recursive CTE based solution. That is almost a ten fold savings in resource utilization with the recursive CTE based solution.
As far as I understand, query costs are only comparable within the single execution plan. You can compare execution plans to determine how the optimizer handles the different queries, but the percent values are only relative to a single execution plan. 9% of a 10 minute query is more than 91% of a 10 second query. To compare different queries, you need to use more absolute metrics such as I/O or CPU time.
Comparing query costs in this manner simply doesn't work with rCTE's, the relative cost is almost always wildly underestimated. In the example I posted a few back, the inline tally solution and the rCTE solution were costed at very roughly 50% each - about 30-fold out.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 18, 2013 at 9:04 am
What I haven't seen is this "inverse relationship" i.e. where the rCTE fails to scale. Do you have an example?
The first article by Jeff Moden in your signature makes the point pretty well.
June 18, 2013 at 9:12 am
sagesmith (6/18/2013)
What I haven't seen is this "inverse relationship" i.e. where the rCTE fails to scale. Do you have an example?
The first article by Jeff Moden in your signature makes the point pretty well.
Forum etiquette? :blush: have I upset someone?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 18, 2013 at 9:56 am
Hi Mike,
Thanks for the nice discussion.
This was indeed a real life example in Finance domain. I just broke down its various components for creating a simple example. In the real scenario, we had around 63 million records to process like this. In other words, there were 63 million distinct SeqNo's in the source table, dbo.abc. But the value of month count would never go above 12, the number of months in a financial year. This was just a one time process though. Of course, we had other fields in our source table that made the logic more complex. But for this article, all I did was remove the unnecessary complexity out so that the concept could be brought forward. In our case, the cursor based apporach took roughly six hours for execution. On the other hand, the rCTE based solution took only 3 minutes. I agree that I could have used the Tally table as well. In our real case, I was satisfied with the rCTE usage as a solution to this problem. Tally would have been even better performance-wise but I did not want to create another physical table or a temp table to store the Tally records. My self-imposed goal was to use only existing tables.
------Thanks and Best Regards,
Suresh.
June 18, 2013 at 10:41 am
I think you missed the point that chris was making, if you look at his example he uses an Inline CTE to act as a Tally table.
As you have a fixed number of Periods (12) you could have done something like
SELECT
Tabe.columnlist
, DATEADD(MONTH,y.N,Table.aDate) Period
From
Table
CROSS APPLY (SELECT TOP(MONTH_COUNT) N-1
FROM (VALUES(1),(2),(3),(4),(5),(6)
,(7),(8),(9),(10),(11),(12))
x (N)) y
if you really wanted you could get rid of the N-1 by starting the VALUES seqence from 0-11.
If you want a large tally table then Chris's CROSS JOIN CTE method works well.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
June 18, 2013 at 11:32 am
Jason-299789 (6/18/2013)
I think you missed the point that chris was making, if you look at his example he uses an Inline CTE to act as a Tally table.As you have a fixed number of Periods (12) you could have done something like
SELECT
Tabe.columnlist
, DATEADD(MONTH,y.N,Table.aDate) Period
From
Table
CROSS APPLY (SELECT TOP(MONTH_COUNT) N-1
FROM (VALUES(1),(2),(3),(4),(5),(6)
,(7),(8),(9),(10),(11),(12))
x (N)) y
if you really wanted you could get rid of the N-1 by starting the VALUES seqence from 0-11.
If you want a large tally table then Chris's CROSS JOIN CTE method works well.
Hi Jason, I completely agree with your method above and Chris's method with Tally inside CTE. There is no doubt that both the methods would be faster. I was just replying to Mike's query above where the Tally table is being separately built and populated prior to being used in the main query.
Another reason why I chose a set-based approach rather than a row-by-row insert in the cursor-based approach is that in my scenario, I did not want to lock the destination table dbo.def around seven hundred million times, once per insert (I had 63 million distinct rows in dbo.abc, the source table). In the set-based approach, it would lock dbo.def only once and then release the lock post the insert operation. I preferred a minor inconvenience for 3 minutes over a nagging problem for 6 hours. I am sure that with Tally, the 3 minutes would get even further reduced. Great solution. Thanks to you and Chris for the insights.
----Thanks and Best Regards,
Suresh.
June 18, 2013 at 12:11 pm
Please forgive my lack of etiquette. I meant to reference this article by Jeff Moden[/url].
I am humbly submitting this code for review by the gurus here on this forum. I think it might show that the CURSOR might be unfairly cast as the villain for this problem. By using the CURSOR in a similar way and just changing the INNER WHILE loop I am able to halve the time it takes the rCTE on my box. Is the CURSOR, as it is used in the example, the bottleneck?
Viewing 15 posts - 16 through 30 (of 68 total)
You must be logged in to reply to this topic. Login to reply