SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Solve Problems Using Recursive CTE


Solve Problems Using Recursive CTE

Author
Message
bj_shenglong
bj_shenglong
Forum Newbie
Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)

Group: General Forum Members
Points: 8 Visits: 63
Problem: Bank requires to find out number of months customs have spent money more then certain amount in consecutive month
or sales department requires to find out number of months a product has been sold more then certain amount in consecutive month.

If data is just in few months or just require for 2 or 3 consecutive months, then simple table join can do it.
But we have like years records, and need count all consecutive months, it will could be hard to hard coding all table join for 2, 3, 4 ,5 ... consecutive months.
There is an easy way using recursive CTE to solve such a problem.

For example, a table store sales records like below. And we need to list number of consecutive months for any sale who sold a product more then 2 in a month. It is can be done easily by following recursive CTE query

sample data
CREATE TABLE #sales(
[name] [varchar](50) NOT NULL,
[saledate] [datetime] NULL,
[quantity] [int] NULL
)

insert into #sales(name,saledate,quantity)
values
('A','2012-01-01',1),
('A','2012-02-01',2),
('A','2012-03-01',3),
('A','2012-04-01',4),
('A','2012-05-01',5),
('A','2012-06-01',6)

insert into #sales(name,saledate,quantity)
values
('B','2012-01-01',6),
('B','2012-02-01',2),
('B','2012-03-01',3),
('B','2012-04-01',4),
('B','2012-05-01',1),
('B','2012-06-01',6)

insert into #sales(name,saledate,quantity)
values
('C','2012-01-01',6),
('C','2012-02-01',1),
('C','2012-03-01',3),
('C','2012-04-01',1),
('C','2012-05-01',4),
('C','2012-06-01',1)

insert into #sales(name,saledate,quantity)
values
('D','2012-01-01',6),
('D','2012-02-01',3),
('D','2012-03-01',3),
('D','2012-04-01',4),
('D','2012-05-01',1),
('D','2012-06-01',6)

-- wm for all months in which product sold more 2
with wm as (
select name,saledate
from #sales
where quantity>2
),
-- only using above qualified records, not all records to do recursive join
base_cte (name,saledate ) as (
select * from wm
union all
select a.name,a.saledate
from wm a inner join base_cte b
on a.name=b.name and a.saledate=dateadd(month,1,b.saledate)
)

-- the count column indicates number of consecutive month for that month.
select b.name,b.saledate, COUNT(b.name) as cnt
from base_cte b
group by b.name,b.saledate
order by b.name,b.saledate

-- for example cnt = 2, meaning that month is the second month, from which, backward in 2
-- consecutive month, a product was sold more then 2 each month
-- cnt = 3, meaning that month is the third month, from which, backward in 3
-- consecutive month, a product was sold more then 2
CapnHector
CapnHector
Hall of Fame
Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)

Group: General Forum Members
Points: 3029 Visits: 1789
This looks like a gaps and island problem. it also from the problem description sounds like homework or an interview question.

Take a look at this for a detailed explanation and solution of the problem. http://www.manning.com/nielsen/nielsenMEAP_freeCh5.pdf Specifically part two chapter 5


For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw

Need to Split some strings? Jeff Moden's DelimitedSplit8K
Jeff Moden's Cross tab and Pivots Part 1
Jeff Moden's Cross tab and Pivots Part 2
mickyT
mickyT
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2710 Visits: 3318
Using Jeff Moden's article http://www.sqlservercentral.com/articles/T-SQL/71550/ about Group Islands of Contiguous Dates as inspiration, you could do the following

;WITH cte AS (
SELECT name,
DATEADD(mm, - ROW_NUMBER() OVER (ORDER BY name, saledate), saledate) dategroup,
saledate
FROM #sales
WHERE quantity > 2
)
SELECT name, MIN(saledate) firstsaledate, COUNT(*) cnt
FROM cte
GROUP BY name, dategroup
ORDER BY name, dategroup


bj_shenglong
bj_shenglong
Forum Newbie
Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)

Group: General Forum Members
Points: 8 Visits: 63
I know Jeff Moden's solution.
But I assume this way could be faster because there is not sorting for row number.
At least, it is a different solution.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (202K reputation)SSC Guru (202K reputation)SSC Guru (202K reputation)SSC Guru (202K reputation)SSC Guru (202K reputation)SSC Guru (202K reputation)SSC Guru (202K reputation)SSC Guru (202K reputation)

Group: General Forum Members
Points: 202813 Visits: 41944
bj_shenglong (12/5/2012)
But I assume this way could be faster...


That's how rumors of performance get started. :-)

Write code to build a million row test table and test your hypothesis. No matter which way it turns out, we'll all learn something if you post the results. See the following articles for how to do such a thing pretty easily and quickly.

http://www.sqlservercentral.com/articles/Data+Generation/87901/
http://www.sqlservercentral.com/articles/Test+Data/88964/

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
bj_shenglong
bj_shenglong
Forum Newbie
Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)

Group: General Forum Members
Points: 8 Visits: 63
There is a particular problem, let's' say, we just want to find out who sold more then 2 each month in at least two consecutive months and when.

For the above sample, below recursion would work. Also, recursion will stop immediately when it reaches the first qualified date.

with m2_cte_f (name,saledate,quantity,ind) as (
select s.*, 0 as ind
from #sales s
where s.saledate='2012-01-01'
union all
select s.*, case when s.quantity > 2 and sc.quantity > 2 then 1 else 0 end as ind
from #sales s
inner join m2_cte_f sc
on (s.saledate = dateadd(month,1,sc.saledate) and s.name=sc.name)
where sc.ind = 0
)
select * from m2_cte_f where ind=1
mickyT
mickyT
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2710 Visits: 3318
So comparing the islands and recursive queries returning similar rows

;WITH cte AS (
SELECT name,
DATEADD(mm, - ROW_NUMBER() OVER (ORDER BY name, saledate), saledate) dategroup,
saledate
FROM #sales
WHERE quantity > 2 and saledate >= '2012-01-01'
)
SELECT name, max(saledate), COUNT(*)
FROM cte
GROUP BY name, dategroup
HAVING COUNT(*) > 1
ORDER BY name, dategroup

;with m2_cte_f (name,saledate,quantity,ind) as (
select s.*, 0 as ind
from #sales s
where s.saledate='2012-01-01'
union all
select s.*, case when s.quantity > 2 and sc.quantity > 2 then 1 else 0 end as ind
from #sales s
inner join m2_cte_f sc
on (s.saledate = dateadd(month,1,sc.saledate) and s.name=sc.name)
where sc.ind = 0
)
select * from m2_cte_f where ind=1



I get the following IO stats (timing not worth mentioning 1ms each) for the small test set
(3 row(s) affected)
Table '#sales____00000000009F'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(3 row(s) affected)
Table 'Worktable'. Scan count 2, logical reads 91, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#sales____00000000009F'. Scan count 2, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.



Upping the stakes a tiny bit by putting a moderate amount of data (3000 odd rows) into the table
INSERT INTO #sales (name, saledate)
SELECT *
FROM
(SELECT * FROM (VALUES('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T')) as sales(name)) names,
(SELECT TOP 156 dateadd(mm, N, '1999-12-01') saledate FROM Tally) as months

UPDATE #sales
SET quantity = RAND(Checksum(Newid())) * 5

CREATE CLUSTERED INDEX SALES_IDX1 ON #sales (saledate, name)



I get the following
(22 row(s) affected)
Table '#sales__0000000000A4'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(15 row(s) affected)
Table 'Worktable'. Scan count 2, logical reads 635, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#sales__0000000000A4'. Scan count 96, logical reads 193, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 3 ms.


As I added more rows to the table the recursive query got very gradually slower and did more reads, while the islands query remained static. I got up to 75816 rows. Would you believe the had sales data back to 1770 for the same 26 people :-D
(27 row(s) affected)
Table '#sales__0000000000AD'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(19 row(s) affected)
Table 'Worktable'. Scan count 2, logical reads 1193, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#sales__0000000000AD'. Scan count 189, logical reads 380, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 5 ms.

Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (202K reputation)SSC Guru (202K reputation)SSC Guru (202K reputation)SSC Guru (202K reputation)SSC Guru (202K reputation)SSC Guru (202K reputation)SSC Guru (202K reputation)SSC Guru (202K reputation)

Group: General Forum Members
Points: 202813 Visits: 41944
bj_shenglong (12/5/2012)
There is a particular problem, let's' say, we just want to find out who sold more then 2 each month in at least two consecutive months and when.

For the above sample, below recursion would work. Also, recursion will stop immediately when it reaches the first qualified date.


Although they can be fast, recursive CTEs are still procedural in nature. The only way to know for sure is to do a test.

{Edit} Was distracted by a code promotion going on at work and I see that MickyT made just such a test. Thank you, good Sir!

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
thava
thava
SSC-Addicted
SSC-Addicted (469 reputation)SSC-Addicted (469 reputation)SSC-Addicted (469 reputation)SSC-Addicted (469 reputation)SSC-Addicted (469 reputation)SSC-Addicted (469 reputation)SSC-Addicted (469 reputation)SSC-Addicted (469 reputation)

Group: General Forum Members
Points: 469 Visits: 557
this is my contribution

USE tempdb
GO
IF OBJECT_ID('TestTbl') IS NOT NULL
DROP TABLE TestTbl

CREATE TABLE Testtbl (id INT PRIMARY KEY)

INSERT INTO Testtbl
(
id
)
SELECT TOP 1000 ROW_NUMBER() OVER ( ORDER BY c.object_id) id FROM sys.[columns] c ,sys.[columns] c2

DELETE FROM TestTbl WHERE id IN(SELECT top 100 ABS(CHECKSUM(NEWID())%1000) FROM sys.[columns] c)

DELETE FROM TestTbl WHERE id IN(SELECT top 100 ABS(CHECKSUM(NEWID())%1000) FROM sys.[columns] c)

SELECT * FROM TestTbl;

WITH S AS (
SELECT ROW_NUMBER() OVER (order by t.id) AS RN,t.id
FROM TestTbl t LEFT OUTER JOIN TestTbl t2
ON t.id -1= t2.id
WHERE t2.id IS NULL
),E AS (
SELECT ROW_NUMBER() OVER (order by t.id) AS RN, t.id
FROM TestTbl t LEFT OUTER JOIN TestTbl t2
ON t.id +1= t2.id
WHERE t2.id IS NULL
)

SELECT s.id AS [START], e.id AS [END] FROM S INNER JOIN E ON s.Rn= E.rn
GO


using this i try to solve your problem

USE tempdb
GO

IF OBJECT_ID('sales') IS NOT NULL
DROP TABLE sales
CREATE TABLE sales(
[name] [varchar](50) NOT NULL,
[saledate] [datetime] NULL,
[quantity] [int] NULL
)
GO
Insert into sales(name,saledate,quantity)
values
('A','2012-01-01',1),('A','2012-02-01',2),('A','2012-03-01',3),
('A','2012-04-01',4),('A','2012-05-01',5),('A','2012-06-01',6),

('B','2012-01-01',6),('B','2012-02-01',2),('B','2012-03-01',3),
('B','2012-04-01',4),('B','2012-05-01',1),('B','2012-06-01',6),

('C','2012-01-01',6),('C','2012-02-01',1),('C','2012-03-01',3),
('C','2012-04-01',1),('C','2012-05-01',4),('C','2012-06-01',1),

('D','2012-01-01',6),('D','2012-02-01',3),('D','2012-03-01',3),
('D','2012-04-01',4),('D','2012-05-01',1),('D','2012-06-01',6);
WITH Fil AS (
SELECT *
FROM sales
WHERE quantity > 2
),
S AS (
SELECT ROW_NUMBER() OVER (ORDER BY t.name) AS RN, t.NAME, MONTH (t.saledate) AS id
FROM Fil t
LEFT OUTER JOIN Fil t2
ON MONTH (t.saledate) -1 = MONTH (t2.saledate) AND
t2.name = t.name
WHERE t2.saledate IS NULL
),
E AS (
SELECT ROW_NUMBER() OVER (ORDER BY t.name) AS RN, t.NAME, MONTH (t.saledate) AS id
FROM Fil t
LEFT OUTER JOIN Fil t2
ON MONTH (t.saledate) + 1 = MONTH (t2.saledate) AND
t2.name = t.name
WHERE t2.saledate IS NULL
),
Gap AS(
SELECT e.NAME, s.id AS [DateStart], e.id AS [DateEnd]
FROM S
INNER JOIN E
ON s.Rn = E.rn
--AND s.id<>e.id
)

SELECT g.NAME ,g.Datestart, sum(CASE WHEN(sales.quantity>2) AND MONTH(sales.saledate) BETWEEN g.datestart AND g.dateend THEN 1 ELSE 0 END ) as Res
FROM sales INNER JOIN gap g ON sales.name =g.NAME AND month(sales.saledate)>= g.datestart
GROUP BY g.name,g.Datestart
ORDER BY g.name,g.Datestart


i miss the criteria



Every rule in a world of bits and bytes, can be bend or eventually be broken
MyBlog About Common dialog control
A Visualizer for viewing SqlCommand object script
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search