Recursion with a Twist

Question

Recursion with a Twist

Phil Parkin

SSC Guru

Points: 246716
More actions
November 28, 2012 at 11:54 am

#285600

Hi - can't seem to get my head around this one, so any assistance is welcome.
I am trying to write a query against a table of invoices, returning all relevant rows for a particular InvoiceId. Here's some sample data:
--Create temp table to hold the dummy data
if object_id('tempdb..#IDs') is not null
drop table #IDs
create table #IDs (
InvoiceId int not null
,BookingId int not null
) on [PRIMARY]
go
alter table #IDs add constraint PK_IDs primary key clustered (
InvoiceId
,BookingId
)
with (
STATISTICS_NORECOMPUTE = off
,IGNORE_DUP_KEY = off
,ALLOW_ROW_LOCKS = on
,ALLOW_PAGE_LOCKS = on
) on [PRIMARY]
go
insert #IDs ( InvoiceId, BookingId)
select * from (values (1,9), (1,10), (1,11), (2,11), (3,11), (3,12), (3,13), (4,14), (5,14)) data(InvoiceId,BookingId)
select * from #IDs
Now, imagine that we're interested in returning data for InvoiceId 1.
What I would like to return is this:
SELECT *
FROM (
VALUES
(1,9),
(1,10),
(1,11),
(2,11),
(3,11),
(3,12),
(3,13)) x
([InvoiceId],[BookingId])
The 'twist' should now be obvious: InvoiceId 1 has BookingIds which relate to other invoices, which in turn may have BookingIds which relate to other invoices. I want to return all related rows.
If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Viewing 15 posts - 1 through 15 (of 35 total)

You must be logged in to reply to this topic. Login to reply

CapnHector SSCoach Points: 16143 More actions · Answer 1

If there is a row (3,14) would all your sample data then be returned because of (4,14) and (5,14)? from your description i think that is correct. just trying to wrap my head around it.

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

Phil Parkin SSC Guru Points: 246716 More actions · Answer 2

Correct.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Alan Burstein SSC Guru Points: 61136 More actions · Answer 3

This is a bit tricky and I am still working on this. Here's what I have thusfar...

--Create temp table to hold the dummy data

if object_id('tempdb..#IDs') is not null

drop table #IDs

CREATE TABLE #IDs

(

InvoiceId int not null,

BookingId int not null,

primary key clustered (InvoiceId, BookingId)

);

INSERT #IDs VALUES (1,9), (1,10), (1,11), (2,11), (3,11), (3,12), (3,13), (4,14), (5,14)

GO

DECLARE @val int = 1;

WITH x AS

(

SELECT i.InvoiceId, i.BookingId

from

(

SELECT InvoiceId, BookingId

from #IDs

WHERE InvoiceId = @val

) a

JOIN #IDs i ON a.BookingId = i.BookingId

)

SELECT DISTINCT i.*

FROM x

RIGHT JOIN #IDs i ON x.InvoiceId = i.InvoiceId

WHERE x.InvoiceId IS NOT NULL

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

CapnHector SSCoach Points: 16143 More actions · Answer 4

What i came up with was to turn it into an adjacency list hierarchy and then we can recourse through that. of course i could be completly off base here but here is the code to turn it into the adjacency list

CREATE TABLE #IDsParents (

InvoiceId int not null

,BookingId int not null

,ParentInvoiceID INT

)

insert #IDsParents ( InvoiceId, BookingId)

select * from (values (1,9), (1,10), (1,11), (2,11), (3,11), (3,12), (3,13), (4,14), (5,14)) data(InvoiceId,BookingId)

;WITH Parent AS(

SELECT a.InvoiceId, b.InvoiceId AS Parent

FROM #IDsParents a

LEFT JOIN #IDsParents b

ON a.BookingId = b.BookingId

AND a.InvoiceId > b.InvoiceId

WHERE b.InvoiceId IS NOT NULL

)

UPDATE b SET ParentInvoiceID = a.Parent

FROM Parent a

RIGHT JOIN #IDsParents b

ON a.InvoiceId = b.InvoiceId

SELECT * FROM #IDsParents

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 5

capnhector (11/28/2012)
What i came up with was to turn it into an adjacency list hierarchy and then we can recourse through that. of course i could be completly off base here but here is the code to turn it into the adjacency list
[/code]

The only problem I have with that solution method is that you can't traverse in both directions. IE: Pulling Invoice 3 won't get you back to 1 (or vice versa) depending on the parenting.

The biggest problem with this recursion is that it's self-joining. For those curious:

--am trying to write a query against a table of invoices, returning all relevant rows for a particular InvoiceId. Heres some sample data:

--Create temp table to hold the dummy data

if object_id('tempdb..#IDs') is not null

drop table #IDs

create table #IDs (

InvoiceId int not null

,BookingId int not null

) on [PRIMARY]

go

alter table #IDs add constraint PK_IDs primary key clustered (

InvoiceId

,BookingId

)

with (

STATISTICS_NORECOMPUTE = off

,IGNORE_DUP_KEY = off

,ALLOW_ROW_LOCKS = on

,ALLOW_PAGE_LOCKS = on

) on [PRIMARY]

go

insert #IDs ( InvoiceId, BookingId)

select * from (values (1,9), (1,10), (1,11), (2,11), (3,11), (3,12), (3,13), (4,14), (5,14)) data(InvoiceId,BookingId)

select * from #IDs

DECLARE @InvoiceID INT

SET @InvoiceID = 1

;WITH rCTE AS

(SELECT

InvoiceID,

BookingID,

1 AS HierarchyLevel

FROM

#IDs

WHERE

InvoiceID = @InvoiceID

UNION ALL

SELECT

#IDs.InvoiceID,

ids2.BookingID,

rCTE.HierarchyLevel + 1 AS HierarchyLevel

FROM

rCTE

JOIN

#IDs

ON#IDs.BookingID = rCTE.BookingID

JOIN

#IDs AS ids2

ON#IDs.InvoiceID = ids2.InvoiceID

WHERE

rCTE.HierarchyLevel + 1

)

SELECT * FROM rCTE

I'm trying to work out a way to exclude previously included invoices from the detection list but you can't double-reference the CTE in the recursion, so NOT IN (SELECT) clauses and the like are out. I'm actually thinking this may need to be looped as a baseline and then worked on from there.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 6

This is NOT pretty, but it IS functional.

--am trying to write a query against a table of invoices, returning all relevant rows for a particular InvoiceId. Heres some sample data:

--Create temp table to hold the dummy data

if object_id('tempdb..#IDs') is not null

drop table #IDs

if object_id('tempdb..#TempStore') is not null

drop table #TempStore

create table #IDs (

InvoiceId int not null

,BookingId int not null

) on [PRIMARY]

go

alter table #IDs add constraint PK_IDs primary key clustered (

InvoiceId

,BookingId

)

with (

STATISTICS_NORECOMPUTE = off

,IGNORE_DUP_KEY = off

,ALLOW_ROW_LOCKS = on

,ALLOW_PAGE_LOCKS = on

) on [PRIMARY]

go

insert #IDs ( InvoiceId, BookingId)

select * from (values (1,9), (1,10), (1,11), (2,11), (3,11), (3,12), (3,13), (4,14), (5,14)) data(InvoiceId,BookingId)

select * from #IDs

DECLARE @InvoiceID INT,

@Rowcount INT

SELECT @InvoiceID = 1,

@Rowcount = 1

CREATE TABLE #TempStore

(InvoiceID INT, BookingID INT)

INSERT INTO #TempStore

SELECT

InvoiceID, BookingID

FROM

#IDs

WHERE

InvoiceID = @InvoiceID

-- Set this here, might as well not hit the loop if no records to work from.

SELECT @Rowcount = @@ROWCOUNT

WHILE @Rowcount <> 0

BEGIN

INSERT INTO #TempStore

SELECT

ids2.InvoiceID, ids2.BookingID

FROM

#IDs

JOIN

(SELECT DISTINCT BookingID FROM #TempStore) AS drv

ON#IDs.BookingID = drv.BookingID

JOIN

#IDs AS ids2

ON#IDs.InvoiceID = ids2.InvoiceID

WHERE

#IDs.InvoiceID NOT IN(SELECT DISTINCT InvoiceID FROM #TempStore)

SET @Rowcount = @@ROWCOUNT

END

SELECT * FROM #TempStore

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Alan Burstein SSC Guru Points: 61136 More actions · Answer 7

Evil Kraig F (11/28/2012)
This is NOT pretty, but it IS functional.

I am trying to understand why went with a loop vs a set-based approach. I posted a more set based method earlier that gets the same results notably faster. I am not trying to be confrontational, I'm just trying to understand your approach or what I did wrong.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 8

Alan.B (11/28/2012)
Evil Kraig F (11/28/2012)
This is NOT pretty, but it IS functional.
I am trying to understand why went with a loop vs a set-based approach. I posted a more set based method earlier that gets the same results notably faster. I am not trying to be confrontational, I'm just trying to understand your approach or what I did wrong.

Whoops, my apologies Alan. I didn't realize you were code complete. This comment:

Alan.B (11/28/2012)
This is a bit tricky and I am still working on this. Here's what I have thusfar...

Misled me to believe you were still working on your solution and I breezed over it before going back to trying to force the rCTE to work.

However, you will miss recursive chains. For example, I've adjusted the inclusion set here:

INSERT #IDs VALUES (1,9), (1,10), (1,11), (2,11), (2,12), (3,12), (3,13), (4,14), (5,14)

GO

You'll notice that your code will no longer pick up Invoice 3 because You need to go from 1->2 via Booking 11, then 2->3 via Booking 12.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Alan Burstein SSC Guru Points: 61136 More actions · Answer 9

Evil Kraig F (11/28/2012)
Alan.B (11/28/2012)
Evil Kraig F (11/28/2012)
This is NOT pretty, but it IS functional.
I am trying to understand why went with a loop vs a set-based approach. I posted a more set based method earlier that gets the same results notably faster. I am not trying to be confrontational, I'm just trying to understand your approach or what I did wrong.
Whoops, my apologies Alan. I didn't realize you were code complete. This comment:
Alan.B (11/28/2012)
This is a bit tricky and I am still working on this. Here's what I have thusfar...
Misled me to believe you were still working on your solution and I breezed over it before going back to trying to force the rCTE to work.
However, you will miss recursive chains. For example, I've adjusted the inclusion set here:
INSERT #IDs VALUES (1,9), (1,10), (1,11), (2,11), (2,12), (3,12), (3,13), (4,14), (5,14)
GO
You'll notice that your code will no longer pick up Invoice 3 because You need to go from 1->2 via Booking 11, then 2->3 via Booking 12.

Ahhhh.... Now I see what I was doing wrong; I knew I was missing something.

Thanks & nice work.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

CapnHector SSCoach Points: 16143 More actions · Answer 10

Alan.B (11/28/2012)
Evil Kraig F (11/28/2012)
This is NOT pretty, but it IS functional.
I am trying to understand why went with a loop vs a set-based approach. I posted a more set based method earlier that gets the same results notably faster. I am not trying to be confrontational, I'm just trying to understand your approach or what I did wrong.

~~@Alan.b If i look at your method correctly it misses InvoiceID 3.~~

@Kraig your right about the issues with my method that you cant go both directions up the tree which i had forgotten about. (dont work with hierarchies much)

EDIT: i sat to long to hit the post button so struck what was said before the post.

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

thava SSCrazy Points: 2275 More actions · Answer 11

how about this one

SET @InvoiceID = 1;WITH rCTE AS

(

SELECT InvoiceID, BookingID, 1 AS HierarchyLevel

FROM #IDs

WHERE InvoiceID = @InvoiceID

UNION ALL

SELECT IDs.InvoiceID, ids2.BookingID, rCTE.HierarchyLevel + 1 AS HierarchyLevel

FROM rCTE

JOIN #IDs ids

ON IDs.BookingID = rCTE.BookingID AND

ids.InvoiceId <> rcte.InvoiceID

INNER JOIN #IDs ids2

ON ids2.InvoiceId = ids.InvoiceID

WHERE rCTE.InvoiceID<ids2.InvoiceId

)

select * from Rcte

Every rule in a world of bits and bytes, can be bend or eventually be broken
MyBlog About Common dialog control
A Visualizer for viewing SqlCommand object script [/url]

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 12

thava (11/29/2012)
how about this one

It can't go backwards from 3 to get back to 1 because of the limitor to reduce duplication and endless recursion. It's similar to the issue above in the hierarchy model. You need to be able to go after either InvoiceID 3 or InvoiceID 1 and get the same result list.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Phil Parkin SSC Guru Points: 246716 More actions · Answer 13

First of all, thanks to everyone who has taken the time to think about this problem and contribute. I spent over an hour blowing recursion limits left, right and centre yesterday without getting anywhere close, so I appreciate the input.

I was excited to see Alan's solution working perfectly with my data, only to test it out using Craig's alternative (and valid) data and see it not work quite so well ...

It remains a problem for us here, which for the moment we are dealing with by hard-coding multiple recursion levels. But we have examples where the actual number of recursions needed to scrape all the details together is greater than what we have coded. So any further input is most welcome.

Thanks

Phil

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Phil Parkin SSC Guru Points: 246716 More actions · Answer 14

I wrote the above response before trying Craig's solution, which certainly does the business - thanks! Maybe this is one instance where a loop is the best solution. I'll leave it a while longer (no pun intended) before I implement it, to see whether anyone comes up with a set-based solution.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.