June 4, 2008 at 1:33 am
I would please like an answer to a problem I could not figure out a T-SQL query solution, although it seems trivial.
I have a database of related pairs of items of 2 same datatype columns:
Item 1 Item2
A D
A E
B F
C A
C G
C Q
I would need a table of group name and all items in related pairs from left and right columns similar to this
Group Id Item
1 A
1 D
1 E
1 C
1 G
1 Q
2 B
2 F
Thank you for a hint or a solution to that problem?
Branimir Cucek
June 20, 2008 at 7:58 am
I have got so far to this:
I have got the missing pairs from C->A dependency pair
using self joins
SELECT rawlist_1.Item2, rawlist_2.Item1
FROM rawlist INNER JOIN
rawlist rawlist_1 ON rawlist.Item2 = rawlist_1.Item2 INNER JOIN
rawlist rawlist_2 ON rawlist_1.Item1 = rawlist_2.Item2
WHERE (rawlist_1.Item2 = 'A')
then UNION with obvious first pairs
SELECT Item1, Item2
FROM rawlist
WHERE (Item1 = 'A')
plus UNION vice-versa:
SELECT Item2, Item1
FROM rawlist
WHERE (Item2 = 'A')
and then extract group members.
But it seems I should iterate in a recursive way for all other subdependencies
and for every item.
That is how it got complicated.
Any clues or comments on that?
June 20, 2008 at 8:40 am
Hmm, interesting problem, did not want to spend much time on it, so here is a solution using a temporary table:
GO
WITH cte
AS ( SELECT i.Item1
, i.Item2
FROM items AS i
WHERE Item1 <= Item2
UNION
SELECT i.Item2 AS Item1
, i.Item1 AS Item2
FROM items AS i
WHERE i.Item2 <= i.Item1
UNION ALL
SELECT i1.Item1
, i2.Item2
FROM items AS i1
JOIN items AS i2 ON i1.Item2 = i2.Item1
WHERE i1.Item1 <= i2.Item2
)
SELECT DISTINCT
c1.Item1
, dense_rank() OVER ( ORDER BY c1.Item1 ) AS treeNr
INTO #rootItems
FROM cte AS c1
WHERE NOT EXISTS ( SELECT c2.Item1
FROM cte AS c2
WHERE c2.Item2 = c1.Item1 )
GO
WITH cte
AS ( SELECT i.Item1
, i.treeNr
FROM #rootItems AS i
UNION ALL
SELECT i2.Item2
, i1.treeNr AS treeNr
FROM cte AS i1
JOIN items AS i2 ON i1.Item1 = i2.Item1
AND i2.Item1 <= i2.Item2
UNION ALL
SELECT i2.Item1
, i1.treeNr AS treeNr
FROM cte AS i1
JOIN items AS i2 ON i1.Item1 = i2.Item2
AND i2.Item1 > i2.Item2
)
SELECT *
FROM cte
ORDER BY treeNr
, Item1
To set up the whole thing:
create table items (Item1 char, Item2 char)
insert into items values('A','D')
insert into items values('A','E')
insert into items values('B','F')
insert into items values('C','A')
insert into items values('C','G')
insert into items values('C','Q')
Regards,
Andras
June 20, 2008 at 8:56 am
Ok, and here is a solution without a temptable :), just a CTE expression. Although I'm sure there is a more efficient way to do this 🙂
WITH cte
AS ( SELECT i.Item1
, i.Item2
FROM items AS i
WHERE Item1 <= Item2
UNION
SELECT i.Item2 AS Item1
, i.Item1 AS Item2
FROM items AS i
WHERE i.Item2 <= i.Item1
UNION ALL
SELECT i1.Item1
, i2.Item2
FROM items AS i1
JOIN items AS i2 ON i1.Item2 = i2.Item1
WHERE i1.Item1 <= i2.Item2
),
rootnodes_cte as
(SELECT DISTINCT
c1.Item1
, dense_rank() OVER ( ORDER BY c1.Item1 ) AS treeNr
FROM cte AS c1
WHERE NOT EXISTS ( SELECT c2.Item1
FROM cte AS c2
WHERE c2.Item2 = c1.Item1 )
),
cte2
AS ( SELECT i.Item1
, i.treeNr
FROM rootnodes_cte AS i
UNION ALL
SELECT i2.Item2
, i1.treeNr AS treeNr
FROM cte2 AS i1
JOIN items AS i2 ON i1.Item1 = i2.Item1
AND i2.Item1 <= i2.Item2
UNION ALL
SELECT i2.Item1
, i1.treeNr AS treeNr
FROM cte2 AS i1
JOIN items AS i2 ON i1.Item1 = i2.Item2
AND i2.Item1 > i2.Item2
)
SELECT *
FROM cte2
ORDER BY treeNr, Item1
The result is:
Item1 treeNr
----- --------------------
A 1
C 1
D 1
E 1
G 1
Q 1
B 2
F 2
Regards,
Andras
June 20, 2008 at 10:04 am
Mine is simpler, but it ends up reversing the group order on that sample of data:
create table #T (
Item1 char(1) not null,
Item2 char(1) not null,
constraint PK_T primary key (item1, item2))
insert into #t (item1, item2)
select 'A','D' union all
select 'A','E' union all
select 'B','F' union all
select 'C','A' union all
select 'C','G' union all
select 'C','Q'
select distinct dense_rank() over (order by item1), item1
from #t
;with
CTE (GroupID, Item) as
(select distinct dense_rank() over (order by item1), item1
from #t
where item1 not in
(select item2
from #t)
union all
select groupid, item2
from #t
inner join cte
on #t.item1 = cte.item)
select *
from cte
order by groupid, item
GroupIDItem
1B
1F
2A
2C
2D
2E
2G
2Q
If the sequence of the groups doesn't matter, then this is definitely simpler code.
To get the groups to sequence as desired, I'd have to add two more CTEs to it. One to assign the group IDs to the first entry in each group, one to join the data together and replace the original group ID from the first CTE. Easy enough to do, but is it unnecessary complexity or does it actually matter?
This query will work with up to 100 levels of grouping. If, for example, D appeared in Item1 with Z in Item2 in another row of the table, it would add that to group 2. Setting maxrecursion would allow that limit to be adjusted up or down.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 22, 2008 at 1:14 pm
Is this a correct interpretation of your original sample data?
A - D
/ E C - Q
|
G
B
F
And if I understand you corrent, you want all interconnected subgraphs ordered?
N 56°04'39.16"
E 12°55'05.25"
June 22, 2008 at 1:26 pm
What will then happen when you add the sample data
SELECT'D', 'Q' UNION ALL
and remove the sample data
SELECT'A', 'D' UNION ALL
The graph then looks like
A D
/ \ E C - Q
|
G
B
F
and the result should be similar to
Item1treeNr
A1
C1
D1
E1
G1
Q1
B2
F2
and NOT look like this?
Item1treeNr
A1
C1
E1
G1
Q1
B2
F2
D3
Q3
N 56°04'39.16"
E 12°55'05.25"
June 22, 2008 at 2:40 pm
Not that goodlooking but it will do the job.
DECLARE@Items TABLE (RowID INT IDENTITY(1, 1), Item1 CHAR, Item2 CHAR)
INSERT@Items
SELECT'D', 'Q' UNION ALL
SELECT'A', 'E' UNION ALL
SELECT'B', 'F' UNION ALL
SELECT'C', 'A' UNION ALL
SELECT'C', 'G' UNION ALL
SELECT'C', 'Q'
/*
A D
/ \ E C - Q
|
G
B
F
*/
DECLARE@Yak TABLE
(
Item CHAR PRIMARY KEY CLUSTERED,
Grp INT
)
INSERT@Yak
(
Item,
Grp
)
SELECTItem1,
MIN(RowID)
FROM(
SELECTItem1,
RowID
FROM@Items
UNION ALL
SELECTItem2,
RowID
FROM@Items
) AS d
GROUP BYItem1
DECLARE@RowID INT
SELECT@RowID = MAX(RowID)
FROM@Items
WHILE @RowID IS NOT NULL
BEGIN
UPDATEy
SETy.Grp = x.Seq
FROM@Yak AS y
INNER JOIN(
SELECTy.Item,
y.Grp,
MIN(y.Grp) OVER () AS Seq
FROM@Items AS i
INNER JOIN@Yak AS y ON y.Item IN (i.Item1, i.Item2)
WHEREi.RowID = @RowID
) AS x ON x.Item = y.Item
OR x.Grp = y.Grp
SELECT@RowID = MAX(RowID)
FROM@Items
WHERERowID < @RowID
END
SELECTItem,
Grp
FROM@Yak
ORDER BYGrp,
Item
N 56°04'39.16"
E 12°55'05.25"
June 22, 2008 at 4:06 pm
I think that this is the "connected subgraphs" problem from my Discrete Algorithms class in college. Of course it was in Pascal, not SQL, and it was 30 years ago...
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
June 22, 2008 at 9:20 pm
This example seems very efficient:
DECLARE @Items TABLE (ID INT IDENTITY(1, 1), Item1 CHAR, Item2 CHAR, Grp bigint)
INSERT @Items (Item1, Item2)
SELECT 'D', 'Q' UNION ALL
SELECT 'A', 'E' UNION ALL
SELECT 'B', 'F' UNION ALL
SELECT 'C', 'A' UNION ALL
SELECT 'C', 'G' UNION ALL
SELECT 'C', 'Q' UNION ALL
SELECT 'H', 'I' UNION ALL
SELECT 'I', 'J' UNION ALL
SELECT 'J', 'K' UNION ALL
SELECT 'K', 'H' UNION ALL
SELECT 'L', 'M' UNION ALL
SELECT 'M', 'N' UNION ALL
SELECT 'N', 'O' UNION ALL
SELECT 'O', 'P' UNION ALL
SELECT 'P', 'R' UNION ALL
SELECT 'R', 'S' UNION ALL
SELECT 'S', 'T' UNION ALL
SELECT 'T', 'U'
/*
A D
/ \ E C - Q
|
G
B
F
I
/ H J
\ /
K
L-M-N-O-P-R-S-T-U
*/
--======Make the Connected table
Create Table #Connected (Node char(1) NOT NULL, ConnectedTo char(1) NOT NULL)
ALTER TABLE #Connected ADD CONSTRAINT [PK_#Nodes] PRIMARY KEY CLUSTERED (Node ASC, ConnectedTo ASC)
--start with the first-order connections:
INSERT into #Connected
Select Item1, Item2 from @Items
UNION
Select Item2, Item1 from @Items
UNION
Select Item1, Item1 from @Items
Declare @rows int
Set @rows = 1
--====== Loop until done
While @rows > 0
BEGIN
--======Now Extend it one Order:
Insert into #Connected
Select distinct C1.Node, C2.ConnectedTo
From #Connected C1
Join #Connected C2
ON C1.ConnectedTo = C2.Node--transitive property
Where NOT EXISTS (Select * From #Connected C11
Where C11.Node = C1.Node
And C11.ConnectedTo = C2.ConnectedTo
)
Set @rows = @@RowCount
END
--======Report Results
Select Node, Ascii(Min(Upper(ConnectedTo))) - 47 as [Group]
From #Connected
Group By Node
drop table #Connected
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
July 1, 2008 at 7:50 am
Thank you all for giving me an efficient answer to the real problem.
However, there are some other questions.
It seems that the group naming after the ASCII value of the lowest named item is inconvenient.
The letters in the rows in real database represent different items(car parts) as 6-12 digit varchars. There are just
represented as letters here for clearer picture of the problem.
Anyway, taking from what we have, it is easier if you just take a MIN(ConnectedTo) element as a unique group name.
There is no constraint on the name of the group - it just need to be unique.
I have adopted this and begun to think about maintenance of the living cross-reference table.
Indeed, if you want to join two groups into one you would just need to insert a new pair
with items from different groups and rerun the query and the cross reference table would rearange itself.
To add or remove an item from group is trivial, but the big problem is when you want to remove the item which is
actually a group name in the Group column ( MIN(ConnectedTo) element from the original query). You would need
some dummy element pair which will now qualify as group name ( MIN(ConnectedTo) element ) in the query rerun and then
remove the pair "dummy element"-"last group column". But after a while i think the cross reference table would be a mess.
Or Maybe if there is a really some unique name of the group indepedent from item names generated in the first place it would be no problem but how can this be accomplished and maintained?
Which is a better way to pursue?
Branimir Cucek
July 1, 2008 at 2:54 pm
Is there a physical relationship between the items? Like, PartA is a component of PartB? (Classic bill of materials.) Or is the connection between items arbitrary? ("Buy one shock absorber, get two spark plugs free!" kind of stuff.)
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 2, 2008 at 6:36 am
No, it is not classical Bill of Materials. It is more like you sell shock absorbers from many manufacturers for various types of cars , and every manufacturer/brand has it own catalog part number for a particular car. Also, the parts from the same brands can also have more than one number (old/new number with different prices, quality I or II etc.).
The aim is that using only one number you obtain from one catalog or the brand number customer gives you, you have to know what other shock absorbers of different brands you have on stock for the particular car that fits so that the customer/seller can choose without searching different catalogs for various brands.
Hope it helps.
Branimir Cucek
July 2, 2008 at 7:42 am
So, when this returns multiple groups, what is it actually giving you?
I'm envisioning a proc that has a part number as an input parameter, and which gives you all the equivalent parts. Is that what you're looking for? Or is this more of a report of everything in the system all at once?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 2, 2008 at 9:02 am
That is it,
... a procedure that has a part number as an input parameter, and which gives you all the equivalent parts.
What, in your opinion, is a best strategy for maintenance of such a table
Branimir Cucek
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply