Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Another grouping problem Expand / Collapse
Author
Message
Posted Wednesday, May 15, 2013 2:43 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 4:09 AM
Points: 3,420, Visits: 5,344
Cadavre (5/13/2013)
I really don't have time to dig into this, work is a bet hectic. When I've got some spare time, I'll try to remember to revisit it. I'm fairly certain that this is a recursive CTE problem, so Dwain may well be your man (he loves recursive CTEs).


Dwain who?



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1452980
Posted Wednesday, May 15, 2013 6:22 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 4:09 AM
Points: 3,420, Visits: 5,344
Sorry to disappoint you guys but I wasn't able to come up with anything.

My gut was telling me an rCTE wouldn't do it and the set-based loop I tried wouldn't resolve it out properly either.

Of course, I could be wrong as I'm a bit out of practice writing rCTEs!



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1453044
Posted Wednesday, May 15, 2013 6:26 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:37 AM
Points: 6,748, Visits: 13,890
dwain.c (5/15/2013)
Sorry to disappoint you guys but I wasn't able to come up with anything.

My gut was telling me an rCTE wouldn't do it and the set-based loop I tried wouldn't resolve it out properly either.

Of course, I could be wrong as I'm a bit out of practice writing rCTEs!


I'm not surprised, Dwain. It's quite tricky. The more obvious methods like MAX() OVER() won't work when you expect them to, in the recursive part.


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1453046
Posted Wednesday, May 15, 2013 8:44 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 3:41 AM
Points: 2,379, Visits: 7,564
ChrisM@Work (5/15/2013)
dwain.c (5/15/2013)
Sorry to disappoint you guys but I wasn't able to come up with anything.

My gut was telling me an rCTE wouldn't do it and the set-based loop I tried wouldn't resolve it out properly either.

Of course, I could be wrong as I'm a bit out of practice writing rCTEs!


I'm not surprised, Dwain. It's quite tricky. The more obvious methods like MAX() OVER() won't work when you expect them to, in the recursive part.


I haven't tried, as I said I'm pretty busy, but I thought a combination of a MAX() OVER() and a recursive CTE was going to do the business. Ah well.

Thanks for taking a look Dwain, be nice to see where your efforts took you.



Not a DBA, just trying to learn

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better, quicker answers on SQL Server performance related questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/



If you litter your database queries with nolock query hints, are you aware of the side effects?
Try reading a few of these links...

(*) Missing rows with nolock
(*) Allocation order scans with nolock
(*) Consistency issues with nolock
(*) Transient Corruption Errors in SQL Server error log caused by nolock
(*) Dirty reads, read errors, reading rows twice and missing rows with nolock


LinkedIn | Blog coming soon (for sufficiently large values of "soon" )!
Post #1453143
Posted Wednesday, May 15, 2013 6:16 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 4:09 AM
Points: 3,420, Visits: 5,344
Cadavre (5/15/2013)
Thanks for taking a look Dwain, be nice to see where your efforts took you.


Very well. For what it's worth:

DECLARE @pairs TABLE (dropped_id INT, retained_id  INT)

INSERT INTO @pairs
--select 15799 , 8388 union all
--select 52941, 8388 union all
--select 52941, 15799 union all
--select 80833, 8388 union all
--select 80833 , 15799
SELECT 34336, 14553 UNION ALL
SELECT 39352, 14553 UNION ALL
SELECT 39352, 34336 UNION ALL
SELECT 39776, 34123 UNION ALL
SELECT 47833, 14553 UNION ALL
SELECT 47833, 34336 UNION ALL
SELECT 47833, 39352 UNION ALL
SELECT 47969, 14553 UNION ALL
SELECT 47969, 34336 UNION ALL
SELECT 47969, 39352 UNION ALL
SELECT 47969, 47833 UNION ALL
SELECT 48460, 14553 UNION ALL
SELECT 48460, 34336 UNION ALL
SELECT 48460, 39352 UNION ALL
SELECT 48460, 47833 UNION ALL
SELECT 48460, 47969 UNION ALL
SELECT 48463, 14553 UNION ALL
SELECT 48463, 34336 UNION ALL
SELECT 48463, 39352 UNION ALL
SELECT 48463, 47833 UNION ALL
SELECT 48463, 47969 UNION ALL
SELECT 48463, 48460 UNION ALL
SELECT 52149, 14553 UNION ALL
SELECT 52149, 34336 UNION ALL
SELECT 52149, 39352 UNION ALL
SELECT 52149, 39776 UNION ALL
SELECT 52149, 47833 UNION ALL
SELECT 52149, 47969 UNION ALL
SELECT 52149, 48460 UNION ALL
SELECT 52149, 48463 UNION ALL
SELECT 54497, 14553 UNION ALL
SELECT 54497, 34336 UNION ALL
SELECT 54497, 39352 UNION ALL
SELECT 54497, 39776 UNION ALL
SELECT 54497, 47833 UNION ALL
SELECT 54497, 47969 UNION ALL
SELECT 54497, 48460 UNION ALL
SELECT 54497, 48463 UNION ALL
SELECT 54497, 52149 UNION ALL
SELECT 66316, 14553 UNION ALL
SELECT 66316, 34336 UNION ALL
SELECT 66316, 39352 UNION ALL
SELECT 66316, 47833 UNION ALL
SELECT 66316, 47969 UNION ALL
SELECT 66316, 48460 UNION ALL
SELECT 66316, 48463 UNION ALL
SELECT 66316, 48819 UNION ALL
SELECT 66316, 52149 UNION ALL
SELECT 66316, 54497 UNION ALL
SELECT 74054, 66316 UNION ALL
SELECT 77722, 53333

DECLARE @pairs2 TABLE (dropped_id INT, retained_id INT)
DECLARE @rows INT
INSERT INTO @pairs2
SELECT dropped_id=CASE WHEN dropped_id < retained_id THEN dropped_id ELSE retained_id END
,retained_id=CASE WHEN dropped_id < retained_id THEN retained_id ELSE dropped_id END
FROM @pairs
SELECT @rows = @@ROWCOUNT

DELETE FROM @pairs

INSERT INTO @pairs
SELECT dropped_id, retained_id
FROM @pairs2
WHERE dropped_id NOT IN (SELECT retained_id FROM @pairs2)
SELECT @rows = @@ROWCOUNT

WHILE @rows <> 0
BEGIN
INSERT INTO @pairs
--OUTPUT INSERTED.dropped_id, INSERTED.retained_id
SELECT a.dropped_id, b.retained_id
FROM @pairs a
INNER JOIN @pairs2 b ON a.retained_id = b.dropped_id
EXCEPT
SELECT dropped_id, retained_id
FROM @pairs
SELECT @rows = @@ROWCOUNT

END

INSERT INTO @pairs
SELECT a=MIN(dropped_id), b=MAX(dropped_id)
FROM @pairs
GROUP BY retained_id
HAVING COUNT(*) > 1

;WITH Pairs AS (
SELECT dropped_id, a.retained_id
,n=ROW_NUMBER() OVER (PARTITION BY a.retained_id ORDER BY dropped_id)
,b
FROM @pairs a
INNER JOIN (
SELECT retained_id, a=MAX(dropped_id), b=MIN(dropped_id)
FROM @pairs
GROUP BY retained_id
HAVING COUNT(*) > 1) b ON a.dropped_id = b.a OR a.retained_id = b.retained_id)
UPDATE a
SET dropped_id=b
FROM Pairs a
WHERE n > 1

SELECT DISTINCT *
FROM @pairs
ORDER BY dropped_id, retained_id


I tweaked it this morning and I think it may now be producing the correct rows for this case but no guarantee it always will.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1453298
Posted Thursday, May 16, 2013 1:23 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:37 AM
Points: 6,748, Visits: 13,890
Cadavre (5/15/2013)
ChrisM@Work (5/15/2013)
dwain.c (5/15/2013)
Sorry to disappoint you guys but I wasn't able to come up with anything.

My gut was telling me an rCTE wouldn't do it and the set-based loop I tried wouldn't resolve it out properly either.

Of course, I could be wrong as I'm a bit out of practice writing rCTEs!


I'm not surprised, Dwain. It's quite tricky. The more obvious methods like MAX() OVER() won't work when you expect them to, in the recursive part.


I haven't tried, as I said I'm pretty busy, but I thought a combination of a MAX() OVER() and a recursive CTE was going to do the business. Ah well.

Thanks for taking a look Dwain, be nice to see where your efforts took you.


There are at least two ways of approaching this problem, number chains and sets. I think the methods reviewed so far have employed number chains, so I had a play with sets.
Allocate each number pair a set identity, so the first row (ordered by whatever) has set identity = 1. Wherever either of the two members of set 1 appear elsewhere in the table, change the set identity for the pair to 1. Run again: wherever any of the (now many) members of set 1 appear elsewhere in the table in a different set, change the set id to 1. Rinse and repeat.
Here's the first attempt:

--DROP TABLE #Temp;

-- preprocess the sample data: there isn't a dupe and a master,
-- they're simply members of a set of two.
WITH SequencedData AS (
SELECT *, SetNo = ROW_NUMBER() OVER (ORDER BY Duplicate_ID, Master_ID)
FROM SCC
)
SELECT SetNo, Duplicate_ID
INTO #Temp
FROM SequencedData
UNION ALL
SELECT SetNo, Master_ID
FROM SequencedData;
-- (102 row(s) affected)

---------------------------------------------------------------------------
-- three different ways of performing the subquery for the update.
-- three updates have to be executed with the sample data to complete.
---------------------------------------------------------------------------
UPDATE t SET SetNo = c.newset
FROM #Temp t
INNER JOIN (
SELECT d.SetNo, newset = MIN(newset)
FROM (
SELECT Duplicate_ID, SetNo, newset = MIN(SetNo) OVER (PARTITION BY Duplicate_ID)
FROM #Temp
) d
GROUP BY d.SetNo
HAVING MIN(newset) <> d.SetNo
) c ON c.SetNo = t.SetNo;
-- (96 row(s) affected)

UPDATE t SET SetNo = c.newset
FROM #Temp t
INNER JOIN (
SELECT q.SetNo, q.newset
FROM (
SELECT
d.SetNo, d.newset, rn = ROW_NUMBER() OVER(PARTITION BY d.SetNo ORDER BY d.newset)
FROM (
SELECT SetNo, newset = MIN(SetNo) OVER (PARTITION BY Duplicate_ID)
FROM #Temp
) d
) q
WHERE rn = 1
AND q.SetNo <> q.newset
) c ON c.SetNo = t.SetNo;
-- (66 row(s) affected)

UPDATE t SET SetNo = c.newset
FROM #Temp t
INNER JOIN (
SELECT newset = MIN(t2.SetNo), t1.SetNo
FROM #Temp t1
INNER JOIN #Temp t2
ON t2.Duplicate_ID = t1.Duplicate_ID
AND t2.SetNo < t1.SetNo
GROUP BY t1.SetNo
) c ON c.SetNo = t.SetNo;
-- (2 row(s) affected)

---------------------------------------------------------------------------
-- Final result set
---------------------------------------------------------------------------
SELECT t.Duplicate, x.Duplicate_ID
FROM (
SELECT SetNo, Duplicate = MIN(Duplicate_ID)
FROM #Temp
GROUP BY SetNo
) t
CROSS APPLY (
SELECT DISTINCT *
FROM #Temp ti
WHERE ti.SetNo = t.SetNo
AND ti.Duplicate_ID <> t.Duplicate
) x;

Note that the subquery code for the three updates shown is logically the same and generates the same results with the sample data - I was playing with the subqueries in an attempt to find something which might be rCTE-compatible.
Three updates have to be performed, a fourth does no work. The results are the same as Abu Dina's and it's pretty darned quick.

I haven't yet discovered a rCTE to do the same process because you can't perform aggregations on the recursive part of a rCTE, which this process requires. You can chain ordinary CTE's though, and get the whole caboodle working in a single query like this:

;WITH DataSets AS (
SELECT
SetID = ROW_NUMBER() OVER(ORDER BY Master_ID, Duplicate_ID),
Master_ID, Duplicate_ID
FROM scc
),
FirstPass AS (
SELECT SetID = ISNULL(x.NewSetID, d.SetID), d.Master_ID, d.Duplicate_ID,
RowAffected = 0 + CASE WHEN x.NewSetID IS NOT NULL THEN 1 ELSE 0 END --
FROM DataSets d
OUTER APPLY ( -- find the lowest SetID that either number belongs to
SELECT NewSetID = MIN(di.SetID)
FROM DataSets di
WHERE di.SetID < d.SetID
AND (di.Master_ID IN (d.Master_ID, d.Duplicate_ID) OR di.Duplicate_ID IN (d.Master_ID, d.Duplicate_ID))
) x
),
SecondPass AS (
SELECT SetID = ISNULL(x.NewSetID, d.SetID), d.Master_ID, d.Duplicate_ID,
RowAffected = RowAffected + CASE WHEN x.NewSetID IS NOT NULL THEN 1 ELSE 0 END
FROM FirstPass d
OUTER APPLY (
SELECT NewSetID = MIN(di.SetID)
FROM FirstPass di
WHERE di.SetID < d.SetID
AND (di.Master_ID IN (d.Master_ID, d.Duplicate_ID) OR di.Duplicate_ID IN (d.Master_ID, d.Duplicate_ID))
) x
),
ThirdPass AS (
SELECT SetID = ISNULL(x.NewSetID, d.SetID), d.Master_ID, d.Duplicate_ID,
RowAffected = RowAffected + CASE WHEN x.NewSetID IS NOT NULL THEN 1 ELSE 0 END
FROM SecondPass d
OUTER APPLY (
SELECT NewSetID = MIN(di.SetID)
FROM SecondPass di
WHERE di.SetID < d.SetID
AND (di.Master_ID IN (d.Master_ID, d.Duplicate_ID) OR di.Duplicate_ID IN (d.Master_ID, d.Duplicate_ID))
) x
),
TweakedResults AS (
SELECT
SetID, Duplicate_ID, rn = ROW_NUMBER() OVER (PARTITION BY SetID ORDER BY Duplicate_ID)
FROM (
SELECT SetID, Duplicate_ID
FROM ThirdPass
UNION -- eliminate dupes
SELECT SetID, Master_ID
FROM ThirdPass
) d
)
SELECT
retained_id = t.Duplicate_ID,
dropped_id = x.Duplicate_ID
FROM TweakedResults t
CROSS APPLY (SELECT Duplicate_ID FROM TweakedResults ti WHERE ti.SetID = t.SetID AND ti.rn>1) x
WHERE t.rn = 1


- but the performance doesn't exactly shine. Have a look at the plan and you see why - the source table is read something like 8 times


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1453347
Posted Thursday, May 16, 2013 3:03 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 4:09 AM
Points: 3,420, Visits: 5,344
ChrisM@Work (5/16/2013)
... the source table is read something like 8 times


Only 8 times? I'd be surprised if my contribution is less than that.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1453378
Posted Monday, June 24, 2013 8:47 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, September 19, 2014 3:37 AM
Points: 708, Visits: 3,288
This is still giving me problems!

DECLARE @pairs TABLE (dropped_id INT, retained_id  INT)

INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011, 24014 UNION ALL
SELECT 23011, 24951 UNION ALL
SELECT 16492, 24951 UNION ALL
SELECT 16492, 29381 UNION ALL
SELECT 24951, 29381 UNION ALL
SELECT 23011, 29381 UNION ALL
SELECT 24014, 66665 UNION ALL
SELECT 3344, 66665

gives back: (using my method, Dwain's and chriM's)



---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1466775
Posted Monday, June 24, 2013 8:59 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:37 AM
Points: 6,748, Visits: 13,890
Abu Dina (6/24/2013)
This is still giving me problems!

DECLARE @pairs TABLE (dropped_id INT, retained_id  INT)

INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011, 24014 UNION ALL
SELECT 23011, 24951 UNION ALL
SELECT 16492, 24951 UNION ALL
SELECT 16492, 29381 UNION ALL
SELECT 24951, 29381 UNION ALL
SELECT 23011, 29381 UNION ALL
SELECT 24014, 66665 UNION ALL
SELECT 3344, 66665

gives back: (using my method, Dwain's and chriM's)



The last solution I posted uses a series of updates. Choose one of them and run it four times, something like this:
DECLARE @pairs TABLE (Duplicate_ID INT, Master_ID  INT)

INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011, 24014 UNION ALL
SELECT 23011, 24951 UNION ALL
SELECT 16492, 24951 UNION ALL
SELECT 16492, 29381 UNION ALL
SELECT 24951, 29381 UNION ALL
SELECT 23011, 29381 UNION ALL
SELECT 24014, 66665 UNION ALL
SELECT 3344, 66665;

-- preprocess the sample data: there isn't a dupe and a master,
-- they're simply members of a set of two.
IF object_id('TempDB..#Temp') IS NOT NULL DROP TABLE #Temp;

WITH SequencedData AS (
SELECT *, SetNo = ROW_NUMBER() OVER (ORDER BY Duplicate_ID, Master_ID)
FROM @pairs
)
SELECT SetNo, Duplicate_ID
INTO #Temp
FROM SequencedData
UNION ALL
SELECT SetNo, Master_ID
FROM SequencedData;
-- (102 row(s) affected)

---------------------------------------------------------------------------
-- FOUR updates have to be executed with the sample data to complete.
---------------------------------------------------------------------------
DECLARE @RowsUpdated INT
SET @RowsUpdated = 1

WHILE @RowsUpdated > 0
BEGIN
UPDATE t SET SetNo = c.newset
FROM #Temp t
INNER JOIN (
SELECT d.SetNo, newset = MIN(newset)
FROM (
SELECT Duplicate_ID, SetNo, newset = MIN(SetNo) OVER (PARTITION BY Duplicate_ID)
FROM #Temp
) d
GROUP BY d.SetNo
HAVING MIN(newset) <> d.SetNo
) c ON c.SetNo = t.SetNo;

SET @RowsUpdated = @@rowcount
END

---------------------------------------------------------------------------
-- Final result set
---------------------------------------------------------------------------
SELECT t.Duplicate, x.Duplicate_ID
FROM (
SELECT SetNo, Duplicate = MIN(Duplicate_ID)
FROM #Temp
GROUP BY SetNo
) t
CROSS APPLY (
SELECT DISTINCT *
FROM #Temp ti
WHERE ti.SetNo = t.SetNo
AND ti.Duplicate_ID <> t.Duplicate
) x;



“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1466782
Posted Monday, June 24, 2013 9:33 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, September 19, 2014 3:37 AM
Points: 708, Visits: 3,288
ChrisM@Work (6/24/2013)
Abu Dina (6/24/2013)
This is still giving me problems!

DECLARE @pairs TABLE (dropped_id INT, retained_id  INT)

INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011, 24014 UNION ALL
SELECT 23011, 24951 UNION ALL
SELECT 16492, 24951 UNION ALL
SELECT 16492, 29381 UNION ALL
SELECT 24951, 29381 UNION ALL
SELECT 23011, 29381 UNION ALL
SELECT 24014, 66665 UNION ALL
SELECT 3344, 66665

gives back: (using my method, Dwain's and chriM's)



The last solution I posted uses a series of updates. Choose one of them and run it four times, something like this:
DECLARE @pairs TABLE (Duplicate_ID INT, Master_ID  INT)

INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011, 24014 UNION ALL
SELECT 23011, 24951 UNION ALL
SELECT 16492, 24951 UNION ALL
SELECT 16492, 29381 UNION ALL
SELECT 24951, 29381 UNION ALL
SELECT 23011, 29381 UNION ALL
SELECT 24014, 66665 UNION ALL
SELECT 3344, 66665;

-- preprocess the sample data: there isn't a dupe and a master,
-- they're simply members of a set of two.
IF object_id('TempDB..#Temp') IS NOT NULL DROP TABLE #Temp;

WITH SequencedData AS (
SELECT *, SetNo = ROW_NUMBER() OVER (ORDER BY Duplicate_ID, Master_ID)
FROM @pairs
)
SELECT SetNo, Duplicate_ID
INTO #Temp
FROM SequencedData
UNION ALL
SELECT SetNo, Master_ID
FROM SequencedData;
-- (102 row(s) affected)

---------------------------------------------------------------------------
-- FOUR updates have to be executed with the sample data to complete.
---------------------------------------------------------------------------
DECLARE @RowsUpdated INT
SET @RowsUpdated = 1

WHILE @RowsUpdated > 0
BEGIN
UPDATE t SET SetNo = c.newset
FROM #Temp t
INNER JOIN (
SELECT d.SetNo, newset = MIN(newset)
FROM (
SELECT Duplicate_ID, SetNo, newset = MIN(SetNo) OVER (PARTITION BY Duplicate_ID)
FROM #Temp
) d
GROUP BY d.SetNo
HAVING MIN(newset) <> d.SetNo
) c ON c.SetNo = t.SetNo;

SET @RowsUpdated = @@rowcount
END

---------------------------------------------------------------------------
-- Final result set
---------------------------------------------------------------------------
SELECT t.Duplicate, x.Duplicate_ID
FROM (
SELECT SetNo, Duplicate = MIN(Duplicate_ID)
FROM #Temp
GROUP BY SetNo
) t
CROSS APPLY (
SELECT DISTINCT *
FROM #Temp ti
WHERE ti.SetNo = t.SetNo
AND ti.Duplicate_ID <> t.Duplicate
) x;



Huh... I must have screwed up the code when I added it to the sproc. I swear it wasn't working before!

Anyhoo, superb work Chris. Thanks so much! That's another drink I owe you


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1466797
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse