June 27, 2011 at 8:19 am
Suppose I have table T1 containing columns A and B with values:
A B
- -
A B
B A
C D
D C
D E
I want to eliminate rows where there are duplicates across both A and B. So in the above example, I want a result of 3 rows:
A B
C D
D E
It could also be
B A
D C
D E
This has stumped be so far, so any help is appreciated.
Regards,
Ray Daugherty
June 27, 2011 at 8:24 am
By the way, my first attempt was
SELECT A, B
FROM Test AS T1
WHERE (NOT EXISTS
(SELECT T2.B, T2.A
FROM Test AS T2
WHERE (T1.A = T2.B) AND (T1.B = T2.A)))
However, that just gave me a one row result of (D, E), eliminating both records that were part of the duplicate instead of just one of them. It seems like a ROW_NUMBER() OVER clause may be needed, but I haven't been able to figure it out yet.
June 27, 2011 at 1:25 pm
CREATE TABLE dbo.table1 (A CHAR(1), B CHAR(1)) ;
GO
INSERT INTO dbo.table1
(A, B)
VALUES ('A', 'B'),
('B', 'A'),
('C', 'D'),
('D', 'C'),
('D', 'E') ;
GO
SELECT DISTINCT
T1.A,
T1.B
FROM table1 T1
LEFT JOIN table1 T2 ON T1.A = T2.B
AND T1.B = T2.A
WHERE T1.A < T2.A
OR T2.A IS NULL ;
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 27, 2011 at 2:04 pm
Perfect!!! Thanks.
June 27, 2011 at 2:08 pm
You're welcome 🙂
PS You did a nice job posting the problem and expected results in your initial post, but I would not have posted a solution had you not provided what you had already tried because this looks like a homework problem (which is fine as long as I see an attempt). Sample DDL and DML to build your test table would have been nice. Here is an article to help you create future posts: http://www.sqlservercentral.com/articles/Best+Practices/61537/%5B/url%5D
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 27, 2011 at 2:14 pm
It's actually a real world problem I was dealing with at work, in a table of 366,000 records with GUID's as the pair of foreign keys. I just wanted to keep the problem simple. Thanks again.
June 29, 2011 at 1:12 pm
I'm not sure if I should create it's own topic, but I see this was recently answered (a rarity when you stumble upon a post from a search engine) and it is more of an extension of a question rather than a new one entirely.
Is there a way to create rules to not allow those kinds of duplicates? I could use that solution to create a stored procedure that will eliminate duplicate pairs, but I'd like to prevent them from occurring in the first place.
June 29, 2011 at 1:16 pm
djacobsen 84059 (6/29/2011)
I'm not sure if I should create it's own topic, but I see this was recently answered (a rarity when you stumble upon a post from a search engine) and it is more of an extension of a question rather than a new one entirely.Is there a way to create rules to not allow those kinds of duplicates? I could use that solution to create a stored procedure that will eliminate duplicate pairs, but I'd like to prevent them from occurring in the first place.
TMK not with DRI. The condition you are trying to prevent can only be checked by evaluating multiple rows of data so you would be forced to use a trigger or some other type of application code (maybe in a proc) to enforce the rule.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 29, 2011 at 1:22 pm
I remember answering to a question very similar to this one. I already see that u got ur reply, but here is (are) another way of doing it:
declare @t table (column1 char(3), column2 Char(3), country varchar(10))
insert into @t values('abc', 'def', 'USA'),
('def', 'abc', 'USA'),
('xxx', 'yyy', 'Canada'),
('yyy', 'xxx', 'Canada')
insert into @t values('abc', 'yyy', 'Canada'),
('xxx', 'def', 'Canada'),
('xxx', 'xxx', 'Canada'),
('xxx', 'xxx', 'Canada')
-- For only 2 columns
; WITH CTE AS
(
SELECT RN = ROW_NUMBER() OVER ( PARTITION BY Country , ( ( column1 + column2 ) + ( column2 + column1 )) ORDER BY ( SELECT NULL ))
,Column1 , column2 , country
from @t
)
SELECT column1 , column2 , country
FROM CTE
WHERE RN = 1
-- For more than 2 columns
; WITH UnPivotData AS
(
SELECT Vals ,
RN = ROW_NUMBER() OVER (ORDER BY Vals)
FROM @t SOURCE_TABLE
UNPIVOT
( Vals FOR ColName IN ( [column1] , [column2]) ) UNPIVOT_HANDLE
-- You'll 've to hardcode your column names above if the matching sets
-- are scattered in more than 2 columns
GROUP BY Vals
),
GroupedData AS
(
SELECT SrcData.column1 , SrcData.column2 , SrcData.country,
-- Add the columns here as well
RowNum = ROW_NUMBER() OVER ( PARTITION BY Country ,SUM ( Unpvt.RN ) ORDER BY ( SELECT 0) )
FROM UnPivotData Unpvt
INNER JOIN @t SrcData
ON ( SrcData.column1 = Unpvt.Vals OR SrcData.column2 = Unpvt.Vals )
-- You'll 've to add the others columns to the OR clause
-- if the matching sets are scattered in more than 2 columns
GROUP BY
SrcData.column1 , SrcData.column2 , SrcData.country
-- Add the columns here as well
)
SELECT column1 , column2 , country
FROM GroupedData
WHERE RowNum = 1
June 29, 2011 at 1:27 pm
Thank you for the quick reply, I am not sure what TMK is, and I will read about DRI. I knew I could do it in application code, however I thought SQL would be faster then what I would put together. (Example, keep a second duplicate table, add new entry to duplicate table, run that query to output only none duplicates and compare to row count of original, post to original if row count changes delete from duplicate if it doesn't. However I can think of quite a few things wrong with that implementation)
June 29, 2011 at 1:46 pm
djacobsen 84059 (6/29/2011)
Thank you for the quick reply, I am not sure what TMK is, and I will read about DRI. I knew I could do it in application code, however I thought SQL would be faster then what I would put together. (Example, keep a second duplicate table, add new entry to duplicate table, run that query to output only none duplicates and compare to row count of original, post to original if row count changes delete from duplicate if it doesn't. However I can think of quite a few things wrong with that implementation)
No problem. And sorry for the shorthand.... TMK = To My Knowledge.
By the way, the solution Cold Coffee put up will destroy the one I put up in terms of performance, so if you're going to store this app code somewhere and run it often look at his.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 11 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply