Remove Duplicates in opposite columns

  • I have a table containing the following data:

    LinkingIDID1 ID2

    166202180659253178

    166202253178180659

    166334180380253179

    166334253179180380

    166342180380180659

    166342180659180380

    166582253179258643

    166582258643253179

    264052258642258643

    264052258643258642

    264502258643258663

    264502258643259562

    Within the LinkingID, there are duplicates in ID1 and ID2 but just in opposite columns. I have been trying to figure out a way to remove these set based. It doesn't matter which duplicate is removed. Essentially these are just endpoints and I don't care which side they are on. The solution must recognize the duplicates and not just remove based on every 2nd row. It seems that this should be very simple, but I can't figure it out. Any help is greatly appreciated.

  • SELECT

    LinkingID,

    ID1,

    ID2,

    rn = ROW_NUMBER() OVER (PARTITION BY LinkingID, IDMAX, IDMIN ORDER BY (SELECT NULL))

    FROM #Sample s

    CROSS APPLY (

    SELECT

    IDMAX = CASE WHEN ID1 > ID2 THEN ID1 ELSE ID2 END,

    IDMIN = CASE WHEN ID1 > ID2 THEN ID2 ELSE ID1 END

    ) x

    -- sample data:

    CREATE TABLE #Sample (LinkingID INT, ID1 INT, ID2 INT)

    INSERT INTO #Sample (LinkingID, ID1, ID2) VALUES

    (166202, 180659, 253178),

    (166202, 253178, 180659),

    (166334, 180380, 253179),

    (166334, 253179, 180380),

    (166342, 180380, 180659),

    (166342, 180659, 180380),

    (166582, 253179, 258643),

    (166582, 258643, 253179),

    (264052, 258642, 258643),

    (264052, 258643, 258642),

    (264502, 258643, 258663),

    (264502, 258643, 259562)

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • This solution worked perfectly. Thanks for your help. I definitely need to study the CROSS APPLY feature more.

  • keith.west-982594 (1/22/2015)


    This solution worked perfectly. Thanks for your help. I definitely need to study the CROSS APPLY feature more.

    You're welcome. You don't have to use APPLY:

    WITH Swapper AS (

    SELECT *,

    IDMAX = CASE WHEN ID1 > ID2 THEN ID1 ELSE ID2 END,

    IDMIN = CASE WHEN ID1 > ID2 THEN ID2 ELSE ID1 END

    FROM #Sample

    )

    SELECT

    LinkingID,

    ID1,

    ID2,

    rn = ROW_NUMBER() OVER (PARTITION BY LinkingID, IDMAX, IDMIN ORDER BY (SELECT NULL))

    FROM Swapper;

    SELECT

    LinkingID,

    ID1,

    ID2,

    rn = ROW_NUMBER() OVER (PARTITION BY LinkingID,

    CASE WHEN ID1 > ID2 THEN ID1 ELSE ID2 END,

    CASE WHEN ID1 > ID2 THEN ID2 ELSE ID1 END

    ORDER BY (SELECT NULL))

    FROM #Sample;

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • keith.west-982594 (1/22/2015)


    This solution worked perfectly. Thanks for your help. I definitely need to study the CROSS APPLY feature more.

    Take a look at the two articles in my signature from Paul White. They provide an excellent look at APPLY.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Removed bad solution

    😎

  • Slightly different take, using ChrisM's sample table.

    SELECT s.LinkingID, MaxID

    FROM #Sample AS s

    CROSS APPLY (SELECT TOP 1 [MaxID]

    FROM (VALUES(s.ID1),

    (s.ID2) )x([MaxID])

    ORDER BY MaxID DESC) y

    GROUP BY s.LinkingID, y.MaxID

  • For the long haul, I'd suggest modifying the code to always put the smaller of the two IDs in ID1, a check constraint on ID1 that enforces ID1 <= ID2, and then a unique constraint across the 3 IDs. Of course, there should be NOT NULL constraints on all 3 columns.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Just for fun, using the SQL 2012 LAG function, this query matches ChrisM's first result:

    SELECT LinkingID, ID1, ID2

    ,rn=CASE WHEN ID3=ID2 AND ID4=ID1 THEN 2 ELSE 1 END

    FROM

    (

    SELECT LinkingID, ID1, ID2

    ,ID3=LAG(ID1) OVER (PARTITION BY LinkingID ORDER BY ID1)

    ,ID4=LAG(ID2) OVER (PARTITION BY LinkingID ORDER BY ID1)

    FROM #Sample

    ) a;

    Using his sample data of course.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply