In a self-joined table, how to remove ALL records with duplicates

  • How to remove the rows 6,7, 12,13, 21 and 22?
    I intend to remove pair of records with duplicate Col3 and the ID of that record is the same as the OrigID of the other record. My query below deletes only 3 rows instead of 6.

    begin tran 
    delete FROM #Table
    WHERE ID in ( SELECT r1.id FROM #Table r1
        INNER JOIN #Table r2
        ON r1.ID= r2.OrigID
        WHERE r1.Col3 = r2.Col3
        )

    3 records are not deleted


  • Another condition here is that assuming that the Col3 of rows 1 and 2 are the same, they must not be deleted because the ID and the ORIGID
    of those records are not equal. Can anyone help? Thank you

  • vujicikm - Tuesday, April 24, 2018 9:48 PM

    Another condition here is that assuming that the Col3 of rows 1 and 2 are the same, they must not be deleted because the ID and the ORIGID
    of those records are not equal. Can anyone help? Thank you

    "In a self joined table" implies that you've posted details from a query result.
    You're likely to get a better query if you post up your source table in the form of CREATE TABLE (and INSERTs to populate) - this gives people some data to run their code against.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • The duplicate value is listed in column ID or in column OrigID. So you also have to delete the rows where the duplicate value is listed in the OrigID column.

    delete FROM #Table
    WHERE ID in ( SELECT r1.id FROM #Table r1
      INNER JOIN #Table r2
      ON r1.ID= r2.OrigID
      WHERE r1.Col3 = r2.Col3
      )
    OR OrigID in ( SELECT r1.id FROM #Table r1
      INNER JOIN #Table r2
      ON r1.ID= r2.OrigID 
      WHERE r1.Col3 = r2.Col3
      )

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • Let's add the displayed data and the code to do it:CREATE TABLE #Table (
        ID int NOT NULL PRIMARY KEY CLUSTERED,
        OrigID int NOT NULL,
        Col1 int NOT NULL,
        Col2 int NOT NULL,
        Col3 char(6) NOT NULL,
        Col4 char(6) NOT NULL
    );
    INSERT INTO #Table (ID, OrigID, Col1, Col2, Col3, Col4)
        VALUES    (490465, 388007, 491367, 207114, '1A1F13', '38D741'),
                (627543, 412091, 747366, 978516, '28EEE1', 'FAB636'),
                (454046, 513930, 825512, 331307, 'B34E0C', '136041'),
                (723560, 766176, 532240, 532964, '1CB7D4', '02756D'),
                (766176, 324855, 711708, 532964, 'DAA45E', '11CD7B'),
                (861009, 816214, 595236, 525376, '094222', '144054'),
                (816214, 859732, 256486, 882616, '094222', '7CB7AE'),
                (621922, 205037, 625409, 205806, 'B451E2', '2EABB6'),
                (205037, 133244, 409998, 205806, '2F398D', '93E5F9'),
                (965119, 990016, 631267, 267954, '1B900A', 'E80542'),
                (792120, 845792, 751546, 170739, '1C0BD8', '2E9065'),
                (888927, 871529, 231089, 945847, 'EDE8DF', '8B5F10'),
                (871529, 417818, 577038, 980451, 'EDE8DF', '5C4A73'),
                (190496, 771008, 303567, 594950, 'B8CF7C', '7E5851'),
                (754771, 268480, 642805, 219194, 'FCF4E7', '293C8A'),
                (941315, 410362, 519672, 561005, '82CCC6', '3033EA'),
                (714356, 441476, 475392, 823554, '287663', '45C9D6'),
                (529539, 747572, 155897, 212880, 'AC0DEB', 'A7118D'),
                (972839, 593053, 960970, 436351, '98BD01', '999F3A'),
                (883615, 725671, 410949, 720495, '5877D2', 'CD6E46'),
                (769460, 397870, 765429, 188897, 'E1E672', '1FABA2'),
                (397870, 975048, 958499, 440291, 'E1E672', '588961'),
                (646407, 880842, 775176, 847170, '25295F', 'BE6200'),
                (882888, 909166, 468781, 346358, 'F9D790', '2C6056'),
                (172768, 461766, 284711, 215639, '80D0C1', 'C126BE'),
                (465597, 961688, 953680, 879278, '23819F', '34889B'),
                (334894, 715861, 803223, 722370, 'BCB855', 'FF4C74'),
                (379317, 768801, 937434, 382958, '86A6BA', '979B75'),
                (218346, 907134, 763072, 565305, '99F9B1', 'B443BA'),
                (563865, 981593, 711572, 224162, '2D1EC0', '8ACFE6'),
                (409029, 192003, 326449, 406291, '08EC95', '8E3611'),
                (796975, 544072, 248244, 408699, '104F4E', '63C65E');

    WITH ALL_ROWS AS (

        SELECT T1.ID AS ID1, T2.ID AS ID2
        FROM #Table AS T1
            INNER JOIN #Table AS T2
                ON T1.ID = T2.OrigID
                AND T1.Col3 = T2.Col3
                AND T1.ID <> T2.ID
    )
    DELETE T3
    FROM #Table AS T3
        INNER JOIN ALL_ROWS AS AR
            ON T3.ID IN (AR.ID1, AR.ID2)

    SELECT *
    FROM #Table;

    DROP TABLE #Table;

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • HanShi - Wednesday, April 25, 2018 3:09 AM

    The duplicate value is listed in column ID or in column OrigID. So you also have to delete the rows where the duplicate value is listed in the OrigID column.

    delete FROM #Table
    WHERE ID in ( SELECT r1.id FROM #Table r1
      INNER JOIN #Table r2
      ON r1.ID= r2.OrigID
      WHERE r1.Col3 = r2.Col3
      )
    OR ID in ( SELECT r2.id FROM #Table r1    -- changed from  'OR OrigID in ( SELECT r1.id FROM #Table r1'
      INNER JOIN #Table r2
      ON r1.ID= r2.OrigID 
      WHERE r1.Col3 = r2.Col3
      )

    Thank you Han Shi. I just made some modification to your script.

  • sgmunson - Wednesday, April 25, 2018 8:55 AM

    Let's add the displayed data and the code to do it:CREATE TABLE #Table (
        ID int NOT NULL PRIMARY KEY CLUSTERED,
        OrigID int NOT NULL,
        Col1 int NOT NULL,
        Col2 int NOT NULL,
        Col3 char(6) NOT NULL,
        Col4 char(6) NOT NULL
    );
    INSERT INTO #Table (ID, OrigID, Col1, Col2, Col3, Col4)
        VALUES    (490465, 388007, 491367, 207114, '1A1F13', '38D741'),
                (627543, 412091, 747366, 978516, '28EEE1', 'FAB636'),
                (454046, 513930, 825512, 331307, 'B34E0C', '136041'),
                (723560, 766176, 532240, 532964, '1CB7D4', '02756D'),
                (766176, 324855, 711708, 532964, 'DAA45E', '11CD7B'),
                (861009, 816214, 595236, 525376, '094222', '144054'),
                (816214, 859732, 256486, 882616, '094222', '7CB7AE'),
                (621922, 205037, 625409, 205806, 'B451E2', '2EABB6'),
                (205037, 133244, 409998, 205806, '2F398D', '93E5F9'),
                (965119, 990016, 631267, 267954, '1B900A', 'E80542'),
                (792120, 845792, 751546, 170739, '1C0BD8', '2E9065'),
                (888927, 871529, 231089, 945847, 'EDE8DF', '8B5F10'),
                (871529, 417818, 577038, 980451, 'EDE8DF', '5C4A73'),
                (190496, 771008, 303567, 594950, 'B8CF7C', '7E5851'),
                (754771, 268480, 642805, 219194, 'FCF4E7', '293C8A'),
                (941315, 410362, 519672, 561005, '82CCC6', '3033EA'),
                (714356, 441476, 475392, 823554, '287663', '45C9D6'),
                (529539, 747572, 155897, 212880, 'AC0DEB', 'A7118D'),
                (972839, 593053, 960970, 436351, '98BD01', '999F3A'),
                (883615, 725671, 410949, 720495, '5877D2', 'CD6E46'),
                (769460, 397870, 765429, 188897, 'E1E672', '1FABA2'),
                (397870, 975048, 958499, 440291, 'E1E672', '588961'),
                (646407, 880842, 775176, 847170, '25295F', 'BE6200'),
                (882888, 909166, 468781, 346358, 'F9D790', '2C6056'),
                (172768, 461766, 284711, 215639, '80D0C1', 'C126BE'),
                (465597, 961688, 953680, 879278, '23819F', '34889B'),
                (334894, 715861, 803223, 722370, 'BCB855', 'FF4C74'),
                (379317, 768801, 937434, 382958, '86A6BA', '979B75'),
                (218346, 907134, 763072, 565305, '99F9B1', 'B443BA'),
                (563865, 981593, 711572, 224162, '2D1EC0', '8ACFE6'),
                (409029, 192003, 326449, 406291, '08EC95', '8E3611'),
                (796975, 544072, 248244, 408699, '104F4E', '63C65E');

    WITH ALL_ROWS AS (

        SELECT T1.ID AS ID1, T2.ID AS ID2
        FROM #Table AS T1
            INNER JOIN #Table AS T2
                ON T1.ID = T2.OrigID
                AND T1.Col3 = T2.Col3
                AND T1.ID <> T2.ID
    )
    DELETE T3
    FROM #Table AS T3
        INNER JOIN ALL_ROWS AS AR
            ON T3.ID IN (AR.ID1, AR.ID2)

    SELECT *
    FROM #Table;

    DROP TABLE #Table;

    Thank you sir for the glamorous script

  • Another way, increasing the join conditions:

    DELETE FROM #Table
    WHERE ID in (
        SELECT r1.id
        FROM #Table r1
        INNER JOIN #Table r2
        ON r1.ID= r2.OrigID
        OR r2.ID = r1.OrigID
        WHERE r1.Col3 = r2.Col3
      )

  • Angus.Young - Friday, April 27, 2018 3:17 AM

    Another way, increasing the join conditions:

    DELETE FROM #Table
    WHERE ID in (
        SELECT r1.id
        FROM #Table r1
        INNER JOIN #Table r2
        ON r1.ID= r2.OrigID
        OR r2.ID = r1.OrigID
        WHERE r1.Col3 = r2.Col3
      )

    While that will work, it fails to prevent the table from self-joining if for any reason a row shows up where the ID values are also identical, which may be problematic, based on the original requirements.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Here's how I would do this. I always write it as a select first to verify I get the rows I want to delete. Then just comment out the select line and order by, uncomment the delete..


    SELECT T1.ID, T1.OrigID, T1.Col3
    --DELETE T1
    FROM #Table AS T1
    JOIN #Table AS T2
    ON T1.Col3 = T2.Col3 AND (T1.ID = T2.OrigID OR T2.ID = T1.OrigID)
    Order by T1.Col3

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply