Query Help - For Duplicates

  • Hi

    I need to find duplicate AID will be unique base on Name so I need to find duplicates on KeyValue column Group by AID.

    so it means in this dataset AID 1 is duplicate for AID 4 as both matching so my expected result is to get AID 1 and 4.

    create table #temp(AID int, Name varchar(500), KeyValue varchar(500))

    insert #temp values (1, 'LN', 'CP_VA_LA')
    insert #temp values (1, 'LT', 'C1')
    insert #temp values (2, 'LN', 'CP_SA_LA')
    insert #temp values (2, 'LT', 'C1')
    insert #temp values (3, 'LN', 'CP_SA_LA')
    insert #temp values (3, 'LT', 'C2')
    insert #temp values (4, 'LN', 'CP_VA_LA')
    insert #temp values (4, 'LT', 'C1')
    insert #temp values (5, 'LN', 'CP_VA_DA')
    insert #temp values (5, 'LT', 'C4')

    Thanks

  • This can be improved but here's one way: 
    WITH piv AS
    (
    SELECT
     AID,
     name1 = MAX(CASE [name] WHEN 'LN' THEN [name] END),
     name2 = MAX(CASE [name] WHEN 'LT' THEN [name] END),
     KeyValue1 = MAX(CASE [name] WHEN 'LT' THEN KeyValue END),
     KeyValue2 = MAX(CASE [name] WHEN 'LN' THEN KeyValue END)
    FROM #temp
    GROUP BY AID
    ),
    findDupes AS
    (
    SELECT piv.name1 ,piv.name2 , piv.KeyValue1 ,piv.KeyValue2
    FROM piv
    GROUP BY piv.name1, piv.name2, piv.KeyValue1, piv.KeyValue2
    HAVING COUNT(*) > 1
    )
    SELECT piv.AID
    FROM findDupes d
    JOIN piv ON
      d.name1  = piv.name1
    AND d.name2  = piv.name2
    AND d.KeyValue1 = piv.KeyValue1
    AND d.KeyValue2 = piv.KeyValue2;

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Issue with this approach is I have big data set and some time I have two type of Name like (LN, LT) and some time 3 (LN, LT, LC) and some time 1 type of Name.only (LN)  so in these cases its not work for all scenarios.

    Thanks

  • inayatkhan - Thursday, April 27, 2017 8:07 AM

    Issue with this approach is I have big data set and some time I have two type of Name like (LN, LT) and some time 3 (LN, LT, LC) and some time 1 type of Name.only (LN)  so in these cases its not work for all scenarios.

    Thanks

    Ok, how about this

    IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp;
    GO
    create table #temp(AID int, Name varchar(500), KeyValue varchar(500));
    GO

    INSERT #temp values
    (1, 'LN', 'CP_VA_LA')
    ,(1, 'LT', 'C1')
    ,(2, 'LN', 'CP_SA_LA')
    ,(2, 'LT', 'C1')
    ,(3, 'LN', 'CP_SA_LA')
    ,(3, 'LT', 'C2')
    ,(4, 'LN', 'CP_VA_LA')
    ,(4, 'LT', 'C1')
    ,(5, 'LN', 'CP_VA_DA')
    ,(5, 'LT', 'C4')
    ,(5, 'LC', 'XXXA')
    ,(6, 'LN', 'CP_VA_DA')
    ,(6, 'LT', 'C4')
    ,(11, 'LN', 'CP_ZA_LA')
    ,(11, 'LT', 'C1')
    ,(14, 'LN', 'CP_ZA_LA')
    ,(14, 'LT', 'C1')
    ,(14, 'LC', 'XXXA')
    ,(21, 'LN', 'CP_ZA_LA')
    ,(21, 'LT', 'C5')
    ,(21, 'LC', 'XXXA')
    ,(24, 'LN', 'CP_ZA_LA')
    ,(24, 'LT', 'C5')
    ,(24, 'LC', 'XXXA')
    ;
    GO

    WITH piv AS
    (
    SELECT
     AID,
     name1 = MAX(CASE [name] WHEN 'LN' THEN [name] END),
     name2 = MAX(CASE [name] WHEN 'LT' THEN [name] END),
     name3 = MAX(CASE [name] WHEN 'LC' THEN [name] END),
     KeyValue1 = MAX(CASE [name] WHEN 'LT' THEN KeyValue END),
     KeyValue2 = MAX(CASE [name] WHEN 'LN' THEN KeyValue END),
     KeyValue3 = MAX(CASE [name] WHEN 'LC' THEN KeyValue END)
    FROM #temp
    GROUP BY AID
    ),
    findDupes AS
    (
    SELECT piv.name1, piv.name2, piv.name3, piv.KeyValue1, piv.KeyValue2, piv.KeyValue3
    FROM piv
    GROUP BY piv.name1, piv.name2, piv.name3, piv.KeyValue1, piv.KeyValue2, piv.KeyValue3
    HAVING COUNT(*) > 1
    )
    SELECT piv.*
    FROM findDupes d
    JOIN piv ON d.name1 = piv.name1
    AND ISNULL(d.name2,'')  = ISNULL(piv.name2,'')
    AND ISNULL(d.name3,'')  = ISNULL(piv.name3,'')
    AND d.KeyValue1 = piv.KeyValue1
    AND ISNULL(d.KeyValue2,'') = ISNULL(piv.KeyValue2,'')
    AND ISNULL(d.KeyValue3,'') = ISNULL(piv.KeyValue3,'');

    Which returns:

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • thanks!

  • No problem!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply