Home Forums SQL Server 2008 T-SQL (SS2K8) Rank duplicates, but only rows involved in duplicates RE: Rank duplicates, but only rows involved in duplicates

  • ChrisM@home (10/24/2014)


    sqldriver (10/24/2014)


    ChrisM@Work (10/24/2014)


    -- Changes MERGE JOIN to NESTED LOOPS, loses sort required for ROW_NUMBER()

    Would you mind explaining some more about this part?

    Thanks

    Sure. Have a good look at the actual execution plans first, I'll get back to you tomorrow.

    Oh, neat. I like that.

    What do you think about this? Borrowing a little Itzik Magik.

    WITH dupe1

    AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RN ,

    a.BusinessName, a.[Address], a.Phone

    FROM TestBusiness a

    ),

    dupe2

    AS ( SELECT d.BusinessName, d.[Address], d.Phone

    FROM dupe1 d

    GROUP BY d.BusinessName, d.[Address], d.Phone

    HAVING COUNT(*) > 1

    )

    SELECT *

    FROM dupe1 d1

    INNER JOIN dupe2 d2

    ON d1.BusinessName = d2.BusinessName

    AND d1.[Address] = d2.[Address]

    AND d1.Phone = d2.Phone

    WHERE d1.RN > 1;

    The QE hates forcing a seek here, and forcing a hash join (was thinking if the actual tables are much larger, though it may use one anyway if they are) seems to result in 1 more ms CPU time, and one extra logical read.

    Compared with your examples, the IO and time output seem roughly the same, though I'd be interested to hear what you think about the far different execution plan.

    My tests include your suggested index.

    Anyway, looking forward to your response.

    Thanks