RE: Rank duplicates, but only rows involved in duplicates

SSChampion

Points: 10440

October 24, 2014 at 8:36 pm

ChrisM@home (10/24/2014)
sqldriver (10/24/2014)
ChrisM@Work (10/24/2014)
-- Changes MERGE JOIN to NESTED LOOPS, loses sort required for ROW_NUMBER()
Would you mind explaining some more about this part?
Thanks
Sure. Have a good look at the actual execution plans first, I'll get back to you tomorrow.

Oh, neat. I like that.

What do you think about this? Borrowing a little Itzik Magik.

WITH dupe1

AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RN ,

a.BusinessName, a.[Address], a.Phone

FROM TestBusiness a

),

dupe2

AS ( SELECT d.BusinessName, d.[Address], d.Phone

FROM dupe1 d

GROUP BY d.BusinessName, d.[Address], d.Phone

HAVING COUNT(*) > 1

)

SELECT *

FROM dupe1 d1

INNER JOIN dupe2 d2

ON d1.BusinessName = d2.BusinessName

AND d1.[Address] = d2.[Address]

AND d1.Phone = d2.Phone

WHERE d1.RN > 1;

The QE hates forcing a seek here, and forcing a hash join (was thinking if the actual tables are much larger, though it may use one anyway if they are) seems to result in 1 more ms CPU time, and one extra logical read.

Compared with your examples, the IO and time output seem roughly the same, though I'd be interested to hear what you think about the far different execution plan.

My tests include your suggested index.

Anyway, looking forward to your response.

Thanks