RE: Using Ranking Functions to Deduplicate Data

Right there with Babe

Points: 794

July 27, 2010 at 9:47 am

#1198804

Don't get me wrong. This is a good example of "how to use RANK() and ROW_NUMBER() functions". My problem is why would I do this instead of using DISTINCT?

There is already an article on "Ranking Functions" here...

[/url]

Also, I know that DISTINCT is not the most efficient way to "de-duplicate" (nice phrase) over large datasets. Now if we are saying using Ranking Function approach is faster on larger data sets or largely skewed data sets, i.e. each row is already distinct, then I have learnt something. Else all I have learnt is (yet) one more way to write (potentially confusing) code. Using DISTINCT on the example cited in the article would be obvious. That's important where I come from.

So, if someone can please educate me as to when and why I would use the Ranking function method to get distinct records I would very much appreciate it.