ROW_NUMBER(): An Efficient Alternative to Subqueries

Question

ROW_NUMBER(): An Efficient Alternative to Subqueries

Francis Rodrigues-459442

SSC Eights!

Points: 860
More actions
May 11, 2009 at 9:07 pm

#204596

Comments posted to this topic are about the item ROW_NUMBER(): An Efficient Alternative to Subqueries

Viewing 15 posts - 1 through 15 (of 60 total)

You must be logged in to reply to this topic. Login to reply

tony rogerson SSC Veteran Points: 242 More actions · Answer 1

Using non-deterministic functions like ROW_NUMBER and NEWID need to be done carefully - my blog article: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/06/12/bug-in-inline-expansion-of-non-deterministic-functions-in-derived-tables-and-cte-s-causes-incorrect-results.aspx explains the serious bug that exists in SQL Server 2005 with using them.

Tony.

paul_ramster Ten Centuries Points: 1382 More actions · Answer 2

And if you're using SQLServer2000, then you'll need to use a "TOP n" correlated subquery, since ROW_NUMBER() is not available.. eg:

select RecentVersion.*
from(select distinct ProductId from dbo.ProductVersion) ProductList
inner join dbo.ProductVersion RecentVersion
on RecentVersion.ProductId = ProductList.ProductId
and cast(RecentVersion.Version as varchar(10)) + '.' + cast(RecentVersion.MinorVersion as varchar(10)) + '.' + cast(RecentVersion.ReleaseVersion as varchar(10)) = (
select top 1 cast(x.Version as varchar(10)) + '.' + cast(x.MinorVersion as varchar(10)) + '.' + cast(x.ReleaseVersion as varchar(10))
from dbo.ProductVersion x
where x.ProductId = RecentVersion.ProductId
order by x.Version desc, x.MinorVersion desc, x.ReleaseVersion desc
)

Madhivanan-208264 SSCertifiable Points: 7516 More actions · Answer 3

Other uses of Row_number() function

http://sqlblogcasts.com/blogs/madhivanan/archive/2007/08/27/multipurpose-row-number-function.aspx

Madhivanan

Failing to plan is Planning to fail

Jason Rogers-460518 SSC Veteran Points: 226 More actions · Answer 4

Jason Rogers-460518

SSC Veteran

Points: 226

May 12, 2009 at 6:22 am

#992783

Great Article - thank you

jochemvdende SSC Enthusiast Points: 129 More actions · Answer 5

Nice example!

Alternativly you could use a cursor to make it more readable and improve performance over a nested query. That said i dont like cursors and do prefer common table expressions!

SET ROWCOUNT 0

go

IF OBJECT_ID('tempdb..#ProductVersion') IS NULL

BEGIN

CREATE TABLE #ProductVersion

(

ProductID int NOT NULL,

Version int NOT NULL,

MinorVersion int NOT NULL,

ReleaseVersion int NOT NULL,

StandardCost numeric(30, 4) NOT NULL

)

END

DECLARE @ProductId INT

DECLARE A CURSOR FOR

SELECTProductId

FROMProductVersion

GROUP BY ProductId

ORDER BY ProductId

OPEN A

FETCH NEXT FROM A

INTO @ProductId

WHILE@@FETCH_STATUS = 0

BEGIN

INSERT INTO #ProductVersion

SELECTTOP 1 ProductId,

Version,

MinorVersion,

ReleaseVersion,

StandardCost

FROMProductVersion

WHEREProductId = @ProductId

ORDER BY ProductId ASC,

Version DESC,

MinorVersion DESC,

ReleaseVersion DESC

FETCH NEXT FROM A

INTO @ProductId

END

CLOSE A

DEALLOCATE A

SELECT * FROM #ProductVersion

DROP TABLE #ProductVersion

greenantim SSC-Addicted Points: 492 More actions · Answer 6

Thanks for the great article - I thought that you did a good job of explaining how to use the row_number function and comparing its performance to subqueries.

Tim

William Hutton-462896 SSC Enthusiast Points: 173 More actions · Answer 7

How could I leverage this to get, say, the "most current" version of a contract (i.e., either the contract currently in effect or the latest contract even if it was expired) *and* the latest version of the contract (i.e., it doesn't take effect until the future? I have a query using subqueries for this that looks something like this (this is simplified a bit). The row_number() query is much more concise but only returns that max contract date for each contract, which leaves out the current dates for those that have them:

SELECT DISTINCT group_number

,subgroup_number

,ISNULL(

(SELECT MAX(contract_thru_date)

FROM contracts curr1

WHERE curr1.group_number=curr.group_number

AND GETDATE() BETWEEN contract_from_date AND contract_thru_date

,(SELECT MAX(contract_thru_date)

FROM contracts prev

WHERE prev.group_number=curr.group_number

)

) contract_thru_date FROM contracts curr

EDIT: NM, I figured it out. All I had to was add an or statement to the where in the CTE. Great article, came at a good time! :w00t:

JJ B SSCertifiable Points: 5100 More actions · Answer 8

Well done article.

Question: I hate to open a can of worms on a different topic, but I'm so curious as to why you specified the nolock hint for the row_number version of the query. Do you know of specific problems that require use of nolock for this type of implementation?

Thanks.

RBarryYoung SSC Guru Points: 143329 More actions · Answer 9

tony rogerson (5/12/2009)
Using non-deterministic functions like ROW_NUMBER and NEWID need to be done carefully - my blog article: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/06/12/bug-in-inline-expansion-of-non-deterministic-functions-in-derived-tables-and-cte-s-causes-incorrect-results.aspx explains the serious bug that exists in SQL Server 2005 with using them.

Afraid that I do not understand this. AFAIK, CTEs are table expressions like non-indexed views. If you write them non-deterministically (which is what you do in your blog article), then they return non-deterministic results. And if you reference a non-deterministic table expression multiple times, you are not guaranteed to get the same results. That's straight ANSI SQL to the best of my knowledge and I see no bug in Transact-SQL demonstrated here.

And as far as ROW_NUMBER() it only returns non-deterministic results if the OVER clause parameters are non-deterministic. If you use a deterministic ORDER BY (which also requires that the column set is uniquely orderable), then you will get deterministic results.

All this really demonstrates is that if you input non-deterministic expressions without instantiating them, then you will get non-deterministic results back. Garbage In, Garbage Out.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Francis Rodrigues-459442 SSC Eights! Points: 860 More actions · Answer 10

There are not any issues to my knowledge that require the NOLOCK hint for ROW_NUMBER(). I just used it for performance reasons.

Francis Rodrigues-459442 SSC Eights! Points: 860 More actions · Answer 11

JJ B (5/12/2009)
Well done article.
Question: I hate to open a can of worms on a different topic, but I'm so curious as to why you specified the nolock hint for the row_number version of the query. Do you know of specific problems that require use of nolock for this type of implementation?
Thanks.

There aren't any problems with the ROW_NUMBER() version of the query that require the NOLOCK hint. I just used it for the slight performance improvement.

JJ B SSCertifiable Points: 5100 More actions · Answer 12

Francis Rodrigues (5/12/2009)
There are not any issues to my knowledge that require the NOLOCK hint for ROW_NUMBER(). I just used it for performance reasons.

Thanks for the clarification. I guess that's what I was trying to get at. Do you mean that ROW_NUMBER() has such bad performance that you need to resort to using NOLOCK? Or is using NOLOCK just a common practice for your queries to try to preempt performance problems?

Thanks.

wbrianwhite SSC Eights! Points: 954 More actions · Answer 13

In what way was simple set logic unable to perform this operation? It looks like this could be solved with left outer joins and is null tests. I don't have the test db mentioned, so my > may be wrong, it could need to be a exclude1.Version

left join Production.ProductVersion exclude2

on p.ProductID = exclude2.ProductID

and p.Version = exclude2.Version

and p.MinorVersion > exclude2.MinorVersion

left join Production.ProductVersion exclude3

on p.ProductID = exclude3.ProductID

and p.Version = exclude3.Version

and p.MinorVersion = exclude3.MinorVersion

and p.ReleaseVersion > exclude3.ReleaseVersion

left join Production.ProductVersion exclude4

on p.ProductID = exclude4.ProductID

and p.Version = exclude4.Version

and p.MinorVersion = exclude4.MinorVersion

and p.ReleaseVersion = exclude4.ReleaseVersion

and p.StandardCost > exclude4.StandardCost

where exclude1.Version is null

and exclude2.MinorVersion is null

and exclude3.ReleaseVersion is null

and exclude4.StandardCost is null

[/code]

William Hutton-462896 SSC Enthusiast Points: 173 More actions · Answer 14

wbrianwhite (5/12/2009)
In what way was simple set logic unable to perform this operation? It looks like this could be solved with left outer joins and is null tests. I don't have the test db mentioned, so my > may be wrong, it could need to be a <, I can never remember without testing it. But something like this should do the trick

I think it's just a matter of ...elegance? To me using Row_Number() statements (and CTEs in general) are easier to write and easier to follow in specific situations (provided the one reading knows what they do). And I can tell you just from testing this morning that the Row_Number() query has a performance improvement over the left joins and isnull tests I had been using previously.