ROW_NUMBER(): An Efficient Alternative to Subqueries

  • Comments posted to this topic are about the item ROW_NUMBER(): An Efficient Alternative to Subqueries

  • Using non-deterministic functions like ROW_NUMBER and NEWID need to be done carefully - my blog article: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/06/12/bug-in-inline-expansion-of-non-deterministic-functions-in-derived-tables-and-cte-s-causes-incorrect-results.aspx explains the serious bug that exists in SQL Server 2005 with using them.

    Tony.

  • And if you're using SQLServer2000, then you'll need to use a "TOP n" correlated subquery, since ROW_NUMBER() is not available.. eg:

    select RecentVersion.*

    from(select distinct ProductId from dbo.ProductVersion) ProductList

    inner join dbo.ProductVersion RecentVersion

    on RecentVersion.ProductId = ProductList.ProductId

    and cast(RecentVersion.Version as varchar(10)) + '.' + cast(RecentVersion.MinorVersion as varchar(10)) + '.' + cast(RecentVersion.ReleaseVersion as varchar(10)) = (

    select top 1 cast(x.Version as varchar(10)) + '.' + cast(x.MinorVersion as varchar(10)) + '.' + cast(x.ReleaseVersion as varchar(10))

    from dbo.ProductVersion x

    where x.ProductId = RecentVersion.ProductId

    order by x.Version desc, x.MinorVersion desc, x.ReleaseVersion desc

    )

  • Other uses of Row_number() function

    http://sqlblogcasts.com/blogs/madhivanan/archive/2007/08/27/multipurpose-row-number-function.aspx


    Madhivanan

    Failing to plan is Planning to fail

  • Great Article - thank you

  • Nice example!

    Alternativly you could use a cursor to make it more readable and improve performance over a nested query. That said i dont like cursors and do prefer common table expressions!

    SET ROWCOUNT 0

    go

    IF OBJECT_ID('tempdb..#ProductVersion') IS NULL

    BEGIN

    CREATE TABLE #ProductVersion

    (

    ProductID int NOT NULL,

    Version int NOT NULL,

    MinorVersion int NOT NULL,

    ReleaseVersion int NOT NULL,

    StandardCost numeric(30, 4) NOT NULL

    )

    END

    DECLARE @ProductId INT

    DECLARE A CURSOR FOR

    SELECTProductId

    FROMProductVersion

    GROUP BY ProductId

    ORDER BY ProductId

    OPEN A

    FETCH NEXT FROM A

    INTO @ProductId

    WHILE@@FETCH_STATUS = 0

    BEGIN

    INSERT INTO #ProductVersion

    SELECTTOP 1 ProductId,

    Version,

    MinorVersion,

    ReleaseVersion,

    StandardCost

    FROMProductVersion

    WHEREProductId = @ProductId

    ORDER BY ProductId ASC,

    Version DESC,

    MinorVersion DESC,

    ReleaseVersion DESC

    FETCH NEXT FROM A

    INTO @ProductId

    END

    CLOSE A

    DEALLOCATE A

    SELECT * FROM #ProductVersion

    DROP TABLE #ProductVersion

  • Thanks for the great article - I thought that you did a good job of explaining how to use the row_number function and comparing its performance to subqueries.

    Tim

  • How could I leverage this to get, say, the "most current" version of a contract (i.e., either the contract currently in effect or the latest contract even if it was expired) *and* the latest version of the contract (i.e., it doesn't take effect until the future? I have a query using subqueries for this that looks something like this (this is simplified a bit). The row_number() query is much more concise but only returns that max contract date for each contract, which leaves out the current dates for those that have them:

    SELECT DISTINCT group_number

    ,subgroup_number

    ,ISNULL(

    (SELECT MAX(contract_thru_date)

    FROM contracts curr1

    WHERE curr1.group_number=curr.group_number

    AND GETDATE() BETWEEN contract_from_date AND contract_thru_date

    ,(SELECT MAX(contract_thru_date)

    FROM contracts prev

    WHERE prev.group_number=curr.group_number

    )

    ) contract_thru_date FROM contracts curr

    EDIT: NM, I figured it out. All I had to was add an or statement to the where in the CTE. Great article, came at a good time! :w00t:

  • Well done article.

    Question: I hate to open a can of worms on a different topic, but I'm so curious as to why you specified the nolock hint for the row_number version of the query. Do you know of specific problems that require use of nolock for this type of implementation?

    Thanks.

  • tony rogerson (5/12/2009)


    Using non-deterministic functions like ROW_NUMBER and NEWID need to be done carefully - my blog article: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/06/12/bug-in-inline-expansion-of-non-deterministic-functions-in-derived-tables-and-cte-s-causes-incorrect-results.aspx explains the serious bug that exists in SQL Server 2005 with using them.

    Afraid that I do not understand this. AFAIK, CTEs are table expressions like non-indexed views. If you write them non-deterministically (which is what you do in your blog article), then they return non-deterministic results. And if you reference a non-deterministic table expression multiple times, you are not guaranteed to get the same results. That's straight ANSI SQL to the best of my knowledge and I see no bug in Transact-SQL demonstrated here.

    And as far as ROW_NUMBER() it only returns non-deterministic results if the OVER clause parameters are non-deterministic. If you use a deterministic ORDER BY (which also requires that the column set is uniquely orderable), then you will get deterministic results.

    All this really demonstrates is that if you input non-deterministic expressions without instantiating them, then you will get non-deterministic results back. Garbage In, Garbage Out.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • There are not any issues to my knowledge that require the NOLOCK hint for ROW_NUMBER(). I just used it for performance reasons.

  • JJ B (5/12/2009)


    Well done article.

    Question: I hate to open a can of worms on a different topic, but I'm so curious as to why you specified the nolock hint for the row_number version of the query. Do you know of specific problems that require use of nolock for this type of implementation?

    Thanks.

    There aren't any problems with the ROW_NUMBER() version of the query that require the NOLOCK hint. I just used it for the slight performance improvement.

  • Francis Rodrigues (5/12/2009)


    There are not any issues to my knowledge that require the NOLOCK hint for ROW_NUMBER(). I just used it for performance reasons.

    Thanks for the clarification. I guess that's what I was trying to get at. Do you mean that ROW_NUMBER() has such bad performance that you need to resort to using NOLOCK? Or is using NOLOCK just a common practice for your queries to try to preempt performance problems?

    Thanks.

  • In what way was simple set logic unable to perform this operation? It looks like this could be solved with left outer joins and is null tests. I don't have the test db mentioned, so my > may be wrong, it could need to be a exclude1.Version

    left join Production.ProductVersion exclude2

    on p.ProductID = exclude2.ProductID

    and p.Version = exclude2.Version

    and p.MinorVersion > exclude2.MinorVersion

    left join Production.ProductVersion exclude3

    on p.ProductID = exclude3.ProductID

    and p.Version = exclude3.Version

    and p.MinorVersion = exclude3.MinorVersion

    and p.ReleaseVersion > exclude3.ReleaseVersion

    left join Production.ProductVersion exclude4

    on p.ProductID = exclude4.ProductID

    and p.Version = exclude4.Version

    and p.MinorVersion = exclude4.MinorVersion

    and p.ReleaseVersion = exclude4.ReleaseVersion

    and p.StandardCost > exclude4.StandardCost

    where exclude1.Version is null

    and exclude2.MinorVersion is null

    and exclude3.ReleaseVersion is null

    and exclude4.StandardCost is null

    [/code]

  • wbrianwhite (5/12/2009)


    In what way was simple set logic unable to perform this operation? It looks like this could be solved with left outer joins and is null tests. I don't have the test db mentioned, so my > may be wrong, it could need to be a <, I can never remember without testing it. But something like this should do the trick

    I think it's just a matter of ...elegance? To me using Row_Number() statements (and CTEs in general) are easier to write and easier to follow in specific situations (provided the one reading knows what they do). And I can tell you just from testing this morning that the Row_Number() query has a performance improvement over the left joins and isnull tests I had been using previously.

Viewing 15 posts - 1 through 15 (of 60 total)

You must be logged in to reply to this topic. Login to reply