Windowing functions vs Non-deterministic ROW-NUMBER()

  • Since windowing functions make a big deal of the OVER ... ORDER BY clause, how can that be reconciled with the non-deterministic ROW_NUMBER() function, i.e. something like:

    SELECT

    ClientNo,

    SalesNo,

    ROW_NUMBER() OVER (ORDER BY ClientNo, SalesNo) AS RowNum

    FROM Sales

    ORDER BY ClientNo, SalesNo;

    where a client might appear more than once because of multiple sales for a client.

    Is ROW_NUMBER() suddenly going to produce a reproducible ordering for the results set ?

  • ROW_NUMBER() is deterministic as long as the ORDER BY clause is deterministic. This is why it's a good idea to add enough keys to the ORDER BY clause to ensure that the order is deterministic.

    For aggregates with an OVER clause that uses ROWS, the result is again deterministic if and only if the ORDER BY clause is deterministic.

    For aggregates with an OVER clause that uses RANGE (either explicitly or implicitly), the result is deterministic even if the ORDER BY clause is not deterministic.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • This is a very interesting subject, one of spent a lot of time looking into and something worth really understanding when dealing with "Ranking" functions. First (in case you run across this) according to BOL Window "Ranking" functions are nondeterministic. This is mostly false. Itzik Ben-Gan, in his book, Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions (page 57) explains it best (emph mine):

    The [ANSI SQL] standard supports four window functions that deal with ranking calculations. Those are ROW_NUMBER, NTILE, RANK, and DENSE_RANK. The standard covers the first two as one category and the last two as another, probably due to determinism-related differences.

    He later describes ROW_NUMBER and NTILE as “deterministic when the window ordering is unique” (pages 60 & 64), RANK and DENSE_RANK as always deterministic(page 67).

    In other words, when the columns referenced in your the ORDER BY portion of your OVER clause are unique then then your all four "ranking" functions will produce a deterministic result set; when they aren't then only RANK and DENSE_RANK produce a deterministic result set, ROW_NUMBER and NTILE do not.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • drew.allen (4/18/2016)

    [snip]

    ROW_NUMBER() is deterministic as long as the ORDER BY clause is deterministic.

    Alan.B (4/18/2016)

    [snip]

    In other words, when the columns referenced in your the ORDER BY portion of your OVER clause are unique then then your all four "ranking" functions will produce a deterministic result set; when they aren't then only RANK and DENSE_RANK produce a deterministic result set, ROW_NUMBER and NTILE do not.

    Now, this makes sense. Books on line (msdn Microsoft) simply states: "The following built-in functions from other categories are always nondeterministic." and that includes ROW_NUMBER.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply