Fun with OVER

  • Raghavendra Mudugal (6/26/2014)


    Also, can any explain... in query A, when Jean has 1 and Anita and Mary has 3 where did the 2 go?

    In question A the default framing is RANGE UNBOUNDED PRECEDING AND CURRENT ROW. Therefore it calculates the count over all rows that have a value <= the current row. Both Anita and Mary are counting the number of rows in their window that have an ARRIVALTIME <= to their own value. In both cases that is 3 rows. There is no ARRIVALTIME that has 2 rows <= to that value. When using RANGE framing it calculates based on the values specified in the ORDER BY whereas ROWS calculates by position of the row in the window and the ORDER BY arranges the rows into position. That's why query C has a value of 2 for a row because for that position in the frame there are only two rows at or before that position.

    There are no facts, only interpretations.
    Friedrich Nietzsche

  • Louis Hillebrand (6/26/2014)


    Interesting, but when I change the order of the rows in the insert (Mary before Anita) the order of the rows in query C changes.

    The result of Query C depends on the initial order of the rows returned, which is not specified with an ORDER BY and can change.

    +1

    I knew B Only wasn't the expected answer, but it is the only one which will guarantee the result expected.

    Good question though and a nice explanation, Thanks

  • Fair point that C wouldn't *guarantee* those exact results if anything about this example changed (e.g. change the order of data entry).

    I was focusing on the framing method and how especially the default framing method changes the results with equal values so I was trying to keep everything very simple. I figured the teachable moment was more important than absolute purity. My apologies to those that I tripped up, and congratulations on your deep understanding of window functions and framing. 😎

    There are no facts, only interpretations.
    Friedrich Nietzsche

  • barry.mcconnell (6/26/2014)


    Raghavendra Mudugal (6/26/2014)


    Also, can any explain... in query A, when Jean has 1 and Anita and Mary has 3 where did the 2 go?

    In question A the default framing is RANGE UNBOUNDED PRECEDING AND CURRENT ROW. Therefore it calculates the count over all rows that have a value <= the current row. Both Anita and Mary are counting the number of rows in their window that have an ARRIVALTIME <= to their own value. In both cases that is 3 rows. There is no ARRIVALTIME that has 2 rows <= to that value. When using RANGE framing it calculates based on the values specified in the ORDER BY whereas ROWS calculates by position of the row in the window and the ORDER BY arranges the rows into position. That's why query C has a value of 2 for a row because for that position in the frame there are only two rows at or before that position.

    thank you, barry. 🙂

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • Thanks Barry.

    Here is another fine article on the OVER clause

    http://www.sqlservercentral.com/articles/T-SQL/75466/

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Good question!

    I am very happy to see someone pointing out this in my opinion dangerous and misguided default.

    I just would like to add that there is a very big performance difference between using RANGE or ROWS. Which makes the default of RANGE even more stupid.

    In my presentation on the OVER clause, I recommend people to always expliclty specify the range/rows claause and never rely on the default.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Interesting question. A couple of people have pointed out the hole in the question.

    I was careless about the ordering in query C, and thus got it "right . If I had been careful I would have wanted to pick "it depends" but wouldn't have been able to as the actually correct answer isn't an option. Maybe the question should have been phrased as "which queries could possibly produce the results shown"?

    Tom

  • Ed Wagner (6/27/2014)


    Good question. Thank you for it.

    +1

    Thanks

  • nice question barry..

  • Nice question. I use the Over() function a lot so I always enjoy these.

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply