Using a Window Aggregate in an Aggregate Query

  • Kathi Kellenberger

    SSChampion

    Points: 11811

    Comments posted to this topic are about the item Using a Window Aggregate in an Aggregate Query

    Aunt Kathi Data Platform MVP
    Author of Expert T-SQL Window Functions
    Simple-Talk Editor

  • palotaiarpad

    SSCarpal Tunnel

    Points: 4893

    Hmm. According to BOL, if OVER used with a SUM, then an order by clause is required...

  • Eirikur Eiriksson

    SSC Guru

    Points: 182424

    palotaiarpad (4/26/2015)


    Hmm. According to BOL, if OVER used with a SUM, then an order by clause is required...

    That is incorrect, it is the partition clause and in this case the empty over() is a shorthand for it.

    😎

  • This was removed by the editor as SPAM

  • Eirikur Eiriksson

    SSC Guru

    Points: 182424

    Stewart "Arturius" Campbell (4/26/2015)


    Never thought of doing it this way

    This is a very efficient method when aggregating with different grouping / level of details, comes close to the divide and conquer method in performance.

    😎

  • palotaiarpad

    SSCarpal Tunnel

    Points: 4893

    If we assume, that we have the latest SQL Server Version (2014), then an over by is required according to BOL (I can't try it yet)

    OVER ( [ partition_by_clause ] order_by_clause)

    order_by_clause determines the logical order in which the operation is performed. order_by_clause is required.

    Or BOL is incorrect?

  • Koen Verbeeck

    SSC Guru

    Points: 258965

    palotaiarpad (4/26/2015)


    If we assume, that we have the latest SQL Server Version (2014), then an over by is required according to BOL (I can't try it yet)

    OVER ( [ partition_by_clause ] order_by_clause)

    order_by_clause determines the logical order in which the operation is performed. order_by_clause is required.

    Or BOL is incorrect?

    Link?

    ORDER BY is not required, unless you use a frame extent.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck

    SSC Guru

    Points: 258965

    Very nice question btw.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • palotaiarpad

    SSCarpal Tunnel

    Points: 4893

    Link is already in my first post, but please: https://msdn.microsoft.com/en-us/library/ms187810.aspx

  • Koen Verbeeck

    SSC Guru

    Points: 258965

    palotaiarpad (4/27/2015)


    Link is already in my first post, but please: https://msdn.microsoft.com/en-us/library/ms187810.aspx

    BOL is incorrect. At the bottom, there are already 2 remarks stating the same.

    If you take a look at the OVER clause page where BOL links to, you can see it is explicitly stated that ORDER BY is not required.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Eirikur Eiriksson

    SSC Guru

    Points: 182424

    Koen Verbeeck (4/27/2015)


    Very nice question btw.

    Looks like BOL has an error:

    😎

    OVER ( [ partition_by_clause ] order_by_clause)

    partition_by_clause divides the result set produced by the FROM clause into partitions to which the function is applied. If not specified, the function treats all rows of the query result set as a single group. order_by_clause determines the logical order in which the operation is performed. order_by_clause is required. For more information, see OVER Clause (Transact-SQL).

    Another thing to keep in mind is the somewhat strange OVER defaults, recommend always filling those in rather than relying on the supplied defaults. Koen has a good blog post on this[/url]. Here is a quick sample:

    SELECT TOP(10)

    so.object_id

    ,SUM(so.object_id) OVER() AS EMPTY_OVER

    ,SUM(so.object_id) OVER

    (

    PARTITION BY so.object_id

    ) AS PART_BY_SELF

    ,SUM(so.object_id) OVER

    (

    ORDER BY so.object_id

    ) AS ORD_BY_SELF

    FROM sys.objects so

    WHERE so.object_id > 0

    AND so.object_id < 100;

    Results with TOP(10)

    object_id EMPTY_OVER PART_BY_SELF ORD_BY_SELF

    ----------- ----------- ------------ -----------

    3 3669 3 3

    5 3669 5 8

    6 3669 6 14

    7 3669 7 21

    8 3669 8 29

    9 3669 9 38

    17 3669 17 55

    18 3669 18 73

    19 3669 19 92

    20 3669 20 112

    Now the same code without the TOP(10)

    SELECT

    so.object_id

    ,SUM(so.object_id) OVER() AS EMPTY_OVER

    ,SUM(so.object_id) OVER

    (

    PARTITION BY so.object_id

    ) AS PART_BY_SELF

    ,SUM(so.object_id) OVER

    (

    ORDER BY so.object_id

    ) AS ORD_BY_SELF

    FROM sys.objects so

    WHERE so.object_id > 0

    AND so.object_id < 100;

    Results without TOP(10)

    object_id EMPTY_OVER PART_BY_SELF ORD_BY_SELF

    ----------- ----------- ------------ -----------

    3 3669 3 3

    5 3669 5 8

    6 3669 6 14

    7 3669 7 21

    8 3669 8 29

    9 3669 9 38

    17 3669 17 55

    18 3669 18 73

    19 3669 19 92

    20 3669 20 112

    21 3669 21 133

    22 3669 22 155

    23 3669 23 178

    24 3669 24 202

    25 3669 25 227

    27 3669 27 254

    28 3669 28 282

    29 3669 29 311

    34 3669 34 345

    41 3669 41 386

    42 3669 42 428

    43 3669 43 471

    44 3669 44 515

    45 3669 45 560

    46 3669 46 606

    47 3669 47 653

    48 3669 48 701

    49 3669 49 750

    50 3669 50 800

    51 3669 51 851

    54 3669 54 905

    55 3669 55 960

    56 3669 56 1016

    57 3669 57 1073

    58 3669 58 1131

    59 3669 59 1190

    60 3669 60 1250

    62 3669 62 1312

    63 3669 63 1375

    64 3669 64 1439

    65 3669 65 1504

    67 3669 67 1571

    68 3669 68 1639

    69 3669 69 1708

    71 3669 71 1779

    72 3669 72 1851

    73 3669 73 1924

    74 3669 74 1998

    75 3669 75 2073

    78 3669 78 2151

    79 3669 79 2230

    80 3669 80 2310

    82 3669 82 2392

    84 3669 84 2476

    85 3669 85 2561

    86 3669 86 2647

    87 3669 87 2734

    89 3669 89 2823

    90 3669 90 2913

    91 3669 91 3004

    92 3669 92 3096

    93 3669 93 3189

    94 3669 94 3283

    95 3669 95 3378

    96 3669 96 3474

    97 3669 97 3571

    98 3669 98 3669

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    palotaiarpad (4/26/2015)


    Hmm. According to BOL, if OVER used with a SUM, then an order by clause is required...

    As others already said, BOL is wrong.

    OVER() with normal aggregate functions (like SUM) works in two possible ways:

    1. Without ORDER BY - either with PARTITION BY, or just as OVER (). In this case, the PARTITION BY defines groups (no PARTITION BY means that the entire result set is considered a single group), and the aggregate is applied to all rows in the group.

    2. With ORDER BY. In this case, a ROWS or RANGE specification should also be added, allthough the system does supply a default for the latter. With ORDER BY, the results are still grouped based on the PARTITION BY clause, but the ORDER BY + ROWS or RANGE specification limits the function of the aggregate to a subset of the group. This is very useful for, for instance, running totals.

    When using ORDER BY, please do not rely on the default ROWS or RANGE specification. The default may appear to give you running totals, but it uses RANGE instead of ROWS; depending on whether the ORDER BY specification has duplicates or not, this either means that you sometimes get unexpected results, or it means that though you get the same results, you are thrashing performance.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    Eirikur Eiriksson (4/27/2015)


    Another thing to keep in mind is the somewhat strange OVER defaults, recommend always filling those in rather than relying on the supplied defaults. Koen has a good blog post on this[/url]. Here is a quick sample:

    I agree that the default RANGE specification instead of ROWS specification is bad, but the demo code and results you posted appear not related to that. It appears that you expect a different behaviour by adding or removing a TOP clause. Can you please expand on that? (For the record, the results were exactly as I would expect them to be).

    Also note that your query with TOP has no ORDER BY specification for the outer query. That means that SQL Server could have returned any subset of ten rows from the full results, in any desired order. The fact that the rows returned happen to match the ORDER BY specification in the query is an artefact of how the optimizer decides to execute the query, but that is not guaranteed.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Eirikur Eiriksson

    SSC Guru

    Points: 182424

    Hugo Kornelis (4/27/2015)


    Eirikur Eiriksson (4/27/2015)


    Another thing to keep in mind is the somewhat strange OVER defaults, recommend always filling those in rather than relying on the supplied defaults. Koen has a good blog post on this[/url]. Here is a quick sample:

    I agree that the default RANGE specification instead of ROWS specification is bad, but the demo code and results you posted appear not related to that. It appears that you expect a different behaviour by adding or removing a TOP clause. Can you please expand on that? (For the record, the results were exactly as I would expect them to be).

    The code is an simple illustration that the defaults for the empty OVER are different from other defaults implied when different options are used. This affects not only for RANGE/ROWS but also for the frame and the scope.

    😎

    Also note that your query with TOP has no ORDER BY specification for the outer query. That means that SQL Server could have returned any subset of ten rows from the full results, in any desired order. The fact that the rows returned happen to match the ORDER BY specification in the query is an artefact of how the optimizer decides to execute the query, but that is not guaranteed.

    Even with the ORDER BY in the query, the results are the same.

  • Ed Wagner

    SSC Guru

    Points: 286975

    palotaiarpad (4/26/2015)


    If we assume, that we have the latest SQL Server Version (2014), then an over by is required according to BOL (I can't try it yet)

    Others have shown BOL to be incorrect, so I'll just address the version question. If you remove the FORMAT function, you should be able to run the queries in SQL 2008.

    Nice question. Thanks.

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply