RE: Group by – SQLServerCentral

SSC Guru

Points: 64780

September 6, 2010 at 1:00 am

Good question!

In addition to your explanation, here is WHY an alias can be used in the ORDER BY, but not in the GORUP BY:

The correct result of any SQL query is defined in terms of the logical(*) processing order.

1. FROM clause. Build an intermediate set as a copy of the single table in the FROM clause, or as the result of joining the multiple tables. Or supply a single-row result set with no columns if no FROM is specified.

2. WHERE clause. For each row in the output of step 1, evaluate the WHERE condition. If the result is True, add the row to the intermediate result set that is the output of this step; if it evaluates to False or Unknown (remember three-valued logic!!), discard the row. If no WHERE clause is specified, all rows are assumed to pass the test.

3. GROUP BY clause. Create a new intermediate result set by taking the output of step 2 and forming groups based on the columns or expressions in the GROUP BY expression. If no GROUP BY is present, but the query does include a HAVING clause and/or does use aggregate functions in the SELECT clause, all rows are combined into one single group (i.e., "GROUP BY ()" is assumed). Otherwise, no grouping is applied (all rows are retained as individual rows).

4. HAVING clause. Similar to the WHERE clause, the expression in the HAVING clause is evaluated - but not for individual rows, but for each group in the output of step 3. If the condition evaluates to True, the group is inserted into the intermediate result set of this step; if it's False or Unknown, the entire group gets discarded.

5. SELECT clause. Based on the remaining rows and groups in the output of step 4, the result set of the query is constructed. Each expression in the SELECT list becomes a column in the result set. Columns in the intermediate result set that are not included in the SELECT list are dropped (unless SELECT * is used, becuase that is a shorthand for "all columns in the intermediate result set").

6. ORDER BY clause. Officially a non-relational operation, since relational sets are unordered by definition. But as a service for the front-end, the database can convert the result set ("set" - meaning unordered) to a result cursor ("cursor" - ordered collection). Officially, only columns or expressions that are in the SELECT clause can be used in the ORDER BY clause (as the other columns were dropped from the result, and hence not included in the input of this step). SQL Server does allow this. Logically, you can consider SQL Server adding some extra colums to the SELECT list, with an "invisible" attribute.

Note the absence of subqueries in this list. Logically(*), whenever a subqueriy is encountered in the steps above, evaluation of the outer query is halted, the evaluation process for the subquery is evaluated (steps 1 through 5, as ORDER BY is not allowed in a subquery [for simplicity sake, I choose to not include the SQL Server dialect TOP ... ORDER BY in this description]), and when the result of the subquery is known, the processing of the outer query is resumed.

(*) Note the term "logical". There is no requirement at all that queries are actually executed in this order; as long as the results match those that would be returned by following this order. In fact, one of the key components of SQL Server (and any RDBMS) is the optimizer, a component that considers many strategies to find the desired results and picks the one likely to be the "cheapest" in terms of execution time and resource usage.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/