Order By Column

  • Revenant

    SSC-Forever

    Points: 42467

    Indeed, this highlights an interesting point. Thanks, Anoo!

  • Thomas Abraham

    SSChampion

    Points: 10761

    Nice question on an interesting topic. But agree explanation could have been better. Let's see another one Anoo.

    [font="Verdana"]Please don't go. The drones need you. They look up to you.[/font]
    Connect to me on LinkedIn

  • Ken Wymore

    SSCoach

    Points: 16668

    A good example of how important attention to detail is. Thanks Anoo.

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    Koen Verbeeck (3/4/2014)


    Nice question, but the explanation doesn't mention why the order of casting vs order by changes. (because of adding the T alias in the second statement)

    The ORDER BY operates by default on column aliases. In the official theoretical model of query evaluation, that is even the ONLY option and sorting on anything that is not in the SELECT list is impossible.

    Luckily SQL Server does allow you to order by columns from the FROM clause even if they are not in the SELECT list, but the standard behaviour is preferred. That's why SQL Server will first try to match the ORDER BY arguments with the column names or aliases from the SELECT clause. WHen that fails, it will look at the remaining columns.

    Adding the prefix bypasses this behaviour, because you now explicitly specify the source of the data.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • TomThomson

    SSC Guru

    Points: 104773

    Nice question.

    Hugo Kornelis (3/4/2014)


    Koen Verbeeck (3/4/2014)


    Nice question, but the explanation doesn't mention why the order of casting vs order by changes. (because of adding the T alias in the second statement)

    The ORDER BY operates by default on column aliases. In the official theoretical model of query evaluation, that is even the ONLY option and sorting on anything that is not in the SELECT list is impossible.

    Luckily SQL Server does allow you to order by columns from the FROM clause even if they are not in the SELECT list, but the standard behaviour is preferred. That's why SQL Server will first try to match the ORDER BY arguments with the column names or aliases from the SELECT clause. WHen that fails, it will look at the remaining columns.

    Adding the prefix bypasses this behaviour, because you now explicitly specify the source of the data.

    In some queries only items the order by clause can only use columns from the select list: order by in queries that use select distinct, union, intersect, or except are not permitted to use columns from tables in teh from list unless those columns are in the from list. This applies only when those features occur at the outermost level, not in subqueries - but order clauses in subqueries are meaningless anyway. With the set operators the order by clause can only use the names and aliases specified in the first select (ie the colum names of the record set returned by the query). So in these cases when the standard behaviour fails the system doesn't look at the remaining columns.

    Tom

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    TomThomson (3/4/2014)


    Nice question.

    Hugo Kornelis (3/4/2014)


    Koen Verbeeck (3/4/2014)


    Nice question, but the explanation doesn't mention why the order of casting vs order by changes. (because of adding the T alias in the second statement)

    The ORDER BY operates by default on column aliases. In the official theoretical model of query evaluation, that is even the ONLY option and sorting on anything that is not in the SELECT list is impossible.

    Luckily SQL Server does allow you to order by columns from the FROM clause even if they are not in the SELECT list, but the standard behaviour is preferred. That's why SQL Server will first try to match the ORDER BY arguments with the column names or aliases from the SELECT clause. WHen that fails, it will look at the remaining columns.

    Adding the prefix bypasses this behaviour, because you now explicitly specify the source of the data.

    In some queries only items the order by clause can only use columns from the select list: order by in queries that use select distinct, union, intersect, or except are not permitted to use columns from tables in teh from list unless those columns are in the from list. This applies only when those features occur at the outermost level, not in subqueries - but order clauses in subqueries are meaningless anyway. With the set operators the order by clause can only use the names and aliases specified in the first select (ie the colum names of the record set returned by the query). So in these cases when the standard behaviour fails the system doesn't look at the remaining columns.

    Absolutely true. And all those restirctions are totally obvious when you think about it.

    For the set operators, allowing access to different column simply opens way too many cans of worms. Think about "SELECT OrderID FROM dbo.Orders UNION ALL SELECT CustomerID FROM dbo.Customers ORDER BY OrderQty;" -assuming that the Customers table has no OrderQty column, how should SQL Server resolve this?

    For SELECT DISTINCT, allowing ORDER BY to operate on columns not in the SELECT list would produce erratic and unpredictable behaviour. Consider this table:

    Col1 | Col2

    -----+-----

    1 | 1

    2 | 2

    3 | 1

    Now what do you expect to get returned if you allow "SELECT DISTINCT Col2 FROM TheTableAbove ORDER BY Col1;"?


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Revenant

    SSC-Forever

    Points: 42467

    Hugo Kornelis (3/4/2014)


    TomThomson (3/4/2014)


    Nice question.

    Hugo Kornelis (3/4/2014)


    Koen Verbeeck (3/4/2014)


    Nice question, but the explanation doesn't mention why the order of casting vs order by changes. (because of adding the T alias in the second statement)

    The ORDER BY operates by default on column aliases. In the official theoretical model of query evaluation, that is even the ONLY option and sorting on anything that is not in the SELECT list is impossible.

    Luckily SQL Server does allow you to order by columns from the FROM clause even if they are not in the SELECT list, but the standard behaviour is preferred. That's why SQL Server will first try to match the ORDER BY arguments with the column names or aliases from the SELECT clause. WHen that fails, it will look at the remaining columns.

    Adding the prefix bypasses this behaviour, because you now explicitly specify the source of the data.

    In some queries only items the order by clause can only use columns from the select list: order by in queries that use select distinct, union, intersect, or except are not permitted to use columns from tables in teh from list unless those columns are in the from list. This applies only when those features occur at the outermost level, not in subqueries - but order clauses in subqueries are meaningless anyway. With the set operators the order by clause can only use the names and aliases specified in the first select (ie the colum names of the record set returned by the query). So in these cases when the standard behaviour fails the system doesn't look at the remaining columns.

    Absolutely true. And all those restirctions are totally obvious when you think about it.

    For the set operators, allowing access to different column simply opens way too many cans of worms. Think about "SELECT OrderID FROM dbo.Orders UNION ALL SELECT CustomerID FROM dbo.Customers ORDER BY OrderQty;" -assuming that the Customers table has no OrderQty column, how should SQL Server resolve this?

    For SELECT DISTINCT, allowing ORDER BY to operate on columns not in the SELECT list would produce erratic and unpredictable behaviour. Consider this table:

    Col1 | Col2

    -----+-----

    1 | 1

    2 | 2

    3 | 1

    Now what do you expect to get returned if you allow "SELECT DISTINCT Col2 FROM TheTableAbove ORDER BY Col1;"?

    Maybe I do note get something, Hugo.

    IMO no matter how you slice it -- meaning ORDER or other clauses --, Col2 still has only two distinct values, right?

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    Revenant (3/4/2014)


    Hugo Kornelis (3/4/2014)


    For SELECT DISTINCT, allowing ORDER BY to operate on columns not in the SELECT list would produce erratic and unpredictable behaviour. Consider this table:

    Col1 | Col2

    -----+-----

    1 | 1

    2 | 2

    3 | 1

    Now what do you expect to get returned if you allow "SELECT DISTINCT Col2 FROM TheTableAbove ORDER BY Col1;"?

    Maybe I do note get something, Hugo.

    IMO no matter how you slice it -- meaning ORDER or other clauses --, Col2 still has only two distinct values, right?

    Yup, two distinct values for Col2, so you get two rows. And the query says they have to be returned in order of Col1. So, which order do you expect them?

    I hope that you agree with me that this question cannot be answered - and hence that SQL Server is totally right in not permitting a SELECT DISTINCT query to use ORDER BY columns that are not in the SELECT list.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Revenant

    SSC-Forever

    Points: 42467

    Hugo Kornelis (3/4/2014)


    Revenant (3/4/2014)


    Hugo Kornelis (3/4/2014)


    For SELECT DISTINCT, allowing ORDER BY to operate on columns not in the SELECT list would produce erratic and unpredictable behaviour. Consider this table:

    Col1 | Col2

    -----+-----

    1 | 1

    2 | 2

    3 | 1

    Now what do you expect to get returned if you allow "SELECT DISTINCT Col2 FROM TheTableAbove ORDER BY Col1;"?

    Maybe I do note get something, Hugo.

    IMO no matter how you slice it -- meaning ORDER or other clauses --, Col2 still has only two distinct values, right?

    Yup, two distinct values for Col2, so you get two rows. And the query says they have to be returned in order of Col1. So, which order do you expect them?

    I hope that you agree with me that this question cannot be answered - and hence that SQL Server is totally right in not permitting a SELECT DISTINCT query to use ORDER BY columns that are not in the SELECT list.

    Absolutely, Hugo. 🙂

  • TomThomson

    SSC Guru

    Points: 104773

    Revenant (3/4/2014)


    Hugo Kornelis (3/4/2014)


    For SELECT DISTINCT, allowing ORDER BY to operate on columns not in the SELECT list would produce erratic and unpredictable behaviour. Consider this table:

    Col1 | Col2

    -----+-----

    1 | 1

    2 | 2

    3 | 1

    Now what do you expect to get returned if you allow "SELECT DISTINCT Col2 FROM TheTableAbove ORDER BY Col1;"?

    Maybe I do note get something, Hugo.

    IMO no matter how you slice it -- meaning ORDER or other clauses --, Col2 still has only two distinct values, right?

    Tes, but which comes first with order by Col1? The Col1 values where Col2 is 1 are 3 and 1, and the Col1 value where Col2 is 2 is 2; so does 1<2 rule or does 2<3 rule? Yes, one could define the effect - several different ways, each of which would probably be unacceptable to the majority of people, so it was best to say "you can't do it if you specify distinct". Without distinct it's easy - you return as many rows as there are in the table, ordered as specified. With distinct, you can't return all the rows because there are fewer distinct Col2 values than there are rows.

    Tom

  • Revenant

    SSC-Forever

    Points: 42467

    TomThomson (3/4/2014)


    Revenant (3/4/2014)


    Hugo Kornelis (3/4/2014)


    For SELECT DISTINCT, allowing ORDER BY to operate on columns not in the SELECT list would produce erratic and unpredictable behaviour. Consider this table:

    Col1 | Col2

    -----+-----

    1 | 1

    2 | 2

    3 | 1

    Now what do you expect to get returned if you allow "SELECT DISTINCT Col2 FROM TheTableAbove ORDER BY Col1;"?

    Maybe I do note get something, Hugo.

    IMO no matter how you slice it -- meaning ORDER or other clauses --, Col2 still has only two distinct values, right?

    Tes, but which comes first with order by Col1? The Col1 values where Col2 is 1 are 3 and 1, and the Col1 value where Col2 is 2 is 2; so does 1<2 rule or does 2<3 rule? Yes, one could define the effect - several different ways, each of which would probably be unacceptable to the majority of people, so it was best to say "you can't do it if you specify distinct". Without distinct it's easy - you return as many rows as there are in the table, ordered as specified. With distinct, you can't return all the rows because there are fewer distinct Col2 values than there are rows.

    Hmm... I am not a computer scientist and even less a programming language designer; however, I see that there could be a new rule introduced that would say that if you are selecting distinct values based on a column which is not in the select list, you present the first value encountered in the (implicit) full SELECT.

    Unfortunately, our rather small example does not make my point obvious, but please consider -

    Col1 | Col2

    -----+-----

    1 | 2

    2 | 3

    3 | 1

    4 | 2

    That would give you 2,3,1 right?

    I could live with this.

    Edit: making the sequence more obvious, IMO

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    Revenant (3/4/2014)


    Hmm... I am not a computer scientist and even less a programming language designer; however, I see that there could be a new rule introduced that would say that if you are selecting distinct values based on a column which is not in the select list, you present the first value encountered in the (implicit) full SELECT.

    Hate to be a spoilsport, but: define "First". (Remembering that relational databases are set-based and hence by definition have no order. And also remembering that the optimizer is free to reorder the query as much as it wants to.

    There is a way to be explicit about this, and that is to use the equivalent GROUP BY:

    SELECT Col2 FROM ThatTable GROUP BY Col2 ORDER BY MIN(Col1); -- or MAX(Col1), or AVG(Col1), or (...) - depending on what you want


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Revenant

    SSC-Forever

    Points: 42467

    Hugo Kornelis (3/4/2014)


    Revenant (3/4/2014)


    Hmm... I am not a computer scientist and even less a programming language designer; however, I see that there could be a new rule introduced that would say that if you are selecting distinct values based on a column which is not in the select list, you present the first value encountered in the (implicit) full SELECT.

    Hate to be a spoilsport, but: define "First". (Remembering that relational databases are set-based and hence by definition have no order. And also remembering that the optimizer is free to reorder the query as much as it wants to.

    There is a way to be explicit about this, and that is to use the equivalent GROUP BY:

    SELECT Col2 FROM ThatTable GROUP BY Col2 ORDER BY MIN(Col1); -- or MAX(Col1), or AVG(Col1), or (...) - depending on what you want

    Hugo, the original question was about ORDER BY Col1. That gives us an implicit order.

    Yeah, I know, it is iffy, and as I said, I am not a computer scientist designing T-SQL, I am just an engineer who happens to have some say about whether it is implemented correctly - not the same thing.

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    Revenant (3/4/2014)


    Hugo Kornelis (3/4/2014)


    Revenant (3/4/2014)


    Hmm... I am not a computer scientist and even less a programming language designer; however, I see that there could be a new rule introduced that would say that if you are selecting distinct values based on a column which is not in the select list, you present the first value encountered in the (implicit) full SELECT.

    Hate to be a spoilsport, but: define "First". (Remembering that relational databases are set-based and hence by definition have no order. And also remembering that the optimizer is free to reorder the query as much as it wants to.

    There is a way to be explicit about this, and that is to use the equivalent GROUP BY:

    SELECT Col2 FROM ThatTable GROUP BY Col2 ORDER BY MIN(Col1); -- or MAX(Col1), or AVG(Col1), or (...) - depending on what you want

    Hugo, the original question was about ORDER BY Col1. That gives us an implicit order.

    Yeah, I know, it is iffy, and as I said, I am not a computer scientist designing T-SQL, I am just an engineer who happens to have some say about whether it is implemented correctly - not the same thing.

    Hmmm. So suppose I modify the data as follows:

    Col1 | Col2

    -----+-----

    9999 | 2

    2 | 3

    3 | 1

    4 | 2

    9876 | 2

    1 | 2

    5555 | 2

    (Yeah, I know: extreme)

    Now if we run SELECT DISTINCT Col2 ... ORDER BY Col1; - you'd still expect the order to be 2, 3, 1?

    (That effectively means that you define the ORDER BY in this case to be interpreted as the equivalent GROUP BY query using MIN as the aggregate function)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Revenant

    SSC-Forever

    Points: 42467

    Hugo Kornelis (3/4/2014)


    Revenant (3/4/2014)


    Hugo Kornelis (3/4/2014)


    Revenant (3/4/2014)


    Hmm... I am not a computer scientist and even less a programming language designer; however, I see that there could be a new rule introduced that would say that if you are selecting distinct values based on a column which is not in the select list, you present the first value encountered in the (implicit) full SELECT.

    Hate to be a spoilsport, but: define "First". (Remembering that relational databases are set-based and hence by definition have no order. And also remembering that the optimizer is free to reorder the query as much as it wants to.

    There is a way to be explicit about this, and that is to use the equivalent GROUP BY:

    SELECT Col2 FROM ThatTable GROUP BY Col2 ORDER BY MIN(Col1); -- or MAX(Col1), or AVG(Col1), or (...) - depending on what you want

    Hugo, the original question was about ORDER BY Col1. That gives us an implicit order.

    Yeah, I know, it is iffy, and as I said, I am not a computer scientist designing T-SQL, I am just an engineer who happens to have some say about whether it is implemented correctly - not the same thing.

    Hmmm. So suppose I modify the data as follows:

    Col1 | Col2

    -----+-----

    9999 | 2

    2 | 3

    3 | 1

    4 | 2

    9876 | 2

    1 | 2

    5555 | 2

    (Yeah, I know: extreme)

    Now if we run SELECT DISTINCT Col2 ... ORDER BY Col1; - you'd still expect the order to be 2, 3, 1?

    (That effectively means that you define the ORDER BY in this case to be interpreted as the equivalent GROUP BY query using MIN as the aggregate function)

    Yes, Hugo - I would expect it to be 2,3,1, based on the DISTINCT values associated with the Col1 values.

    As I said, I am not a computer scientist - I am paid for enforcing the rules - sort of -, not creating them. 😉

Viewing 15 posts - 16 through 30 (of 37 total)

You must be logged in to reply to this topic. Login to reply