Order By Column

Question

Post reply

Order By Column

Viewing 15 posts - 16 through 30 (of 36 total)

You must be logged in to reply to this topic. Login to reply

Thomas Abraham SSChampion Points: 10761 More actions · Answer 1

Nice question on an interesting topic. But agree explanation could have been better. Let's see another one Anoo.

[font="Verdana"]Please don't go. The drones need you. They look up to you.[/font]
Connect to me on LinkedIn

Ken SSCoach Points: 17191 More actions · Answer 2

A good example of how important attention to detail is. Thanks Anoo.

Hugo Kornelis SSC Guru Points: 64789 More actions · Answer 3

Koen Verbeeck (3/4/2014)
Nice question, but the explanation doesn't mention why the order of casting vs order by changes. (because of adding the T alias in the second statement)

The ORDER BY operates by default on column aliases. In the official theoretical model of query evaluation, that is even the ONLY option and sorting on anything that is not in the SELECT list is impossible.

Luckily SQL Server does allow you to order by columns from the FROM clause even if they are not in the SELECT list, but the standard behaviour is preferred. That's why SQL Server will first try to match the ORDER BY arguments with the column names or aliases from the SELECT clause. WHen that fails, it will look at the remaining columns.

Adding the prefix bypasses this behaviour, because you now explicitly specify the source of the data.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

TomThomson SSC Guru Points: 104767 More actions · Answer 4

Nice question.

Hugo Kornelis (3/4/2014)
Koen Verbeeck (3/4/2014)
Nice question, but the explanation doesn't mention why the order of casting vs order by changes. (because of adding the T alias in the second statement)
The ORDER BY operates by default on column aliases. In the official theoretical model of query evaluation, that is even the ONLY option and sorting on anything that is not in the SELECT list is impossible.
Luckily SQL Server does allow you to order by columns from the FROM clause even if they are not in the SELECT list, but the standard behaviour is preferred. That's why SQL Server will first try to match the ORDER BY arguments with the column names or aliases from the SELECT clause. WHen that fails, it will look at the remaining columns.
Adding the prefix bypasses this behaviour, because you now explicitly specify the source of the data.

In some queries only items the order by clause can only use columns from the select list: order by in queries that use select distinct, union, intersect, or except are not permitted to use columns from tables in teh from list unless those columns are in the from list. This applies only when those features occur at the outermost level, not in subqueries - but order clauses in subqueries are meaningless anyway. With the set operators the order by clause can only use the names and aliases specified in the first select (ie the colum names of the record set returned by the query). So in these cases when the standard behaviour fails the system doesn't look at the remaining columns.

Tom

Hugo Kornelis SSC Guru Points: 64789 More actions · Answer 5

TomThomson (3/4/2014)
Nice question.
Hugo Kornelis (3/4/2014)
Koen Verbeeck (3/4/2014)
Nice question, but the explanation doesn't mention why the order of casting vs order by changes. (because of adding the T alias in the second statement)
The ORDER BY operates by default on column aliases. In the official theoretical model of query evaluation, that is even the ONLY option and sorting on anything that is not in the SELECT list is impossible.
Luckily SQL Server does allow you to order by columns from the FROM clause even if they are not in the SELECT list, but the standard behaviour is preferred. That's why SQL Server will first try to match the ORDER BY arguments with the column names or aliases from the SELECT clause. WHen that fails, it will look at the remaining columns.
Adding the prefix bypasses this behaviour, because you now explicitly specify the source of the data.
In some queries only items the order by clause can only use columns from the select list: order by in queries that use select distinct, union, intersect, or except are not permitted to use columns from tables in teh from list unless those columns are in the from list. This applies only when those features occur at the outermost level, not in subqueries - but order clauses in subqueries are meaningless anyway. With the set operators the order by clause can only use the names and aliases specified in the first select (ie the colum names of the record set returned by the query). So in these cases when the standard behaviour fails the system doesn't look at the remaining columns.

Absolutely true. And all those restirctions are totally obvious when you think about it.

For the set operators, allowing access to different column simply opens way too many cans of worms. Think about "SELECT OrderID FROM dbo.Orders UNION ALL SELECT CustomerID FROM dbo.Customers ORDER BY OrderQty;" -assuming that the Customers table has no OrderQty column, how should SQL Server resolve this?

For SELECT DISTINCT, allowing ORDER BY to operate on columns not in the SELECT list would produce erratic and unpredictable behaviour. Consider this table:

Col1 | Col2

-----+-----

1 | 1

2 | 2

3 | 1

Now what do you expect to get returned if you allow "SELECT DISTINCT Col2 FROM TheTableAbove ORDER BY Col1;"?

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Revenant SSC-Forever Points: 42467 More actions · Answer 6

Hugo Kornelis (3/4/2014)
TomThomson (3/4/2014)
Nice question.
Hugo Kornelis (3/4/2014)
Koen Verbeeck (3/4/2014)
Nice question, but the explanation doesn't mention why the order of casting vs order by changes. (because of adding the T alias in the second statement)
The ORDER BY operates by default on column aliases. In the official theoretical model of query evaluation, that is even the ONLY option and sorting on anything that is not in the SELECT list is impossible.
Luckily SQL Server does allow you to order by columns from the FROM clause even if they are not in the SELECT list, but the standard behaviour is preferred. That's why SQL Server will first try to match the ORDER BY arguments with the column names or aliases from the SELECT clause. WHen that fails, it will look at the remaining columns.
Adding the prefix bypasses this behaviour, because you now explicitly specify the source of the data.
In some queries only items the order by clause can only use columns from the select list: order by in queries that use select distinct, union, intersect, or except are not permitted to use columns from tables in teh from list unless those columns are in the from list. This applies only when those features occur at the outermost level, not in subqueries - but order clauses in subqueries are meaningless anyway. With the set operators the order by clause can only use the names and aliases specified in the first select (ie the colum names of the record set returned by the query). So in these cases when the standard behaviour fails the system doesn't look at the remaining columns.
Absolutely true. And all those restirctions are totally obvious when you think about it.
For the set operators, allowing access to different column simply opens way too many cans of worms. Think about "SELECT OrderID FROM dbo.Orders UNION ALL SELECT CustomerID FROM dbo.Customers ORDER BY OrderQty;" -assuming that the Customers table has no OrderQty column, how should SQL Server resolve this?
For SELECT DISTINCT, allowing ORDER BY to operate on columns not in the SELECT list would produce erratic and unpredictable behaviour. Consider this table:
Col1 | Col2
-----+-----
1 | 1
2 | 2
3 | 1
Now what do you expect to get returned if you allow "SELECT DISTINCT Col2 FROM TheTableAbove ORDER BY Col1;"?

Maybe I do note get something, Hugo.

IMO no matter how you slice it -- meaning ORDER or other clauses --, Col2 still has only two distinct values, right?

Hugo Kornelis SSC Guru Points: 64789 More actions · Answer 7

Revenant (3/4/2014)
Hugo Kornelis (3/4/2014)
For SELECT DISTINCT, allowing ORDER BY to operate on columns not in the SELECT list would produce erratic and unpredictable behaviour. Consider this table:
Col1 | Col2
-----+-----
1 | 1
2 | 2
3 | 1
Now what do you expect to get returned if you allow "SELECT DISTINCT Col2 FROM TheTableAbove ORDER BY Col1;"?
Maybe I do note get something, Hugo.
IMO no matter how you slice it -- meaning ORDER or other clauses --, Col2 still has only two distinct values, right?

Yup, two distinct values for Col2, so you get two rows. And the query says they have to be returned in order of Col1. So, which order do you expect them?

I hope that you agree with me that this question cannot be answered - and hence that SQL Server is totally right in not permitting a SELECT DISTINCT query to use ORDER BY columns that are not in the SELECT list.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Revenant SSC-Forever Points: 42467 More actions · Answer 8

Hugo Kornelis (3/4/2014)
Revenant (3/4/2014)
Hugo Kornelis (3/4/2014)
For SELECT DISTINCT, allowing ORDER BY to operate on columns not in the SELECT list would produce erratic and unpredictable behaviour. Consider this table:
Col1 | Col2
-----+-----
1 | 1
2 | 2
3 | 1
Now what do you expect to get returned if you allow "SELECT DISTINCT Col2 FROM TheTableAbove ORDER BY Col1;"?
Maybe I do note get something, Hugo.
IMO no matter how you slice it -- meaning ORDER or other clauses --, Col2 still has only two distinct values, right?
Yup, two distinct values for Col2, so you get two rows. And the query says they have to be returned in order of Col1. So, which order do you expect them?
I hope that you agree with me that this question cannot be answered - and hence that SQL Server is totally right in not permitting a SELECT DISTINCT query to use ORDER BY columns that are not in the SELECT list.

Absolutely, Hugo. 🙂

TomThomson SSC Guru Points: 104767 More actions · Answer 9

Revenant (3/4/2014)
Hugo Kornelis (3/4/2014)
For SELECT DISTINCT, allowing ORDER BY to operate on columns not in the SELECT list would produce erratic and unpredictable behaviour. Consider this table:
Col1 | Col2
-----+-----
1 | 1
2 | 2
3 | 1
Now what do you expect to get returned if you allow "SELECT DISTINCT Col2 FROM TheTableAbove ORDER BY Col1;"?
Maybe I do note get something, Hugo.
IMO no matter how you slice it -- meaning ORDER or other clauses --, Col2 still has only two distinct values, right?

Tes, but which comes first with order by Col1? The Col1 values where Col2 is 1 are 3 and 1, and the Col1 value where Col2 is 2 is 2; so does 1<2 rule or does 2<3 rule? Yes, one could define the effect - several different ways, each of which would probably be unacceptable to the majority of people, so it was best to say "you can't do it if you specify distinct". Without distinct it's easy - you return as many rows as there are in the table, ordered as specified. With distinct, you can't return all the rows because there are fewer distinct Col2 values than there are rows.

Tom

Revenant SSC-Forever Points: 42467 More actions · Answer 10

TomThomson (3/4/2014)
Revenant (3/4/2014)
Hugo Kornelis (3/4/2014)
For SELECT DISTINCT, allowing ORDER BY to operate on columns not in the SELECT list would produce erratic and unpredictable behaviour. Consider this table:
Col1 | Col2
-----+-----
1 | 1
2 | 2
3 | 1
Now what do you expect to get returned if you allow "SELECT DISTINCT Col2 FROM TheTableAbove ORDER BY Col1;"?
Maybe I do note get something, Hugo.
IMO no matter how you slice it -- meaning ORDER or other clauses --, Col2 still has only two distinct values, right?
Tes, but which comes first with order by Col1? The Col1 values where Col2 is 1 are 3 and 1, and the Col1 value where Col2 is 2 is 2; so does 1<2 rule or does 2<3 rule? Yes, one could define the effect - several different ways, each of which would probably be unacceptable to the majority of people, so it was best to say "you can't do it if you specify distinct". Without distinct it's easy - you return as many rows as there are in the table, ordered as specified. With distinct, you can't return all the rows because there are fewer distinct Col2 values than there are rows.

Hmm... I am not a computer scientist and even less a programming language designer; however, I see that there could be a new rule introduced that would say that if you are selecting distinct values based on a column which is not in the select list, you present the first value encountered in the (implicit) full SELECT.

Unfortunately, our rather small example does not make my point obvious, but please consider -

Col1 | Col2

-----+-----

1 | 2

2 | 3

3 | 1

4 | 2

That would give you 2,3,1 right?

I could live with this.

Edit: making the sequence more obvious, IMO

Hugo Kornelis SSC Guru Points: 64789 More actions · Answer 11

Revenant (3/4/2014)
Hmm... I am not a computer scientist and even less a programming language designer; however, I see that there could be a new rule introduced that would say that if you are selecting distinct values based on a column which is not in the select list, you present the first value encountered in the (implicit) full SELECT.

Hate to be a spoilsport, but: define "First". (Remembering that relational databases are set-based and hence by definition have no order. And also remembering that the optimizer is free to reorder the query as much as it wants to.

There is a way to be explicit about this, and that is to use the equivalent GROUP BY:

SELECT Col2 FROM ThatTable GROUP BY Col2 ORDER BY MIN(Col1); -- or MAX(Col1), or AVG(Col1), or (...) - depending on what you want

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Revenant SSC-Forever Points: 42467 More actions · Answer 12

Hugo Kornelis (3/4/2014)
Revenant (3/4/2014)
Hmm... I am not a computer scientist and even less a programming language designer; however, I see that there could be a new rule introduced that would say that if you are selecting distinct values based on a column which is not in the select list, you present the first value encountered in the (implicit) full SELECT.
Hate to be a spoilsport, but: define "First". (Remembering that relational databases are set-based and hence by definition have no order. And also remembering that the optimizer is free to reorder the query as much as it wants to.
There is a way to be explicit about this, and that is to use the equivalent GROUP BY:
SELECT Col2 FROM ThatTable GROUP BY Col2 ORDER BY MIN(Col1); -- or MAX(Col1), or AVG(Col1), or (...) - depending on what you want

Hugo, the original question was about ORDER BY Col1. That gives us an implicit order.

Yeah, I know, it is iffy, and as I said, I am not a computer scientist designing T-SQL, I am just an engineer who happens to have some say about whether it is implemented correctly - not the same thing.

Hugo Kornelis SSC Guru Points: 64789 More actions · Answer 13

Revenant (3/4/2014)
Hugo Kornelis (3/4/2014)
Revenant (3/4/2014)
Hmm... I am not a computer scientist and even less a programming language designer; however, I see that there could be a new rule introduced that would say that if you are selecting distinct values based on a column which is not in the select list, you present the first value encountered in the (implicit) full SELECT.
Hate to be a spoilsport, but: define "First". (Remembering that relational databases are set-based and hence by definition have no order. And also remembering that the optimizer is free to reorder the query as much as it wants to.
There is a way to be explicit about this, and that is to use the equivalent GROUP BY:
SELECT Col2 FROM ThatTable GROUP BY Col2 ORDER BY MIN(Col1); -- or MAX(Col1), or AVG(Col1), or (...) - depending on what you want
Hugo, the original question was about ORDER BY Col1. That gives us an implicit order.
Yeah, I know, it is iffy, and as I said, I am not a computer scientist designing T-SQL, I am just an engineer who happens to have some say about whether it is implemented correctly - not the same thing.

Hmmm. So suppose I modify the data as follows:

Col1 | Col2

-----+-----

9999 | 2

2 | 3

3 | 1

4 | 2

9876 | 2

1 | 2

5555 | 2

(Yeah, I know: extreme)

Now if we run SELECT DISTINCT Col2 ... ORDER BY Col1; - you'd still expect the order to be 2, 3, 1?

(That effectively means that you define the ORDER BY in this case to be interpreted as the equivalent GROUP BY query using MIN as the aggregate function)

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Revenant SSC-Forever Points: 42467 More actions · Answer 14

Hugo Kornelis (3/4/2014)
Revenant (3/4/2014)
Hugo Kornelis (3/4/2014)
Revenant (3/4/2014)
Hmm... I am not a computer scientist and even less a programming language designer; however, I see that there could be a new rule introduced that would say that if you are selecting distinct values based on a column which is not in the select list, you present the first value encountered in the (implicit) full SELECT.
Hate to be a spoilsport, but: define "First". (Remembering that relational databases are set-based and hence by definition have no order. And also remembering that the optimizer is free to reorder the query as much as it wants to.
There is a way to be explicit about this, and that is to use the equivalent GROUP BY:
SELECT Col2 FROM ThatTable GROUP BY Col2 ORDER BY MIN(Col1); -- or MAX(Col1), or AVG(Col1), or (...) - depending on what you want
Hugo, the original question was about ORDER BY Col1. That gives us an implicit order.
Yeah, I know, it is iffy, and as I said, I am not a computer scientist designing T-SQL, I am just an engineer who happens to have some say about whether it is implemented correctly - not the same thing.
Hmmm. So suppose I modify the data as follows:
Col1 | Col2
-----+-----
9999 | 2
2 | 3
3 | 1
4 | 2
9876 | 2
1 | 2
5555 | 2
(Yeah, I know: extreme)
Now if we run SELECT DISTINCT Col2 ... ORDER BY Col1; - you'd still expect the order to be 2, 3, 1?
(That effectively means that you define the ORDER BY in this case to be interpreted as the equivalent GROUP BY query using MIN as the aggregate function)

Yes, Hugo - I would expect it to be 2,3,1, based on the DISTINCT values associated with the Col1 values.

As I said, I am not a computer scientist - I am paid for enforcing the rules - sort of -, not creating them. 😉

TomThomson SSC Guru Points: 104767 More actions · Answer 15

Revenant (3/4/2014)
TomThomson (3/4/2014)
Revenant (3/4/2014)
Hugo Kornelis (3/4/2014)
For SELECT DISTINCT, allowing ORDER BY to operate on columns not in the SELECT list would produce erratic and unpredictable behaviour. Consider this table:
Col1 | Col2
-----+-----
1 | 1
2 | 2
3 | 1
Now what do you expect to get returned if you allow "SELECT DISTINCT Col2 FROM TheTableAbove ORDER BY Col1;"?
Maybe I do note get something, Hugo.
IMO no matter how you slice it -- meaning ORDER or other clauses --, Col2 still has only two distinct values, right?
Tes, but which comes first with order by Col1? The Col1 values where Col2 is 1 are 3 and 1, and the Col1 value where Col2 is 2 is 2; so does 1<2 rule or does 2<3 rule? Yes, one could define the effect - several different ways, each of which would probably be unacceptable to the majority of people, so it was best to say "you can't do it if you specify distinct". Without distinct it's easy - you return as many rows as there are in the table, ordered as specified. With distinct, you can't return all the rows because there are fewer distinct Col2 values than there are rows.
Hmm... I am not a computer scientist and even less a programming language designer; however, I see that there could be a new rule introduced that would say that if you are selecting distinct values based on a column which is not in the select list, you present the first value encountered in the (implicit) full SELECT.
Unfortunately, our rather small example does not make my point obvious, but please consider -
Col1 | Col2
-----+-----
1 | 2
2 | 3
3 | 1
4 | 2
That would give you 2,3,1 right?
I could live with this.
Edit: making the sequence more obvious, IMO

Yes, that's one of the many approaches that would work, and in my view it's one of the better ones because it doesn't require any colision resolution mechanism. But would it be accepted?

Some people would suggest computing the average position in the full list for each value, and ordering by that (perhaps falling back on your method to resolve collisions). Others might suggest looking at whether the number of value A - value B pairs in the full list is where the value A element is before the value B element is greater than or less than the number of pairs where the opposite is the case (again resolving collisions somehow). These are just two of the methods of getting a good representation of the "average" relative position. You can complicate them by weighting elements or pairs of elements depending on how distant from the middle of the unpruned list they are, or when working with pairs by how far apart the two components are in the unpruned list.

I remember a discussion about this particular ordering problem - not in the context of databases, something completely different, way back in the 60s, where the concensus ended up being that the best procedure would be to use a random (ideally not pseudo-random, but in practice true random wasn't really going to be possible) selection of which item with each given value to retain (that would be done by looking at the items in true random order and retaining the part specified for the distinct value and the part designated for ordering for the first item with each distinct value for the specified part, sort the retained sub-items including the (unwanted) ordering part, and then throw away the unwanted part. Like your suggested method, that doesn't require any collision resolution; of course it might be difficult to live with in a relational database context, as it's certainly not deterministic.

Tom