random records

  • I just came across an article describing a method of obtaining (pseudo) random records using:

    ORDER BY newid()

    this seems to work, but no where could I find why this works. I had thought order by needed a column name, column index or expression resulting in column index. Anybody got any ideas.

    Steve

  • ORDER BY can sort by any expression, it doesn't need to use actual columns from the table. In this case the expression is just simply the method newid(). However, since SQL is a set based language, using a function in this way would not work as might be expected since the same return value for the method would be used for all rows. But somehow (I am sure someone can tell us exactly why) newid() doesnt behave this way, instead it is executed once for every row in the table, and then the result set is sorted according to those values.

    For example, compare the two following statements. The first will return 10 different rows each time it is executed, but the second will return the same 10 rows (the 10 first in the table) each time:

    select top 10 * from northwind.dbo.orders order by newid()

    select top 10 * from northwind.dbo.orders order by rand()

    You can also test these statements to see it even clearer:

    select newid() from northwind.dbo.region

    select rand() from northwind.dbo.region

    The first returns 4 different rows, the second 4 duplicate rows. I guess it is just a choice they made to always execute newid once per row, since what it does is exactly return a value that is supposed to be unique.

    --

    Chris Hedgate @ Apptus Technologies (http://www.apptus.se)

    http://www.sql.nu

  • Thanks Chris

    Interesting what you say about the difference between newid() and rand(), newid() being executed for every row. I think my question is now more basic.

    What does it mean to use an expression in an order by clause? Taking the simple select

    SELECT EmployeeID, LastName, FirstName

    FROM Northwind.dbo.Employees

    I can specify a column name or a number refering to a column and this works fine. If I used

    ORDER BY 4

    I get the error

    The ORDER BY position number 4 is out of range of the number of items in the select list.

    as you'd expect. However if i use

    ORDER BY (4 - 1)

    I don't get an error but neither does it sort on column 3, it sorts on the employeeid column. In fact as far as I can see what ever expression I enter here, string or numeric it always seems to sort on the employeeid column. So what's going on here, am I missing something, this seems to make the ORDER BY newid() behaviour even more unusual

    Steve

  • quote:


    I can specify a column name or a number refering to a column and this works fine. If I used

    ORDER BY 4

    I get the error

    The ORDER BY position number 4 is out of range of the number of items in the select list.

    as you'd expect. However if i use

    ORDER BY (4 - 1)


    Steve, what you've found here is a 'bug' (not really a bug, but a flaw) in SQL, that has actually been deprecated (meaning it will disappear and should not be used). Specifying column ordinal position as sort key is what I mean. Just as you showed, the behavior is very strange. Actually, what happens is this:

    SELECT EmployeeID, LastName, FirstName

    FROM Northwind.dbo.Employees

    ORDER BY 3

    You didn't have this example, but I included it anyway. The result set will be sorted by the third column (FirstName).

    ORDER BY 4

    SQL Serevr tries to sort by the fourth column, that doesn't exist, therefore you get an error.

    ORDER BY (4 - 1)

    (4 - 1) is evaluated as an expression, not a column ordinal position. Since this expression will evaluate to 3 for every row in the resultset, the set will be sorted by default order (i.e the order SQL retrieves the rows). If you have a clustered index on EmployeeId (as I guess there is) it will be sorted in that order, otherwise it can actually be sorted in any order, there are no guarantee what order it will be sorted in.

    So, as you can see, specifying sort by column ordinal position can be very confusing, and therefore it is deprecated. It isn't even needed, since you can sort by aliases given to columns in the resultset anyway.

    --

    Chris Hedgate @ Apptus Technologies (http://www.apptus.se)

    http://www.sql.nu

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply