Do you need an order by when selecting TOP N from a clustered index?

  • Lets say you have a table with 2 columns, the first, an identity primary key column, the second a foreign key linking to the primary key of a dimension.

    This table is populated by inserting the the primary key from the dimension with an order by clause such as a date field from the dimension so the identity family id should be stored in order of the date from the dimension it comes from.

    I have done a lot of testing, selecting top 20 from table where the identity > x (with an order by)and it always returns the next 20 columns in order.

    Is this safe for production though?  I can't rememberif it was always advised that without and order by, even on a clustered index, the results could be arbitrary..  I know withou the index I would need a an order by

  • Selecting data from a table without an ORDER By clause will never guarantee the order the data returns in. if you add a TOP into the mix, the data you get could be different every time you run the statement even if the underlying data hasn't changed. Having a clustered index may cause SQL SERVER to more often that not return the results in the same order, but that doesn't always mean it will. Plus, if someone (stupidly) were to drop or amend the INDEX, you could be in for a world of pain.

    If you want to always have the latest 20 records returned, then add an ORDER BY clause, it will always guarantee your results will be as expected.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • I had a feeling that was the case thanks I just have never seen SQL not return the order correctly when it has a clustered index.

    No one would be able to change the index or tinker it and the table is dropped and rebuild every night so there would be no issue with the stats but I will have to use the order by to be sure.

    The issue was the tbale could potentially be 10s of millions of rows and speed could be an issue here.

  • No ORDER BY, no guarantee of order. End of story.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • rob.lewis 86087 - Wednesday, April 26, 2017 3:57 AM

    The issue was the tbale could potentially be 10s of millions of rows and speed could be an issue here.

    If you have your index's on the right columns, for example those referenced in the ORDER BY clause, then SQL will be able handle this efficiently. If you have 10M rows, and you're ordering by a column(s) with no indexing, then your performance is going to be poor; as it will need to read every single value and organise the data accordingly.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • rob.lewis 86087 - Wednesday, April 26, 2017 3:57 AM

    I had a feeling that was the case thanks I just have never seen SQL not return the order correctly when it has a clustered index.

    No one would be able to change the index or tinker it and the table is dropped and rebuild every night so there would be no issue with the stats but I will have to use the order by to be sure.

    The issue was the tbale could potentially be 10s of millions of rows and speed could be an issue here.

    ORDER BY is expensive as it requires a SORT.
    However, my understanding is that if your query goes parallel, there is no guarantee that the results will be returned in the expected order.

  • DesNorton - Wednesday, April 26, 2017 4:14 AM

    ORDER BY is expensive as it requires a SORT.

    Not necessarily. If there's an index on the ORDER BY column, SQL can use that to avoid the need for a sort. The ORDER BY is still necessary, but there's no sort operation.

    However, my understanding is that if your query goes parallel, there is no guarantee that the results will be returned in the expected order.

    Unless there's an ORDER BY on the outer query, there's no guarantee that the results will be returned in the expected order.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I believe there is ONE SCENARIO in which order IS guaranteed in TSQL without an explicit ORDER BY: GROUP BY with a ROLLUP or CUBE (perhaps GROUPING SETS, but I haven't verified this). And in fact, if you do order that output you will wind up with the NULLs at the top, which is almost certainly not where they should be in the output.

    Outside of that (as others have said) no ORDER BY no guaranteed order.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Even then - if your order by is a "partial order" you could still end up with some fuzziness when you combine this with a TOP N clause. Otherwise stated - if the order by columns yield duplicate ordering entries (multiple rows with the same ORDER BY values), you STILL run the risk of some unpredictable behavior though much smaller than without the ORDER BY.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • rob.lewis 86087 - Wednesday, April 26, 2017 3:41 AM

    Lets say you have a table with 2 columns, the first, an identity primary key column, the second a foreign key linking to the primary key of a dimension.

    This table is populated by inserting the the primary key from the dimension with an order by clause such as a date field from the dimension so the identity family id should be stored in order of the date from the dimension it comes from.

    I have done a lot of testing, selecting top 20 from table where the identity > x (with an order by)and it always returns the next 20 columns in order.

    Is this safe for production though?  I can't rememberif it was always advised that without and order by, even on a clustered index, the results could be arbitrary..  I know withou the index I would need a an order by

    Ignoring that using IDENTITY as a primary key is a totally nonrelational way of programming, you should always put an ORDER BY clause on a construct that is based on an ordering. What you are doing documents the intent of the query; if it's redundant, trust the optimizer to ignore it and not actually do a sort. However, if any of your indexes change, or some other changes in the physical storage of the data change, then the order by will be needed. It's always better to write code, even if you think you can default on it, then to not have it. The next release might be completely different. My favorite example is when the BIT type changed from a logical {0,1} , that we knew from all the other programming languages which we had ever used. Then in one release of SQL Server, it became a numeric data type {0,1,NULL}. In SQL, all datatypes must support null. The rules of the null, make it the "default default", so people that were depending on the old logical bit to default to "false" had some very unpleasant surprises in their code.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Feel free to ignore everything Joe wrote Rob.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply