Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

The Voice of the DBA

Steve Jones is the editor of SQLServerCentral.com and visits a wide variety of data related topics in his daily editorial. Steve has spent years working as a DBA and general purpose Windows administrator, primarily working with SQL Server since it was ported from Sybase in 1990. You can follow Steve on Twitter at twitter.com/way0utwest

A row has no row number

It seems that every month I have someone asking the question about ordering or row numbers for a query. Let’s get one thing clear from the start: there are no "row numbers" in a table.

You can assume that the first row you inserted is row number one, but it’s not. In fact, depending on the indexing or lack of indexing, you may or may not get that row returned first by a query. You can add an ORDER BY when you query the table, and in that case you can get the rows returned in a certain order every time, however the row number is not linked to a row.

As an example. If I have this People table:

ID Name-- -------1  Steve2  Gail

and I query:

select ID, name from people order by name

I get

ID Name-- -------2  Gail1  Steve

I could add a row number

 SELECT row_number() OVER (ORDER BY [name])       , [Name]   FROM dbo.People

and get this:

   Name-- -------1  Gail2  Steve

But "Gail" isn’t linked to "1" as a row number. If I do this:

 INSERT people SELECT 3, 'Bob'

SELECT row_number() OVER (ORDER BY [name])       , [Name]   FROM dbo.People

I now get this:

   Name-- -------1  Bob2  Gail2  Steve


Now "Bob" is 1. You can get row numbers, but they are only linked to an ORDER BY and a specific result set. If the data changes, the row numbers may move.

While it might appear in some queries that you are getting consistent ordering of results, don’t confuse coincidence with causality. You might live on those assumptions for years, building code on them, and then make a few changes and lots of things break.

If you need ordering, use ORDER BY.


Filed under: Blog Tagged: syndicated, T-SQL

Comments

Posted by tony rogerson on 9 May 2011

Probably worth mentioning the same can be said for column ordering, if people use * they are asking for trouble because the ordering of the columns is from the meta data in the tables and if somebody changes the order of the columns in the table though DDL then interesting side affects happen!

So don't use * in statements except where in EXISTS, also, always name columns on the INSERT eg. INSERT tbl ( col1, col 2 ) SELECT col1, col2 FROM .... rather than INSERT tbl1 SELECT col1, col2

Tony.

Tony.

Posted by Alex-668179 on 9 May 2011

About column order: it seems that the situation is not the same. Apparently SQL Server column order is static ("creation order"), even though in the relational model it is indeed undefined (according to Itzik Ben-Gan in the May edition of SQL Server Magazine; obviously it's a bit hard to verify this experimentally).

Now of course you can change the column order through DDL, but that's not the same as having an undefined column order like we have for rows. You would be missing the point completely if you thought that.

Alex

Posted by John.Norcott on 9 May 2011

Silly question, but if you had an identity column, wouldn't that act as a row number?  Maybe that's the row number most people are asking about? I know you can re-organize identity values if needed, but they do stay in the same order, don't they?  Also, some other database engines, the one I'm thinking about particularly is the IBM Db2 database, has a concept of a Relative Record Number which you can indeed order queries by.  The relative record number, from my understanding, is nothing more than the order the rows were inserted into the table.

Posted by tony rogerson on 9 May 2011

Hi Alex - that's what I said :)....

The meta data in sys.columns defines the column ordering, that can be changed through DDL; when writing I was more thinking about VIEW definitions which are more likely to change overtime as I've seen myself when I consult in shops.

T

Posted by Kenneth Wymore on 9 May 2011

Nice article Steve.

A question regarding Tony's "Select * " comment:

Is it better to use "SELECT * " in an EXISTS statement or "SELECT 1 "? I have seen it both ways and was once told that "SELECT 1 " was more efficient although that person did not explain why. I could see "Select 1 " being more efficient if the "Select * " in the exists statement actually returns the full column set internally even though it isn't actually needed.

Kenny

Posted by don 24688 on 9 May 2011

John - I used to think the same thing.  The first "large" database project I built has an identity column on the main table.  The query I was using to display data on the main page didn't specify an ordering.  For the first 2.5 years, the rows were showing in the same order that they had been inserted.  One day, we crossed some threshold and started getting data in some other order.

Posted by jcrawf02 on 9 May 2011

Ken, SELECT * vs SELECT 1 in an Exists makes no difference, you can actually do SELECT NULL and it works just fine. The EXISTS() statement will prompt the optimizer to ignore what's in the SELECT completely (that's my paraphrasing, in no way technical). See Gail Shaw's blog for a series on EXISTS vs IN that explains this. (sqlinthewild.co.za/.../exists-vs-in)

Posted by tfifield on 9 May 2011

Nice little article.  I once had a client that used the order the row was inserted as a guarantee that it would be returned in that order (table was a heap).  It was for an indented Bill of Materials.  I kept telling that that it would break sooner or later and that I should fix it.  They didn't was to spend the money.

Needless to say it broke and cost them even more to fix it.

Posted by YSLGuru on 9 May 2011

Speaking of Ordering does anyone know if there exists any kind of SSMS add-in or similiar that lkets you do basic ordering with the results of a query without actually having to order the resulst or use a temp table?

I frequently am dumping results to Excel so as to do some sorting just so I don't waste system resources to sort by.  It sure would be nice if there were some kind of Mini-Excel or Mini-Spreadsheet for SSMS that ran within the query tools results.

Posted by Mark on 9 May 2011

SELECT row_number() OVER (ORDER BY [name])       , [Name]   FROM dbo.People

I now get this:

  Name-- -------1  Bob2  Gail2  Steve

The row numbers here are not coming from the data in the database, they are coming the row_number() function. I would expect this to be: 1  Bob2  Gail3  Steve, because Bob is the first name alphabetically, then Gail, then Steve. and when I tried it, I got a 3 there.

Posted by tony rogerson on 10 May 2011

In terms of the EXISTS ( SELECT * ... v EXISTS ( SELECT 1

That does matter, not from a performance point of view but from a permissions; we had a discussion about that in the private MVP groups - you may not have access to all the columns in the table because of DENY so * actually fails because of the way the optimiser parses and checks the permissions :)

T

Posted by John.Norcott on 10 May 2011

Don - thanks for the response.  I'll have to do some research to find out the threshold.  More for curiosity's sake at this point, though.

Posted by Steve Jones on 10 May 2011

@tony, thanks for the note on column ordering. Definitely something to keep in mind.

@alex, the column order is defined in the creation statement. I don't know that it is guaranteed, but that can be changed in any SELECT statement.

Leave a Comment

Please register or log in to leave a comment.