Row number over guid

Question

Post reply

Row number over guid

amns

SSCrazy

Points: 2628
More actions
November 13, 2015 at 9:37 am

#313127

Hi!
I need to create some kind of integer row identifier over a table with GUID primary key, so I can select some subsets sequentially from that table.
I'm thinking to use ROW_NUMBER over the primary key column which type is GUID.
Will I receive always the same order?
Will the order that I receive match with the order that the rows were added to the table?
Something like this:
select
rowId,
column2,
column3,
column4
from (
select
row_number() over (order by column1) as row_id, -- column1 is PK GUID unique
column2,
column3,
column4
from
myTable ) myTableWithRowId
where
row_id > (@page-1)*@pagesize AND ROW_ID <= @page*@pagesize
First iteration would be with:
@page = 1
@pagesize = 10
» receive subset from 0 to 10
Second iteration:
@page = 2
@pagesize = 10
» receive subset from 11 to 20
Third iteration:
@page = 3
@pagesize = 10
» receive subset from 21 to 30
and so on..
...
Appreciate if someone could help.
Thanks

Viewing 15 posts - 1 through 15 (of 38 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 1

Yes the rows will be in a consistent order.

😎

Quick thought, since you are on 2012 it's probably easier to use the OFFSET FETCH rather than the WHERE clause. Further, when using a GUID as a primary key, you might want to add a numeric clustered index, helps tackling the fragmentation implied with non-sequential value nature of the GUID unless it's been sequentially generated which kind of beats the purpose.

Sean Lange SSC Guru Points: 286573 More actions · Answer 2

I will disagree slightly with Eirikur here. The values in ROW_NUMBER will always be the same but the order of rows in the select statement are not guaranteed to be consistent because your main query does not have an order by. If you want to ensure the order of the final output there is one and only one way to ensure that, add an order by clause.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 3

Sean Lange (11/13/2015)
I will disagree slightly with Eirikur here. The values in ROW_NUMBER will always be the same but the order of rows in the select statement are not guaranteed to be consistent because your main query does not have an order by. If you want to ensure the order of the final output there is one and only one way to ensure that, add an order by clause.

The order of the ROW_NUMBER relative to the GUID is set in the OVER clause, the order of appearance isn't guaranteed though.

😎

Sean Lange SSC Guru Points: 286573 More actions · Answer 4

Eirikur Eiriksson (11/13/2015)
Sean Lange (11/13/2015)
I will disagree slightly with Eirikur here. The values in ROW_NUMBER will always be the same but the order of rows in the select statement are not guaranteed to be consistent because your main query does not have an order by. If you want to ensure the order of the final output there is one and only one way to ensure that, add an order by clause.
The order of the ROW_NUMBER relative to the GUID is set in the OVER clause, the order of appearance isn't guaranteed though.
😎

We are saying the same thing. 😛 I just wanted to clarify in case the OP was also expecting the order of appearance to be the same which it wouldn't without the order by.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 5

Sean Lange (11/13/2015)
Eirikur Eiriksson (11/13/2015)
Sean Lange (11/13/2015)
I will disagree slightly with Eirikur here. The values in ROW_NUMBER will always be the same but the order of rows in the select statement are not guaranteed to be consistent because your main query does not have an order by. If you want to ensure the order of the final output there is one and only one way to ensure that, add an order by clause.
The order of the ROW_NUMBER relative to the GUID is set in the OVER clause, the order of appearance isn't guaranteed though.
😎
We are saying the same thing. 😛 I just wanted to clarify in case the OP was also expecting the order of appearance to be the same which it wouldn't without the order by.

Good point and never too often stated!

😎

On a side note, adding a CHECKSUM on the GUID within the ORDER could help the performance (smaller worktable rows), especially when doing more complex aggregation/ordering/ranking on larger sets but the order might become less predictable:-P

Ed Wagner SSC Guru Points: 287024 More actions · Answer 6

Eirikur Eiriksson (11/13/2015)
Sean Lange (11/13/2015)
Eirikur Eiriksson (11/13/2015)
Sean Lange (11/13/2015)
I will disagree slightly with Eirikur here. The values in ROW_NUMBER will always be the same but the order of rows in the select statement are not guaranteed to be consistent because your main query does not have an order by. If you want to ensure the order of the final output there is one and only one way to ensure that, add an order by clause.
The order of the ROW_NUMBER relative to the GUID is set in the OVER clause, the order of appearance isn't guaranteed though.
😎
We are saying the same thing. 😛 I just wanted to clarify in case the OP was also expecting the order of appearance to be the same which it wouldn't without the order by.
Good point and never too often stated!
😎
On a side note, adding a CHECKSUM on the GUID within the ORDER could help the performance (smaller worktable rows), especially when doing more complex aggregation/ordering/ranking on larger sets but the order might become less predictable:-P

One note of a different side, skipping the GUID altogether and going with an integer primary key would be even better. Then again, I hate GUIDs. 😛

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Sean Lange SSC Guru Points: 286573 More actions · Answer 7

Ed Wagner (11/13/2015)
Eirikur Eiriksson (11/13/2015)
Sean Lange (11/13/2015)
Eirikur Eiriksson (11/13/2015)
Sean Lange (11/13/2015)
I will disagree slightly with Eirikur here. The values in ROW_NUMBER will always be the same but the order of rows in the select statement are not guaranteed to be consistent because your main query does not have an order by. If you want to ensure the order of the final output there is one and only one way to ensure that, add an order by clause.
The order of the ROW_NUMBER relative to the GUID is set in the OVER clause, the order of appearance isn't guaranteed though.
😎
We are saying the same thing. 😛 I just wanted to clarify in case the OP was also expecting the order of appearance to be the same which it wouldn't without the order by.
Good point and never too often stated!
😎
On a side note, adding a CHECKSUM on the GUID within the ORDER could help the performance (smaller worktable rows), especially when doing more complex aggregation/ordering/ranking on larger sets but the order might become less predictable:-P
One note of a different side, skipping the GUID altogether and going with an integer primary key would be even better. Then again, I hate GUIDs. 😛

I agree. GUIDs have their place and sometimes are totally unavoidable. They can be incredibly painful to your system if not handled correctly. But what I find is they are unbelievable developer hostile. Querying with them for analysis/testing/debugging purposes is nothing short of horribly painful.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 8

Just a quick test to play around with, notice the effect of adding an index

😎

USE TEEST;

GO

SET NOCOUNT ON;

DECLARE @SAMPLE_SIZE BIGINT = 10000000;

DECLARE @INT_BUCKET INT = 0;

DECLARE @BIGINT_BUCKET INT = 0;

DECLARE @timer TABLE (T_TEXT VARCHAR(50) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME()));

--/* UNCOMMENT THIS LINE TO SKIP GENERATING THE TEST SET.

IF OBJECT_ID(N'dbo.TEST_GUID_ORDER') IS NOT NULL DROP TABLE dbo.TEST_GUID_ORDER

CREATE TABLE dbo.TEST_GUID_ORDER

(

TGO_ID INT NOT NULL CONSTRAINT PK_DBO_TEST_GUID_ORDER_TGO_ID PRIMARY KEY CLUSTERED

,TGO_GUID UNIQUEIDENTIFIER NOT NULL CONSTRAINT UNQ_DBO_TEST_GUID_ORDER_TGO_GUID DEFAULT (NEWID())

);

;WITH T(N) AS (SELECT 0 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N))

, NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)

INSERT INTO dbo.TEST_GUID_ORDER(TGO_ID)

SELECT

NM.N

FROM NUMS NM;

-- */

INSERT INTO @timer(T_TEXT) VALUES('DRY RUN NO INDEX');

SELECT

@INT_BUCKET = T.TGO_ID

FROM dbo.TEST_GUID_ORDER T

ORDER BY T.TGO_GUID;

INSERT INTO @timer(T_TEXT) VALUES('DRY RUN NO INDEX');

INSERT INTO @timer(T_TEXT) VALUES('GUID NO INDEX');

SELECT

@BIGINT_BUCKET = ROW_NUMBER() OVER

(

ORDER BY T.TGO_GUID

)

FROM dbo.TEST_GUID_ORDER T;

INSERT INTO @timer(T_TEXT) VALUES('GUID NO INDEX');

INSERT INTO @timer(T_TEXT) VALUES('CHECKSUM NO INDEX');

SELECT

@BIGINT_BUCKET = ROW_NUMBER() OVER

(

ORDER BY CHECKSUM(T.TGO_GUID)

)

FROM dbo.TEST_GUID_ORDER T;

INSERT INTO @timer(T_TEXT) VALUES('CHECKSUM NO INDEX');

-----------------------------------

-- ADD INDEX

-----------------------------------

CREATE UNIQUE NONCLUSTERED INDEX UNQNCLIDX_DBO_TEST_GUID_ORDER_TGO_GUID ON dbo.TEST_GUID_ORDER(TGO_GUID ASC);

-----------------------------------

INSERT INTO @timer(T_TEXT) VALUES('DRY RUN INDEX');

SELECT

@INT_BUCKET = T.TGO_ID

FROM dbo.TEST_GUID_ORDER T

ORDER BY T.TGO_GUID;

INSERT INTO @timer(T_TEXT) VALUES('DRY RUN INDEX');

INSERT INTO @timer(T_TEXT) VALUES('GUID INDEX');

SELECT

@BIGINT_BUCKET = ROW_NUMBER() OVER

(

ORDER BY T.TGO_GUID

)

FROM dbo.TEST_GUID_ORDER T;

INSERT INTO @timer(T_TEXT) VALUES('GUID INDEX');

INSERT INTO @timer(T_TEXT) VALUES('CHECKSUM INDEX');

SELECT

@BIGINT_BUCKET = ROW_NUMBER() OVER

(

ORDER BY CHECKSUM(T.TGO_GUID)

)

FROM dbo.TEST_GUID_ORDER T;

INSERT INTO @timer(T_TEXT) VALUES('CHECKSUM INDEX');

SELECT

T.T_TEXT

,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION

FROM @timer T

GROUP BY T.T_TEXT

ORDER BY DURATION;

Sample results

T_TEXT DURATION

------------------ ----------

DRY RUN INDEX 1290002

GUID INDEX 2270003

CHECKSUM INDEX 7334148

CHECKSUM NO INDEX 7417160

DRY RUN NO INDEX 9888332

GUID NO INDEX 10582332

But obviously there is more to this than meets the eye;-)

amns SSCrazy Points: 2628 More actions · Answer 9

Thanks for all your inputs.

I agree with the numeric clustered index, it would be my choice if it was me who created this table.

But this table is in another database and I have to deal with it as it is at the moment.

Thanks for the OFFSET FETCH tip and I will add an order by in the final output.

j-1064772 SSCertifiable Points: 5316 More actions · Answer 10

I agree. GUIDs have their place and sometimes are totally unavoidable. They can be incredibly painful to your system if not handled correctly. But what I find is they are unbelievable developer hostile. Querying with them for analysis/testing/debugging purposes is nothing short of horribly painful.

One saving grace I found for the GUID was in an inherited, undocumented database I had to convert. In some cases, it was not immediately apparent which table to search in order to find "detail" info. Finding the GUID clearly flagged the proper table to use. A colleague had the same problem and unfortunately, instead of using GUIDs as primary and foreign keys, the data base he had inherited used integers - which were all starting from 1 and thus not readily distinguishable.

amns SSCrazy Points: 2628 More actions · Answer 11

Yes, that could be an advantage of having GUID columns as primary and foreign keys on a database.

Never thought of that.

One other advantage of GUIDs could be when you need to join two equal tables, for example, two tables of users which were filled in different places and keep the same user identifier. With integers this would be impossible.

Sergiy SSC Guru Points: 110206 More actions · Answer 12

sql.queries (11/18/2015)
One other advantage of GUIDs could be when you need to join two equal tables, for example, two tables of users which were filled in different places and keep the same user identifier. With integers this would be impossible.

It's actually a disadvantage rather than an advantage.

A same user can be recorded in different places, and different GUID would lead to a wrong perception that those are records for different users.

_____________
Code for TallyGenerator

j-1064772 SSCertifiable Points: 5316 More actions · Answer 13

If two offices of an organization used the same application each maintaining its own database, then at a later time the two databases could be merged without any risk of interference between the primary and foreign keys if they are GUID's. Merging the contents of the two databases would become more complicated because integer primary and foreign keys could have the same values.

Sergiy SSC Guru Points: 110206 More actions · Answer 14

j-1064772 (11/18/2015)
If two offices of an organization used the same application each maintaining its own database, then at a later time the two databases could be merged without any risk of interference between the primary and foreign keys if they are GUID's. Merging the contents of the two databases would become more complicated because integer primary and foreign keys could have the same values.

Technically - yes.

But logically - use of GUID turns the outcome of such a merge into pretty much rubbish.

Say, if both databases have a record of a location named "New York City".

When you merge those 2 databases together - should you merger those 2 records from different databases into 1? Or you're gonna leave 2 New York cities in the merged database?

Probably not.

Then you need some other logic to identify identical entries in both databases and merging them, GUID identifiers are useless for that.

After merging 2 location entries - what is gonna happen to the addresses which point to "New York City" locations by FK links after 1 of them has to be removed?

FK columns need to be all updated with a new GUID for "New York City" location, and existing GUID won't help here at all. You need to get a new identity value for the location by referring it by name or any other natural key for the entity.

It's pretty much the same way as you'd need to do if the tables would use integer identities instead of GUID:

you still need to get new ID's for FK columns using natural key as unique identifiers for entities.

GUID references can only help when you're absolutely sure you do not have repeating entries in both databases.

But at the stage when databases are about to get merged it's rarely the case.

_____________
Code for TallyGenerator