Row number over guid

  • Hi!

    I need to create some kind of integer row identifier over a table with GUID primary key, so I can select some subsets sequentially from that table.

    I'm thinking to use ROW_NUMBER over the primary key column which type is GUID.

    Will I receive always the same order?

    Will the order that I receive match with the order that the rows were added to the table?

    Something like this:

    select

    rowId,

    column2,

    column3,

    column4

    from (

    select

    row_number() over (order by column1) as row_id, -- column1 is PK GUID unique

    column2,

    column3,

    column4

    from

    myTable ) myTableWithRowId

    where

    row_id > (@page-1)*@pagesize AND ROW_ID <= @page*@pagesize

    First iteration would be with:

    @page = 1

    @pagesize = 10

    ยป receive subset from 0 to 10

    Second iteration:

    @page = 2

    @pagesize = 10

    ยป receive subset from 11 to 20

    Third iteration:

    @page = 3

    @pagesize = 10

    ยป receive subset from 21 to 30

    and so on..

    ...

    Appreciate if someone could help.

    Thanks

  • Yes the rows will be in a consistent order.

    ๐Ÿ˜Ž

    Quick thought, since you are on 2012 it's probably easier to use the OFFSET FETCH rather than the WHERE clause. Further, when using a GUID as a primary key, you might want to add a numeric clustered index, helps tackling the fragmentation implied with non-sequential value nature of the GUID unless it's been sequentially generated which kind of beats the purpose.

  • I will disagree slightly with Eirikur here. The values in ROW_NUMBER will always be the same but the order of rows in the select statement are not guaranteed to be consistent because your main query does not have an order by. If you want to ensure the order of the final output there is one and only one way to ensure that, add an order by clause.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 โ€“ Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (11/13/2015)


    I will disagree slightly with Eirikur here. The values in ROW_NUMBER will always be the same but the order of rows in the select statement are not guaranteed to be consistent because your main query does not have an order by. If you want to ensure the order of the final output there is one and only one way to ensure that, add an order by clause.

    The order of the ROW_NUMBER relative to the GUID is set in the OVER clause, the order of appearance isn't guaranteed though.

    ๐Ÿ˜Ž

  • Eirikur Eiriksson (11/13/2015)


    Sean Lange (11/13/2015)


    I will disagree slightly with Eirikur here. The values in ROW_NUMBER will always be the same but the order of rows in the select statement are not guaranteed to be consistent because your main query does not have an order by. If you want to ensure the order of the final output there is one and only one way to ensure that, add an order by clause.

    The order of the ROW_NUMBER relative to the GUID is set in the OVER clause, the order of appearance isn't guaranteed though.

    ๐Ÿ˜Ž

    We are saying the same thing. ๐Ÿ˜› I just wanted to clarify in case the OP was also expecting the order of appearance to be the same which it wouldn't without the order by.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 โ€“ Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (11/13/2015)


    Eirikur Eiriksson (11/13/2015)


    Sean Lange (11/13/2015)


    I will disagree slightly with Eirikur here. The values in ROW_NUMBER will always be the same but the order of rows in the select statement are not guaranteed to be consistent because your main query does not have an order by. If you want to ensure the order of the final output there is one and only one way to ensure that, add an order by clause.

    The order of the ROW_NUMBER relative to the GUID is set in the OVER clause, the order of appearance isn't guaranteed though.

    ๐Ÿ˜Ž

    We are saying the same thing. ๐Ÿ˜› I just wanted to clarify in case the OP was also expecting the order of appearance to be the same which it wouldn't without the order by.

    Good point and never too often stated!

    ๐Ÿ˜Ž

    On a side note, adding a CHECKSUM on the GUID within the ORDER could help the performance (smaller worktable rows), especially when doing more complex aggregation/ordering/ranking on larger sets but the order might become less predictable:-P

  • Eirikur Eiriksson (11/13/2015)


    Sean Lange (11/13/2015)


    Eirikur Eiriksson (11/13/2015)


    Sean Lange (11/13/2015)


    I will disagree slightly with Eirikur here. The values in ROW_NUMBER will always be the same but the order of rows in the select statement are not guaranteed to be consistent because your main query does not have an order by. If you want to ensure the order of the final output there is one and only one way to ensure that, add an order by clause.

    The order of the ROW_NUMBER relative to the GUID is set in the OVER clause, the order of appearance isn't guaranteed though.

    ๐Ÿ˜Ž

    We are saying the same thing. ๐Ÿ˜› I just wanted to clarify in case the OP was also expecting the order of appearance to be the same which it wouldn't without the order by.

    Good point and never too often stated!

    ๐Ÿ˜Ž

    On a side note, adding a CHECKSUM on the GUID within the ORDER could help the performance (smaller worktable rows), especially when doing more complex aggregation/ordering/ranking on larger sets but the order might become less predictable:-P

    One note of a different side, skipping the GUID altogether and going with an integer primary key would be even better. Then again, I hate GUIDs. ๐Ÿ˜›

  • Ed Wagner (11/13/2015)


    Eirikur Eiriksson (11/13/2015)


    Sean Lange (11/13/2015)


    Eirikur Eiriksson (11/13/2015)


    Sean Lange (11/13/2015)


    I will disagree slightly with Eirikur here. The values in ROW_NUMBER will always be the same but the order of rows in the select statement are not guaranteed to be consistent because your main query does not have an order by. If you want to ensure the order of the final output there is one and only one way to ensure that, add an order by clause.

    The order of the ROW_NUMBER relative to the GUID is set in the OVER clause, the order of appearance isn't guaranteed though.

    ๐Ÿ˜Ž

    We are saying the same thing. ๐Ÿ˜› I just wanted to clarify in case the OP was also expecting the order of appearance to be the same which it wouldn't without the order by.

    Good point and never too often stated!

    ๐Ÿ˜Ž

    On a side note, adding a CHECKSUM on the GUID within the ORDER could help the performance (smaller worktable rows), especially when doing more complex aggregation/ordering/ranking on larger sets but the order might become less predictable:-P

    One note of a different side, skipping the GUID altogether and going with an integer primary key would be even better. Then again, I hate GUIDs. ๐Ÿ˜›

    I agree. GUIDs have their place and sometimes are totally unavoidable. They can be incredibly painful to your system if not handled correctly. But what I find is they are unbelievable developer hostile. Querying with them for analysis/testing/debugging purposes is nothing short of horribly painful.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 โ€“ Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Just a quick test to play around with, notice the effect of adding an index

    ๐Ÿ˜Ž

    USE TEEST;

    GO

    SET NOCOUNT ON;

    DECLARE @SAMPLE_SIZE BIGINT = 10000000;

    DECLARE @INT_BUCKET INT = 0;

    DECLARE @BIGINT_BUCKET INT = 0;

    DECLARE @timer TABLE (T_TEXT VARCHAR(50) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME()));

    --/* UNCOMMENT THIS LINE TO SKIP GENERATING THE TEST SET.

    IF OBJECT_ID(N'dbo.TEST_GUID_ORDER') IS NOT NULL DROP TABLE dbo.TEST_GUID_ORDER

    CREATE TABLE dbo.TEST_GUID_ORDER

    (

    TGO_ID INT NOT NULL CONSTRAINT PK_DBO_TEST_GUID_ORDER_TGO_ID PRIMARY KEY CLUSTERED

    ,TGO_GUID UNIQUEIDENTIFIER NOT NULL CONSTRAINT UNQ_DBO_TEST_GUID_ORDER_TGO_GUID DEFAULT (NEWID())

    );

    ;WITH T(N) AS (SELECT 0 FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N))

    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)

    INSERT INTO dbo.TEST_GUID_ORDER(TGO_ID)

    SELECT

    NM.N

    FROM NUMS NM;

    -- */

    INSERT INTO @timer(T_TEXT) VALUES('DRY RUN NO INDEX');

    SELECT

    @INT_BUCKET = T.TGO_ID

    FROM dbo.TEST_GUID_ORDER T

    ORDER BY T.TGO_GUID;

    INSERT INTO @timer(T_TEXT) VALUES('DRY RUN NO INDEX');

    INSERT INTO @timer(T_TEXT) VALUES('GUID NO INDEX');

    SELECT

    @BIGINT_BUCKET = ROW_NUMBER() OVER

    (

    ORDER BY T.TGO_GUID

    )

    FROM dbo.TEST_GUID_ORDER T;

    INSERT INTO @timer(T_TEXT) VALUES('GUID NO INDEX');

    INSERT INTO @timer(T_TEXT) VALUES('CHECKSUM NO INDEX');

    SELECT

    @BIGINT_BUCKET = ROW_NUMBER() OVER

    (

    ORDER BY CHECKSUM(T.TGO_GUID)

    )

    FROM dbo.TEST_GUID_ORDER T;

    INSERT INTO @timer(T_TEXT) VALUES('CHECKSUM NO INDEX');

    -----------------------------------

    -- ADD INDEX

    -----------------------------------

    CREATE UNIQUE NONCLUSTERED INDEX UNQNCLIDX_DBO_TEST_GUID_ORDER_TGO_GUID ON dbo.TEST_GUID_ORDER(TGO_GUID ASC);

    -----------------------------------

    INSERT INTO @timer(T_TEXT) VALUES('DRY RUN INDEX');

    SELECT

    @INT_BUCKET = T.TGO_ID

    FROM dbo.TEST_GUID_ORDER T

    ORDER BY T.TGO_GUID;

    INSERT INTO @timer(T_TEXT) VALUES('DRY RUN INDEX');

    INSERT INTO @timer(T_TEXT) VALUES('GUID INDEX');

    SELECT

    @BIGINT_BUCKET = ROW_NUMBER() OVER

    (

    ORDER BY T.TGO_GUID

    )

    FROM dbo.TEST_GUID_ORDER T;

    INSERT INTO @timer(T_TEXT) VALUES('GUID INDEX');

    INSERT INTO @timer(T_TEXT) VALUES('CHECKSUM INDEX');

    SELECT

    @BIGINT_BUCKET = ROW_NUMBER() OVER

    (

    ORDER BY CHECKSUM(T.TGO_GUID)

    )

    FROM dbo.TEST_GUID_ORDER T;

    INSERT INTO @timer(T_TEXT) VALUES('CHECKSUM INDEX');

    SELECT

    T.T_TEXT

    ,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION

    FROM @timer T

    GROUP BY T.T_TEXT

    ORDER BY DURATION;

    Sample results

    T_TEXT DURATION

    ------------------ ----------

    DRY RUN INDEX 1290002

    GUID INDEX 2270003

    CHECKSUM INDEX 7334148

    CHECKSUM NO INDEX 7417160

    DRY RUN NO INDEX 9888332

    GUID NO INDEX 10582332

    But obviously there is more to this than meets the eye;-)

  • Thanks for all your inputs.

    I agree with the numeric clustered index, it would be my choice if it was me who created this table.

    But this table is in another database and I have to deal with it as it is at the moment.

    Thanks for the OFFSET FETCH tip and I will add an order by in the final output.

  • I agree. GUIDs have their place and sometimes are totally unavoidable. They can be incredibly painful to your system if not handled correctly. But what I find is they are unbelievable developer hostile. Querying with them for analysis/testing/debugging purposes is nothing short of horribly painful.

    One saving grace I found for the GUID was in an inherited, undocumented database I had to convert. In some cases, it was not immediately apparent which table to search in order to find "detail" info. Finding the GUID clearly flagged the proper table to use. A colleague had the same problem and unfortunately, instead of using GUIDs as primary and foreign keys, the data base he had inherited used integers - which were all starting from 1 and thus not readily distinguishable.

  • Yes, that could be an advantage of having GUID columns as primary and foreign keys on a database.

    Never thought of that.

    One other advantage of GUIDs could be when you need to join two equal tables, for example, two tables of users which were filled in different places and keep the same user identifier. With integers this would be impossible.

  • sql.queries (11/18/2015)


    One other advantage of GUIDs could be when you need to join two equal tables, for example, two tables of users which were filled in different places and keep the same user identifier. With integers this would be impossible.

    It's actually a disadvantage rather than an advantage.

    A same user can be recorded in different places, and different GUID would lead to a wrong perception that those are records for different users.

    _____________
    Code for TallyGenerator

  • If two offices of an organization used the same application each maintaining its own database, then at a later time the two databases could be merged without any risk of interference between the primary and foreign keys if they are GUID's. Merging the contents of the two databases would become more complicated because integer primary and foreign keys could have the same values.

  • j-1064772 (11/18/2015)


    If two offices of an organization used the same application each maintaining its own database, then at a later time the two databases could be merged without any risk of interference between the primary and foreign keys if they are GUID's. Merging the contents of the two databases would become more complicated because integer primary and foreign keys could have the same values.

    Technically - yes.

    But logically - use of GUID turns the outcome of such a merge into pretty much rubbish.

    Say, if both databases have a record of a location named "New York City".

    When you merge those 2 databases together - should you merger those 2 records from different databases into 1? Or you're gonna leave 2 New York cities in the merged database?

    Probably not.

    Then you need some other logic to identify identical entries in both databases and merging them, GUID identifiers are useless for that.

    After merging 2 location entries - what is gonna happen to the addresses which point to "New York City" locations by FK links after 1 of them has to be removed?

    FK columns need to be all updated with a new GUID for "New York City" location, and existing GUID won't help here at all. You need to get a new identity value for the location by referring it by name or any other natural key for the entity.

    It's pretty much the same way as you'd need to do if the tables would use integer identities instead of GUID:

    you still need to get new ID's for FK columns using natural key as unique identifiers for entities.

    GUID references can only help when you're absolutely sure you do not have repeating entries in both databases.

    But at the stage when databases are about to get merged it's rarely the case.

    _____________
    Code for TallyGenerator

Viewing 15 posts - 1 through 15 (of 38 total)

You must be logged in to reply to this topic. Login to reply