Row number over guid

  • Leaving the merky waters of international standards aside:

    Did you ever compare the performance of an INNER JOIN on integer (or binary) columns vs. on character columns (considering they are populated with the same values).

    Taking the same example with GUID: CHAR(36) more than doubles size of the index on it, it will double amount of memory required for the join.

    But main thing - it will require to run COLLATION rules on EVERY value which is being validated by a join or a WHERE clause.

    Well, if you need to justify an upgrade to a new, more powerful server, then yes, use PK's on character strings everywhere. It will certainly help your case.

    _____________
    Code for TallyGenerator

  • Sergiy (11/21/2015)


    Rendering is just what it is - rendering for your eyes.

    Those series of letters you see on the screen as GUID's are not stored in database, like series of letters representing currenciers.

    Unless, of course, if some genius decides to store GUID as char(36), not uniqueidentifier.

    Same thing as with dates: same date may be rendered to different character strings, and same string may be representing different dates, depending on application setting, localisation, etc.

    BTW, same may be said about numbers: 31OCT = 25DEC.

    So, your comparison here is not valid.

    Yes, I understand all that. My only point was that a char(3) primary key is no more alien to non-western eyes than a GUID one.

    Those cases may be "edge" for US residents.

    They are mainstream for everyone outside of US.

    You guys use to watch "World Series". The name implies that it has something to do with the world.

    But believe me, most of the people outside of US would have trouble to tell which sport play the participants of the World Series.

    So I think we kind of agree on this: some tables have an appropriate natural primary key, others don't, but not everyone will agree on which tables those are. So if you are going to merge two databases (which I think is where this discussion started), then even if they're both very well designed, there's still going to be a bit of remedial work to do.

    By the way, World Series is baseball, but that's about all I know. That's right, I'm not one of "you guys" - I'm one of the "people outside of US".

    Leaving the merky waters of international standards aside:

    Did you ever compare the performance of an INNER JOIN on integer (or binary) columns vs. on character columns (considering they are populated with the same values).

    Taking the same example with GUID: CHAR(36) more than doubles size of the index on it, it will double amount of memory required for the join.

    But main thing - it will require to run COLLATION rules on EVERY value which is being validated by a join or a WHERE clause.

    Well, if you need to justify an upgrade to a new, more powerful server, then yes, use PK's on character strings everywhere. It will certainly help your case.

    No, I didn't do such a performance comparison, but I'd be interested to see figures from anyone who did. I'm not quite sure what your point is here - are you suggesting that I advocated using a char(36) primary key? No, I suggested that two- or three- letter accepted abbreviations make perfectly good candidates. Do you know of any performance figures comparing that with a GUID?

    John

  • John Mitchell-245523 (11/22/2015)


    Yes, I understand all that. My only point was that a char(3) primary key is no more alien to non-western eyes than a GUID one.

    That is right, but the thing is - GUID was never meant to be "for a human eye".

    It is a machine-generated set of bytes serving internal machine purposes.

    And we should not use our (human oriented) references for serving those machine purposes.

    They are not good for that.

    So I think we kind of agree on this: some tables have an appropriate natural primary key, others don't, but not everyone will agree on which tables those are. So if you are going to merge two databases (which I think is where this discussion started), then even if they're both very well designed, there's still going to be a bit of remedial work to do.

    I think we are.

    Machine auto-generated identifiers which serve as unique ID's for entities in a database cannot be used to distinguish unique entities in 2 or 2 databases previously not logically connected to each other.

    We still need to work it out using business rules for the entities as natural keys and assign new internal unique ID's to the entities in merged databases.

    And use of GUID does not provide any advantage here comparing with ordinary integer ID.

    I'm not quite sure what your point is here - are you suggesting that I advocated using a char(36) primary key? No, I suggested that two- or three- letter accepted abbreviations make perfectly good candidates. Do you know of any performance figures comparing that with a GUID?

    John

    I'm sure a tinyint identity would be quite sufficient for all the states in the USA.

    2 or 3 times less storage used, and way faster processing of JOINs.

    _____________
    Code for TallyGenerator

  • Sergiy (11/22/2015)

    That is right, but the thing is - GUID was never meant to be "for a human eye".

    It is a machine-generated set of bytes serving internal machine purposes.

    And we should not use our (human oriented) references for serving those machine purposes.

    They are not good for that.

    Yes, I agree. But with a natural key, you have that as an attribute anyway, so why not use it as the key? If you don't already have it as an attribute then yes, it probably wouldn't be unreasonable to use a surrogate key instead.

    I'm sure a tinyint identity would be quite sufficient for all the states in the USA.

    2 or 3 times less storage used, and way faster processing of JOINs.

    Tinyint = 2 bytes; char(2) = 2 bytes, and I'd be surprised if you observed any noticeable difference in performance. Using a natural key also has the potential advantage of reducing the number of joins you need to do.

    John

  • John Mitchell-245523 (11/22/2015)


    Tinyint = 2 bytes; char(2) = 2 bytes, and I'd be surprised if you observed any noticeable difference in performance. Using a natural key also has the potential advantage of reducing the number of joins you need to do.

    John

    Tinyint is 1 byte.

    And it does not need to process collation rules for every value.

    So, difference in performance will be noticeable.

    I used to believe in saving on joins, when I was young and naïve, but practice proves that extra joins on well normalised database usually perform better than natural keys stored in tables as a FK.

    _____________
    Code for TallyGenerator

  • Oops, you're quite right! I'd still be interested to know what the performance difference is - I can't see it being significant. I shall post back if I ever get round to testing it.

    John

  • John Mitchell-245523 (11/23/2015)


    Oops, you're quite right! I'd still be interested to know what the performance difference is - I can't see it being significant. I shall post back if I ever get round to testing it.

    Not really a comprehensive test, but http://sqlinthewild.co.za/index.php/2011/02/15/are-int-joins-faster-than-string-joins-2/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gail. That's just as comprehensive as anything I was planning! I repeated your test but I restricted the lookup tables to 250 rows, since that's probably around the size you'd expect for a table with a natural key - countries or currencies, maybe. I was surprised to see the difference in favour of integer joins: CPU and elapsed times of 3042 and 4035 for int compared to 5632 and 6723 for string. When I repeated with indexes, I got something else I didn't expect - string beat int on CPU but lost on elapsed: 62 and 1079 for int; 9 and 1342 for string.

    John

  • Elapsed time will mostly be display of data if you used SSMS (it's terrible about displaying data). Try inserting into a temp table to remove that factor.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 9 posts - 31 through 38 (of 38 total)

You must be logged in to reply to this topic. Login to reply