Are There That Many GUIDs?

  • For that reason, a sequential integer will generally prove more efficient than a GUID.

    Why did you qualify this? Wouldn't this always be true? That doesn't mean that there's not a place for GUIDs under certain circumstances, but those circumstances would have nothing to do with efficiency, right?

  • RonKyle (3/24/2015)


    For that reason, a sequential integer will generally prove more efficient than a GUID.

    Why did you qualify this? Wouldn't this always be true? That doesn't mean that there's not a place for GUIDs under certain circumstances, but those circumstances would have nothing to do with efficiency, right?

    I say generally, because I'm still open to the possibility that a GUID key can be more efficient than an integer key in some niche scenario. For example, databases modeled for ORM tools like to store GUID keys, because those keys map directly to the objects they're working with in the application. So, it's efficient for the application's data access layer, if not for the database storage engine.

    Also, perhaps some database storage engines are optimized to use GUID based key. I havn't worked with SQL Server's new Heckaton in-memory table types yet, so maybe it makes more sense there. Federated database systems tend to use GUID based keys, because of it's global uniqueness, and I predict that federation (ie: Federated ColumnStore) is in SQL Server's near future.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • tdennis-674196 (3/24/2015)


    In your reply, the implication (emphasized by me above) is that a GUID has a larger capacity than a bigint. The actual number of unique values of a GUID is 2^122; for a bigint data type it is 2^126 (using the full range from the most negative value to the most positive value).

    Actually, a bigint is a signed 8 byte or 64-bit value so the range is only -2^63 to 2^63 - 1 and, not 2^126, whereas a GUID is a 16 byte or 128 bit value.

    The nice thing about a GUID is that internally it can be stored as 4 32-bit words or 2 64-bit words, so actual comparisons of GUID values can be done extremely fast by a processor. A CPU can compare an array of 4 32-bit words of 2 64-bit words far faster than you could ever compare two separate int or bigint columns in your database table. If we ever get 128 bit processors, that will become a single CPU operation to compare values.

    The only time a GUID becomes slow is when it has to be converted from human readable format into the 16 byte internal format during parsing of your SQL query.

    TDENIS: do you have any documentation or a link that shows that SQL Server stores GUIDs internally as 4 32-bit words?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • If the database fits in memory and the load is adequate, where locks and page splitting is not an issue, then GUIDs is not necessarily a practical problem, just a theological one.

    If the GUIDs make life for the developer easier and less bug prone then good, as long as the performance is good enough with some leeway for the future why should the dba worry that much. An extra minute added to the backups and the cost of another 30GB of ram can be cheap, especially when compared to time needed for fixing bugs and the loss of face with the clients/business.

    Not everyone has massive databases, not everyone has excessive loads, not every one is Google. There is a lot of snobbery about only using techniques that will scale to millions of users doing gazillion of hits every second. I wouldn't rate the GUID argument in the top set of issues for design, development or administration.

    Design for the real world is more important that delusions of grandeur.

  • Yet Another DBA (3/26/2015)


    ...

    If the GUIDs make life for the developer easier and less bug prone then good, as long as the performance is good enough with some leeway for the future why should the dba worry that much. An extra minute added to the backups and the cost of another 30GB of ram can be cheap, especially when compared to time needed for fixing bugs and the loss of face with the clients/business.

    Not everyone has massive databases, not everyone has excessive loads, not every one is Google. There is a lot of snobbery about only using techniques that will scale to millions of users doing gazillion of hits every second. I wouldn't rate the GUID argument in the top set of issues for design, development or administration.

    Design for the real world is more important that delusions of grandeur.

    In my view, it is much simpler to look at and query a few thousand integers in a small system with a few users and a few thousand records than to try and work with a few thousand GUID's.

    To some, it may seem more grandiose to be wrestling with thousands of GUID's when simple integers can more than suffice.

  • Dave62 (3/26/2015)


    Yet Another DBA (3/26/2015)


    ...

    If the GUIDs make life for the developer easier and less bug prone then good, as long as the performance is good enough with some leeway for the future why should the dba worry that much. An extra minute added to the backups and the cost of another 30GB of ram can be cheap, especially when compared to time needed for fixing bugs and the loss of face with the clients/business.

    Not everyone has massive databases, not everyone has excessive loads, not every one is Google. There is a lot of snobbery about only using techniques that will scale to millions of users doing gazillion of hits every second. I wouldn't rate the GUID argument in the top set of issues for design, development or administration.

    Design for the real world is more important that delusions of grandeur.

    In my view, it is much simpler to look at and query a few thousand integers in a small system with a few users and a few thousand records than to try and work with a few thousand GUID's.

    To some, it may seem more grandiose to be wrestling with thousands of GUID's when simple integers can more than suffice.

    I tend to write the query based on the schema and the actual entry points so to me I wouldn't often see the guids. Trying to shortcut the query cos someone has said that they know the IDs has caused so much hassle, wrong accounts being credited, security accounts being enabled etc.

    The point I was trying to make is on the majority of systems it more down to personal choice and it can be false economy to argue about the dollars/pounds/euros being saved.

    If you prefer the integer vs guids its entirely up to you. I would prefer to argue/discuss with the developers about more pressing issues than a guid vs int.

  • If the database fits in memory and the load is adequate, where locks and page splitting is not an issue, then GUIDs is not necessarily a practical problem, just a theological one.

    This is an unfortunate attitude. I've had to work with databases where this has become an issue and it's too late to fix it. We simply have to live with the GUIDs, and it's a difficult life. This is not "theological." The DBAs job is to look down the road. When a DBA allows GUIDS, varchar(max)s and so one when these are unnecessary, he/she is creating problems that will have to be either solved or worked around in the future.

  • The debate about Integer versus GUID really hinges around whether we're talking about the clustering key. For obvious reasons, whether your table is clustered on a sequential versus random key matters. It also matters because the clustering key is used for the unique RID bookmark in all non-clustered indexes, even those indexes that don't include the key column. However, if we're just talking about non-clustered keys, then it's less of an issue.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Yet Another DBA (3/26/2015)


    If the database fits in memory and the load is adequate, where locks and page splitting is not an issue, then GUIDs is not necessarily a practical problem, just a theological one.

    If the GUIDs make life for the developer easier and less bug prone then good, as long as the performance is good enough with some leeway for the future why should the dba worry that much. An extra minute added to the backups and the cost of another 30GB of ram can be cheap, especially when compared to time needed for fixing bugs and the loss of face with the clients/business.

    Not everyone has massive databases, not everyone has excessive loads, not every one is Google. There is a lot of snobbery about only using techniques that will scale to millions of users doing gazillion of hits every second. I wouldn't rate the GUID argument in the top set of issues for design, development or administration.

    Design for the real world is more important that delusions of grandeur.

    As a DBA, I agree with you, to a point.

    GUIDs are not inherently evil. We know very well what issues they cause. They lead to more fragmentation. That's an issue if you have lots of scans, but not an issue if you have lots of point look ups. They make index keys wider, taking up more space on disk in both the key where they're used and the nonclustered indexes if the GUID is a cluster. And all these problems can be worked around to a degree if they actually cause impact.

    But...

    I fall down when I hear the "we're not building Google" argument. It's an excuse to simply ignore known issues rather than address whether or not those issues are applicable in a given situation. I've worked on systems (small systems) that successfully implemented GUIDs with no issues at all. I've also worked on systems (small systems) that the GUIDs caused massive impact. And the systems where they hurt were not Google-level systems.

    GUIDs are not evil. But they are also not benign. I don't think it's even remotely unreasonable for a DBA, when presented with the proposal for a design using GUIDs, to look at it with a degree of suspicion. Evaluate the impact, test the design, ensure that, in this case, it's OK. And if it's not, propose alternatives. In short, excercise the knowledge and judgement that, at least in theory, you're paying that person for.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • So basically be wary of GUIDs and evaluate their proposed use considering what we know at the time. Seams reasonable advice.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Grant Fritchey (3/26/2015)

    I fall down when I hear the "we're not building Google" argument. It's an excuse to simply ignore known issues rather than address whether or not those issues are applicable in a given situation.

    Really, what I was saying was an excuse to ignore the issues?

    Don't think so. I wrote "...not every one is Google.", so I'm guilty of writing a truism but not necessarily guilty of ignoring the issues.

    Seen too many projects that have had the attitude; we need this to scale to X so we need to use Y technology Z coding paradigm and servers $$ size etc. But in reality the size they want is magnitudes greater than the global potential client base with 100% uptake, the faulty premise has led to project overruns and excessive costs, little ROI and the CTO having the thought that outsourcing would be a preferred route in the future.

    And, yes, I have also worked on projects where it has gone the other way.

    To me its more about seeing the big picture. I would prefer to win the war, with the least blood, against the performance gremlins and not getting into having to win every skirmish especially when it might not even be relevant in that instance.

    btw. natural keys will make the index wider than an integer. I would prefer to see a database properly normalised and applications properly thought through, but then again the real world does play silly buggers with aspirations.

    Of course we could all just put "It depends" as *the* answer to every question and no one would the wiser

  • btw. natural keys will make the index wider than an integer. I would prefer to see a database properly normalised and applications properly thought through, but then again the real world does play silly buggers with aspirations.

    Natural keys may make the index wider. Some won't. But even if they do, you also need to avoid duplicates. A DBA has a lot of things to balance, but the first duty is to guarentee the integrity of the data.

  • RonKyle (3/27/2015)


    btw. natural keys will make the index wider than an integer. I would prefer to see a database properly normalised and applications properly thought through, but then again the real world does play silly buggers with aspirations.

    Natural keys may make the index wider. Some won't. But even if they do, you also need to avoid duplicates. A DBA has a lot of things to balance, but the first duty is to guarentee the integrity of the data.

    One thing to keep in mind that the column(s) which make up the RID are also included by and covered by the non-clustered index. Consider that natural keys (unlike surrogate keys) are typically included in the output result of queries, so I wouldn't be too concerned about using something like SKU and OrderDate as part of a clustered key, even the key itself is wide, because it's something that probably needs to be covered anyhow. However, GUID or Identity keys are rarely useful as output from an application query.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I wouldn't be too concerned about using something like SKU and OrderDate as part of a clustered key, even the key itself is wide, because it's something that probably needs to be covered anyhow

    Agree completely. I've recently changed some tables that were clustered on the GUID key (not my design) and changed the cluster to the entry date. The result has been an across the board improvements.

  • When clustering on a composite natural key, perhaps something like OrderDate + StoreID + SKU, then place the most sequential column in the left most position. Not only does this keep the rows physically clustered in the table without page splits but queries also logically tend to retreive rows clustered this way.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 15 posts - 151 through 165 (of 169 total)

You must be logged in to reply to this topic. Login to reply