GUIDs GUIDs everywhere, but how is my data unique?

  • David Rueter

    SSCrazy

    Points: 2632

    I agree that GUIDs for clustered indexes are a bad idea, and GUIDs for primary keys are not generally ideal. But there is at least one additional consideration that hasn't been mentioned here: data security.

    If you have a situation where there is an untrusted public interface to access some data (say a list of rows in an HTML table), and that interface needs to support looking up a particular row to retrieve additional fields when clicked, or if the row is to be updated, etc., you need to have a way for the public interface to communicate a unique identifier back to the the backend database.

    While an identity integer may be optimal for performance, it is IMO not suitable for this scenario...because it is by definition sequential. That means that other valid identifiers are easily guessed by a malicious user.

    For example, if HTML included something like <a href="getRecord?id=6">Click for details</a> it would be trivial for a malicious user to call "getRecord?id=5" to retrieve data he is not supposed to see.

    If on the other hand a GUID is used, this sort of risk is largely mitigated, for <href="getrecord?id=FF7CCA0D-6921-47AF-B3F5-D004E4B89560">Click for details</a> does not provide many clues as how to guess a valid id to access information the malicious user is not supposed to see.

    For this reason, I generally use identity integers for primary keys with clustered indexes, but will add a unique indexed GUID column as an "alias" to refer to the row from an untrusted source if needed.

    Using a GUID as an "alias" has additional benefits as well: for example, you can re-generate these aliases at will without impacting your internal database referential integrity.

    (Note that the pattern I outline above is not ideal, is intentionally simplistic to demonstrate my point, and a GUID alone is not enough to secure data in the face of poor architecture--I am not recommending the pattern, just pointing out the vulnerability of a sequential ID in an untrusted environment.)

  • David.Poole

    SSC Guru

    Points: 75312

    Caveat. All answers have an implicit "It Depends"

    Q: Why add another unique key when a naturally occurring one already exists?

    A: If you have a package system that assigns keys and you wish to migrate from that system then it is useful to have keys that remain in your replacement package system. Product 'x' may have an auto-increment in your original system but a guid in your replacement system (or any other weird and wacky combination). Having a specific ever-lasting key for Product 'x' allows you to reconcile between systems during the interim state and also maintain consistency in your data warehouse system. Remember your data warehouse system will probably outlive your OLTP system by at least a couple of generations. There is nothing that says the immutable key has to be human readable.

    Q: What's wrong with an auto-increment.

    A: As a simple stand-a-lone system you will get away with it. Syncing multiple systems, each with its own auto-increments can be a real bitch. Been there, done that, learnt new Anglo Saxon. also, failed inserts also increment the auto-increment leaving gaps.

    Q: Ever tried to debug using GUIDS.

    A: I'm a big fan of the Red-Gate tool kit. I don't work for Red-Gate, I have no financial incentive to promote Red-Gate. SQL Data Compare works for me. I've also spent a lot of time thinking about automated testing and test frameworks to add to my DBA bag of tools. Not having those tools is vastly more expensive than having them. Tools are not a panacea but help greatly.

    Q: Never had a problem with GUIDs as primary keys due to their uniqueness?

    A: Really??? When assigned in the DB they can cause nightmares when the calling app has a buggy INSERT mechanism. Try inserting the same record twice and without a naturally occurring unique key then the record insert will generate another GUID to go with the duplicate record. Obviously this afflicts auto-increment records as well but at least they are easy to spot.

    When I wrote my article about a proposed Key Generator instead of a GUID one of the comments that was posted against the article was that it was somewhat similar to what Twitter do. Guids/uuids are easy for programmers to use but sometimes a key generator is more appropriate.

  • stephen.hopkins

    Grasshopper

    Points: 11

    Interesting article - got to admit I disagree with most of the arguments. GUIDs fulfil the role of a surrogate key, a concept that has been around for a long time. This give you a lot of flexibility and consistency. Some of the pieces where I disagree are:

    1. "If a GUID were to be used as the primary key, the potential to add multiple rows with Server = "PCSVSERVER060" and Service = "TraverseArchiveService" exists,"

    Well, no - just add another index to ensure uniqueness.

    2. Your ID examples - the only difference/advantage is performance - as integer IDs in your example are just surrogate keys of another sort. BUT you don't provide any detail on how significant or otherwise this performance difference is....if you are saying that the issue is cost, i.e. disk space - I think you should work an example as I think the cost difference is insignificant in most situations.

    3. SAP uses GUIDs as a key - I agree that adding a GUID to a set of data that already contains a GUID doesn't make sense - but would be surprised if people were doing that....

    Anyway - thank you for the article and your views - always good to see different peoples viewpoints.

  • fregatepallada

    SSC-Addicted

    Points: 407

    I agree that adding a GUID to a set of data that already contains a primary key does not make ANY sense (unless it used for replication) 😎

  • x

    SSC-Insane

    Points: 23548

    David Rueter (12/18/2014)


    For example, if HTML included something like <a href="getRecord?id=6">Click for details</a> it would be trivial for a malicious user to call "getRecord?id=5" to retrieve data he is not supposed to see.

    If the user were not supposed to see it, couldn't the web page calculate the permissions as such and just not return the record?

  • David Rueter

    SSCrazy

    Points: 2632

    patrickmcginnis59 10839 (12/19/2014)


    David Rueter (12/18/2014)


    For example, if HTML included something like <a href="getRecord?id=6">Click for details</a> it would be trivial for a malicious user to call "getRecord?id=5" to retrieve data he is not supposed to see.

    If the user were not supposed to see it, couldn't the web page calculate the permissions as such and just not return the record?

    In some use cases, yes--there may be an authenticated user session, and access rights could be explicitly enforced in code. In other use cases, there may not be an authenticated session at all (such as the way private link sharing works in many document management, file, and photo services: if you have the URL, you have access to the referenced resource.)

    But in either case, IMO it is better not to send sequential ID's down to an untrusted user interface: the overhead of an alternate ID is negligible, and it provides another simple layer of protection and abstraction.

  • x

    SSC-Insane

    Points: 23548

    David Rueter (12/19/2014)


    patrickmcginnis59 10839 (12/19/2014)


    David Rueter (12/18/2014)


    For example, if HTML included something like <a href="getRecord?id=6">Click for details</a> it would be trivial for a malicious user to call "getRecord?id=5" to retrieve data he is not supposed to see.

    If the user were not supposed to see it, couldn't the web page calculate the permissions as such and just not return the record?

    In some use cases, yes--there may be an authenticated user session, and access rights could be explicitly enforced in code. In other use cases, there may not be an authenticated session at all (such as the way private link sharing works in many document management, file, and photo services: if you have the URL, you have access to the referenced resource.)

    But in either case, IMO it is better not to send sequential ID's down to an untrusted user interface: the overhead of an alternate ID is negligible, and it provides another simple layer of protection and abstraction.

    URI's are a resource identifier, not a security measure.

    edit: here's a pretty good discussion about URL's and obscurity: http://stackoverflow.com/questions/4833314/are-secret-urls-truly-secure

  • fregatepllada

    SSCommitted

    Points: 1648

    From my personal experience with developers, some of them have no clue how to return auto-generated identifier after INSERT operation. 🙁

Viewing 8 posts - 31 through 38 (of 38 total)

You must be logged in to reply to this topic. Login to reply