Primary key or not to primary key, that is the question

  • Yeah, bad taste on the subject but nonetheless, you're reading my post :). I'd like to pose a scenario and see what other developers are doing out there (or would recommend at least).

    We have most of our tables defined with a PRIMARY KEY constraint over a single column that is defined as UNIQUEIDENTIFIER with a default of NEWID(). This is never clustered unless the table is small. We use this in our child tables as the foreign key reference like so (excuse the pseudo table definitions):

    Table:Parent

    ID UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() PRIMARY KEY,

    Field VARCHAR(30) NOT NULL,

    Code INT NOT NULL

    Status VARCHAR(3) NOT NULL

    Table:Child

    SurrogateID INT IDENTITY(1, 1) NOT NULL, --has a unique clustered index defined

    ID UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() PRIMARY KEY,

    Field1 VARCHAR(30) NOT NULL,

    ParentID UNIQUEIDENTIFIER NOT NULL (FK defined to Parent.ID)

    Obviously the primary key constraint maintains an index behind the scenes. When we query between the two tables it will look soemthing like this:

    SELECT P.ID, P.Field, P.Code, P.Status, C.ID, C.Field1

    FROM dbo.Parent AS P

    INNER JOIN dbo.Child AS C

    ON P.ID = C.ParentID

    Looks simple right? This query should (and does for me) do an index seek using the primary key constraint. But it also generates an expensive key lookup (clustered) over the clustered index. The only way to resolve that is to add included columns in the primary key to resolve it.

    Yes... primary keys indeed *DO NOT* allow included columns to be defined. At this point I have one of two options:

    - Drop the PRIMARY KEY and simply make a unique index with the necessary included columns

    - Create a unique index over Parent.ID with the neccessary included columns. However this results in doubling the index storage required for the Parent.ID column.

    So the question is this, do you:

    - drop the primary key and create the unique index with included columns

    - duplicate the primary key and create the unique index with included columns

    I'm leaning towards dropping the PK and creating the unique index myself. Thoughts?

  • Just a rambling here...you seem to be somewhat concerned with storage space. You also have obviously realized the challenges of using a unique identifier as your PK. You have ended up with 2 columns of unique values. Why not just drop the unique IDs all together and go with your identity column? It is faster, easier to maintain, easier to debug queries, indexes don't get fragmented etc, etc etc...

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • tafountain (3/4/2013)


    We have most of our tables defined with a PRIMARY KEY constraint over a single column that is defined as UNIQUEIDENTIFIER with a default of NEWID().

    Ow. Replicating? If so you'll have to maintain the PK on the tables. If not, why are you using a GUID?

    What is the clustered on these tables that your primary join mechanism is being relegated to an NC index? Is this a rare connection as far as the tables are concerned?

    This is working in the dark, really, as we don't know the rest of the expectations of your system. There're indicators here that there's a lot more in the overhead going on than just a simple schema exercise.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Sorry for the delayed response. I've been working on other items lately. Anyways, here I am :).

    @sean - yes I am concerned with storage space but you hit it on the head, this is a typical surrogate key issue. All-in-all I guess the decision is to live with the duplicate scenario as long as the storage requirements aren't simply ridiculous (for the surrogate key situation anyway). I would love to go the identity route, however we maintain data across environments and need the GUID values to support this. This might change in the future but not today.

    @Evil - see the last part of my response to Sean - and at one point we did have replication setup to a reporting server but have since simplified that with a simply copy-only backup.

  • 1) if you are worried with space, GUIDs have no place in your system.

    2) why aren't you using NEWSEQUENTIALID??

    3) Your PK on the parent does a clustered index on the NEWID, since you didn't specify one.

    4) You are fragmenting the heck out of your nc indexes on those GUIDs.

    5) Did I mention that GUIDs SUCK yet?!? Oh, and for multiple systems, you can almost ALWAYS come up with a simple multi-part numeric arrangement that is guaranteed to be unique across all systems. I have used a tiny/smallint ServerID and integer identity construct at numerous clients to eliminate GUIDS they "HAD to have", always with GREAT effect.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • 2) why aren't you using NEWSEQUENTIALID??

    I don't know why OP doesn't, but me...

    I don't use it because it's useless! It does not guarantee sequential order after reboot as it's just a simple wrapper around UuidCreateSequential WinAPI which never had order guarantee...

    BTW, I try not to use GUID's at all, however sometimes it's unavoidable. The company I'm consulting right now had bought into 3-rd party product which has CLUSTERED GUID's PKS in all tables.

    So, I enjoy looking into index fragmentation stats every day here :w00t:...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (3/19/2013)


    2) why aren't you using NEWSEQUENTIALID??

    I don't know why OP doesn't, but me...

    I don't use it because it's useless! It does not guarantee sequential order after reboot as it's just a simple wrapper around UuidCreateSequential WinAPI which never had order guarantee...

    BTW, I try not to use GUID's at all, however sometimes it's unavoidable. The company I'm consulting right now had bought into 3-rd party product which has CLUSTERED GUID's PKS in all tables.

    So, I enjoy looking into index fragmentation stats every day here :w00t:...

    You are absolutely wrong in your statement. It is certainly NOT useless. It avoids fragmentation for as long as the server remains up. You currently (with NEWID) get values spread throughout the range continually. That isn't just about page splits, fragmentation, less full pages, etc either. It keeps hot (i.e. recent) data in the buffer pool much more frequently as well.

    I note that if you DO have GUIDs you MUST manage them, or suffer a variety of negative consequences. Some of those consequences you can't do anything about however (such as size of data) other than buying bigger hardware. 🙁

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (3/19/2013)


    Eugene Elutin (3/19/2013)


    2) why aren't you using NEWSEQUENTIALID??

    I don't know why OP doesn't, but me...

    I don't use it because it's useless! It does not guarantee sequential order after reboot as it's just a simple wrapper around UuidCreateSequential WinAPI which never had order guarantee...

    BTW, I try not to use GUID's at all, however sometimes it's unavoidable. The company I'm consulting right now had bought into 3-rd party product which has CLUSTERED GUID's PKS in all tables.

    So, I enjoy looking into index fragmentation stats every day here :w00t:...

    You are absolutely wrong in your statement. It is certainly NOT useless. It avoids fragmentation for as long as the server remains up. You currently (with NEWID) get values spread throughout the range continually. That isn't just about page splits, fragmentation, less full pages, etc either. It keeps hot (i.e. recent) data in the buffer pool much more frequently as well.

    I note that if you DO have GUIDs you MUST manage them, or suffer a variety of negative consequences. Some of those consequences you can't do anything about however (such as size of data) other than buying bigger hardware. 🙁

    Ok, the "not guaranteed sequence" is not a single NEWSEQUENTIALID issue.

    Actually it's doesn't guarantees even uniqueness in all cases (if your computer has no NIC installed, then GUID generetated by that function will only be unique within this computer). But, when you have NIC installed, it makes it unsecured, as it's based on the MAC address of NIC and can be predicted.

    All about beauties of it is here: http://msdn.microsoft.com/en-gb/library/windows/desktop/aa379322(v=vs.85).aspx

    Another issue you will face with this if you have .NET application which also will want to generate "sequential" GUID's, you will need to come up with a special logic to match SQL Server "byte scrambling" (http://www.jorriss.net/blog/archive/2008/04/24/unraveling-the-mysteries-of-newsequentialid).

    So, I do hold my humble opinion that the best way to manage GUID's in SQL Server database is to try avoiding them completely :hehe:

    And the last bit about

    It avoids fragmentation for as long as the server remains up.

    That is really helpful! So, as long you never reboot your server - you are fine. But if you do it at least once in its life, than you staffed, as sequential GUID's could be lower in its values than generated before reboot, so you will get your fragmentation issue back.

    So, make sure you never reboot your server in order to enjoy full set of NEWSEQUENTIALID powerful features;-)

    Sorry, it's not my cup of tea...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • TheSQLGuru,

    I did not intend to spark a debate 🙂 but here are my responses.

    TheSQLGuru (3/19/2013)


    1) if you are worried with space, GUIDs have no place in your system.

    This statement is a little unsubstantiated. GUIDs are fine, sure they're larger than INT but they fulfull requirements that an INT cannot. For example we need to maintain uniqueness across servers. A GUID does this for us. Yes, they have drawbacks but I would not totally exclude them as an option because of them.

    TheSQLGuru (3/19/2013)


    2) why aren't you using NEWSEQUENTIALID??

    These do not provide uniqueness across tables, let alone across servers.

    TheSQLGuru (3/19/2013)


    3) Your PK on the parent does a clustered index on the NEWID, since you didn't specify one.

    Is this a question or a statement? Not sure what you're trying to communicate.

    TheSQLGuru (3/19/2013)


    4) You are fragmenting the heck out of your nc indexes on those GUIDs.

    Yes - this is very true. However most of the indexes are defragmented in under 30 seconds (most under 10 seconds) during our nightly maintenance.

    TheSQLGuru (3/19/2013)


    5) Did I mention that GUIDs SUCK yet?!? Oh, and for multiple systems, you can almost ALWAYS come up with a simple multi-part numeric arrangement that is guaranteed to be unique across all systems. I have used a tiny/smallint ServerID and integer identity construct at numerous clients to eliminate GUIDS they "HAD to have", always with GREAT effect.

    Ok, now you have my interest :). I wouldn't mind understanding your implementation.

  • tafountain (3/20/2013)


    TheSQLGuru,

    I did not intend to spark a debate 🙂 but here are my responses.

    TheSQLGuru (3/19/2013)


    1) if you are worried with space, GUIDs have no place in your system.

    This statement is a little unsubstantiated. GUIDs are fine, sure they're larger than INT but they fulfull requirements that an INT cannot. For example we need to maintain uniqueness across servers. A GUID does this for us. Yes, they have drawbacks but I would not totally exclude them as an option because of them.

    TheSQLGuru (3/19/2013)


    2) why aren't you using NEWSEQUENTIALID??

    These do not provide uniqueness across tables, let alone across servers.

    TheSQLGuru (3/19/2013)


    3) Your PK on the parent does a clustered index on the NEWID, since you didn't specify one.

    Is this a question or a statement? Not sure what you're trying to communicate.

    TheSQLGuru (3/19/2013)


    4) You are fragmenting the heck out of your nc indexes on those GUIDs.

    Yes - this is very true. However most of the indexes are defragmented in under 30 seconds (most under 10 seconds) during our nightly maintenance.

    TheSQLGuru (3/19/2013)


    5) Did I mention that GUIDs SUCK yet?!? Oh, and for multiple systems, you can almost ALWAYS come up with a simple multi-part numeric arrangement that is guaranteed to be unique across all systems. I have used a tiny/smallint ServerID and integer identity construct at numerous clients to eliminate GUIDS they "HAD to have", always with GREAT effect.

    Ok, now you have my interest :). I wouldn't mind understanding your implementation.

    1) As I already said, you can (and I have several times) maintained uniqueness across multiple servers with an int identity and a separate ServerID field (tiny or small int) that is used as the compound PK for the table. Given that ServerA ALWAYS uses 1 for ServerID, ServerB ALWAYS uses 2 for ServerID, etc, you are GUARANTEED to NEVER get duplicate PK values with that 5 or 6 byte PK (instead of the SIXTEEN bytes of a GUID).

    2) I believe you are wrong in stating that NEWSEQUENTIALID isn't UNIQUE. The only time it would not be is if there is no NIC on the server. Got many of those lying around?? I haven't seen one in the 20+ years I have been with databases, 15 of them as a SQL Server consultant. Here is the reference, which comes directly from SQL Server Books Online: http://msdn.microsoft.com/en-us/library/aa379322(VS.85).aspx

    3) My "PK is clustered index" is a statement. If you simply define a PK on a table it is clustered by default unless you override that with explicit syntax to make it non-clustered.

    4) It may not matter (much) on your system, but the extra bytes required for a GUID over other datatype(s) or compound keys REALLY make a difference on systems with any reasonable number of rows.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Eugene Elutin (3/19/2013)


    But, when you have NIC installed, it makes it unsecured, as it's based on the MAC address of NIC and can be predicted.

    Just to be clear for other that may be reading this, "normal" GUIDs no longer use the MAC address and are nothing more than random numbers. Sequential GUIDs do.

    The current "normal" GUIDs used by SQL Server are "Type 4" GUIDs, which are nothing more than a very large random number, and even Microsoft says that although the chance of duplicating one is still incredibly small, there's no longer a guarantee that they're globally unique. Best put a unique index on it if it's a key column.

    I agree with the "scramble" on the sort, though. There is a "block" sort order but they don't sort GUIDs from left to right. I've not played with sequential GUIDs but I imagine that because of the way they sort, they're gonna cause fragmentation no matter what.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheSQLGuru (3/20/2013)


    tafountain (3/20/2013)


    TheSQLGuru,

    I did not intend to spark a debate 🙂 but here are my responses.

    TheSQLGuru (3/19/2013)


    1) if you are worried with space, GUIDs have no place in your system.

    This statement is a little unsubstantiated. GUIDs are fine, sure they're larger than INT but they fulfull requirements that an INT cannot. For example we need to maintain uniqueness across servers. A GUID does this for us. Yes, they have drawbacks but I would not totally exclude them as an option because of them.

    TheSQLGuru (3/19/2013)


    2) why aren't you using NEWSEQUENTIALID??

    These do not provide uniqueness across tables, let alone across servers.

    TheSQLGuru (3/19/2013)


    3) Your PK on the parent does a clustered index on the NEWID, since you didn't specify one.

    Is this a question or a statement? Not sure what you're trying to communicate.

    TheSQLGuru (3/19/2013)


    4) You are fragmenting the heck out of your nc indexes on those GUIDs.

    Yes - this is very true. However most of the indexes are defragmented in under 30 seconds (most under 10 seconds) during our nightly maintenance.

    TheSQLGuru (3/19/2013)


    5) Did I mention that GUIDs SUCK yet?!? Oh, and for multiple systems, you can almost ALWAYS come up with a simple multi-part numeric arrangement that is guaranteed to be unique across all systems. I have used a tiny/smallint ServerID and integer identity construct at numerous clients to eliminate GUIDS they "HAD to have", always with GREAT effect.

    Ok, now you have my interest :). I wouldn't mind understanding your implementation.

    1) As I already said, you can (and I have several times) maintained uniqueness across multiple servers with an int identity and a separate ServerID field (tiny or small int) that is used as the compound PK for the table. Given that ServerA ALWAYS uses 1 for ServerID, ServerB ALWAYS uses 2 for ServerID, etc, you are GUARANTEED to NEVER get duplicate PK values with that 5 or 6 byte PK (instead of the SIXTEEN bytes of a GUID).

    2) I believe you are wrong in stating that NEWSEQUENTIALID isn't UNIQUE. The only time it would not be is if there is no NIC on the server. Got many of those lying around?? I haven't seen one in the 20+ years I have been with databases, 15 of them as a SQL Server consultant. Here is the reference, which comes directly from SQL Server Books Online: http://msdn.microsoft.com/en-us/library/aa379322(VS.85).aspx

    3) My "PK is clustered index" is a statement. If you simply define a PK on a table it is clustered by default unless you override that with explicit syntax to make it non-clustered.

    4) It may not matter (much) on your system, but the extra bytes required for a GUID over other datatype(s) or compound keys REALLY make a difference on systems with any reasonable number of rows.

    Regarding this:

    3) My "PK is clustered index" is a statement. If you simply define a PK on a table it is clustered by default unless you override that with explicit syntax to make it non-clustered.

    Or if you already have a clustered index defined on the table when you define a PK. The PK will be nonclustered if there is already a clustered index on a table.

  • As far as the NEWSEQUENTIALID goes, SSCrazy hit most of the points I didn't elaborate on. We have multiple requirements and some of those include the application code generates the identifier before saving the record to the database. All in all, I agree from an implementation perspective an INT column defined with IDENTITY sounds more plausible but it, just like GUIDs, have thier own pro's and con's.

  • tafountain (3/20/2013)


    As far as the NEWSEQUENTIALID goes, SSCrazy hit most of the points I didn't elaborate on. ...

    Who are you calling SSCrazy, SSC Journeyman? 😉

    GUID's are evil! B-r-r-r-r-r! :hehe:

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (3/20/2013)


    tafountain (3/20/2013)


    As far as the NEWSEQUENTIALID goes, SSCrazy hit most of the points I didn't elaborate on. ...

    Who are you calling SSCrazy, SSC Journeyman? 😉

    GUID's are evil! B-r-r-r-r-r! :hehe:

    Oops, I copied his title and not his name lol - Eugene.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply