Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Primary key or not to primary key, that is the question Expand / Collapse
Author
Message
Posted Monday, March 04, 2013 10:58 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 7:33 AM
Points: 104, Visits: 385
Yeah, bad taste on the subject but nonetheless, you're reading my post :). I'd like to pose a scenario and see what other developers are doing out there (or would recommend at least).

We have most of our tables defined with a PRIMARY KEY constraint over a single column that is defined as UNIQUEIDENTIFIER with a default of NEWID(). This is never clustered unless the table is small. We use this in our child tables as the foreign key reference like so (excuse the pseudo table definitions):

Table:Parent
ID UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() PRIMARY KEY,
Field VARCHAR(30) NOT NULL,
Code INT NOT NULL
Status VARCHAR(3) NOT NULL

Table:Child
SurrogateID INT IDENTITY(1, 1) NOT NULL, --has a unique clustered index defined
ID UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() PRIMARY KEY,
Field1 VARCHAR(30) NOT NULL,
ParentID UNIQUEIDENTIFIER NOT NULL (FK defined to Parent.ID)

Obviously the primary key constraint maintains an index behind the scenes. When we query between the two tables it will look soemthing like this:

SELECT P.ID, P.Field, P.Code, P.Status, C.ID, C.Field1
FROM dbo.Parent AS P
INNER JOIN dbo.Child AS C
ON P.ID = C.ParentID

Looks simple right? This query should (and does for me) do an index seek using the primary key constraint. But it also generates an expensive key lookup (clustered) over the clustered index. The only way to resolve that is to add included columns in the primary key to resolve it.

Yes... primary keys indeed *DO NOT* allow included columns to be defined. At this point I have one of two options:
- Drop the PRIMARY KEY and simply make a unique index with the necessary included columns
- Create a unique index over Parent.ID with the neccessary included columns. However this results in doubling the index storage required for the Parent.ID column.

So the question is this, do you:
- drop the primary key and create the unique index with included columns
- duplicate the primary key and create the unique index with included columns

I'm leaning towards dropping the PK and creating the unique index myself. Thoughts?
Post #1426374
Posted Monday, March 04, 2013 12:10 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 2:24 PM
Points: 11,990, Visits: 11,007
Just a rambling here...you seem to be somewhat concerned with storage space. You also have obviously realized the challenges of using a unique identifier as your PK. You have ended up with 2 columns of unique values. Why not just drop the unique IDs all together and go with your identity column? It is faster, easier to maintain, easier to debug queries, indexes don't get fragmented etc, etc etc...

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1426398
Posted Monday, March 04, 2013 12:43 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: 2 days ago @ 3:16 PM
Points: 5,986, Visits: 6,931
tafountain (3/4/2013)
We have most of our tables defined with a PRIMARY KEY constraint over a single column that is defined as UNIQUEIDENTIFIER with a default of NEWID().

Ow. Replicating? If so you'll have to maintain the PK on the tables. If not, why are you using a GUID?

What is the clustered on these tables that your primary join mechanism is being relegated to an NC index? Is this a rare connection as far as the tables are concerned?

This is working in the dark, really, as we don't know the rest of the expectations of your system. There're indicators here that there's a lot more in the overhead going on than just a simple schema exercise.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1426418
Posted Monday, March 18, 2013 12:31 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 7:33 AM
Points: 104, Visits: 385
Sorry for the delayed response. I've been working on other items lately. Anyways, here I am :).

@Sean - yes I am concerned with storage space but you hit it on the head, this is a typical surrogate key issue. All-in-all I guess the decision is to live with the duplicate scenario as long as the storage requirements aren't simply ridiculous (for the surrogate key situation anyway). I would love to go the identity route, however we maintain data across environments and need the GUID values to support this. This might change in the future but not today.

@Evil - see the last part of my response to Sean - and at one point we did have replication setup to a reporting server but have since simplified that with a simply copy-only backup.
Post #1432306
Posted Tuesday, March 19, 2013 7:57 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Wednesday, April 16, 2014 8:43 PM
Points: 4,128, Visits: 5,836
1) if you are worried with space, GUIDs have no place in your system.

2) why aren't you using NEWSEQUENTIALID??

3) Your PK on the parent does a clustered index on the NEWID, since you didn't specify one.

4) You are fragmenting the heck out of your nc indexes on those GUIDs.

5) Did I mention that GUIDs SUCK yet?!? Oh, and for multiple systems, you can almost ALWAYS come up with a simple multi-part numeric arrangement that is guaranteed to be unique across all systems. I have used a tiny/smallint ServerID and integer identity construct at numerous clients to eliminate GUIDS they "HAD to have", always with GREAT effect.



Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1432662
Posted Tuesday, March 19, 2013 8:12 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, April 03, 2014 10:10 AM
Points: 2,792, Visits: 4,874

2) why aren't you using NEWSEQUENTIALID??


I don't know why OP doesn't, but me...
I don't use it because it's useless! It does not guarantee sequential order after reboot as it's just a simple wrapper around UuidCreateSequential WinAPI which never had order guarantee...
BTW, I try not to use GUID's at all, however sometimes it's unavoidable. The company I'm consulting right now had bought into 3-rd party product which has CLUSTERED GUID's PKS in all tables.
So, I enjoy looking into index fragmentation stats every day here ...


_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1432672
Posted Tuesday, March 19, 2013 8:27 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Wednesday, April 16, 2014 8:43 PM
Points: 4,128, Visits: 5,836
Eugene Elutin (3/19/2013)

2) why aren't you using NEWSEQUENTIALID??


I don't know why OP doesn't, but me...
I don't use it because it's useless! It does not guarantee sequential order after reboot as it's just a simple wrapper around UuidCreateSequential WinAPI which never had order guarantee...
BTW, I try not to use GUID's at all, however sometimes it's unavoidable. The company I'm consulting right now had bought into 3-rd party product which has CLUSTERED GUID's PKS in all tables.
So, I enjoy looking into index fragmentation stats every day here ...


You are absolutely wrong in your statement. It is certainly NOT useless. It avoids fragmentation for as long as the server remains up. You currently (with NEWID) get values spread throughout the range continually. That isn't just about page splits, fragmentation, less full pages, etc either. It keeps hot (i.e. recent) data in the buffer pool much more frequently as well.

I note that if you DO have GUIDs you MUST manage them, or suffer a variety of negative consequences. Some of those consequences you can't do anything about however (such as size of data) other than buying bigger hardware.


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1432683
Posted Tuesday, March 19, 2013 8:53 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, April 03, 2014 10:10 AM
Points: 2,792, Visits: 4,874
TheSQLGuru (3/19/2013)
Eugene Elutin (3/19/2013)

2) why aren't you using NEWSEQUENTIALID??


I don't know why OP doesn't, but me...
I don't use it because it's useless! It does not guarantee sequential order after reboot as it's just a simple wrapper around UuidCreateSequential WinAPI which never had order guarantee...
BTW, I try not to use GUID's at all, however sometimes it's unavoidable. The company I'm consulting right now had bought into 3-rd party product which has CLUSTERED GUID's PKS in all tables.
So, I enjoy looking into index fragmentation stats every day here ...


You are absolutely wrong in your statement. It is certainly NOT useless. It avoids fragmentation for as long as the server remains up. You currently (with NEWID) get values spread throughout the range continually. That isn't just about page splits, fragmentation, less full pages, etc either. It keeps hot (i.e. recent) data in the buffer pool much more frequently as well.

I note that if you DO have GUIDs you MUST manage them, or suffer a variety of negative consequences. Some of those consequences you can't do anything about however (such as size of data) other than buying bigger hardware.


Ok, the "not guaranteed sequence" is not a single NEWSEQUENTIALID issue.
Actually it's doesn't guarantees even uniqueness in all cases (if your computer has no NIC installed, then GUID generetated by that function will only be unique within this computer). But, when you have NIC installed, it makes it unsecured, as it's based on the MAC address of NIC and can be predicted.
All about beauties of it is here: http://msdn.microsoft.com/en-gb/library/windows/desktop/aa379322(v=vs.85).aspx
Another issue you will face with this if you have .NET application which also will want to generate "sequential" GUID's, you will need to come up with a special logic to match SQL Server "byte scrambling" (http://www.jorriss.net/blog/archive/2008/04/24/unraveling-the-mysteries-of-newsequentialid).
So, I do hold my humble opinion that the best way to manage GUID's in SQL Server database is to try avoiding them completely

And the last bit about
It avoids fragmentation for as long as the server remains up.


That is really helpful! So, as long you never reboot your server - you are fine. But if you do it at least once in its life, than you staffed, as sequential GUID's could be lower in its values than generated before reboot, so you will get your fragmentation issue back.
So, make sure you never reboot your server in order to enjoy full set of NEWSEQUENTIALID powerful features
Sorry, it's not my cup of tea...


_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1432704
Posted Wednesday, March 20, 2013 8:07 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: 2 days ago @ 7:33 AM
Points: 104, Visits: 385
TheSQLGuru,

I did not intend to spark a debate :) but here are my responses.

TheSQLGuru (3/19/2013)
1) if you are worried with space, GUIDs have no place in your system.

This statement is a little unsubstantiated. GUIDs are fine, sure they're larger than INT but they fulfull requirements that an INT cannot. For example we need to maintain uniqueness across servers. A GUID does this for us. Yes, they have drawbacks but I would not totally exclude them as an option because of them.

TheSQLGuru (3/19/2013)
2) why aren't you using NEWSEQUENTIALID??

These do not provide uniqueness across tables, let alone across servers.

TheSQLGuru (3/19/2013)
3) Your PK on the parent does a clustered index on the NEWID, since you didn't specify one.

Is this a question or a statement? Not sure what you're trying to communicate.

TheSQLGuru (3/19/2013)
4) You are fragmenting the heck out of your nc indexes on those GUIDs.

Yes - this is very true. However most of the indexes are defragmented in under 30 seconds (most under 10 seconds) during our nightly maintenance.

TheSQLGuru (3/19/2013)

5) Did I mention that GUIDs SUCK yet?!? Oh, and for multiple systems, you can almost ALWAYS come up with a simple multi-part numeric arrangement that is guaranteed to be unique across all systems. I have used a tiny/smallint ServerID and integer identity construct at numerous clients to eliminate GUIDS they "HAD to have", always with GREAT effect.

Ok, now you have my interest :). I wouldn't mind understanding your implementation.
Post #1433238
Posted Wednesday, March 20, 2013 8:24 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Wednesday, April 16, 2014 8:43 PM
Points: 4,128, Visits: 5,836
tafountain (3/20/2013)
TheSQLGuru,

I did not intend to spark a debate :) but here are my responses.

TheSQLGuru (3/19/2013)
1) if you are worried with space, GUIDs have no place in your system.

This statement is a little unsubstantiated. GUIDs are fine, sure they're larger than INT but they fulfull requirements that an INT cannot. For example we need to maintain uniqueness across servers. A GUID does this for us. Yes, they have drawbacks but I would not totally exclude them as an option because of them.

TheSQLGuru (3/19/2013)
2) why aren't you using NEWSEQUENTIALID??

These do not provide uniqueness across tables, let alone across servers.

TheSQLGuru (3/19/2013)
3) Your PK on the parent does a clustered index on the NEWID, since you didn't specify one.

Is this a question or a statement? Not sure what you're trying to communicate.

TheSQLGuru (3/19/2013)
4) You are fragmenting the heck out of your nc indexes on those GUIDs.

Yes - this is very true. However most of the indexes are defragmented in under 30 seconds (most under 10 seconds) during our nightly maintenance.

TheSQLGuru (3/19/2013)

5) Did I mention that GUIDs SUCK yet?!? Oh, and for multiple systems, you can almost ALWAYS come up with a simple multi-part numeric arrangement that is guaranteed to be unique across all systems. I have used a tiny/smallint ServerID and integer identity construct at numerous clients to eliminate GUIDS they "HAD to have", always with GREAT effect.

Ok, now you have my interest :). I wouldn't mind understanding your implementation.



1) As I already said, you can (and I have several times) maintained uniqueness across multiple servers with an int identity and a separate ServerID field (tiny or small int) that is used as the compound PK for the table. Given that ServerA ALWAYS uses 1 for ServerID, ServerB ALWAYS uses 2 for ServerID, etc, you are GUARANTEED to NEVER get duplicate PK values with that 5 or 6 byte PK (instead of the SIXTEEN bytes of a GUID).

2) I believe you are wrong in stating that NEWSEQUENTIALID isn't UNIQUE. The only time it would not be is if there is no NIC on the server. Got many of those lying around?? I haven't seen one in the 20+ years I have been with databases, 15 of them as a SQL Server consultant. Here is the reference, which comes directly from SQL Server Books Online: http://msdn.microsoft.com/en-us/library/aa379322(VS.85).aspx

3) My "PK is clustered index" is a statement. If you simply define a PK on a table it is clustered by default unless you override that with explicit syntax to make it non-clustered.

4) It may not matter (much) on your system, but the extra bytes required for a GUID over other datatype(s) or compound keys REALLY make a difference on systems with any reasonable number of rows.


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1433249
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse