Primary Key datatype Bigint vs uniqueidentifier:

Question

Primary Key datatype Bigint vs uniqueidentifier:

Saravanan_tvr

SSCarpal Tunnel

Points: 4558
More actions
November 4, 2012 at 10:39 pm

#264242

Hi Friends!
Can any one suggest me for each tables primary key should be Bigint as PrimaryKey or [uniqueidentifier] as Primarykey, ?
Most of the MS prodcuts are keeping [uniqueidentifier] as primary key
Please let me know, what motives is in?
Example:
In MS Products
Customer table has CustomerId as [uniqueidentifier] primarykey
But I doing this way
Customer table has CustomerId as Bigint Identity(1,1) primarykey
Which is the best way to perform with tables to do DML ?

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 1

Saravanan_tvr (11/4/2012)
Most of the MS prodcuts are keeping [uniqueidentifier] as primary key
Please let me know, what motives is in?

just for security prospects .. so that nobody can identify or guess what would be the id for next application page to make the page traversing impossible. 😉

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

George M Parker SSCertifiable Points: 5412 More actions · Answer 2

Which is the best way to perform with tables to do DML ?

As with so many things in SQL Server, the answer is "it depends". But as a general rule, UNIQUEIDENTIFIERS are typically a poor choice for a primary key that is also defined as a clustered index for a couple of reasons.

1. The key is included as the lookup value for all non-clustered indexes and a GUID is significantly wider than an Int or a BigInt. That will require more disk space to accomodate the increased data type size.

2. UNIQUEIDENTIFIER's as the clustered index are prone to fragmentation because they are not sequential and are generated in a random order.

You can read more on Kimberly Tripp's blog here: http://www.sqlskills.com/blogs/kimberly/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx

If you're going to use a UNIQUEIDENTIFIER as a primary key / clustered index, you want to make sure you are using the NEWSEQUENTIALID() function.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

For best performance there is no "it depends" - go with bigint over uniqueidentifier. Actually, in this case you could almost certainly get away with an int, which is just 4 bytes instead of 8 and offers up 4.2 BILLION values.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

GSquared SSC Guru Points: 260824 More actions · Answer 4

Performance-wise, Int will be better than BigInt, and BigInt will be better than UniqueIdentifier. Less storage, less RAM, less I/O, etc., with the smaller datatypes.

The reasons to use GUIDs over IDs are based on business rules, technical needs, and security, not on performance.

As already mentioned, if you have an ID value (Int or BigInt) in a URL as a variable value (this is a very common situation), then someone can just change the value in the URL and get other data. Sometimes, data they shouldn't be allowed to have. Good luck doing that with a GUID in the URL. Can be done, but it's very, very labor-intensive and time consuming (even if automated). A good botnet could run through GUID permutations pretty rapidly, but it would show as a DDOS attack on your servers, and there are ways to deal with that.

If you're dealing with multi-computer replication, like Merge Replication, then GUIDs are needed to insure no colisions between values. Int/BigInt won't do that very well, if at all.

If you want to cut out a call to the DB server, you can generate GUIDs for new data in the application and pass them in as a parameter in the insert statement to a table, instead of inserting and then returning the value. Identity columns don't allow for that. On a busy system, this can actually make a difference in overall application performance.

Those are real reasons to use GUIDs. Not "performance" in the database.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 5

CELKO (11/7/2012)
Can any one suggest me for each tables primary key should be Bigint as Primary Key or [uniqueidentifier] as PRIMARY KEY ?
This is a silly question and you need to read a book -- ANY BOOK -- on RDBMS> A key is a subset of the attributes of an entity which uniquely identifies each entity in the set. It is not a particular data type!
When you look for a key, you start with the standards of your industry. I would estimate that this covers ~80% of the properly defined systems -- VIN, ISBN, UPC, EAN, etc. which can be verified and validated via a trusted external source. In the old days, this took time; you can Google it today.
After that, there is a natural key in the data, such as (longitude, latitude or HTM), store, cash register, ticket_nbr, etc which can be verified and validated in the reality of the data. This is ~18% of the cases. if you have no natural key, then your schema is probably wrong.
Then, if that fails and we are looking at <2% of all situations, we invent a key with check digits, validation rules and an audit trail within our enterprise. This is a great screaming pain IF you not do it right. This is why industry standards exists -- people got tired the pain Would you like to do Retail without standard UPC barcodes on products? We did that for centuries before barcodes.
So, Newbies prefer to do it wrong by using auto-increments or other proprietary stuff that cannot be verified or validated BECAUSE it is easier than real RDBMS design. They want to have a magical, universal "one-size-fits-all" answer that does not require you ACTUALLY understand the problem domain.
A key without validation and verification will only lead to dirty data. And after all the overhead of the “Magical Universal Elixir UID” they still have to assure that the relational key is unique.
On a scale from 1 to 10, what color is your favorite letter of the alphabet?

(emphasis added)

I have to admit, I find you writing this (the part I bolded) highly amusing. Irony scale goes off the top of the meter! Thank you for a good laugh.

But we've already had the discussion on why these are only applicable some of the time. Per you, over 66% of applicable data constitutes an "edge case", and pointing that data out somehow makes me a troll.

So, I won't go into the details about how wrong you are, again. I'll just let the OP know that I can prove you (Joe) are just plain wrong. Yes, prove it, with the certainty of the mathematical proofs in any good high school geometry textbook. With an "edge case" that constitutes the majority of the data. (In the words of Inigo Montoya, "You keep using that [phrase]. I do not think it means what you think it means.")

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Lord Of SQL SSC Veteran Points: 270 More actions · Answer 6

Natural Keys have a tendency to take inserts all over their spectrum. This fragments indexes very quickly, and unnecessarily.

Setting a PK as a surrogate key obviates the need to insert rows anywhere but at the end of the table. This also removes the tendency to have knowledge tied up in the value of the PK, which can be a disaster for performance.

BIGINT is better as a surrogate key than GUID for performance, but you may have replication needs that require a universal PK like a GUID. It is by no means "doing it wrong" to use those, given the parameters of your application needs.

Use INT where you can, and use auto-increment on your PK where you can. Save the business knowledge for the other fields, where it matters.

-----------------------------
I enjoy queries!

Lord Of SQL SSC Veteran Points: 270 More actions · Answer 7

CELKO (10/2/2014)
Natural Keys have a tendency to take inserts all over their spectrum. This fragments indexes very quickly, and unnecessarily.
No, not usually. Think about manufacturing processes. VIN, ISBN, et al all have a sequence inside them. Their prefix is constant and tells us about something important for processing. Look at your credit card numbers. An ISO/IEC 7812 card number is most commonly 16 digits in length:
a six-digit Issuer Identification Number (IIN) (previously called the "Bank Identification Number" (BIN)) the first digit of which is the Major Industry Identifier (MII),
a variable length (up to 12 digits) individual account identifier,
a single check digit calculated using the Luhn algorithm.
The ISAN is the only random identifier standard I can think of.

If I am storing the VIN numbers of cars I service, every one of them will be entered in an essentially random portion of the PK, and not at the end, where they belong. Same with ISBNs for books I sell. I don't control those, I simply have them entered. What if I am entering bank codes so I can direct deposit for my employees? Should I use that natural key?

The answer is, of course, no. In nearly all cases, a natural key is inferior to a surrogate key for maintaining order & performance in a DB, because in nearly ALL cases the people using them do NOT control their issuance. The claim that the OP should "read a book" or "Newbies prefer to do it wrong by using auto-increments" - when the real answers are RIGHT HERE - is insulting, and unneeded on a forum dedicated to SQL Server. It's a legitimate question, BIGINT vs. GUID, and rightfully belongs here. Feel free to respond, but leave the insults behind, especially when you are so clearly wrong.

-----------------------------
I enjoy queries!

MMartin1 One Orange Chip Points: 27768 More actions · Answer 8

Just a quick note here for anyone interested, surrogate keys are a data warehouse concept and not in the operational data system. Surrogate keys protect me (for instance) from the possible recycling of ID's after a dormant period of time. Operational systems only need to keep data for x amount of time. A warehouse needs historical data which can easily be in the magnitude of 5x, 10x, ..etc..longer.

So what keys type to use in your operational data system was covered well by GSquared. It depends on your particular situation. But the chances in general are that you wont need a GUID.

----------------------------------------------------

DonlSimpson SSCertifiable Points: 7043 More actions · Answer 9

MMartin1 (11/5/2014)
... surrogate keys are a data warehouse concept and not in the operational data system ...

Actually, this is not correct. E.F. Codd introduced the concept of surrogate keys in the 1970s.

Don Simpson

I'm not sure about Heisenberg.

MMartin1 One Orange Chip Points: 27768 More actions · Answer 10

DonlSimpson (11/6/2014)
MMartin1 (11/5/2014)
... surrogate keys are a data warehouse concept and not in the operational data system ...
Actually, this is not correct. E.F. Codd introduced the concept of surrogate keys in the 1970s.

Hi Don:

I guess I learned from a different book 🙂 When would a surrogate key be incorporated in the operational data? These are normally in a warehouse.

----------------------------------------------------

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 11

I use SID's in my database designs. Even if I don't use them as a primary key I use them as an alternate key to ensure I have another means to uniquely identify each row. In some applications that I have had to support I have had a primary key become unusable due to changes in requirements that made the key no longer unique.

MMartin1 One Orange Chip Points: 27768 More actions · Answer 12

I was thinking also there are times when a business would get accounts from a client with the client's key (account #)for that account. You would create your own key here as the pk/surrogate key here as well. Though I see S.Keys prescribed use as a alternate to another key that represents a dimensional entity (like a client or product).

----------------------------------------------------

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 13

CELKO (11/7/2012)
Can any one suggest me for each tables primary key should be Bigint as Primary Key or [uniqueidentifier] as PRIMARY KEY ?
This is a silly question and you need to read a book -- ANY BOOK -- on RDBMS> A key is a subset of the attributes of an entity which uniquely identifies each entity in the set. It is not a particular data type!
When you look for a key, you start with the standards of your industry. I would estimate that this covers ~80% of the properly defined systems -- VIN, ISBN, UPC, EAN, etc. which can be verified and validated via a trusted external source. In the old days, this took time; you can Google it today.
After that, there is a natural key in the data, such as (longitude, latitude or HTM), store, cash register, ticket_nbr, etc which can be verified and validated in the reality of the data. This is ~18% of the cases. if you have no natural key, then your schema is probably wrong.
Then, if that fails and we are looking at <2% of all situations, we invent a key with check digits, validation rules and an audit trail within our enterprise. This is a great screaming pain IF you not do it right. This is why industry standards exists -- people got tired the pain Would you like to do Retail without standard UPC barcodes on products? We did that for centuries before barcodes.
So, Newbies prefer to do it wrong by using auto-increments or other proprietary stuff that cannot be verified or validated BECAUSE it is easier than real RDBMS design. They want to have a magical, universal "one-size-fits-all" answer that does not require you ACTUALLY understand the problem domain.
A key without validation and verification will only lead to dirty data. And after all the overhead of the “Magical Universal Elixir UID” they still have to assure that the relational key is unique.
On a scale from 1 to 10, what color is your favorite letter of the alphabet?

Perfect. What would you use as the PK for either a Customer or Employee table? 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)