Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Primary Key datatype Bigint vs uniqueidentifier: Expand / Collapse
Author
Message
Posted Sunday, November 4, 2012 10:39 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Friday, November 7, 2014 3:44 AM
Points: 283, Visits: 1,240
Hi Friends!

Can any one suggest me for each tables primary key should be Bigint as PrimaryKey or [uniqueidentifier] as Primarykey, ?


Most of the MS prodcuts are keeping [uniqueidentifier] as primary key

Please let me know, what motives is in?

Example:
In MS Products
Customer table has CustomerId as [uniqueidentifier] primarykey

But I doing this way

Customer table has CustomerId as Bigint Identity(1,1) primarykey

Which is the best way to perform with tables to do DML ?





Many Thanks!
S.saravanan
“I am a slow walker, but I never walk backwards-
Abraham Lincoln”
Post #1380876
Posted Monday, November 5, 2012 3:16 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 5:03 AM
Points: 2,840, Visits: 3,975
Saravanan_tvr (11/4/2012)
Most of the MS prodcuts are keeping [uniqueidentifier] as primary key

Please let me know, what motives is in?
just for security prospects .. so that nobody can identify or guess what would be the id for next application page to make the page traversing impossible.


-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done
Post #1380950
Posted Monday, November 5, 2012 9:28 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Tuesday, November 19, 2013 8:15 AM
Points: 652, Visits: 1,428
Which is the best way to perform with tables to do DML ?


As with so many things in SQL Server, the answer is "it depends". But as a general rule, UNIQUEIDENTIFIERS are typically a poor choice for a primary key that is also defined as a clustered index for a couple of reasons.

1. The key is included as the lookup value for all non-clustered indexes and a GUID is significantly wider than an Int or a BigInt. That will require more disk space to accomodate the increased data type size.
2. UNIQUEIDENTIFIER's as the clustered index are prone to fragmentation because they are not sequential and are generated in a random order.

You can read more on Kimberly Tripp's blog here: http://www.sqlskills.com/blogs/kimberly/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx

If you're going to use a UNIQUEIDENTIFIER as a primary key / clustered index, you want to make sure you are using the NEWSEQUENTIALID() function.
Post #1381158
Posted Tuesday, November 6, 2012 9:17 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 10:20 AM
Points: 4,437, Visits: 6,339
For best performance there is no "it depends" - go with bigint over uniqueidentifier. Actually, in this case you could almost certainly get away with an int, which is just 4 bytes instead of 8 and offers up 4.2 BILLION values.

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1381624
Posted Tuesday, November 6, 2012 9:53 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Monday, November 17, 2014 12:50 PM
Points: 13,872, Visits: 9,598
Performance-wise, Int will be better than BigInt, and BigInt will be better than UniqueIdentifier. Less storage, less RAM, less I/O, etc., with the smaller datatypes.

The reasons to use GUIDs over IDs are based on business rules, technical needs, and security, not on performance.

As already mentioned, if you have an ID value (Int or BigInt) in a URL as a variable value (this is a very common situation), then someone can just change the value in the URL and get other data. Sometimes, data they shouldn't be allowed to have. Good luck doing that with a GUID in the URL. Can be done, but it's very, very labor-intensive and time consuming (even if automated). A good botnet could run through GUID permutations pretty rapidly, but it would show as a DDOS attack on your servers, and there are ways to deal with that.

If you're dealing with multi-computer replication, like Merge Replication, then GUIDs are needed to insure no colisions between values. Int/BigInt won't do that very well, if at all.

If you want to cut out a call to the DB server, you can generate GUIDs for new data in the application and pass them in as a parameter in the insert statement to a table, instead of inserting and then returning the value. Identity columns don't allow for that. On a busy system, this can actually make a difference in overall application performance.

Those are real reasons to use GUIDs. Not "performance" in the database.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1381653
Posted Wednesday, November 7, 2012 7:21 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, November 18, 2014 12:29 PM
Points: 1,945, Visits: 3,121
Can any one suggest me for each tables primary key should be Bigint as Primary Key or [uniqueidentifier] as PRIMARY KEY ?


This is a silly question and you need to read a book -- ANY BOOK -- on RDBMS> A key is a subset of the attributes of an entity which uniquely identifies each entity in the set. It is not a particular data type!

When you look for a key, you start with the standards of your industry. I would estimate that this covers ~80% of the properly defined systems -- VIN, ISBN, UPC, EAN, etc. which can be verified and validated via a trusted external source. In the old days, this took time; you can Google it today.

After that, there is a natural key in the data, such as (longitude, latitude or HTM), store, cash register, ticket_nbr, etc which can be verified and validated in the reality of the data. This is ~18% of the cases. if you have no natural key, then your schema is probably wrong.

Then, if that fails and we are looking at <2% of all situations, we invent a key with check digits, validation rules and an audit trail within our enterprise. This is a great screaming pain IF you not do it right. This is why industry standards exists -- people got tired the pain Would you like to do Retail without standard UPC barcodes on products? We did that for centuries before barcodes.

So, Newbies prefer to do it wrong by using auto-increments or other proprietary stuff that cannot be verified or validated BECAUSE it is easier than real RDBMS design. They want to have a magical, universal "one-size-fits-all" answer that does not require you ACTUALLY understand the problem domain.

A key without validation and verification will only lead to dirty data. And after all the overhead of the “Magical Universal Elixir UID” they still have to assure that the relational key is unique.

On a scale from 1 to 10, what color is your favorite letter of the alphabet?


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1381972
Posted Wednesday, November 7, 2012 7:54 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Monday, November 17, 2014 12:50 PM
Points: 13,872, Visits: 9,598
CELKO (11/7/2012)
Can any one suggest me for each tables primary key should be Bigint as Primary Key or [uniqueidentifier] as PRIMARY KEY ?


This is a silly question and you need to read a book -- ANY BOOK -- on RDBMS> A key is a subset of the attributes of an entity which uniquely identifies each entity in the set. It is not a particular data type!

When you look for a key, you start with the standards of your industry. I would estimate that this covers ~80% of the properly defined systems -- VIN, ISBN, UPC, EAN, etc. which can be verified and validated via a trusted external source. In the old days, this took time; you can Google it today.

After that, there is a natural key in the data, such as (longitude, latitude or HTM), store, cash register, ticket_nbr, etc which can be verified and validated in the reality of the data. This is ~18% of the cases. if you have no natural key, then your schema is probably wrong.

Then, if that fails and we are looking at <2% of all situations, we invent a key with check digits, validation rules and an audit trail within our enterprise. This is a great screaming pain IF you not do it right. This is why industry standards exists -- people got tired the pain Would you like to do Retail without standard UPC barcodes on products? We did that for centuries before barcodes.

So, Newbies prefer to do it wrong by using auto-increments or other proprietary stuff that cannot be verified or validated BECAUSE it is easier than real RDBMS design. They want to have a magical, universal "one-size-fits-all" answer that does not require you ACTUALLY understand the problem domain.

A key without validation and verification will only lead to dirty data. And after all the overhead of the “Magical Universal Elixir UID” they still have to assure that the relational key is unique.

On a scale from 1 to 10, what color is your favorite letter of the alphabet?


(emphasis added)

I have to admit, I find you writing this (the part I bolded) highly amusing. Irony scale goes off the top of the meter! Thank you for a good laugh.

But we've already had the discussion on why these are only applicable some of the time. Per you, over 66% of applicable data constitutes an "edge case", and pointing that data out somehow makes me a troll.

So, I won't go into the details about how wrong you are, again. I'll just let the OP know that I can prove you (Joe) are just plain wrong. Yes, prove it, with the certainty of the mathematical proofs in any good high school geometry textbook. With an "edge case" that constitutes the majority of the data. (In the words of Inigo Montoya, "You keep using that [phrase]. I do not think it means what you think it means.")


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1381993
Posted Thursday, October 2, 2014 1:04 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, November 6, 2014 12:04 PM
Points: 26, Visits: 74
Natural Keys have a tendency to take inserts all over their spectrum. This fragments indexes very quickly, and unnecessarily.

Setting a PK as a surrogate key obviates the need to insert rows anywhere but at the end of the table. This also removes the tendency to have knowledge tied up in the value of the PK, which can be a disaster for performance.

BIGINT is better as a surrogate key than GUID for performance, but you may have replication needs that require a universal PK like a GUID. It is by no means "doing it wrong" to use those, given the parameters of your application needs.

Use INT where you can, and use auto-increment on your PK where you can. Save the business knowledge for the other fields, where it matters.


-----------------------------
I enjoy queries!
Post #1622087
Posted Thursday, October 2, 2014 7:08 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, November 18, 2014 12:29 PM
Points: 1,945, Visits: 3,121
Natural Keys have a tendency to take inserts all over their spectrum. This fragments indexes very quickly, and unnecessarily.


No, not usually. Think about manufacturing processes. VIN, ISBN, et al all have a sequence inside them. Their prefix is constant and tells us about something important for processing. Look at your credit card numbers. An ISO/IEC 7812 card number is most commonly 16 digits in length:

a six-digit Issuer Identification Number (IIN) (previously called the "Bank Identification Number" (BIN)) the first digit of which is the Major Industry Identifier (MII),
a variable length (up to 12 digits) individual account identifier,
a single check digit calculated using the Luhn algorithm.

The ISAN is the only random identifier standard I can think of.




Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1622178
Posted Thursday, October 2, 2014 8:20 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, November 6, 2014 12:04 PM
Points: 26, Visits: 74
CELKO (10/2/2014)
Natural Keys have a tendency to take inserts all over their spectrum. This fragments indexes very quickly, and unnecessarily.


No, not usually. Think about manufacturing processes. VIN, ISBN, et al all have a sequence inside them. Their prefix is constant and tells us about something important for processing. Look at your credit card numbers. An ISO/IEC 7812 card number is most commonly 16 digits in length:

a six-digit Issuer Identification Number (IIN) (previously called the "Bank Identification Number" (BIN)) the first digit of which is the Major Industry Identifier (MII),
a variable length (up to 12 digits) individual account identifier,
a single check digit calculated using the Luhn algorithm.

The ISAN is the only random identifier standard I can think of.




If I am storing the VIN numbers of cars I service, every one of them will be entered in an essentially random portion of the PK, and not at the end, where they belong. Same with ISBNs for books I sell. I don't control those, I simply have them entered. What if I am entering bank codes so I can direct deposit for my employees? Should I use that natural key?

The answer is, of course, no. In nearly all cases, a natural key is inferior to a surrogate key for maintaining order & performance in a DB, because in nearly ALL cases the people using them do NOT control their issuance. The claim that the OP should "read a book" or "Newbies prefer to do it wrong by using auto-increments" - when the real answers are RIGHT HERE - is insulting, and unneeded on a forum dedicated to SQL Server. It's a legitimate question, BIGINT vs. GUID, and rightfully belongs here. Feel free to respond, but leave the insults behind, especially when you are so clearly wrong.


-----------------------------
I enjoy queries!
Post #1622182
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse