|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Yesterday @ 6:22 AM
Points: 1,214,
Visits: 559
|
|
By confusion, I was referring to the original scenario. PK/FKs are solely for maintaining referential integrity.
Assignments does not need a PK since it will never be a FK in another table and thus no reason to uniquely identify each row.
It seems to me that you are also laboring under a confusion. Assignments HAS a PK, each row is uniquely identified by it's data. The fact that you don't need to define it as a database object doesn't mean that the PK doesn't exist.
-- JimFive
|
|
|
|
|
Old Hand
      
Group: General Forum Members
Last Login: Yesterday @ 1:24 PM
Points: 306,
Visits: 205
|
|
There is a difference between a key and an index. A key is a component of the conceptual data model and does not have a physical representation. It is a concept. A "primary" key is a referential integrity construct that is implemented at the logical data model. There may be other "keys" in the logical data model including unique keys, natural keys, composite keys, candidate keys etc. There aren't ANY indexes in a conceptual or logical data model.
In the physical data model, a primary key is a constraint. An index on that primary key is optional, clustered or otherwise. As someone said earlier, the constraint is to define referential integrity, the index is to provide quick access to the data. There is no rule that says a primary key must also be a clustered index. In fact, in my experience, it rarely is. A clustered index is useful if you find that you tend to want your data sorted in a particular way in the lions share of your queries, it alleviates the overhead associated with an ORDER BY and speeds up performance of range type queries.
I always create a surrogate key as a primary key for every table. I only create a unique index if that key is going to be used as a foreign key elsewhere. If there is a natural unique key (or at least close to unique) then I will, if it makes sense, create a clustered index.
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Wednesday, October 21, 2009 9:20 AM
Points: 245,
Visits: 70
|
|
James Goodwin (12/11/2007)
By confusion, I was referring to the original scenario. PK/FKs are solely for maintaining referential integrity. Assignments does not need a PK since it will never be a FK in another table and thus no reason to uniquely identify each row. It seems to me that you are also laboring under a confusion. Assignments HAS a PK, each row is uniquely identified by it's data. The fact that you don't need to define it as a database object doesn't mean that the PK doesn't exist. -- JimFive What you are describing is a candidate key, not a PK. The fact that you can find a column or set of columns that uniquely identifies a row doesn't make it a PK. PK is a role defined for a specific purpose. In my example there is no purpose for a PK on assignments.
|
|
|
|
|
Right there with Babe
      
Group: General Forum Members
Last Login: Wednesday, May 27, 2009 10:58 AM
Points: 767,
Visits: 174
|
|
I have seen a couple people mention that tables without children or certain logging tables may not need PKs.
I actually have a case of this. My table simply contains a log of site hits to a certain page. It contains the page id, an id to some data about why they hit the page, and the date. This table can't have a PK without creating another column specifically for it since it is absolutely possible (even if unlikely with the datetime precision) that I could have duplicate valid rows in it.
The thought of not guarantying uniqueness does cause me to cringe a little, but after sitting down and thinking about it, I can't come up with a good reason to add another column to this table just for that purpose.
|
|
|
|
|
SSChasing Mays
      
Group: General Forum Members
Last Login: Wednesday, November 18, 2009 1:51 PM
Points: 642,
Visits: 343
|
|
First off, I gotta say that you guys are really spoiled in regards to control over your data. I load into my databases from just about every kind of source. Some are a nightmare of data integrity.
The best example is that I load data from a system that allows alpha characters in the SSN column. 45a343ff3 is a valid SSN in this system. Some of these systems have never heard of Normal forms. It's actually humorous as to the degree in how poorly these databases are designed.
This has led me to have very very very few, if any PKs. I know that I probably simulate a PK in most of my tables by adding a identity column and then putting a clustered index on the table. I will admit that I have done very little research to prove that my way is better than a PK (or if it's even different)
I guess my point to this post is - What am I gaining in putting a PK on a table as opposed to putting a unique clustered index on the table?
I guess I have just came from an Anti-Access background and just haven't been using the PK cause it looked too much like an Access wizard. My databases all seem to perform well and I have little or no deadlocks. Am I missing a chance to improve performance?
Live to Throw Throw to Live Will Summers
|
|
|
|
|
SSC Eights!
      
Group: General Forum Members
Last Login: Thursday, October 22, 2009 4:54 PM
Points: 963,
Visits: 205
|
|
First of all this is a wonderful topic. We have identified logic and concerns from three sides of the data. The logical model, the physical model and the actual interface with the data through the application.
I have worked on all three sides of the data and have been involved in this discussion a number of times. At one time we determined to enforce Ref Integrity through the application. We coupled this with building any index we thought was needed. In this case we often defined a minimum uniqueness set as the clustered index and did not cluster on an identity column. We used this approach for about 4 years and have developed databases of some size. It has worked in more then one system for over a decade.
However, the was some confusion after the data modelers moved on and the implementors followed, leaving the second generation of maintainers in control of the database. It was easier for them to deal with the data using real primary keys and identity fields where needed. Simple reason, they did not have the code available to them and sone have not been able to read that code if it was available. So as we are rehosting the applications we are adding primary keys.
In the systems/databases we have developed since we stopped using the non-pk approach, we have not had to remodel the database and they are better understood by all.
All this said there is a lot of merit in Barry's discussion about associative entities, or tables that resolve many-to-many relationships. These are usually read/used from both sides of the many and there is use of the associative keys but not a generated identify field. Truly this is a candidate for not having a pk. However, if the associative entity has information beyond the keys and stores other data describing the association then it should have a pk. Attributes that would qualify this for having a pk would be date of association, a code value to define what type of association ion it is etc...
Lastly in the application use of the data the applications I have developed are driven by I hope the same business rules and needs that the database was modeled using. If this is the case the code will be reading data using the indices that were developed and the main use of the data should be centered on the minimum uniqueness set, and in most cases that is the primary key of the clustered index. I have never written a piece of code that says read this table by this index. However, I have more often then not defined the database with the indices that I know the optimizer will be using, and in more then one case have written a covered index to insure that the needed speed is there.
Again this is a great topic and has caused interest from many. Thanks for the discussion.
Miles, "Life is easier when you plow around the stump" words from an old farmer...
Not all gray hairs are Dinosaurs!
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Wednesday, November 18, 2009 1:34 PM
Points: 1,527,
Visits: 1,087
|
|
Barry McConnell (12/11/2007) The reason the PK needs to be unique is so that referential integrity can be maintained, ie the FK must point to exactly one row.
By confusion, I was referring to the original scenario. PK/FKs are solely for maintaining referential integrity. Unfortunately so many DBMS vendors and users have confused the roles of PK/FK, indices, and constraints that it is a jumbled mess. When you declare a PK, you also get automatically an index and a constraint, but that doesn't make them equivalent. Let's take a simple example and look at what we actually want and need.
I guess I am still unclear about the definition of PK here, not the fact of how PKs are implemented (or may be wrongly implemented) in real database systems.
My understanding is that a PK is one or more attributes of an entity that uniquely defines a tuple (I don't like that esoteric term but I am trying to stay theoretical) in that entity. If you have a one-entity model - theoretically possible even if it is almost never done in practice - then there is no other entity that will have FKs referencing the PK. Yet the need to uniquely define a tuple will still exist. Otherwise the entity is not a valid entity in the relational model, from what I have read - although in the real world a real table can be created without a PK.
Two entities: employees and projects, business rule = an employee can be assigned to multiple projects and vice versa Need: three tables - employees, projects, and assignments. employees and projects need PKs because they will be participating in referential integrity. Assignments will consist of two FKs and needs 3 indices (one on each FK and a combined one), and a unique constraint on the combined index. Assignments does not need a PK since it will never be a FK in another table and thus no reason to uniquely identify each row. The single column indices provide fast access for the typical query/join performed on this table, i.e. given one FK for a column find all the corresponding FKs of the other column. The uniqueness constraint solves the duplicate row problem which is not the same as uniquely identifying a row.
Suppose, however, that for whatever reason, you have just an employees table. Would you not need a PK to uniquely identify each employee?
I know this sounds like splitting hairs, but I just want to understand what the real meaning of a PK is. You seem to be saying that it has nothing to do with uniquely identifying a row, which is different from what I learned.
Thanks, webrunner
Microsoft Certified Technical Specialist (SQL Server 2005)
|
|
|
|
|
SSChasing Mays
      
Group: General Forum Members
Last Login: Yesterday @ 2:48 PM
Points: 646,
Visits: 1,267
|
|
MattieNH (12/11/2007) How about a table with only one row?
MattieThis is just about the only one that I can see where you might have a good argument against an index of any kind.
We have a table with state names and abbreviations. There are not likely to be more of them. We also have about a dozen small fact tables like that. Big argument ensued during the design meetings that we did not even need a row ID on these small tables. We have row ID's and primary constraints on all of them. Even the one row table that is database status and version. I also have the black eye going along with it. Vindication came when we handed the table structure to a new developer and he got it in two minutes. Every table has certain features that you can count on. All the tables are optimized for join performance.
There are other considerations beyond RDBMS theory. While putting an index IS overkill on a one row table and a fair percentage overhead, the other considerations (like consistancy) can pay off.
Does it make sense not to have a primary key? In some cases and rare circumstances. You also have to weigh the cost of not doing it. I would put a row ID enven on Steve's report tables. It helps on those dark rainy nights when you get the panic phone call that says there is one bogus row that came from deep space that prevents the report(s) from printing. Find the row and kill it by ID. It's happened enough through the years that I now prepare for things like that.
ATB
Charles Kincaid
|
|
|
|
|
SSChasing Mays
      
Group: General Forum Members
Last Login: Yesterday @ 2:48 PM
Points: 646,
Visits: 1,267
|
|
We had a table where we just logged data from a series of production lines. Totally unnormalized. We added a row ID and made that the primary key, clustered. Any guesses why? Hmm?
Performance! Since we always inserted at the end this made the inserts faster. Reports could take their sweet time but the inserts had to be instant.
ATB
Charles Kincaid
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Yesterday @ 6:22 AM
Points: 1,214,
Visits: 559
|
|
We have a table with state names and abbreviations. There are not likely to be more of them. We also have about a dozen small fact tables like that. Big argument ensued during the design meetings that we did not even need a row ID on these small tables. We have row ID's and primary constraints on all of them.
Are you seriously saying that you have a table that looks like: StateID, StateCode, StateName
Instead of: StateCode, StateName
-- JimFive
|
|
|
|