May 28, 2009 at 12:12 pm
All,
After the invaluable input I received on my 1 database vs. 13 database conundrum, I am once again at a cross roads. To refresh you on my situation. I am fairly new to SQL Server, and an IBM DB2 DBA by training. We hired a company to build and turn over a new system here at work. I am not 100% certain their proposed approach is the best approach. I have no problem with being wrong and hope I am, but I want to make decisions based on fact, not egos.
The question is about table construction. Our current data follows 3NF design, and all table keys are composed of attributes which describe the entity. For example the "Insurance" table is keyed by the "Policy Number" attribute.
The recommended new approach is to define identity columns on every table and let that be the primary key everywhere in the system.
We used identity columns as keys on a couple tables in DB2 a few years ago and it was a maintenance nightmare. We frequently move data between test regions and would find that the numbers were already assigned in that region. So we had to start assigning different ranges for every test region. We also pull production data down to test frequently to recreate production problems. Also to re-sequence the identity columns was a very involved process since the RI had to be maintained. Most if not all of our entities (tables) have very natural primary keys. Is it really the best to define identity columns everywhere? What does that buy us? Are all of you doing this as well?
May 28, 2009 at 12:18 pm
I personally don't like surrogate keys; when possible I prefer to use meaningful -natural keys.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.May 28, 2009 at 12:20 pm
Identity columns have their place, but not in what you're describing. Definitely not if you're having to merge data from multiple servers.
If they have to use a surrogate key, a GUID will work in that case, but an ID is a really bad idea.
If you have a good natural key, that's almost always the better idea. Something like policy numbers, which may not be inherently sequential (I don't know), will be a poor choice for the leading edge of the clustered index. If they are sequential, then they can be both PK and CI (Clustered Index). If not, make them the PK, but make something else the CI, or at least the leading edge of the CI. Maybe a DateAdded type column, or something like that would do.
But, based on the data you've given here, I would be strongly opposed to PKing off of ID columns in your case.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 28, 2009 at 12:31 pm
PaulB (5/28/2009)
I personally don't like surrogate keys; when possible I prefer to use meaningful -natural keys.
I like using surrogate keys as a primary key, but I will also identify a natural key as an alternate key when available. Part of the reason for that is I have worked with a database where changes requested by users to both data and/or schema has affected the natural key. It has either required a change to the natural key (you really should not make changes to a primary key), or requirements change the cause the natural key to no longer be unique. Having the surrogate key keeps me from having to working about the primary key changing as the users never see it.
May 28, 2009 at 12:37 pm
Lynn Pettis (5/28/2009)
PaulB (5/28/2009)
I personally don't like surrogate keys; when possible I prefer to use meaningful -natural keys.I like using surrogate keys as a primary key, but I will also identify a natural key as an alternate key when available. Part of the reason for that is I have worked with a database where changes requested by users to both data and/or schema has affected the natural key. It has either required a change to the natural key (you really should not make changes to a primary key), or requirements change the cause the natural key to no longer be unique. Having the surrogate key keeps me from having to working about the primary key changing as the users never see it.
Very true. But even given all that, IDs are often a bad idea on a database that has to consolidate data from multiple sources, compared to GUIDs.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 28, 2009 at 12:42 pm
GSquared (5/28/2009)
Lynn Pettis (5/28/2009)
PaulB (5/28/2009)
I personally don't like surrogate keys; when possible I prefer to use meaningful -natural keys.I like using surrogate keys as a primary key, but I will also identify a natural key as an alternate key when available. Part of the reason for that is I have worked with a database where changes requested by users to both data and/or schema has affected the natural key. It has either required a change to the natural key (you really should not make changes to a primary key), or requirements change the cause the natural key to no longer be unique. Having the surrogate key keeps me from having to working about the primary key changing as the users never see it.
Very true. But even given all that, IDs are often a bad idea on a database that has to consolidate data from multiple sources, compared to GUIDs.
I'll agree, but add the normal idiom "It Depends".
May 29, 2009 at 7:15 am
That's why I put "often" in there.
I've seen successful solutions that had IDs, and also a column indicating which instance of the database was the source. Made it a two-part key, but it worked just fine that way.
As always, "it depends". But I'd recommend against IDs in this guy's case, given what I know so far, because it doesn't sound like there's been sufficient thought given to handling these aspects of it. I could easily be wrong, but I'd at least investigate the heck out of it, were it me.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 29, 2009 at 9:25 am
Thanks for the replies so far. It will help when I raise this issue again with the vendor.
Lynn, could you help me understand better? Where do you put the R/I? On the surrogate or the natural key?
If R/I is on the surrogate key and we changed the natural key, we would have to manually change all child occurances of that natural key. (store proc?) I understand we would have to regardless, but our data integrity would be at the mercy of our programmers without the RDBMS verifying our R/I.
If we establish R/I on the natural key, would we have the same problems as changing the primary key?
Since the surrogate key value will be unknown to users, would we ever access the data by it, (except for using it to join tables in a referential relationship).
Do you cluster the table by this value? In our situation I think we would be better to cluster by the natural key since that is how we sequentially access our data. (ex: select all eligibility rows for this provider). Of course our new system will be more S.O.A. than transactional so perhaps I shouldn't worry as much about clustering.
I hope this doesn't come off as impolite. I really do value your input and this is the only way I am going to learn. For the most part this is all academic as I have never had a table change its primary key. We occasionally added another attribute to a composite key (such as effective date), but that was simple.
Bottom line, I am trying to determine the pros of identity columns as surrogate keys.
Pro: It will more than likely be shorter than the natural key, so a smaller index.
Con: You would still need to index the natural key as well since that is more than likely how you are going to look for your data. So the surrogate index is an additional index.
Pro: Quicker index look-up (if you know the identity columns value).
Con: No one will know the value, until they've read a table in an R/I structure and pulled it back. It will help to access related tables.
Pro: Natural key can be changed and primary key wont have to change
Con: Never changed a natural key yet, but it could happen.
Con: Moving data between regions (if identity column values are dupes of the region you are moving data to, you must develope a process to renumber all identity columns and their children (if R.I on surrogate) )
Please correct me where I am wrong, or add to what I may have missed.
June 4, 2009 at 9:25 am
troy.hodge (5/29/2009)
Thanks for the replies so far. It will help when I raise this issue again with the vendor.Lynn, could you help me understand better? Where do you put the R/I? On the surrogate or the natural key?
If R/I is on the surrogate key and we changed the natural key, we would have to manually change all child occurances of that natural key. (store proc?) I understand we would have to regardless, but our data integrity would be at the mercy of our programmers without the RDBMS verifying our R/I.
If we establish R/I on the natural key, would we have the same problems as changing the primary key?
Since the surrogate key value will be unknown to users, would we ever access the data by it, (except for using it to join tables in a referential relationship).
Do you cluster the table by this value? In our situation I think we would be better to cluster by the natural key since that is how we sequentially access our data. (ex: select all eligibility rows for this provider). Of course our new system will be more S.O.A. than transactional so perhaps I shouldn't worry as much about clustering.
I hope this doesn't come off as impolite. I really do value your input and this is the only way I am going to learn. For the most part this is all academic as I have never had a table change its primary key. We occasionally added another attribute to a composite key (such as effective date), but that was simple.
Bottom line, I am trying to determine the pros of identity columns as surrogate keys.
Pro: It will more than likely be shorter than the natural key, so a smaller index.
Con: You would still need to index the natural key as well since that is more than likely how you are going to look for your data. So the surrogate index is an additional index.
Pro: Quicker index look-up (if you know the identity columns value).
Con: No one will know the value, until they've read a table in an R/I structure and pulled it back. It will help to access related tables.
Pro: Natural key can be changed and primary key wont have to change
Con: Never changed a natural key yet, but it could happen.
Con: Moving data between regions (if identity column values are dupes of the region you are moving data to, you must develope a process to renumber all identity columns and their children (if R.I on surrogate) )
Please correct me where I am wrong, or add to what I may have missed.
Unfortunately, "It Depends". Yes, you have listed Pro's/Con's, but I really can't address them in your situation as I don't know enough about your situation and there really isn't any way for me to do so using this forum.
Some of it depends on the design of the database. Some of it depends on what the changes in the natural key affect. Does the change in the Natural key change which records are related? If yes, then there is a problem, but if not, no problem at all.
For instance, at a previous employer the primary key of a table was the document name (ie IEC 6001). The vendor changed the name (ie IEC 66001). We had to go through numerous tables (ISAM databases actually) and make changes. If, however, the links had all been on another field (ITM-S-KEY, an integer), we could have simply changed the name in one place and historical records could have retained the original name but been linked to the new name on the ITM-S-KEY.
The other aspects are really dependent on your application, so not much more I can do to assist.
Viewing 9 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply