SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Give Up on Natural Primary Keys


Give Up on Natural Primary Keys

Author
Message
Steve Jones
Steve Jones
SSC Guru
SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)SSC Guru (602K reputation)

Group: Administrators
Points: 602380 Visits: 21101
Comments posted to this topic are about the item Give Up on Natural Primary Keys

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
xsevensinzx
xsevensinzx
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21186 Visits: 5792
In the data warehouse side, it makes sense to use surrogate keys and replace all natural keys. I don't really deal with PII data, but even in cases where I get keys from other systems, everything is replaced with the data warehouse surrogate keys. Everything is cleansed and controlled by the data warehouse with signatures/ownership of systems so nothing is amiss. I feel this is becoming the standard at least in this space and it makes sense to me!
Dave Poole
Dave Poole
SSC Guru
SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)SSC Guru (60K reputation)

Group: General Forum Members
Points: 60706 Visits: 3986
xsevensinzx - Sunday, September 30, 2018 5:18 PM
In the data warehouse side, it makes sense to use surrogate keys and replace all natural keys.

Definitely agree. A data warehouse will outlive the operational systems that feed it and succeeding operational systems won't necessary use the same keys or even data types. For that reason surrogate keys in the data warehouse are a necessity.
In operational systems I have learned that there are very few immutable natural keys.

3rd party keys, even from ISO standard sets tend to change. For example ISO3166 country codes change rather more frequently than you might imagine. ISO5218 Gender seems relatively safe. Thanks to the LBGTQIA community for an example to us all of how thorough testing should be done.


LinkedIn Profile
www.simple-talk.com
Andy Robertson
Andy Robertson
Hall of Fame
Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)

Group: General Forum Members
Points: 3293 Visits: 605
I completely agree with you Steve. Another very strong case for surrogate keys.
dsor
dsor
SSC-Addicted
SSC-Addicted (486 reputation)SSC-Addicted (486 reputation)SSC-Addicted (486 reputation)SSC-Addicted (486 reputation)SSC-Addicted (486 reputation)SSC-Addicted (486 reputation)SSC-Addicted (486 reputation)SSC-Addicted (486 reputation)

Group: General Forum Members
Points: 486 Visits: 696
Maybe I'm missing something regarding the backup example. But I would expect the right to be forgotten to include being removed from backups.
Steve Hall
Steve Hall
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16711 Visits: 13123
dsor - Monday, October 1, 2018 3:08 AM
Maybe I'm missing something regarding the backup example. But I would expect the right to be forgotten to include being removed from backups.

Depending upon how the data is backed-up it can be technically difficult to do such a thing. I certainly don't fancy the idea of restoring my 7TB databases from each of the dozen or so back ups, to remove a person's data and then re-create the backup. Just to have another request land a few minutes later.
So, one option is to keep a separate file that can be used to identify a customer, for the purposes of removal only. If a backup is restored then a process must run immediately that removes the person's data, as identified by this separate file.
When a person makes a request to have their data removed it must be made clear to them that it will still exist in backups, that the backups are secure and if the backups are used then their data will be removed before any processing against that restored data can commence.


Steve Hall
Linkedin
Blog Site
R Kingston
R Kingston
Forum Newbie
Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)

Group: General Forum Members
Points: 7 Visits: 5
Following our own GDPR compliancy efforts earlier this year, we concluded that keeping a list of e.g. email addresses or user ids for the purpose of removing those users post-restore was not acceptable, and that we would be better off storing a hash of the users' PK and using that to remove the users later if required.
Victor Kirkpatrick
Victor Kirkpatrick
Hall of Fame
Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)Hall of Fame (3.2K reputation)

Group: General Forum Members
Points: 3171 Visits: 532
Surrogate keys are practically always the only way to go for ease of development as well as performance, especially in large databases. A single column int or bigint will always perform better than some multi-column natural key with one or two char/varchar columns. Think of the costly index(es) on that beast and the cost of having to potentially have that combination in lower tables as well instead of the one int PK that could be down there.

The column means nothing to users and you know you can always identify one row by one int key. Programming / development is made much easier by this as well.
andrew gothard
andrew gothard
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10564 Visits: 6639
BrainDonor - Monday, October 1, 2018 3:54 AM

Depending upon how the data is backed-up it can be technically difficult to do such a thing. I certainly don't fancy the idea of restoring my 7TB databases from each of the dozen or so back ups, to remove a person's data and then re-create the backup. Just to have another request land a few minutes later.
So, one option is to keep a separate file that can be used to identify a customer, for the purposes of removal only. If a backup is restored then a process must run immediately that removes the person's data, as identified by this separate file.
When a person makes a request to have their data removed it must be made clear to them that it will still exist in backups, that the backups are secure and if the backups are used then their data will be removed before any processing against that restored data can commence.


Pretty much how things work in call centres. Anything you do, and any dataset is always matched against the national (and internal if kept) Do Not Call list, held in a separate system, and the data cleansed. It soon becomes second nature.

I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
roger.plowman
roger.plowman
SSCrazy Eights
SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)SSCrazy Eights (8.4K reputation)

Group: General Forum Members
Points: 8416 Visits: 1980
GDPR-like laws are something I hadn't considered, but it makes sense in an absolutist sort of way.

At least here in the U.S. lawyers are more than happy to try and cheat sue anyone for any possible reason unlawful trespass on their clients' rights, so I can see natural keys being the target of a scum-sucking shark over zealous legal beagle fine upstanding pillar of the legal community slavering to strike reluctantly noting a deliberately misconstrued blatant violation of the law.

Regardless, I've always been a firm believer in surrogate keys over natural ones for any number of purely mechanical reasons. First, natural keys change--a lot, meaning fragmentation of the table and indexes. Second, using something like an identity lets you append new records, and third, ala GDPR a surrogate key doesn't leak PII--at least, not by itself.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search