SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Give Up on Natural Primary Keys


Give Up on Natural Primary Keys

Author
Message
Steve Jones
Steve Jones
SSC Guru
SSC Guru (686K reputation)SSC Guru (686K reputation)SSC Guru (686K reputation)SSC Guru (686K reputation)SSC Guru (686K reputation)SSC Guru (686K reputation)SSC Guru (686K reputation)SSC Guru (686K reputation)

Group: Administrators
Points: 686496 Visits: 21594
Comments posted to this topic are about the item Give Up on Natural Primary Keys

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
xsevensinzx
xsevensinzx
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24985 Visits: 6232
In the data warehouse side, it makes sense to use surrogate keys and replace all natural keys. I don't really deal with PII data, but even in cases where I get keys from other systems, everything is replaced with the data warehouse surrogate keys. Everything is cleansed and controlled by the data warehouse with signatures/ownership of systems so nothing is amiss. I feel this is becoming the standard at least in this space and it makes sense to me!
Dave Poole
Dave Poole
SSC Guru
SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)SSC Guru (68K reputation)

Group: General Forum Members
Points: 68959 Visits: 4112
xsevensinzx - Sunday, September 30, 2018 5:18 PM
In the data warehouse side, it makes sense to use surrogate keys and replace all natural keys.

Definitely agree. A data warehouse will outlive the operational systems that feed it and succeeding operational systems won't necessary use the same keys or even data types. For that reason surrogate keys in the data warehouse are a necessity.
In operational systems I have learned that there are very few immutable natural keys.

3rd party keys, even from ISO standard sets tend to change. For example ISO3166 country codes change rather more frequently than you might imagine. ISO5218 Gender seems relatively safe. Thanks to the LBGTQIA community for an example to us all of how thorough testing should be done.


LinkedIn Profile
www.simple-talk.com
Andy Robertson
Andy Robertson
SSCarpal Tunnel
SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)SSCarpal Tunnel (4.3K reputation)

Group: General Forum Members
Points: 4312 Visits: 740
I completely agree with you Steve. Another very strong case for surrogate keys.
dsor
dsor
SSChasing Mays
SSChasing Mays (602 reputation)SSChasing Mays (602 reputation)SSChasing Mays (602 reputation)SSChasing Mays (602 reputation)SSChasing Mays (602 reputation)SSChasing Mays (602 reputation)SSChasing Mays (602 reputation)SSChasing Mays (602 reputation)

Group: General Forum Members
Points: 602 Visits: 817
Maybe I'm missing something regarding the backup example. But I would expect the right to be forgotten to include being removed from backups.
Steve Hall
Steve Hall
SSCoach
SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)SSCoach (18K reputation)

Group: General Forum Members
Points: 18816 Visits: 13521
dsor - Monday, October 1, 2018 3:08 AM
Maybe I'm missing something regarding the backup example. But I would expect the right to be forgotten to include being removed from backups.

Depending upon how the data is backed-up it can be technically difficult to do such a thing. I certainly don't fancy the idea of restoring my 7TB databases from each of the dozen or so back ups, to remove a person's data and then re-create the backup. Just to have another request land a few minutes later.
So, one option is to keep a separate file that can be used to identify a customer, for the purposes of removal only. If a backup is restored then a process must run immediately that removes the person's data, as identified by this separate file.
When a person makes a request to have their data removed it must be made clear to them that it will still exist in backups, that the backups are secure and if the backups are used then their data will be removed before any processing against that restored data can commence.


Steve Hall
Linkedin
Blog Site
R Kingston
R Kingston
Grasshopper
Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)Grasshopper (23 reputation)

Group: General Forum Members
Points: 23 Visits: 5
Following our own GDPR compliancy efforts earlier this year, we concluded that keeping a list of e.g. email addresses or user ids for the purpose of removing those users post-restore was not acceptable, and that we would be better off storing a hash of the users' PK and using that to remove the users later if required.
Victor Kirkpatrick
Victor Kirkpatrick
Hall of Fame
Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)Hall of Fame (3.4K reputation)

Group: General Forum Members
Points: 3438 Visits: 558
Surrogate keys are practically always the only way to go for ease of development as well as performance, especially in large databases. A single column int or bigint will always perform better than some multi-column natural key with one or two char/varchar columns. Think of the costly index(es) on that beast and the cost of having to potentially have that combination in lower tables as well instead of the one int PK that could be down there.

The column means nothing to users and you know you can always identify one row by one int key. Programming / development is made much easier by this as well.
andrew gothard
andrew gothard
SSChampion
SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)

Group: General Forum Members
Points: 11893 Visits: 6797
BrainDonor - Monday, October 1, 2018 3:54 AM

Depending upon how the data is backed-up it can be technically difficult to do such a thing. I certainly don't fancy the idea of restoring my 7TB databases from each of the dozen or so back ups, to remove a person's data and then re-create the backup. Just to have another request land a few minutes later.
So, one option is to keep a separate file that can be used to identify a customer, for the purposes of removal only. If a backup is restored then a process must run immediately that removes the person's data, as identified by this separate file.
When a person makes a request to have their data removed it must be made clear to them that it will still exist in backups, that the backups are secure and if the backups are used then their data will be removed before any processing against that restored data can commence.


Pretty much how things work in call centres. Anything you do, and any dataset is always matched against the national (and internal if kept) Do Not Call list, held in a separate system, and the data cleansed. It soon becomes second nature.

I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
roger.plowman
roger.plowman
SSCrazy Eights
SSCrazy Eights (9.9K reputation)SSCrazy Eights (9.9K reputation)SSCrazy Eights (9.9K reputation)SSCrazy Eights (9.9K reputation)SSCrazy Eights (9.9K reputation)SSCrazy Eights (9.9K reputation)SSCrazy Eights (9.9K reputation)SSCrazy Eights (9.9K reputation)

Group: General Forum Members
Points: 9907 Visits: 2039
GDPR-like laws are something I hadn't considered, but it makes sense in an absolutist sort of way.

At least here in the U.S. lawyers are more than happy to try and cheat sue anyone for any possible reason unlawful trespass on their clients' rights, so I can see natural keys being the target of a scum-sucking shark over zealous legal beagle fine upstanding pillar of the legal community slavering to strike reluctantly noting a deliberately misconstrued blatant violation of the law.

Regardless, I've always been a firm believer in surrogate keys over natural ones for any number of purely mechanical reasons. First, natural keys change--a lot, meaning fragmentation of the table and indexes. Second, using something like an identity lets you append new records, and third, ala GDPR a surrogate key doesn't leak PII--at least, not by itself.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum









































































































































































SQLServerCentral


Search