March 3, 2025 at 12:00 am
Comments posted to this topic are about the item A Poor Data Model
March 3, 2025 at 9:47 am
Agree totally.
I had to deal with UK motor data. The vehicle data has a make, model and variant code as the natural key. The only problem being that each one is a 3 character string holding numerics.
If I take a Ford Fiesta as an example, there have been more than 999 variants, so in the data model a Ford Fiesta has had to have more than one model code. The same issue applies for the Make. There are many MakeCode values for Ford.
I suspect that at the time when the model was put together not all households had a car, heavy industry dominated and people used rail and bus services to get to work. The modellers never foresaw the explosion in car ownership or the marketing activity that would seek to differentiate cars as a product.
UK Postal Codes pose another problem for keys. The key can change! A UK postcode used to identify a postman's walk, which averaged out at 15 households. As new houses are built an old postcode can do something akin to page splits in a DB. Or if wide scale demolition takes place they can defragment too. Generally, postcodes change slowly however it is not the rate of change that is the problem, it is where the change takes place. There are many 3rd party products that are keyed on postcode such as geodemographic data sets, media regions (TV, radio, local press), government data sets used for urban planning and urban transport. These develop blank spots in addition to becoming stale over time.
As with all tech debt, changing the modelling approach would be a Herculean task due to the number of dependencies that have been added over the years. It is probably more cost effective to maintain the domain knowledge which, to be fair, has resilience across many industries.
March 3, 2025 at 5:02 pm
Good examples, and that's the type of thing I run into. I worry what we'll do when we do have issues with SSNs, maybe we'll have to have an "active indicator, or maybe all systems will actually need to add an SSN2 that is more digits, or somehow flags how two of the same xxx-xx-xxxx are actually assigned to different people.
I foresee a lot of index rebuilds, going from a unique, single column index to a multi-column one.
March 3, 2025 at 8:42 pm
SSN can be used as a reliable primary key for a table called Customer_SSN. It should be linked to a CustomerID, just like phone number(s) and email address. But I think a compelling reason for most companies not to use SSN as an identifier is that a significant percentage of their customer base don't have a SSN. Even government offices, hospitals, and financial institutions service individuals who are not US citizens.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
March 4, 2025 at 7:30 am
UK National Insurance numbers (equivalent of US Social Security Number) are also said to have duplicates, especially from the early years of their issue.
March 4, 2025 at 3:08 pm
SSN can be used as a reliable primary key for a table called Customer_SSN. It should be linked to a CustomerID, just like phone number(s) and email address. But I think a compelling reason for most companies not to use SSN as an identifier is that a significant percentage of their customer base don't have a SSN. Even government offices, hospitals, and financial institutions service individuals who are not US citizens.
Not sure this works as two people can have one (prob ok), but one can be re-used as a dupe
March 7, 2025 at 11:39 am
You are right, I agree with you.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply