A Poor Data Model

Question

A Poor Data Model

Steve Jones - SSC Editor

SSC Guru

Points: 738170
More actions
March 3, 2025 at 12:00 am

#4545943

Comments posted to this topic are about the item A Poor Data Model

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

David.Poole SSC Guru Points: 76148 More actions · Answer 1

Agree totally.

I had to deal with UK motor data. The vehicle data has a make, model and variant code as the natural key. The only problem being that each one is a 3 character string holding numerics.

If I take a Ford Fiesta as an example, there have been more than 999 variants, so in the data model a Ford Fiesta has had to have more than one model code. The same issue applies for the Make. There are many MakeCode values for Ford.

I suspect that at the time when the model was put together not all households had a car, heavy industry dominated and people used rail and bus services to get to work. The modellers never foresaw the explosion in car ownership or the marketing activity that would seek to differentiate cars as a product.

UK Postal Codes pose another problem for keys. The key can change! A UK postcode used to identify a postman's walk, which averaged out at 15 households. As new houses are built an old postcode can do something akin to page splits in a DB. Or if wide scale demolition takes place they can defragment too. Generally, postcodes change slowly however it is not the rate of change that is the problem, it is where the change takes place. There are many 3rd party products that are keyed on postcode such as geodemographic data sets, media regions (TV, radio, local press), government data sets used for urban planning and urban transport. These develop blank spots in addition to becoming stale over time.

As with all tech debt, changing the modelling approach would be a Herculean task due to the number of dependencies that have been added over the years. It is probably more cost effective to maintain the domain knowledge which, to be fair, has resilience across many industries.

LinkedIn Profile

Steve Jones - SSC Editor SSC Guru Points: 738170 More actions · Answer 2

Good examples, and that's the type of thing I run into. I worry what we'll do when we do have issues with SSNs, maybe we'll have to have an "active indicator, or maybe all systems will actually need to add an SSN2 that is more digits, or somehow flags how two of the same xxx-xx-xxxx are actually assigned to different people.

I foresee a lot of index rebuilds, going from a unique, single column index to a multi-column one.

Eric M Russell SSC Guru Points: 125596 More actions · Answer 3

SSN can be used as a reliable primary key for a table called Customer_SSN. It should be linked to a CustomerID, just like phone number(s) and email address. But I think a compelling reason for most companies not to use SSN as an identifier is that a significant percentage of their customer base don't have a SSN. Even government offices, hospitals, and financial institutions service individuals who are not US citizens.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

P Jones SSChampion Points: 12370 More actions · Answer 4

UK National Insurance numbers (equivalent of US Social Security Number) are also said to have duplicates, especially from the early years of their issue.

Steve Jones - SSC Editor SSC Guru Points: 738170 More actions · Answer 5

Eric M Russell wrote:

SSN can be used as a reliable primary key for a table called Customer_SSN. It should be linked to a CustomerID, just like phone number(s) and email address. But I think a compelling reason for most companies not to use SSN as an identifier is that a significant percentage of their customer base don't have a SSN. Even government offices, hospitals, and financial institutions service individuals who are not US citizens.

Not sure this works as two people can have one (prob ok), but one can be re-used as a dupe

CamillaDougherty Grasshopper Points: 11 More actions · Answer 6

A well-structured data model is crucial for efficient performance and decision-making. If the model lacks clarity or is flawed, it can lead to issues like poor data integrity, slower processing, and difficulties in scaling. I'd suggest addressing these concerns for improvement. Struggling with essays? I checked out DoMyPaper’s sample section, and it’s actually super helpful. They have tons of well-written examples across different subjects, making research and structuring papers easier. If you’re stuck, these samples can be a great starting point. Definitely worth a look for inspiration! Check it out here: https://domypaper.com/samples You might find exactly what you need to kickstart your next paper!

Eric M Russell SSC Guru Points: 125596 More actions · Answer 7

Steve Jones - SSC Editor wrote:

Eric M Russell wrote:
SSN can be used as a reliable primary key for a table called Customer_SSN. It should be linked to a CustomerID, just like phone number(s) and email address. But I think a compelling reason for most companies not to use SSN as an identifier is that a significant percentage of their customer base don't have a SSN. Even government offices, hospitals, and financial institutions service individuals who are not US citizens.
Not sure this works as two people can have one (prob ok), but one can be re-used as a dupe

Folks can share an email account and phone number too. But two-factor authentication combined with SSN is bullet proof, unless multiple people are intentionally sharing the same ID, which can happen in the case of undocumented immigrants. as an example.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

xemsomenh Newbie Points: 4 More actions · Answer 8

Good examples, and that's the type of thing I run into. I worry what we'll do when we do have issues with SSNs, maybe we'll have to have an "active indicator, or maybe all systems will actually need to add an SSN2 that is more digits, or somehow flags how two of the same xxx-xx-xxxx are actually assigned to different people.

I foresee a lot of index rebuilds, going from a unique, single column index to a multi-column one.

xemsomenh Newbie Points: 4 More actions · Answer 9

These examples are very relevant to what I encounter. What concerns me is how we’ll handle issues related to SSNs when they arise. Maybe we’ll need an "active indicator" to differentiate cases, or perhaps all systems will have to introduce an SSN2 field with additional digits. Another possibility is implementing a mechanism to flag instances where the same xxx-xx-xxxx SSN is actually assigned to different individuals.

I can already foresee a significant number of index rebuilds, shifting from a unique single-column index to a multi-column one.

A Poor Data Model

Cookies on SQLServerCentral