SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Data Modelling advice sought for dealing with FIRST/MIDDLE/LAST names


Data Modelling advice sought for dealing with FIRST/MIDDLE/LAST names

Author
Message
Scott In Sydney
Scott In Sydney
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2066 Visits: 688
I have source data (health data) across three tables with columns as follows:

certifier_name
fathers_first_name
fathers_given_names
fathers_last_name
fathers_middle_name
fathers_surname
first_name
given_names
maiden_name
middle_name
mothers_first_name
mothers_last_name
mothers_maiden_name
mothers_middle_name
next_of_kin_first_name
next_of_kin_last_name
surname

The data lengths range from varchar(56) to varchar(400), with most as varchar(80) and varchar(150).

Data volumes range from 201M, 18M, and 2M rows.

If I select distinct on these columns the data volumes significantly decrease (say 200K - 1.8M rows depending on the column).


I'd like advice on modelling this as a normalized data store (NDS) and dimensional data store (DDS)...

NDS:
Option 1: Put ALL data in a single name column with SK. But the column would be the length of the longest column (varchar(400)).
Option 2: Create NDS tables for FIRST/MIDDLE/GIVEN/LAST names. Combine data across tables as appropriate. I'd prefer to break up given names into first/middle names, but that may be prone to error - perhaps I could just treat that as another first name? Keeping these columns separate, the column length could be better matched to the source data, i.e. "short" first names and "long" last names.

DDS:
DIM:
Option 1: Using either Option 1 or Option 2, create a DIM_PERSON table containing FIRST/MIDDLE/GIVEN/LAST name, using the NDS SK, plus other demographics such as GENDER, DOB, etc. However, if I use the NDS SKs, then the fact table will have a snowflake rather than star schema.
Option 2: Using either option 1 or Option 2, create a DIM_PERSON table containing FIRST/MIDDLE/GIVEN/LAST name, using the names rather than SKs, plus other demographics as above. There would be one row per unique combination of columns. But if I use this approach, I'm not sure of the benefit of the NDS? Perhaps I just create a staging table with the combination of names matching the schema of the DIM table, then upsert into that table?

DIM_PERSON will not be SCD2 - it is just the unique combination of all names + demographics across all the source tables.

FACT:
Use DIM_PERSON_SK to bind the SK to the fact table based on the existing data.

From the dim and fact tables, I'll eventually create very wide data mart table(s), either indexed view or physical table. The DM requirements are query results (data exports) rather than ad hoc analytics.

The design goals for this exercise is data storage optimization, query performance improvements, and maintenance/enhancement capabilities as new data sources come online or old ones change.

Your thoughts? The fact that Option 1 is atypical makes me wonder if it's a really bad idea ;-). But if I break it down, a name is a name is a name; in the NDS, should I really care whether it's first/middle/given/last/certifier/maiden name? Should I care whether "Alex" or "Sam" or "Smith" or "Jones" is the father's or mother's name within the NDS? I'll reconstitute the data using SKs in the DDS dim and fact tables.

P.S.: I note that the Kimball forums are closed down. I hope Ralph and family are enjoying retirement. Are there other forums you can recommend for posting data modelling questions? Or is this one as good as any?

Thanks...
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search