Help! How to properly normalise a table from the start

  • Hi, I have a flat table with 17 columns of contact information with columns to seperate those contacts into different business classes. Here is the table:

    USE DataCompany

    CREATE TABLE ContactInformation

    (

    ID int IDENTITY(1,1) PRIMARY KEY

    'Economic Class

    'Sector varchar (55)

    'Activity (55)

    'Date

    ,InfluenceClassification varchar (25)

    ,FirstName varchar(35)

    ,LastName varchar(70)

    ,Title varchar (120)

    ,Company varchar(150)

    ,Phone varchar (15)

    ,Email varchar (255)

    ,LinkedinID varchar (255)

    ,Website varchar (255)

    ,Tier tinyint

    )

    My question is what parts should be seperated into different tables and why. And how do I reliably associate those foreign keys if the data is already there? I can guess that the company information telephone etc should be seperate, but should the classification columns be seperate as well?(Economic, sector, activity, tier)I've done lots of reading but the basic info available is for obviously noramisable tables , products and customers etc, so Any advice here would be greatly appreciated..:-)

    Thanks

  • tomsharp85 (6/17/2015)


    Hi, I have a flat table with 17 columns of contact information with columns to seperate those contacts into different business classes. Here is the table:

    USE DataCompany

    CREATE TABLE ContactInformation

    (

    ID int IDENTITY(1,1) PRIMARY KEY

    'Economic Class

    'Sector varchar (55)

    'Activity (55)

    'Date

    ,InfluenceClassification varchar (25)

    ,FirstName varchar(35)

    ,LastName varchar(70)

    ,Title varchar (120)

    ,Company varchar(150)

    ,Phone varchar (15)

    ,Email varchar (255)

    ,LinkedinID varchar (255)

    ,Website varchar (255)

    ,Tier tinyint

    )

    My question is what parts should be seperated into different tables and why. And how do I reliably associate those foreign keys if the data is already there? I can guess that the company information telephone etc should be seperate, but should the classification columns be seperate as well?(Economic, sector, activity, tier)I've done lots of reading but the basic info available is for obviously noramisable tables , products and customers etc, so Any advice here would be greatly appreciated..:-)

    Thanks

    Well we don't have a lot of detail here to offer a lot of help but there are a few columns that might be better suited in a separate table. We don't how these are being used or what they mean so it is hard to say for sure.

    Economic Class

    Sector

    Activity

    Date --This needs a name that indicates what it is (DateCreated) or whatever. Date is just entirely too ambiguous to be a decent name, not to mention it is a datatype which will be a pain to work with.

    InfluenceClassification

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks Sean

    I hope the follwing info gives more of an idea. Economic Class relates to which sector of the economy the company fits into - (Financials, Cyclical consumer goods, Technology)

    Sector refines that with a category like Banking or Food and beverages and the activity column is close to the precise thing that the company does. Tier shows how large the company is and has four types 1-4 (1 is largest and 4 smallest)

    Influence classification relates to the persons job title, how senior it is.

    Would it be best for normalisation to have economic class, sector and activity etc with the company name, phone, website and tier in one table as that all relates to one thing.

    And then have name, title, influence classification in another table as that relates to another thing, the person themselves.

    If the table exists as a flat table now, how would I match the tables up correctly and where would the datecreated be best placed?

  • Sean Lange (6/17/2015)


    tomsharp85 (6/17/2015)


    Hi, I have a flat table with 17 columns of contact information with columns to seperate those contacts into different business classes. Here is the table:

    USE DataCompany

    CREATE TABLE ContactInformation

    (

    ID int IDENTITY(1,1) PRIMARY KEY

    'Economic Class

    'Sector varchar (55)

    'Activity (55)

    'Date

    ,InfluenceClassification varchar (25)

    ,FirstName varchar(35)

    ,LastName varchar(70)

    ,Title varchar (120)

    ,Company varchar(150)

    ,Phone varchar (15)

    ,Email varchar (255)

    ,LinkedinID varchar (255)

    ,Website varchar (255)

    ,Tier tinyint

    )

    My question is what parts should be seperated into different tables and why. And how do I reliably associate those foreign keys if the data is already there? I can guess that the company information telephone etc should be seperate, but should the classification columns be seperate as well?(Economic, sector, activity, tier)I've done lots of reading but the basic info available is for obviously noramisable tables , products and customers etc, so Any advice here would be greatly appreciated..:-)

    Thanks

    Well we don't have a lot of detail here to offer a lot of help but there are a few columns that might be better suited in a separate table. We don't how these are being used or what they mean so it is hard to say for sure.

    Economic Class

    Sector

    Activity

    Date --This needs a name that indicates what it is (DateCreated) or whatever. Date is just entirely too ambiguous to be a decent name, not to mention it is a datatype which will be a pain to work with.

    InfluenceClassification

    I agree that we need more information. I'm guessing Activity and Date, assuming Date refers to the date an activity took place, should go into a separate table, and should also include the ID, if ID its to be used as a Contact Id.

    Really need more details to be sure.

    Edit: I wrote this response before the post above. Looks like my assumptions were wrong.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Would my assumptions be near to the mark?

  • I agree with Alvin - don't use ambiguous column names like ID and Date. Instead go with Contact_ID and something like Create_Date. Economic Class can be it's own table with a Foreign Key in your ContactInformation table.

    Economic Class can be it's own table with a Foriegn Key (e.g. EconomicClass_ID) in your ContactInformation table. Also think about object where there may be a one-to-many relationship such as Email or Phone_Nbr. I sometimes like to give those things their own table.

    If Sector is a subcategory of Economic class you could snowflake that out as Category > Sector > SectorID in you ContactInformation Table. If, in your OLTP environment, you need to get the Economic Class, Sector and info from the ContactInformation table you could do this in a view and index it.

    Some other data modeling notes:

    1. Make your attributes more descriptive; again, avoid ambiguous column names such as ID or DATE

    (you also want to avoid reserved Keywords in SQL)

    2. Include your Schema when creating objects.

    3. Make all columns NOT NULLABLE unless they must be NULLABLE. If they must be NULLABLE, include a comment in your DDL

    4. For youe Primary Key, I like to include that later in the DDL as a constraint so that I can name it.

    This helps when managing your metadata. For example, instead of:

    ContactInfo_ID IDENTITY(1,1) PRIMARY KEY,

    ...

    ...

    Try:

    ContactInfo_ID IDENTITY(1,1) NOT NULL,

    ...

    ...

    CONSTRAINT pk_ContactInformation_ContactInfo_ID PRIMARY KEY(ContactInfo_ID),

    5. Use CHECK constraints, the optimizer likes them and it's another way to let other's understand what the table does.

    For Tier you could add a CHECK constraint like

    ...

    CONSTRAINT ck_ContactInformation_Tier (Tier IN (1,2,3,4))

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • tomsharp85 (6/17/2015)


    Hi, I have a flat table with 17 columns of contact information with columns to seperate those contacts into different business classes. Here is the table:

    USE DataCompany

    CREATE TABLE ContactInformation

    (

    ID int IDENTITY(1,1) PRIMARY KEY

    'Economic Class

    'Sector varchar (55)

    'Activity (55)

    'Date

    ,InfluenceClassification varchar (25)

    ,FirstName varchar(35)

    ,LastName varchar(70)

    ,Title varchar (120)

    ,Company varchar(150)

    ,Phone varchar (15)

    ,Email varchar (255)

    ,LinkedinID varchar (255)

    ,Website varchar (255)

    ,Tier tinyint

    )

    My question is what parts should be seperated into different tables and why. And how do I reliably associate those foreign keys if the data is already there? I can guess that the company information telephone etc should be seperate, but should the classification columns be seperate as well?(Economic, sector, activity, tier)I've done lots of reading but the basic info available is for obviously noramisable tables , products and customers etc, so Any advice here would be greatly appreciated..:-)

    Thanks

    I would plan on any given contact having more than one phone number, more than one email address, and working for more than one company, as well as the items previously mentioned. That means a not only a ContactInformation table (terrible name, by the way... we know the database holds data/information) but a Company table, a Phone table, and an Email table. It would likely be beneficial to have a "bridging" table between the Contact (table) and the Company table. It's not likely that you would need one for the Phone or Email table even though that would constitute a bit of renormalization. Make sure that you add a "Type" column for the phone and email tables so you can have "Home", "Work", "Fax" etc for the type of phone number and email. I'd also move the LinkedInID out of the table and treat it more like the phone table so that you can easily accommodate other social medial without changing the Contact table every time someone gets the itch to make a new social media site. The way things are today, the contacts themselves may have a special website to include an ID for.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for your advice. There are a lot of points here, so to be clear. Naming to avoid ambiguity/irrelevant conventions is a must so I'll definitley do this, but have some questions about structure.

    @jeff Moden "I would plan on any given contact having more than one phone number, more than one email address, and working for more than one company, as well as the items previously mentioned." The contacts have 1 job, if the job changes the record is replaced, so I am not sure about this level of normalisation...

    @alan. B "Economic Class can be it's own table with a Foriegn Key (e.g. EconomicClass_ID) in your ContactInformation table. Also think about object where there may be a one-to-many relationship such as Email or Phone_Nbr. I sometimes like to give those things their own table.

    If Sector is a subcategory of Economic class you could snowflake that out as Category > Sector > SectorID in you ContactInformation Table. If, in your OLTP environment, you need to get the Economic Class, Sector and info from the ContactInformation table you could do this in a view and index it."

    If economic class is seperate table, is it logical to put Category, Sector, Sector ID in ContactINformation i.e. with the names?

    Are there some clear delineations that are an absolute must, as I am struggling a bit with how these would work. My mind can't get past that if info is linked to a person then that has to be in their table with their name...I do want to implement the changes I jus tneed to get my head around it. Thanks for your help 😉

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply