What Level of Normalization Would This Be?

  • What level of normalization would this be, and would it be recommended for scenarios with, say 20 000+ users? The benefits that I my boss can see with tbl_Users1 would be the [urll] schlep [/urll] of looking up addresses would be avoided. Deep down inside I feel that the architecture below is just plain wrong, but I'm not sure how to phrase it. I think it's because that there is repetitive data/data patterns that needn't be repeated.

    CREATE TABLE [dbo].[tbl_Users1](

    [userID] [int] IDENTITY(1,1) NOT NULL,

    [DMSUserName] [varchar](50) NOT NULL,

    [firstNames] [nvarchar](100) NULL,

    [surname] [nvarchar](50) NULL,

    [title] [nvarchar](100) NULL,

    [company] [nvarchar](100) NULL,

    [dateRegistered] [datetime2](7) NOT NULL,

    [dateActivated] [datetime2](7) NOT NULL,

    [isActive] [bit] NOT NULL,

    [emailAddress] [nvarchar](100) NOT NULL,

    [primaryContactNumber] [nvarchar](50) NULL,

    [secondaryContactNumber] [nvarchar](50) NULL,

    [physicalAddrLine1] [nvarchar](100) NOT NULL,

    [physicalAddrLine2] [nvarchar](100) NULL,

    [physicalAddrLine3] [nvarchar](100) NULL,

    [physicalAddrLine4] [nvarchar](100) NULL,

    [physicalAddrCity] [nvarchar](50) NOT NULL,

    [physicalAddrProvince] [nvarchar](50) NOT NULL,

    [physicalAddrCountry] [nvarchar](50) NOT NULL,

    [physicalAddrPostalCode] [nvarchar](20) NOT NULL,

    [postalAddrLine1] [nvarchar](100) NOT NULL,

    [postalAddrLine2] [nvarchar](100) NULL,

    [postalAddrLine3] [nvarchar](100) NULL,

    [postalAddrLine4] [nvarchar](100) NULL,

    [postalAddrCity] [nvarchar](50) NOT NULL,

    [postalAddrProvince] [nvarchar](50) NOT NULL,

    [postalAddrCountry] [nvarchar](50) NOT NULL,

    [postalAddrPostalCode] [nvarchar](20) NOT NULL

    ) ON [PRIMARY]

    versus

    CREATE TABLE [dbo].[tbl_Users2](

    [userID] [int] IDENTITY(1,1) NOT NULL,

    [DMSUserName] [varchar](50) NOT NULL,

    [firstNames] [nvarchar](100) NULL,

    [surname] [nvarchar](50) NULL,

    [title] [nvarchar](100) NULL,

    [company] [nvarchar](100) NULL,

    [dateRegistered] [datetime2](7) NOT NULL,

    [dateActivated] [datetime2](7) NOT NULL,

    [isActive] [bit] NOT NULL,

    [emailAddress] [nvarchar](100) NOT NULL,

    [primaryContactNumber] [nvarchar](50) NULL,

    [secondaryContactNumber] [nvarchar](50) NULL,

    [physicalAddr] INT NOT NULL,

    [postalAddr] INT NOT NULL

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[tbl_Addresses](

    [addressID] [int] IDENTITY(1,1) NOT NULL,

    [addrLine1] [nvarchar](100) NOT NULL,

    [addrLine2] [nvarchar](100) NULL,

    [addrLine3] [nvarchar](100) NULL,

    [addrLine4] [nvarchar](100) NULL,

    [addrCity] [nvarchar](50) NOT NULL,

    [addrProvince] [nvarchar](50) NOT NULL,

    [addrCountry] [nvarchar](50) NOT NULL,

    [addrPostalCode] [nvarchar](20) NOT NULL

    ) ON [PRIMARY]

  • I guess my first question here would be whether you'll need all the addresses every time that you query the users table, or will you need them only a small portion of the time?

    nvarchar for addresses?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • It is First Normal Form and normally you would want to have your model in Third Normal Form or even higher.

    If you need some good arguments for the design discussion with your boss, have a look here: http://www.simple-talk.com/sql/learn-sql-server/facts-and-fallacies-about-first-normal-form/

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • It's 1NF (first normal form).

    It'll work great for a very short while, then start causing all kinds of interesting problems. Interesting, that is, if you consider spending huge amounts of time and effort for very little gain, "interesting".

    Split e-mail into its own table. Same for phones. Same for addresses.

    I have four e-mail addresses currently. My wife has about twelve. Saving just one of them is pretty limited.

    With this structure, you have no ability to indicate whether a phone number is a land line, a cell phone, a VOIP phone, etc. You don't have a space for a fax number. You will probably need to record that at some point. A PhoneNumbers table gives you those kind of options, and is easily extensible. What happens with the one-table-two-column solution when someone tells you, "I'll be out of town for the week, my contact phone next week will be ...."?

    As for addresses, again, what happens if someone in your database has a summer address that's different than their winter address? What if they need you to ship something to a hotel they're staying at for a week?

    If your boss needs something more denormalized, well, that's what views are for.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Is it first normal form?

    The requirement for being first normal forum is that there are no repeating groups. If we take the addresses as repeating groups, then this isn't even in 1st Normal Form.

    The requirements for 2nd are no partial key dependencies, so saying that it is in 1st normal form is saying that there are no repeating groups but that there are partial key dependencies.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (5/26/2009)


    Is it first normal form?

    The requirement for being first normal forum is that there are no repeating groups. If we take the addresses as repeating groups, then this isn't even in 1st Normal Form.

    It isn't Address1, Address2, Address[n], but rather AddressLine1,... which seems to imply a different meaning for each of these columns as being up to x parts of one and the same address and therefore is not a repeating group.

    Edit: Not arguing about the sense behind this entity and completely ignoring the domains, of course. Looks like a ported Access table. 🙂

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • There's still two addresses and two phone numbers in the table. That could (or could not) be construed as repeating groups.

    If it is considered a repeating group, this isn't even 1st normal form.

    If it isn't, then, with no partial key dependencies (key - userID - is a single field) and no obvious inter-data dependencies (though there are some implied ones), wouldn't it be 2nd or 3rd normal form?

    It's not in violation of the requirements for 2nd and it doesn't appear to be in violation of the requirements for 3rd.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Um, well. I wasn't really anticipating this amount of responses. Thanks a lot guys, I appreciate it!

    Gila

    I guess my first question here would be whether you'll need all the addresses every time that you query the users table, or will you need them only a small portion of the time?

    nvarchar for addresses?

    Right now, we're retrieving all the fields to populate the users' profile page so that they can change their details. (The project is a website/forum/stuff)

    We're using nvarchar in case the users enter something funky as their address (this will be catering to users that live in South Africa - where I am and we don't really use characters outside the normal ASCII set as our addresses - as well as other countries).

    Thanks for the link Frank.

    GSquared

    Split e-mail into its own table. Same for phones. Same for addresses.

    I have four e-mail addresses currently. My wife has about twelve. Saving just one of them is pretty limited.

    With this structure, you have no ability to indicate whether a phone number is a land line, a cell phone, a VOIP phone, etc. You don't have a space for a fax number. You will probably need to record that at some point. A PhoneNumbers table gives you those kind of options, and is easily extensible. What happens with the one-table-two-column solution when someone tells you, "I'll be out of town for the week, my contact phone next week will be ...."?

    As for addresses, again, what happens if someone in your database has a summer address that's different than their winter address? What if they need you to ship something to a hotel they're staying at for a week?

    If your boss needs something more denormalized, well, that's what views are for.

    Because "the users aren't the main focus of the database:"

    Me

    The benefits that I my boss can see with tbl_Users1 would be the schlep of looking up addresses would be avoided.

    However, I think that alternate/"out of town" contact details example that you give might actually be a possibility. How would you address it GSquared?

    Frank

    Not arguing about the sense behind this entity and completely ignoring the domains, of course. Looks like a ported Access table

    What do you mean by "completely ignoring the domains?" And no dude, it's not a ported Access table... 🙂

    It was originally only supposed to hold the username of the user data stored in a different database on a different server. They're not linked due to security concerns. The user is validated by our own custom data interface into the other server, and then his (sexist 🙂 ) session information held in this supposedly really tiny, puny database. [urll] However, scope creep/mutation, like [/urll] [urll] war, never changes. [/urll]

    If this information helps you guys, I'm using RedGate's very cool DataGenerator to populate the various tables with approximately ten times (200 000 rows) the amount of data that the company owner and my boss envisage that the database will hold and it seems like it's holding up.

  • GDI Lord (5/27/2009)


    Me

    The benefits that I my boss can see with tbl_Users1 would be the schlep of looking up addresses would be avoided.

    However, I think that alternate/"out of town" contact details example that you give might actually be a possibility. How would you address it GSquared?

    Dunno about Gus, but this is how I would do it. (very rough)

    CREATE TABLE Users (

    UserId INT IDENTITY

    ....

    )

    CREATE TABLE UserAddresses (

    UserID INT -- foreign key to Users

    AddressType

    Addressdetail

    )

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply