Joins vs More Columns

  • I've finally talked my boss in to allowing me to redesign our DB. We have several different DBs with user information in them and they are getting to complicated so we are going to create a central users database and consolidate them.

    We have quite a bit of info we want to store, including 2 different addresses (work and home) as well as up to 4 phone numbers.

    I want to set this up with 3 tables:

    Users (us_id, name, email, password, etc)

    Phone (us_id, phone_number, phone_type) (phone_type would be 'home', 'work', 'cell', etc)

    Address (us_id, street, city, state, zip, address_type) (address_type would be 'home', 'work')

    Now, my boss wants to make just one table for all of this and have it contain 4 different columns for phone numbers and 2 sets of 4 columns for 2 different addresses. He wants it this way because he says that doing the joins to the other 2 tables is going to be much slower than just pulling the data back from 1 table.

    There will be approx 50,000 users in the table(s). The phone/address info is not needed often, but the users name, email, and password are needed often.

    So, my question (finally - I know) is this: Would it be noticeably slower to have the 3 tables with joins or just the one table?

  • Here's my answer:  It depends.

    Do your users select * and pull in all fields when they really only need one or two?

    You could make your three tables, then create a view that creates the one 'table' and run some tests to see which performs better in your environment.

    Michelle



    Michelle

  • Thanks for replying Michelle!

    Answering your question: No, only the neccessary columns are pulled back.

    As for testing this out, my boss doesn't want to take the time to make a 'test' version of the 3 tables and then convert all of our data over to them.

    So, if anyone knows or has a good idea about the speed differences, I'd appreciate it!

  • I don't think it should be a speed issue, it's clearly a good design issue.  Creating multiple tables and a view over the top of that, makes the application more flexible.  Systems change all of the time, what happens when someone has three addresses or 5 phone numbers.  His 1 table design doesn't allow for this.  Your design however allows for different phone/address types. 

    Also what will happen when you need to add different attributes about your contacts.  If he wants one big flat-file you can store that in a text document, thus defeating the whole point of having a RDBMS.  At my last job it took 4 years of fighting with the old Mainframe guys about good database design, after they gave it a try, they found it was much easier to maintain the code using multiple tables and relationships.

    I would push the issue with your boss to maintain good design standards.

    Good luck

    Tom

  • Tom,

    Thanks for your input.

    Ok, my question about your suggestion of putting a view in to cover the 3 tables:

    What benefit does the view provide? The code can already look in all 3 tables, so does having a view give you a speed increase because the joins are compiled and thus faster? Or is it just making the data easier to access?

    I'm pretty good with using views to make data easier to access, but I'm not too knowledgeable about any performance benefits/problems they have so if you could enlighten me, it would be much appreciated.

  • It merely takes the complexity of the join out from the users.  That is unless you create an indexed(materialized) view, than things could speed up potentially.

    Tom

  • Hm,...is your boss the database expert or you ?

    50,000 rows are not really much, so I guess either way you won't notice a significant performance difference.

    A relational database is not all about performance. A good design can save you much headache and I would go for a normalized data model. This one might give you additional information.

    http://www.sqlservercentral.com/columnists/bkelley/normalization.asp

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Thanks Frank.

    And, no, my boss is not a DB expert (but he unfortunately has dealt with DBs and thus has some knowledge which makes him dangerous).

    As for performance, he would rather trade headache for better performance, so if he sees anywhere that multiple joins would be slower, than he will insist on removing as many joins as possible and just adding extra columns instead.

    Also, it's only about 50k records now, but that will probably be 100k within a few months and several hundred thousand in 6 months. Would that be enough to start noticing performance differences?

    Personally, I would prefer the normalized version for flexibility, but I'm afraid that it won't happen unless I can find some reputable evidence that either A) there is no performance penalty, or B) the penalty is worth it.

    So, if anyone has any other links that could help, I'd love to see them. If not, thanks for the help so far!

    Dave

  • Forgive me for this (my mood management today has failed), but you might suggest your boss to post to Microsoft SQL Server newsgroups. With a little bit of luck Joe Celko will respond. Would he be reputable enough? If so, be prepared for an offending reply.

    Even if you go up on several hundreds of thousands rows I would prefer the normalized version as this number is still considered small for a RDBMS. Actually what does your boss mean by performance? Response within subseconds? If so, you can fake this by using SQL Server's FAST n query hint.

    It has been mentioned above, that your boss won't need a RDBMS at all. He wants a flat file. Unfortunately Excel can only handle some 65,000 rows in a single spreadsheet.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Having a large table could degrade performance because when the db loads the data into memory less rows are loaded (since each takes more space). This would cause more IO to be done if you either scan many rows during one select or have a lot of queries questions different rows. of course this is true only if those queries doesn't need the address and phone data.

    I too think that for OLTP systems, normalizing is the better approach.

    Anat

     

  • If he thinks his "database performance headache" will be any less than his "difficult-to-use-database headache" then his 'limited database knowledge' will prove not to be the aspirin he wants.

     

    3 properly designed narrow tables are far better than 1 poorly designed wide kludge of a table.

    Proper use of indices on your planned tables will optimise performance.

    Growth of the database from 50k to 400-600k records will only exacerbate the poor performance of the wide table.  In order to get decent performance on the wide table, you will need indices on far more columns than on the narrow one, and would in all probability run into a larger overall database size....thus going some way to defeating his purpose...ie traversing less data.

     

    Stick to your guns.  He's paying for your skills - it's time he let you use them.

  • More points regarding the importance of flexibility:

    There are problems with the single-table format and multiple rows of data. Your opitons are to hard-code which set of data is which (first address is home, second is work - first phone is home, then work, then cell, etc.) which limits flexibility in that if there are more than four types of phone numbers, then you need to add more fields. This also makes it a bit more difficult to code, and increases processing time as the data is saved. If you allow a type field with the multiple addresses/phone numbers, then the performance penalty with the single table is at least as bad as that seen in the normalized view.

    Plus, it takes a certain amount of space to maintain the empty address and phone fields when they aren't all used. Not a lot of space, but we're already picking nits with the performance issues the boss is concerned with.

    (sorry about the version with just the first line - forum software decided I had to enter everything without using returns, which I wasn't expecting!)


    R David Francis

  • Don't forget that with a denormalized table you will be repeating the same information for each each property a person has with multiple values (multiple phone numbers and types, emails, etc.).  Is this effecient?

    I also suggest normalization.  Joins are effecient when used properly.  You may also want to utilize stored procedures. 

  • Actually, the single table approach isn't truly a denormalized approach - data isn't repeated in it any more than it is repeated in the multi-table approach. As people have pointed out, it's more an issue of flexibility - allowing additional addresses or phone numbers without changing the table, and avoiding having columns that will be empty in the vast majority of cases.


    R David Francis

  • I definitely second (and third) the opinion that the 3 tables are much more flexible. My DB has a managers table and a separate phone table. Wouldn't you know it, the customer initially requested up to 3 numbers and went with this for about a year, then changed their minds and wanted slots for up to 6 phone numbers.

    Also, they wanted to know what type of number (mobile, home, work, home fax, work fax, etc, etc.) If you had this situation that would be lots more columns. But our phones table simply has a column for phone type. They also kept adding permutations of phone types.

     

     

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply