DB Design/Normalization Question

  • Hello,

    I've read the posts on this site and elsewhere about normalization -- I have a scenario in a db I'm currently working on where I'd like to get some advice.

    I am using an Azure SQL Back End/MS Access Front End.  About 35 end users.

    The primary function of the database is to keep track of Applications received by this company to procure its services.  These applications are submitted on behalf of the applicant by a 3rd party sales organization.

    A Sales Organization often submits dozens of applications every month. Each application has potentially 80+ points of data about the applicant, and no applicant (in theory) would every apply more than once.   All of this data is displayed on the same form, as requested by the client.

    I currently have one table for the applications - tApplication, and one for the Sales Organization - tSalesOrg.

    tApplication contains the 100+ fields, which seems far too many.  I realize I could separate what is Application Data (Date Application Received, Which Sales Organization, etc) , and what is Applicant Data (Name, Address, etc), but there would still be 80+ fields of Applicant Data.

    The Applicant Data has two sets of addresses and two sets of contact info as each applicant could have two of each.  It also includes numerous boolean fields related to their business. Should I try to separate this out as well?  Any thoughts would be appreciated.  Thanks in advance.

     

     

     

     

     

  • It's hard to tell without seeing at the very least the table definition of tApplication, and a few dummy records. But 105 columns is a lot - it's possible that all the columns belong in that table, but it's impossible to tell without seeing it.

    • Yes, I would try to separate it. The Boolean/bit columns are narrow and take the least space. If there are clear ways to group them by category, try that. If not, they are like a junk dimension and may well just mash in the same misc table. Try splitting all addresses off; applicants and companies. Addresses are homogenized in terms of data shape. Keys to both the company and applicant table make sense.  Also think about putting big character columns in one table. Big char types can be a performance load on queries. So in the end ... Sales org, application, applicant, addresses, application responses misc, application responses with character columns like free text paragraphs with essays for example. Just an idea.
  • Thanks for the replies....I'm hesitant to post any actual table structure or data for security reasons, but it does seem that ultimately I should break up this table, and this has given me a few ideas as of how to do it.  Thanks.

  • Heh... Most people have a rule of "Normalize 'til it hurts then denomalize 'til it works".  I disagree with that.  My rule of thumb is "Normalize 'til it hurts... then leave it like that" because you've finally done it right. 😀

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ha....this one is going to hurt, as so much will have to be re-written.  But better now than a year from now.

  • EsquireDeveloper wrote:

    ...The Applicant Data has two sets of addresses and two sets of contact info as each applicant could have two of each.  It also includes numerous boolean fields related to their business. Should I try to separate this out as well?  Any thoughts would be appreciated.  Thanks in advance. 

    I agree with the others about needing more normalization here.  Any data that there could be more than one of for a single applicant you would definitely want to store in separate tables.  If you have reports or something that depends on those columns being inline together, you could use something like a view or query in a stored procedure to bring that data back together.

  • Quick Follow Up Question:  The two sets of addresses and two sets of contact info for each applicant are obvious cues to normalize and create tables for each.

    But everything else are either booleans or otherwise small, simple data.  This is still 50+ fields -- all of them 1 to 1 with the Applicant and would not apply to any other applicant.   There is no obvious separation of these fields --  Would there be any benefit in arbitrarily splitting this remaining data into separate tables?

  • EsquireDeveloper wrote:

    Quick Follow Up Question:  The two sets of addresses and two sets of contact info for each applicant are obvious cues to normalize and create tables for each.

    But everything else are either booleans or otherwise small, simple data.  This is still 50+ fields -- all of them 1 to 1 with the Applicant and would not apply to any other applicant.   There is no obvious separation of these fields --  Would there be any benefit in arbitrarily splitting this remaining data into separate tables?

    Absolutely but not "arbitrarily"!  Look for columns that are usually NULLs or columns that won't actually be used frequently (putting them into a "sister" table can seriously help performance for "main" queries).  Also, look for columns that should actually be using a MAX datatype.  There's a neat trick you can do (FORCE them to live Out Of Row) to greatly improve the performance of your code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks,I'll go in that direction -- I'm intrigued -- 'Force Them to Live Out Of Row' -- would you mind explaining further or pointing me to a link

  • Offhand it's difficult to tell if separating out 1 to 1 related columns would have any benefit.  It would probably depend on how often those columns are populated, how often they are queried, etc.

  • Look for the [DOWNLOAD] button at the following link...

    https://github.com/bssug/PresentationDownloads/blob/master/20190521_JEFF_MODEN_BLACK_ARTS_INDEX_MAINT_03.zip

    I don't write PowerPoint "presentations"... rather, I write highly animated PowerPoint "Books".  If you go into the "present" mode of the PPTX, each click will advance the slide or make something new appear on the slide.  In theory, you won't need me to explain a thing because it's all explained step-by-step on the slides.  There's also a wad of attached code that I demo'd if you want to run it to see what happens.

    And, BTW.... I have updated the "IndexDNA" code and spreadsheet that's used to actually see the page density of pages in an index in graphical form.

    Your cost for this presentation is just some feedback... let me know what you think and how I can improve it to the point where no one needs me to say a thing  during the presentation.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for all the feedback -- I'll take a look at that presentation.  Thanks All.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply