Normalized Schema Design

  • Hi,

    I am working on a table design, which will store records of completed daily tasks. The table could grow very large in a short period of time.
    At the moment I am not sure if I should normalize it. If I do normalize then I have these tables...

    • Team
    • Period 
    • ParentTask
    • ChildTask
    • Status
    • ApprovalLevel
    • UserDetail
    • TaskRecord (main table)
    Should I do this? What am I gaining by this level of normalization? Is there any effect on partially denormalizing the TaskRecord table and deleting the Team, Period, Status, and Approval tbls?
    Then the TaskRecord table would store actual varchar values rather than FK integers and the joins would be reduced.

    The TaskRecord table will potentially grow very large. Would queries on it be quicker without having to JOIN to other related tables? Either way indexes would have to be created?
    Also for INSERTING or UPDATING a TaskRecord, is it better to just have one table to call rather than having to use a view or sproc to lookup related tables? My biggest concern is performance.

    Thanks!

  • Normalization allows a system to properly handle one-to-many relationship requirements so I don't quite understand your question.  You can model this as a single flattened table...which mainframers did when using tape storage.  Also if your customer is cheap and only wants to pay for a single screen then do it all flattened.  Normalization gets rid of duplication and therefore builds in data integrity.  It also reduces space needs.  Reduced data size buys you more data stored in memory for faster queries.  Rarely do I ever have to denormalize something for better performance.

    I always design normalized models.  You can always throw a cheap stored procedure or view on top of that to make it appear flattened and simple if you want to do a quick and dirty screen for it.

    As for performance, I have many tables with a dozen foreign keys with millions of rows and queries always come back in a second.  Defined foreign keys actually help the optimizer and improve performance.

  • Everything Bill has said is 100% correct. I just want to reinforce it.

    Proper normalization improves performance. You store smaller amounts of data. This means fewer reads and writes overall. It also reduces memory useage. You get better data integrity, which actually improves performance because you have to do less processing to read/write your data. Enforced referential integrity speeds performance.

    Yes, you will have to write more complex queries than with a denormalized structure. Tough. Correctly written queries against a properly normalized structure with good indexing perform extremely well.

    Unless you are in special circumstances such as IoT style data collection or something along those lines, a well structured database is the way to go. Also, just because there's only one screen now doesn't mean there won't be two, or fifteen tomorrow. You'll absolutely be kicking yourself when you're asked for a report or some other functionality that would have been very simple with a well-structured database and now won't be possible with a flat structure.

    Also, if you decide to go with a flat structure, don't use SQL Server at all. You're putting overhead on all your code from which you derive no benefit using a relational database management system. If all you want is a completely flat structure, use an ID/Value, or similar, database such as MongoDB, Cassandra or Redis. These will make your flat structure much easier to create and maintain (although, there are many implications to choosing a completely flat structure, so be sure this is right for your circumstances).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey - Friday, February 17, 2017 6:36 AM

    . You'll absolutely be kicking yourself when you're asked for a report or some other functionality that would have been very simple with a well-structured database and now won't be possible with a flat structure.

    Hi,

    I was thinking of a potential report request where might need to query all tasks approved by a specific user. If all in one table then just a simple select where query. But if normalized, then need multiple joins on id columns,

    Is normalized still going to perform better in this scenario?
    Thanks!

  • Both methods will perform in a second, even for millions of rows.

  • kevin.obrien 66193 - Friday, February 17, 2017 6:47 AM

    Grant Fritchey - Friday, February 17, 2017 6:36 AM

    . You'll absolutely be kicking yourself when you're asked for a report or some other functionality that would have been very simple with a well-structured database and now won't be possible with a flat structure.

    Hi,

    I was thinking of a potential report request where might need to query all tasks approved by a specific user. If all in one table then just a simple select where query. But if normalized, then need multiple joins on id columns,

    Is normalized still going to perform better in this scenario?
    Thanks!

    You're talking about 3-4 joins, max. Yes, this performs just fine. I had a query with 86 joins. It ran in less than 500ms (compile time was 3 minutes, but that's a different story). A join is not performance death. It's what a relational database management system does. They are built and designed around performing join operations.

    As for reporting, it's a lot easier on normalized structures than it is on denormalized ones (NOTE: Star schema, etc., are not "denormalized" because they're not normalized, these are different types of relational structures).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply