Table Design

  • Dear Experts ,
    Please share if there are any standard guidelines for designing the table objects , apart from Normalization , and Column domain values/ranges the data can accommodate max , the type of the column etc and yes the infrastructure  . Apart from these is there anything that is significant during design. Thank you.

  • No "standard" guidelines, no, but here are a few things that I stick to.  There will be exceptions to nearly all of them!

    (1) Choose the clustered index key carefully - but do include one
    (2) The table should have a primary key constraint
    (3) Referential integrity should be enforced with foreign key constraints
    (4) Use a consistent convention for naming columns, tables, constraints and so on
    (5) Don't use spaces, hyphens or a host of other characters in object and column names.  Avoid reserved keywords also
    (6) Choose data types appropriately
    (7) Make full use of CHECK and DEFAULT constraints.

    I could go on - but I'll leave it to others to add to (or disagree with parts of) the list.

    John

  • Primary keys should be non-human readable like a guid...at least in pure theory.  I love them.  Never use a natural key like an SSN because your life will become miserable; there are many SSN dupes and updates will kill you.  And stay away from those pathetic identity columns.  Use Sequences or newid() instead so you can generate primary keys from the app instead of RBAR identity inserts.

    Demand alternate keys (unique constraints) for virtually every table.  Some tables may have two or three alternate key groups of columns.  If you don't get this right, development will suffer and you will be babysitting the database and cleaning up all the time and I will think you are masochistic.

    Use positive logic names throughout such as IsShipped instead of negative such as IsNotShipped.

    Always store base data and never calculated stuff like ExtendedCost.  It is cheap to recalc on the fly or add computed columns or views.

    Err on the side of making columns not null and force developers to request nullable if needed.  Performance and simplicity suffer from too many nulls.

  • I love all the other answers. I'm going to repeat a few for emphasis and a couple of extra points. Almost everything I write has exceptions. However, exceptions should be exceptional.

    * You should have a clustered index on every single table.
    * Clustered indexes should be, as much as possible, unique
    * Every table should have a primary key
    * Every foreign key must be enforced through the use of WITH CHECK
    * 2016 and greater, Clustered Columnstore for analysis focused data instead of clustered index
    * The primary key should be on data that never changes
    * For all your nonclustered indexes, it's better if your clustered index never changes too.
    * The clustered index should be on the most frequently used path to the data, this may not be a PK or AK
    * Because the PK should never change, GUID or IDENTITY are usually the best bets
    * You should have an alternate key (AK) because your PK should be generated, see above, so that it never changes (cascading updates and deletes are from satan).
    * Data types are so important. Strings are strings. Dates are dates. Integers are integers, etc. Don't mix them, ever. It just leads to pain.

    Most of all, don't get overly obsessed with all these suggestions. You're going to have compromises. We all do. Go into them with your eyes open so you understand what the choices you make will mean to the behavior of the system.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I forgot to mention the most important point of all, a point that I've been seeing in many systems books over and over again the past few years...

    Do not optimize to specific application requirements.  Do not design to make it easy for a developer to implement his screen by denormalizing a table.  Do not implement anti-patterns.  Tell developers "No" when they ask for special implementations.

    Always optimize to the system.  Use standard database design patterns.  Doing this will decrease the amount of custom code you have to write by 50% or more.  Doing this will allow developers to focus on business rules instead of deciphering some clever hack saving another 50% in work time.

  • Bill Talada - Wednesday, November 22, 2017 9:14 AM

    Primary keys should be non-human readable like a guid...at least in pure theory.  I love them.  Never use a natural key like an SSN because your life will become miserable; there are many SSN dupes and updates will kill you.  And stay away from those pathetic identity columns.  Use Sequences or newid() instead so you can generate primary keys from the app instead of RBAR identity inserts.

    Demand alternate keys (unique constraints) for virtually every table.  Some tables may have two or three alternate key groups of columns.  If you don't get this right, development will suffer and you will be babysitting the database and cleaning up all the time and I will think you are masochistic.

    Use positive logic names throughout such as IsShipped instead of negative such as IsNotShipped.

    Always store base data and never calculated stuff like ExtendedCost.  It is cheap to recalc on the fly or add computed columns or views.

    Err on the side of making columns not null and force developers to request nullable if needed.  Performance and simplicity suffer from too many nulls.

    There's no need to have RBAR inserts in the presence of an IDENTITY column.  And there are no duplicate SSNs issued by the SSA.  If you run across some, then people are lying.  And, no... SSNs are not and have not ever been reused by the SSA.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Remember or discover that the Clustered Index can have more than one use (didn't say more than one key) and is not necessarily the panacea of performance that people would sometimes have you believe, especially on wide tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Grant Fritchey - Wednesday, November 22, 2017 9:57 AM

    I love all the other answers. I'm going to repeat a few for emphasis and a couple of extra points. Almost everything I write has exceptions. However, exceptions should be exceptional.

    * You should have a clustered index on every single table.
    * Clustered indexes should be, as much as possible, unique
    * Every table should have a primary key
    * Every foreign key must be enforced through the use of WITH CHECK
    * 2016 and greater, Clustered Columnstore for analysis focused data instead of clustered index
    * The primary key should be on data that never changes
    * For all your nonclustered indexes, it's better if your clustered index never changes too.
    * The clustered index should be on the most frequently used path to the data, this may not be a PK or AK
    * Because the PK should never change, GUID or IDENTITY are usually the best bets
    * You should have an alternate key (AK) because your PK should be generated, see above, so that it never changes (cascading updates and deletes are from satan).
    * Data types are so important. Strings are strings. Dates are dates. Integers are integers, etc. Don't mix them, ever. It just leads to pain.

    Most of all, don't get overly obsessed with all these suggestions. You're going to have compromises. We all do. Go into them with your eyes open so you understand what the choices you make will mean to the behavior of the system.

    Just a quick add to Grant's really good post... 

    * Because the PK should never change, GUID or IDENTITY are usually the best bets

    IF you have to resort to using a GUID, for the love of all you hold dear, make sure use you define the column with NEWSEQUENTIALID and NOT NEWID().
    This 1 small thing will help avoid the massive index fragmentation caused by inserting a completely random GUID into your indexes.

  • I would stay away from GUIDs as a primary key, especially if it gets defined as the clustered index.  For me there needs to be a good reason for using GUIDs as primary key, such as replication.
    I like synthetic keys that are invisible to the application.  I also believe that a natural key as one of the alternate keys is valid.  And if you can identify more than one alternate key that could be beneficial as well.

    Don't create single column indexes on every column.
    Try not to duplicate indexes.  Apparently there may be times when a duplicate index is a good thing, but I haven't found them useful yet (Jeff, what say you).

  • Oh, yes, be sure to normalize your tables, at least 3NF.

  • Jeff Moden - Wednesday, November 22, 2017 3:43 PM

    There's no need to have RBAR inserts in the presence of an IDENTITY column.  And there are no duplicate SSNs issued by the SSA.  If you run across some, then people are lying.  And, no... SSNs are not and have not ever been reused by the SSA.

    Ran into duplicate SSNs before and we had to verify it with the Social Security Administration - they weren't lies. And they've had other issues. Here is the SSA link explaining some of it - check the section on Handling SSN Assignment Problems:
    The Story of the Social Security Number

    Sue

  • Thank you Jeff, John, Lynn, Grant and Bill for the points. How about column lengths ? There's a debate with dev team especially on Varchar. Where 20 is enough, they ask for length to be 500 sometimes because SQL anyways uses only the actual data-length. My aurgument is keep only what is required or may grow realistically.. and also since SQL estimates and holds half of the column length in anticipation of data in a column .. isn't it better to use what is required? I believe that keeping things that you don't require , comes with a cost . what do you think ?

  • Arsh - Wednesday, November 22, 2017 7:37 PM

    Thank you Jeff, John, Lynn, Grant and Bill for the points. How about column lengths ? There's a debate with dev team especially on Varchar. Where 20 is enough, they ask for length to be 500 sometimes because SQL anyways uses only the actual data-length. My aurgument is keep only what is required or may grow realistically.. and also since SQL estimates and holds half of the column length in anticipation of data in a column .. isn't it better to use what is required? I believe that keeping things that you don't require , comes with a cost . what do you think ?

    Yes, use only what's required.  This is for two reasons.  First, as you hinted at, the memory grant for a query may be based on the declared width of a varchar column.  Second, you're more likely to prevent bad data from getting into your database if you restrict its width to the absolute maximum of what it could be.

    John

  • John Mitchell-245523 - Thursday, November 23, 2017 2:10 AM

    Arsh - Wednesday, November 22, 2017 7:37 PM

    Thank you Jeff, John, Lynn, Grant and Bill for the points. How about column lengths ? There's a debate with dev team especially on Varchar. Where 20 is enough, they ask for length to be 500 sometimes because SQL anyways uses only the actual data-length. My aurgument is keep only what is required or may grow realistically.. and also since SQL estimates and holds half of the column length in anticipation of data in a column .. isn't it better to use what is required? I believe that keeping things that you don't require , comes with a cost . what do you think ?

    Yes, use only what's required.  This is for two reasons.  First, as you hinted at, the memory grant for a query may be based on the declared width of a varchar column.  Second, you're more likely to prevent bad data from getting into your database if you restrict its width to the absolute maximum of what it could be.

    John

    The following article by Aaron Bertrand illustrates the increased memory grants based on the defined data size, rather than the actual data size.
    Performance Myths : Oversizing string columns

  • Jason A. Long - Wednesday, November 22, 2017 3:57 PM

    Grant Fritchey - Wednesday, November 22, 2017 9:57 AM

    I love all the other answers. I'm going to repeat a few for emphasis and a couple of extra points. Almost everything I write has exceptions. However, exceptions should be exceptional.

    * You should have a clustered index on every single table.
    * Clustered indexes should be, as much as possible, unique
    * Every table should have a primary key
    * Every foreign key must be enforced through the use of WITH CHECK
    * 2016 and greater, Clustered Columnstore for analysis focused data instead of clustered index
    * The primary key should be on data that never changes
    * For all your nonclustered indexes, it's better if your clustered index never changes too.
    * The clustered index should be on the most frequently used path to the data, this may not be a PK or AK
    * Because the PK should never change, GUID or IDENTITY are usually the best bets
    * You should have an alternate key (AK) because your PK should be generated, see above, so that it never changes (cascading updates and deletes are from satan).
    * Data types are so important. Strings are strings. Dates are dates. Integers are integers, etc. Don't mix them, ever. It just leads to pain.

    Most of all, don't get overly obsessed with all these suggestions. You're going to have compromises. We all do. Go into them with your eyes open so you understand what the choices you make will mean to the behavior of the system.

    Just a quick add to Grant's really good post... 

    * Because the PK should never change, GUID or IDENTITY are usually the best bets

    IF you have to resort to using a GUID, for the love of all you hold dear, make sure use you define the column with NEWSEQUENTIALID and NOT NEWID().
    This 1 small thing will help avoid the massive index fragmentation caused by inserting a completely random GUID into your indexes.

    Ah... gotta disagree there (on the GUID stuff) especially since I'm in the process of doing some extreme testing with GUIDs for other purposes.  One big thing to consider is that NEWSEQUENTIALID will create an INSERT "HotSpot" just like using an IDENTITY column or any other ever-increasing value will.  The other thing to consider it that, and I quote from BOL...

    After restarting Windows, the GUID can start again from a lower range...


    ... and it turns out that there's about a 50% chance that it WILL start at a lower number, which will cause the "bad" type of page splits that we all try to  avoid (frequently using incorrect rebuild/reorg methods to do so worsening the problem in the process).

    In contrast and using the correct FILL FACTOR (which isn't the waste you think it might be with GUIDs), you end up with 16 different INSERT points instead of just one like you will with an IDENTITY.

    I still think that GUIDs have a couple of major "suck" factors, such as being a whole lot wider than an INT and those being included in every NCI and that they're a real PITA to read and type and the fact that they actually sort on what appears to be the last 6 bytes (which makes selecting a range of GUIDs a huge pain), but I no longer think of GUIDs as a form of "Death by Datatype" for performance with restpect to inserts to CIs or NCIs.  With the correct FILL FACTOR AND THE RIGHT KIND OF INDEX MAINTENANCE WHICH DOES NOT FOLLOW THE RECOMMENDATIONS IN BOL OR OTHER "BEST PRACTICES" (which in secondary purpose of my current study), they actually have a significant performance advantage for huge numbers of singleton inserts compared to typical ever-ascending keys.

    I'm also discovering that most peoples' index maintenance plans are seriously flawed and may actually be the cause of more performance problems than not doing any index maintenance at all.  This isn't the result of some artificial, contrived, or hear-say recommendations.  You've probably heard me say it before... I've not rebuilt indexes on my production box for almost 2 years and performance actually and significantly improved over the first 3 months and hasn't degraded since.  Part of my current study is to identify and determine the remedy and guidelines for the very small handful of exceptions (tables that are regularly added to and then partially deleted from, for example).  I'm also developing code where people can duplicate the non-insitu experiments that I've been doing to support both not doing index maintenance and how to actually get the most benefit if you feel compelled to do index maintenance.  I've also discovered that ALTER INDEX REORGANIZE sucks so bad that it has its own gravitational pull.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 42 total)

You must be logged in to reply to this topic. Login to reply