Table Design

  • There are reviews on a pile of data modelling tools at databaseanswers.org.
    Some of the more expensive tools are very good but it takes quite an effort to get up and running with them.  Whether they are worth the learning curve, as well as the cost, depends on how much database design you do.   Currently, I may go weeks without any database design work.  Hence I content myself with making the most out of database diagrams.  That said, ERWin enabled me to support a very rushed year 2000 project effectively.

  • Jeff Moden - Wednesday, November 22, 2017 3:43 PM

    Bill Talada - Wednesday, November 22, 2017 9:14 AM

    Primary keys should be non-human readable like a guid...at least in pure theory.  I love them.  Never use a natural key like an SSN because your life will become miserable; there are many SSN dupes and updates will kill you.  And stay away from those pathetic identity columns.  Use Sequences or newid() instead so you can generate primary keys from the app instead of RBAR identity inserts.

    Demand alternate keys (unique constraints) for virtually every table.  Some tables may have two or three alternate key groups of columns.  If you don't get this right, development will suffer and you will be babysitting the database and cleaning up all the time and I will think you are masochistic.

    Use positive logic names throughout such as IsShipped instead of negative such as IsNotShipped.

    Always store base data and never calculated stuff like ExtendedCost.  It is cheap to recalc on the fly or add computed columns or views.

    Err on the side of making columns not null and force developers to request nullable if needed.  Performance and simplicity suffer from too many nulls.

    There's no need to have RBAR inserts in the presence of an IDENTITY column.  And there are no duplicate SSNs issued by the SSA.  If you run across some, then people are lying.  And, no... SSNs are not and have not ever been reused by the SSA.

    Duplicate or not you should NEVER use SSN as a primary key. This is the type of information that should always be encrypted. I know you know this Jeff but it is so critical that it is worth mentioning.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Jason A. Long - Wednesday, November 22, 2017 3:57 PM

    Grant Fritchey - Wednesday, November 22, 2017 9:57 AM

    I love all the other answers. I'm going to repeat a few for emphasis and a couple of extra points. Almost everything I write has exceptions. However, exceptions should be exceptional.

    * You should have a clustered index on every single table.
    * Clustered indexes should be, as much as possible, unique
    * Every table should have a primary key
    * Every foreign key must be enforced through the use of WITH CHECK
    * 2016 and greater, Clustered Columnstore for analysis focused data instead of clustered index
    * The primary key should be on data that never changes
    * For all your nonclustered indexes, it's better if your clustered index never changes too.
    * The clustered index should be on the most frequently used path to the data, this may not be a PK or AK
    * Because the PK should never change, GUID or IDENTITY are usually the best bets
    * You should have an alternate key (AK) because your PK should be generated, see above, so that it never changes (cascading updates and deletes are from satan).
    * Data types are so important. Strings are strings. Dates are dates. Integers are integers, etc. Don't mix them, ever. It just leads to pain.

    Most of all, don't get overly obsessed with all these suggestions. You're going to have compromises. We all do. Go into them with your eyes open so you understand what the choices you make will mean to the behavior of the system.

    Just a quick add to Grant's really good post... 

    * Because the PK should never change, GUID or IDENTITY are usually the best bets

    IF you have to resort to using a GUID, for the love of all you hold dear, make sure use you define the column with NEWSEQUENTIALID and NOT NEWID().
    This 1 small thing will help avoid the massive index fragmentation caused by inserting a completely random GUID into your indexes.

    If you are going to use NEWSEQUENTIALID why bother with such a wide datatype? One of the advantages of guid is the difficulty in guessing. If you use NEWSEQUENTIALID it is simple to guess the next value because it is known. I would make a strong argument against using this because an identity is so much simpler to use and takes a lot less storage. If you use sequential values like this you have lost all of the advantages of the guid datatype and maintain all of the cons.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange - Tuesday, November 28, 2017 7:29 AM

    Jason A. Long - Wednesday, November 22, 2017 3:57 PM

    Grant Fritchey - Wednesday, November 22, 2017 9:57 AM

    I love all the other answers. I'm going to repeat a few for emphasis and a couple of extra points. Almost everything I write has exceptions. However, exceptions should be exceptional.

    * You should have a clustered index on every single table.
    * Clustered indexes should be, as much as possible, unique
    * Every table should have a primary key
    * Every foreign key must be enforced through the use of WITH CHECK
    * 2016 and greater, Clustered Columnstore for analysis focused data instead of clustered index
    * The primary key should be on data that never changes
    * For all your nonclustered indexes, it's better if your clustered index never changes too.
    * The clustered index should be on the most frequently used path to the data, this may not be a PK or AK
    * Because the PK should never change, GUID or IDENTITY are usually the best bets
    * You should have an alternate key (AK) because your PK should be generated, see above, so that it never changes (cascading updates and deletes are from satan).
    * Data types are so important. Strings are strings. Dates are dates. Integers are integers, etc. Don't mix them, ever. It just leads to pain.

    Most of all, don't get overly obsessed with all these suggestions. You're going to have compromises. We all do. Go into them with your eyes open so you understand what the choices you make will mean to the behavior of the system.

    Just a quick add to Grant's really good post... 

    * Because the PK should never change, GUID or IDENTITY are usually the best bets

    IF you have to resort to using a GUID, for the love of all you hold dear, make sure use you define the column with NEWSEQUENTIALID and NOT NEWID().
    This 1 small thing will help avoid the massive index fragmentation caused by inserting a completely random GUID into your indexes.

    If you are going to use NEWSEQUENTIALID why bother with such a wide datatype? One of the advantages of guid is the difficulty in guessing. If you use NEWSEQUENTIALID it is simple to guess the next value because it is known. I would make a strong argument against using this because an identity is so much simpler to use and takes a lot less storage. If you use sequential values like this you have lost all of the advantages of the guid datatype and maintain all of the cons.

    If you are going to use NEWSEQUENTIALID why bother with such a wide datatype?


    I wouldn't use a GUID at all if I can avoid it... Sequential or otherwise...

    One of the advantages of guid is the difficulty in guessing.

    Um no... The advantage is that they avoid collisions. Difficulty of guessing the next value is simply a byproduct of that.
    The act of generating N number of GUIDs in advance, presorting them, and then using them in a sorted order, does nothing to change that... Nor does it make it possible to "guess the next one".

    If you use NEWSEQUENTIALID it is simple to guess the next value because it is known.


    I'm calling BS... If you want to make this claim, you need to prove it.

    I would make a strong argument against using this because an identity is so much simpler to use and takes a lot less storage.

    My quote starts with, "IF you have to resort to using a GUID"... I'm not sure how you translated that into me endorsing the use GUIDs over the use of integers... 
    My comment simply acknowledged that there may be circumstances where you may not have a choice in the matter... and in THAT circumstance, use NEWSEQUENTIALID. 

    If you use sequential values like this you have lost all of the advantages of the guid datatype and maintain all of the cons.

    If you consider rampant "bad" page splits that can only be mediated by setting ridiculously low fill factors an "advantage", then we clearly have different definitions of that word...

  • I wouldn't use a GUID at all if I can avoid it... Sequential or otherwise...

    Totally agree.

    One of the advantages of guid is the difficulty in guessing.

    Um no... The advantage is that they avoid collisions. Difficulty of guessing the next value is simply a byproduct of that.
    The act of generating N number of GUIDs in advance, presorting them, and then using them in a sorted order, does nothing to change that... Nor does it make it possible to "guess the next one".

    If you use NEWSEQUENTIALID it is simple to guess the next value because it is known.

    I'm calling BS... If you want to make this claim, you need to prove it.

    OK. This isn't all that hard to demonstrate that NEWSEQUENTAILID will create these values sequentially. It will increment the second character by 1 from 0 - 9 then A - F. It will then increment the first character by one and start over.


    create table GUIDTest
    (
        MyID uniqueidentifier not null default NEWSEQUENTIALID()
    )

    GO

    insert GUIDTest(MyID)
    default values

    GO 20 --20 new rows in our table each in a separate batch

    select *
    from GUIDTest

    drop table GUIDTest

    I would make a strong argument against using this because an identity is so much simpler to use and takes a lot less storage.

    My quote starts with, "IF you have to resort to using a GUID"... I'm not sure how you translated that into me endorsing the use GUIDs over the use of integers... 
    My comment simply acknowledged that there may be circumstances where you may not have a choice in the matter... and in THAT circumstance, use NEWSEQUENTIALID. 

    And my comment is that if you have to resort to uniqueidentifier but using NEWSEQUENTIALID is ok then you do not need to use uniqueidentifier at all. If NEWSEQUENTIALID is acceptable than it is the wrong datatype in the design.

    If you use sequential values like this you have lost all of the advantages of the guid datatype and maintain all of the cons.

    If you consider rampant "bad" page splits that can only be mediated by setting ridiculously low fill factors an "advantage", then we clearly have different definitions of that word...

    Of course I don't consider that to be good. Perhaps this was not communicated clearly. Clearly I said something in my post that got your feathers ruffled. I truly apologize for that because that was not my intention and by no means was I attempting to discredit you or make this a personal attack.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • It is pretty simple to see that sequential IDs are predictable.

    DECLARE @t table (nsid uniqueidentifier not NULL DEFAULT newsequentialid(), name varchar(20) NOT null)
    INSERT into @t (name) VALUES ('aaa'), ('bbb')
    SELECT * FROM @t

    nsid    name
    51FEB4B0-74D4-E711-B86B-34E6D729D69C    aaa
    52FEB4B0-74D4-E711-B86B-34E6D729D69C    bbb

    I'd like to see a showdown article on guids versus int identities since it is so hotly debated.  After 27 years of sql server programming I strongly prefer heaps+guids over the non-ANSI identities especially since I deal with sync and terabyte databases and my clients insert large batches of linked parent and child rows where keys are determined by the client.  It totally eliminates round trips fetching each parent identity to insert into the child.  I found guids to be much faster than ints perhaps because of 64bit CPUs.  And heaps give me high density hits of current data in my data cache.  I still use int identities on small projects where developers want them.  It would be nice to see someone define exactly where each solution has strengths and weaknesses.  Likely everyone is fed up with people pushing their limited and biased opinions on everyone on this subject.

  • I'd like to see a showdown article on guids versus int identities since it is so hotly debated.

    I'd like to see SEQUENCE numbers used in that evaluation.  They also eliminate the round trip as they are retrieved before the insert of the first parent.

  • Being able to guess the next NEWSEQUENTIALID is in the documentation for NEWSEQUENTIALID:
    NEWSEQUENTIALID (Transact-SQL)
    If privacy is a concern, do not use this function. It is possible to guess the value of the next generated GUID and, therefore, access data associated with that GUID.

    Sue
  • Sean Lange - Tuesday, November 28, 2017 12:56 PM

    I wouldn't use a GUID at all if I can avoid it... Sequential or otherwise...

    Totally agree.

    One of the advantages of guid is the difficulty in guessing.

    Um no... The advantage is that they avoid collisions. Difficulty of guessing the next value is simply a byproduct of that.
    The act of generating N number of GUIDs in advance, presorting them, and then using them in a sorted order, does nothing to change that... Nor does it make it possible to "guess the next one".

    If you use NEWSEQUENTIALID it is simple to guess the next value because it is known.

    I'm calling BS... If you want to make this claim, you need to prove it.

    OK. This isn't all that hard to demonstrate that NEWSEQUENTAILID will create these values sequentially. It will increment the second character by 1 from 0 - 9 then A - F. It will then increment the first character by one and start over.


    create table GUIDTest
    (
        MyID uniqueidentifier not null default NEWSEQUENTIALID()
    )

    GO

    insert GUIDTest(MyID)
    default values

    GO 20 --20 new rows in our table each in a separate batch

    select *
    from GUIDTest

    drop table GUIDTest

    I would make a strong argument against using this because an identity is so much simpler to use and takes a lot less storage.

    My quote starts with, "IF you have to resort to using a GUID"... I'm not sure how you translated that into me endorsing the use GUIDs over the use of integers... 
    My comment simply acknowledged that there may be circumstances where you may not have a choice in the matter... and in THAT circumstance, use NEWSEQUENTIALID. 

    And my comment is that if you have to resort to uniqueidentifier but using NEWSEQUENTIALID is ok then you do not need to use uniqueidentifier at all. If NEWSEQUENTIALID is acceptable than it is the wrong datatype in the design.

    If you use sequential values like this you have lost all of the advantages of the guid datatype and maintain all of the cons.

    If you consider rampant "bad" page splits that can only be mediated by setting ridiculously low fill factors an "advantage", then we clearly have different definitions of that word...

    Of course I don't consider that to be good. Perhaps this was not communicated clearly. Clearly I said something in my post that got your feathers ruffled. I truly apologize for that because that was not my intention and by no means was I attempting to discredit you or make this a personal attack.

    Of course I don't consider that to be good. Perhaps this was not communicated clearly. Clearly I said something in my post that got your feathers ruffled. I truly apologize for that because that was not my intention and by no means was I attempting to discredit you or make this a personal attack.

    Sorry Scott, I owe you an apology.  My feather were ruffled by someone else (nothing to do with anyone or anything related to SSC)...
    I let that influence my interpretation of your post and, consequently, it hit me at a weird angle That's 100% on me.
    I really am sorry about that and hope you can accept my apology.

    Also, I appreciate you setting me straight on the predictability of NEWSEQUENTIALIDs... The sad part is that I had commented, earlier today, on another post, the negative consequences firmly of believing something that simply isn't true. So, massive face-palm in that regard. I honestly don't know if I had read bad information that I took at face value w/o testing or simply assumed that SQL Server used the same "caching" mechanism that it uses with identity values 

    That said, in the context of using them as surrogate primary keys, I'm not worried about their predictability as long as they can still guarantee uniqueness.

  • Jason A. Long - Tuesday, November 28, 2017 1:50 PM

    Sean Lange - Tuesday, November 28, 2017 12:56 PM

    I wouldn't use a GUID at all if I can avoid it... Sequential or otherwise...

    Totally agree.

    One of the advantages of guid is the difficulty in guessing.

    Um no... The advantage is that they avoid collisions. Difficulty of guessing the next value is simply a byproduct of that.
    The act of generating N number of GUIDs in advance, presorting them, and then using them in a sorted order, does nothing to change that... Nor does it make it possible to "guess the next one".

    If you use NEWSEQUENTIALID it is simple to guess the next value because it is known.

    I'm calling BS... If you want to make this claim, you need to prove it.

    OK. This isn't all that hard to demonstrate that NEWSEQUENTAILID will create these values sequentially. It will increment the second character by 1 from 0 - 9 then A - F. It will then increment the first character by one and start over.


    create table GUIDTest
    (
        MyID uniqueidentifier not null default NEWSEQUENTIALID()
    )

    GO

    insert GUIDTest(MyID)
    default values

    GO 20 --20 new rows in our table each in a separate batch

    select *
    from GUIDTest

    drop table GUIDTest

    I would make a strong argument against using this because an identity is so much simpler to use and takes a lot less storage.

    My quote starts with, "IF you have to resort to using a GUID"... I'm not sure how you translated that into me endorsing the use GUIDs over the use of integers... 
    My comment simply acknowledged that there may be circumstances where you may not have a choice in the matter... and in THAT circumstance, use NEWSEQUENTIALID. 

    And my comment is that if you have to resort to uniqueidentifier but using NEWSEQUENTIALID is ok then you do not need to use uniqueidentifier at all. If NEWSEQUENTIALID is acceptable than it is the wrong datatype in the design.

    If you use sequential values like this you have lost all of the advantages of the guid datatype and maintain all of the cons.

    If you consider rampant "bad" page splits that can only be mediated by setting ridiculously low fill factors an "advantage", then we clearly have different definitions of that word...

    Of course I don't consider that to be good. Perhaps this was not communicated clearly. Clearly I said something in my post that got your feathers ruffled. I truly apologize for that because that was not my intention and by no means was I attempting to discredit you or make this a personal attack.

    Of course I don't consider that to be good. Perhaps this was not communicated clearly. Clearly I said something in my post that got your feathers ruffled. I truly apologize for that because that was not my intention and by no means was I attempting to discredit you or make this a personal attack.

    Sorry Scott, I owe you an apology.  My feather were ruffled by someone else (nothing to do with anyone or anything related to SSC)...
    I let that influence my interpretation of your post and, consequently, it hit me at a weird angle That's 100% on me.
    I really am sorry about that and hope you can accept my apology.

    Also, I appreciate you setting me straight on the predictability of NEWSEQUENTIALIDs... The sad part is that I had commented, earlier today, on another post, the negative consequences firmly of believing something that simply isn't true. So, massive face-palm in that regard. I honestly don't know if I had read bad information that I took at face value w/o testing or simply assumed that SQL Server used the same "caching" mechanism that it uses with identity values 

    That said, in the context of using them as surrogate primary keys, I'm not worried about their predictability as long as they can still guarantee uniqueness.

    Absolutely mate. No hard feelings here at all. Stupid internet makes discourse challenging because things get interpreted a certain way which was not the meaning at all. Such is the written word. Pretty sure we are in the same camp on this topic either way. 🙂

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sue_H - Wednesday, November 22, 2017 5:32 PM

    Jeff Moden - Wednesday, November 22, 2017 3:43 PM

    There's no need to have RBAR inserts in the presence of an IDENTITY column.  And there are no duplicate SSNs issued by the SSA.  If you run across some, then people are lying.  And, no... SSNs are not and have not ever been reused by the SSA.

    Ran into duplicate SSNs before and we had to verify it with the Social Security Administration - they weren't lies. And they've had other issues. Here is the SSA link explaining some of it - check the section on Handling SSN Assignment Problems:
    The Story of the Social Security Number

    Sue

    I took a look at the link you provided and I stand corrected.  Thanks for that.  I knew about the wallet facsimile/secretary SSN problem but didn't know about the duplication due to poor communication early in history.  Hopefully, they have all that sorted now.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sean Lange - Tuesday, November 28, 2017 7:22 AM

    Jeff Moden - Wednesday, November 22, 2017 3:43 PM

    Bill Talada - Wednesday, November 22, 2017 9:14 AM

    Primary keys should be non-human readable like a guid...at least in pure theory.  I love them.  Never use a natural key like an SSN because your life will become miserable; there are many SSN dupes and updates will kill you.  And stay away from those pathetic identity columns.  Use Sequences or newid() instead so you can generate primary keys from the app instead of RBAR identity inserts.

    Demand alternate keys (unique constraints) for virtually every table.  Some tables may have two or three alternate key groups of columns.  If you don't get this right, development will suffer and you will be babysitting the database and cleaning up all the time and I will think you are masochistic.

    Use positive logic names throughout such as IsShipped instead of negative such as IsNotShipped.

    Always store base data and never calculated stuff like ExtendedCost.  It is cheap to recalc on the fly or add computed columns or views.

    Err on the side of making columns not null and force developers to request nullable if needed.  Performance and simplicity suffer from too many nulls.

    There's no need to have RBAR inserts in the presence of an IDENTITY column.  And there are no duplicate SSNs issued by the SSA.  If you run across some, then people are lying.  And, no... SSNs are not and have not ever been reused by the SSA.

    Duplicate or not you should NEVER use SSN as a primary key. This is the type of information that should always be encrypted. I know you know this Jeff but it is so critical that it is worth mentioning.

    Absolutely no problem there... it IS definitely worth repeating because I see so many companies that not only use it as a PK, but they do so in clear text.  It's just insane.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sean Lange - Tuesday, November 28, 2017 2:19 PM

    Absolutely mate. No hard feelings here at all. Stupid internet makes discourse challenging because things get interpreted a certain way which was not the meaning at all. Such is the written word. Pretty sure we are in the same camp on this topic either way. 🙂

    Thank you sir... You're absolutely correct... Poe's Law in full effect.

Viewing 13 posts - 31 through 42 (of 42 total)

You must be logged in to reply to this topic. Login to reply