Best Practices for Database Design

  • I guess naming standards fall under the same type of conversation that religion, money and politics fall under.  We all have our opinions and nobody is right, but noone has found the perfect solution.



    Dan Avsec
    MCSD, MSDBA, MCT

  • "So you change your name as you move from room to room."

    "A Rose by any other name..."

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • No, that's where you're wrong.

    I'm right.

    --
    Adam Machanic
    whoisactive

  • "If you always use word ID or CD after the field name, you always know that it is a primary/foreign keys."

    Many of my customers use CD for codes that aren't keys.  For instance a status_cd might be a foreign key or it might not.  In a fully normalized environment it will be a FK to some reference table, but that is not always the case.

    --

    "Regarding the "name" or "description" field.  Let's say you have a field in 2 tables named "name".  The tables are named item and customer.  I have found it much easier to name the fields accordingly.  I name those fields CustomerName and ItemName.  I find it confusing to have the same column name for unrelated fields throughout a database. "

    I also find it confusing.  I particularly dislike "name", "status", "type", and "desc".  Not only are they ambiguous if ever viewed apart from their table name, they're also more likely to be reserved words.  For instance "desc" is an ODBC reserved word, so is "names".  "type" is a reserved word in Oracle and "named" is a reserved word in DB2.  If you always encounter the column name with the table name it's no big deal (i.e. a SQL statement), but when looking at the column in various screen builder tools it can be very confusing to figure out just where that Description field is coming from.

     

     

  • Well its not like the name, it's like sometime people know me as programmer or gamer or dba. LOL

  • Yes, and everybody is entitled to my opinion!

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Well, I will prevent the use of 'cd' keyword other than for primary/fr key. But if i have client that already have a database setup I follow their naming standard.

  • Article was OK but here are some general thoughts:

    1) If you can use the complete name for a table, use it.  There's no need to abbreviate, it just creates more confusion. Promo should be Promotions. 

    2) You say it's ok to use views as long as there aren't  too many joins.  Not really sure where you're coming from on this one.  Views are used to limit access to data and provide end users with a "flattened" view of the data, which in a properly normalized system, could indeed involve several joins.  Views provide a way for us to let users see the data as it relates to their business.   End users are usually not sofisticated enough to write their own queries using normalized data, so views are the only real solution.

    -B

  • What he was talking about in terms of how many joins in a view is the unfortunate aspect that once you get beyond a few joins the performance of the query or view tends to go downhill rather fast. General rule of thumb is to never go beyond 3 or 4 joins in a view definition but it is a rule that can be understandably broken while working under a tight deadline or some such.

    I usually "break the rule" when doing the prototyping for a system and then go back and do some testing of the performance of a complex view vs. a stored procedure.

  • I fully understand the performance implications, however, in larger complex systems where many codes are used (via a reference table), multiple (> 4) joins are required in the view.  A view is the best alternative for end-users wishing that need ad-hoc query/reporting capabilites.

    Actually, the best alternative is to implement a datamart for the data and take periodic snapshots that provide a "pciture" of the data in a flattened state.  Reports and queries can then be run from the datamart rather than directly from the OLTP system.  The one drawback here is the frequency at which data changes in the OLTP system....that will impact how often one needs to update the datamart, which could impact performance/access to data.

     

    -B

     

  • dcpeterson:

    I just want to touch on this.

    You statements previously about

    Train Station rather than Trains Station

    Shoe Store rather than Shoes Store

    Container Ship rather than Containers Ship

    I will say you are right and as was pointed out these are containers for multiple objects or conceptually we are talking the database name.

    And then you state the following.

    "Guilty as charged!  Since it is the container (relation) that imparts meaning to the set I think the relation is where the point of focus should be when naming your tables."

    Again I agree but you should keep in mind when you are descirbing potential objects in a database there can be 0 to many in the object defined.

    So for a train station database you might have the following tables defined

    Trains

    Platforms

    Employees

    Schedules

    Ticket Counters

    Employee_Titles

    Resturants

    Locations (To cover both arrival origin point and departure destination names)

    ...

    For a shoe store database (based on the concept of the shoe store itself) you might have

    Employees

    Employee_Titles

    Checkout_Lanes

    Departments

    Sizes

    Aisles

    Brands

    Promotions

    ...

    And with the Container Ship you might have something like the following

    Cargo

    Crew

    Holds

    Hold_Types (Refirgerated, Sealed, etc)

    ...

    So a database is in itself the representation of a single concept (So a singular name) is made of 1 or more defining concepts (tables) which are defined as a set of attributes that may occur 0 or more times in the given environment (columns and rows representing 0 or more copies of the concept).

    So a table is named for the chance of multiple concepts at any time thus it is plural and the coulmns define a attribute per copy so singular.

    Just because you are viewing a single concept object doesn't mean more do not exist and like wise if the table contains only 1 now it doesn't mean later that couldn't change.

    Take for example a specific family.

    Travis_Family (Database)

    Parents ((Entity_ID))

    Children ((Entity_ID)) --Conceptually I had only 1 child until 1.5 years ago for the 1st 6 years of my marriage, didn't change the fact we had children in our family.

    Entities ((ID), Names, Birthdate, Hair_Color)

    Pets ((Entity_ID), (Pet_Types_ID))

    Pet_Types ((ID), Description)

    Now you go to variables.

    Variables are the concept of 1 unless they are an array (table variable, cursor, etc) so the name convention you use still should represent the concept itself in proper perspective.

    Then your later statement

    "On the flip-side, standards that are too rigid are often impossible to implement.  If the process of data modeling could be reduced to a set of standards "checklists" I would be out of a job.  An architect must have some leeway within the standards.  This is where experience is required to make decisions based on guidelines, or priorities that are, to one degree or another, mututally exclusive."

    I truely disagree here.

    Data modeling itself has nothing to do with standards, and vice versa standards have nothing to do with data modeling.

    The only thing standards should do are define a strong definition of conventions (naming conventions), rules (customer sign-off requirements, database must provide a transaction audit), and/or testing the design (normalization). Standards should be rigid in their requirement and should define the course of developement not the developement itself.

    Also depending on the function of the standard some should be rigid and some should not but it should rarely set the design itself.

    All and all thou you still are the ultimate factor in the decision of your naming convention so just because I do not agree with your choices doesn't mean you have to change it.

  • One more time and then I'm done...at least in this forum.

    No, I was not talking about a database name, and frankly at the logical level there is no such concept as a "database." 

    My examples were merely to show how people are used to dealing with a singular noun when refering to a container for multiple objects.  This surely fits the definition of a table.  I was countering the argument that "Employee" table inplied a single employee and that in order for most people to infer multiple employees, one should pluralize the name to "Employees".  Nobody seriously thinks that a sign that says "Employee Entrance" implies that only one employee may enter, and I have yet to ever see a sign that read "Employees Entrance", not that it couldn't exist...  There are other examples as well: "Carpool Lane" doesn't imply that there can be only one carpool, "Truck Stop" doesn't imply that it serves only one truck, etc...  Obviously, if one were to create a set of tables to hold data about these things you would need to expand on the ideas, but I was strictly addessing the idea that singular names necessarily imply singularity in the objects that those things serve or contain.  I have amply demonstrated that this is not the case.

    My whole argument is that the number of employee records in the employee table is totally irrelevant to what the table should be named.  The name of the table should, in some way, reflect the logical definition which is based entirely on a single predicate function. 

    Your whole argument pivots on the fact that a table can hold more than one record.  Once again, this is totally irrelevant because a relational table--being a variable-- can hold only one value at any given time.  The value of the variable (the set) may be very complex, but that too is irrelevant.  That we might only be interested in a certain part of that variable and express that intrest in terms of both our column list and WHERE statements in our query is also irrelevant, the table is still a variable and it contains only one value at any given point in time.  It is more correct to view the Employee table as holding the SET (singular) of employee data rather than holding a bunch of employee records (plural).  The latter view is more suited to a spreadsheet, not a relational database.  If I were to create an Excel file of employees, I would likely name it "Employees" but a relational table is a different animal altogether.

    One more hole in your argument is that you have no problem with singular column names.  This may seem like a no-brainer, at least until you understand that a relational domain (column) can be defined as another relation (table).  There is nothing that prohibits this practice except that it gets to be a real mind-bender and of course, our current set of SQL based products can't adequately deal with them.  But that doesn't change the fact that they are allowable.  So if you insist on pluralizing you table names, you should logically do the same to your columns.

    As for my discussion of variables...  Once again, relational tables ARE variables, and can contain only one value at a time.  "Learn it, live it, love it!"

    My point about standards is that as they become more prescriptive, the odds increase, perhaps exponentially, that some of those prescriptions will be mutually exclusive in some circumstances.  Also as they become more prescriptive (or proscriptive, or both) they become ever more difficult to implement and require increasing amounts of time to ensure compliance, which paradoxically becomes less and less likely because of the built-in conflicts, and round and round we go...  I'm not arguing against having standards, merely acknowledging that there is a tipping point at which they become an obstacle rather than a tool.  On the other hand standards that are too loose aren't standards at all...

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • I think dcpeterson summed up this thread nicely in this statement:

    My point about standards is that as they become more prescriptive, the odds increase, perhaps exponentially, that some of those prescriptions will be mutually exclusive in some circumstances.  Also as they become more prescriptive (or proscriptive, or both) they become ever more difficult to implement and require increasing amounts of time to ensure compliance, which paradoxically becomes less and less likely because of the built-in conflicts, and round and round we go...  I'm not arguing against having standards, merely acknowledging that there is a tipping point at which they become an obstacle rather than a tool.  On the other hand standards that are too loose aren't standards at all...

    --

    Naming conventions are certainly a tool, not a divine mandate.  And violating a particular standard on occassion due to a reasonable exception does not invalidate the rest of your efforts to follow the standard.  The more they can be applied the better, but taken to an unhealthy extreme they are counterproductive rather than helpful.

    Great discussion throughout.  Thanks for the article JD!

     

  • OK, I lied.  I'm not quite done. 

    I was finally able to get my hands on part of the ISO 11179 specification.  As expected, it was a tangle of muddled terms that are meant for wide applicability but truly useful for none, and particularly not for relational database concepts. 

    Be that as it may, they still got the entity naming correct.  In the document referenced below, they specifically give the example of an "Employee" table, not "Employees".  Granted I don't have access to the complete spec so there very well might be something in a later part that contradicts this example, but then what good is ISO if they can't even follow their own standards?

    http://metadata-standards.org/11179-1/ISO-IEC_11179-1_1999_IS_E.pdf

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • I believe in having naming conventions, but also documentation of desing and naming would be a big help. It would be nice to see somewhere a standard developed to use accross multiple RDBMS, not just a single one. I think ISO 11179 is a good step in right direction.

Viewing 15 posts - 61 through 75 (of 145 total)

You must be logged in to reply to this topic. Login to reply