"This above all: to thine own self be true,
And it must follow, as the night the day,
Thou canst not then be false to any man." [emphasis added]
William Shakespeare, "Hamlet", Act 1 scene 3 (1564 - 1616)
During an exchange in the SQLServerCentral.com user forums a while back, I was asked to give a summary of tips for good data modeling and db design. I answered with what I considered to be a well-thought response, but I realize that the points bear some further detail. At the risk of being compared to Hamlet's voluble antagonist quoted above (hey - at least he made an attempt at being a logician; see emphasis), I offer these 5 tips to help improve your data modeling and db design.
To start with, here are the 5 tips summarized:
1) Educate yourself on data fundamentals
2) Learn the various success criteria against which a design can be judged
3) Expose yourself to other software modeling disciplines
4) Devise techniques to ease design changes for your environment
5) Practice, practice, practice
First and foremost, educate yourself on data fundamentals. Everything else follows from the fundamentals. This can be easier said than done, however. For starters, relational theory tends to get minimal treatment in computer science degrees, and the presentation of the subject matter tends to favor vendor-specific training over general principles.
Therefore, a more accessible route (especially for the busy professional) is probably self-study using books, internet resources, and for those with the benefit of training budgets, seminars or conferences.
Rather than giving you a full list of the books I think are important, I'll limit myself to my "top 5" titles, and rely on the readers to help round out the list in the discussion postings...
- Data And Reality, by William Kent: First published in the '70's, this is a classic text that will help the practitioner think about data apart from its specific representation in a particular DBMS (e.g., SQL Server). Since it was written around the time of the relational model's rising popularity, there is enough to give the reader a sense of some of the other data models in use at the time.
- An Introduction To Database Systems, 8th Edition, by C.J. Date : Simply stated, a data management practitioner's bookshelf is incomplete without this text (or, at the very least, a recent edition).
- Applied Mathematics For Database Professionals, by Lex de Haan and Toon Koppelaars: This is a recent release that will help the reader to understand the translation from pure set theory and logic into the relational model. The book has a number of examples and exercises, and culminates in a fully formed sample project that is based upon an Oracle implementation. An excellent exercise is to work the sample for T-SQL.
- Database In Depth, by C.J. Date: A well-condensed summary of much of the thinking that has been going on with the relational model for the last 30 years.
- Practical Issues In Database Management, by Fabian Pascal: This book highlights the disparity between relational "the theory", and relational as implemented in current SQL DBMSs. Understanding where SQL falls down in this regard is essential to avoiding many common traps in database design.
A couple of online resources that I recommend include the Google newsgroup "comp.databases.theory", and the website DBDebunk.com. The latter, although not actively maintained, has a lot of food for thought, topic-specific papers written by C.J. Date and Fabian Pascal, and good references.
Learn and understand the tradeoffs that you will face during design. Any given conceptual model will have many potential logical models that satisfy the information requirements stated therein. The discipline of design is little more than making decisions about which of the various success criteria are important, and choosing a logical model accordingly. This trivialization assumes, of course, that the designer has created the options from which to choose.
A good summary article by Graeme Simsion [Simsion, 2005] outlines these tradeoffs. For a more in-depth treatment, I refer you to Simsion's book (co-authored by Graham Witt), Data Modeling Essentials, Third Edition.
Other Modeling Disciplines
For data modeling alone there are numerous approaches and notations that one needs to be familiar with. For instance, UML, ER, and Object-Role Modeling, are various notations that can be used for logical and conceptual modeling, and can even represent certain details of physical implementations. If you work primarily with OLTP systems, take some time to familiarize yourself with the modeling concepts used on the BI side of things -- OLAP, cubes, star schema, etc. (and vice-versa if your forte is BI).
Beyond the data management arena, we have UML (for OO design and other software development artifacts), state and control flow diagrams, XML hierarchies, and OO design patterns, to name a few. Although I have my own bias on what provides the most business value (it's all about the data, after all!), there's always something to be learned from other similar disciplines.
Have An Upgrade Plan
Accept that you will make mistakes, and those mistakes will make it into production. OK... even if you're perfect, yesterday's requirements will change and new requirements will arise for the applications that depend on your db design. Develop techniques to deal with this.
The constraints imposed by your deployment environment are going to play a big factor in what you can get away with. Are you shipping a database as part of a shrink-wrapped package? You'll likely need a utility that will be able to bring your customer's installation up to the latest rev no matter what state it's in. Is your production environment a 5-9's installation with db sizes in the hundreds of GB or larger? You'll need strategies for in-place updates to your design that take these scales into account.
Multiply the foregoing with all of the different types of changes we may encounter -- removing and adding keys and constraints, breaking tables down for better representing distinct concepts, etc. -- and it should be apparent that this tip alone is worthy of its own article (or series).
...practice, practice. This includes reviewing the work of others as well as trying new things out yourself.
One way to get exposure to numerous design problems is to participate in the forums here on SQLServerCentral.com. Keep in mind, however, that the real benefit in this is learning to recognize when important requirements are left unstated in a problem. For example, witness the number of replies and back-and-forth for "simple design" questions.
Another approach is to consider the work of others and think about how you might model things differently given your own hypothetical requirements. For instance, you might review the backend for an of-the-shelf software package in use at your company. Additionally, you could review the models in a db design guidance website like http://www.databaseanswers.org/data_models/. The point is to gain experience iterating over solutions more often than if you only constrain yourself to problems presented at your work.
I hope you enjoyed these tips for becoming a better data modeler. I believe that if one keeps these in mind and works at it, they will find themselves an increasingly valuable member of their development teams.
[Simsion, 2005] "Better Data Models - Today", http://www.tdan.com/view-articles/5100
About the author: Troy Ketsdever is a database specialist with over 15 years of commercial software development experience. His main objective and vision is "making the right information available to the right people at the right time".