Well I’m not a data modeler, nor do I play one in the corporate world. But it is TSQLTuesday No. 72, and I have my tales to tell. Mickey Stewe, (b|t), aka @SQLMickey, database architect, whose focus is on data modeling, has invited us all to talk on the topic of, well, no surprise, data modeling. Her invite has been posted on her blog here. Who else but a data modeling expert would know when things go wrong, and that is what we discuss today. I have worked hands-on closely with internationally renowned data architects in various places, and that is not an understatement. No names today. I’ve certainly learned many things, but none more so than what can go wrong with data modeling.
Data modeling is a distinct (no pun intended) profession, not a core responsibility of a typical DBA. I have had my hand at architecting small database structures, and improving bigger ones, as performance lags. On a basic level, everyone should know about database normalization for relational databases. We know normalization is the technique for organizing data in the database in order to minimize data redundancy. This can optimize data retrieval operations, and ensure referential integrity. The typical optimal design standards are the 1st, 2nd and 3rd normal form. Yes there are more, but then again we’re talking about things that go wrong. So here is a great primer of Data Modeling 101, that everyone attempting to design a database should see. Quick primer, on data modeling basics of the 3 forms of normalization.
First Normal Form
- Eliminate repeating groups in individual tables.
- Create a separate table for each set of related data.
Second Normal Form
- Create separate tables for sets of values that apply to multiple records.
- Relate these tables with a foreign key.
Third Normal Form
- Eliminate fields that do not depend on the key.
So, there are times you want to actually de-normalize your database, but I’ll leave that for the experts above my DM pay-grade. However, often times, such as when developers are designing the database model, don’t follow the simple rules, database chaos ensues. It might not be apparent at the outset, but as time goes by, and your SQL Server starts to creak, and data grows exponentially, only then do you realize, wow, this thing is not designed to scale.
One of the biggest faux pas I’ve seen over the years is when a 3rd party vendor has a great app, but apparently they didn’t bother hiring a data modeler. Many of these applications start out on other platforms, such as Access, or Btrieve (remember that?), and such legacy applications are migrated to SQL Server for the data backend. This often results in dozens of tables without any relations defined and no primary or foreign keys. Half the tables are 50 columns wide and no indexes. Most of these tables have no understandable layout, no proper naming convention – ie Table001, Table002, etc. Datatypes are fudged in migration utilities, and everything comes over as a varchar (max). Oh, and by the way, you can’t place an index on a varchar(max) column, so we should ideally limit its use. This of course wastes space, and results in an awfully shoddy database design that won’t scale, and of course, performance will suffer, customers will complain, and businesses will be jeopardized.
Not too long ago at a client, one to remain nameless to protect me from any lawsuit J , was chugging along on its SQL Server, until logging in took > 10 minutes thru the front end, and updating or retrieving records in excess of >10 minutes. That’s a life time for an end-client to wait, an absolutely unacceptable for any business to survive. I even cringed at the prospect of coming close to touching it, but was answering the call of duty and plea from the CEO. We made a lot of positive changes but a long way to go. This database, as it were, could not have possible been designed by a data modeler or architect. I was told it was designed by one of the premier database architects in the country. Well, probably one of the premier developers, but again, not naming names.
Even so, what was created at the outset was changed and modified nteen times by revolving group of developers, and certainly not the same pristine database design that it started out to be. So, we cannot fault the original database architect here, it’s been completely bastardized since the original design. Therefore, the biggest point in this discussion, is that not only must an awesome database be perfectly designed, the data model must be maintained, and flexible to adapt to changing requirements. New tables, column views, stored procedures, functions, indexes, etc. may need to be added, modified, or dropped. Therefore, the database schema must be versioned, and the original model preserved.
My favorite analogy to the aforementioned client is The Nine Circles of Developer Hell, cleverly crafted by Kendra Little of Brent Ozar Unlimited. While the client’s application is still a work in progress, we categorized the database as being in the 6th circle of code hell – code heresy. We were able to bring it back from the 7th circle - violent code, but there is long way thru the River Stix, to get back to the 1st Circle, Code Limbo. There is a lot that can go wrong. Before I signed up to help this client, the contract should have read “Abandon Hope All Ye Who Enter Here”
- Write a blog post about the topic. Don’t have a blog? Start one!
- Include the T-SQL Tuesday Logo and link it back to the original invitation post.
- Publish your blog post Tuesday, November 10, 2015, between 00:00 GMT & 23:59 GMT.
- Leave a reply on the original invite with a link to your blog post (for the round-up).
- Share you post with the community! Tweet it out using the #tsql2sday hashtag
Thanks for hosting Mickey!
And of course, if anyone is interested in learning more about my book Healthy SQL – A Comprehensive Guide to Healthy SQL Server Performance, published by Apress, you can go to the url:
You can also get the book on Amazon: http://bit.ly/HealthySQLonAmazon
For all things SQL, news, events, jobs, info, and other fun tweets, follow me on twitter @Pearlknows and join the #HealthySQL campaign to keep your SQL Servers healthy!