Beginning Database Design - Spot the Flaws

  • I'd say that Harald makes some extremely valid points. There needs to be separation between design (WHAT) and implementation (HOW) decisions. And I think the original article demonstrates just that. I have had any number of conversations that started the same way as the original article. And unless the design versus implementation questions are separated and answered in order, the project either goes nowhere or ends up being difficult to use/maintain/understand. Indeed, I might argue that the point to the first article is just this type of discussion, not the mechanics involved.

    WARNING...SOAPBOX APPROACHING...With regard to the image URL storage, why not simply record a URL or file path? Why bother with storing images in SQL Server? Granted SQL can do so, but then so can the operating system. And just because you CAN store images in SQL doesn't mean you should.

    And with regard to XML, the support in SQL Server is all well and good, but if you want to store and retrieve XML data efficiently, use an XML database. END OF SOAPBOX

    ------------
    Buy the ticket, take the ride. -- Hunter S. Thompson

  • A great article idea! I think the example is similar to one that many of us may face when requested to design a "simple" system. I won't repeat what has already been said, but I wanted to comment on the physical placement of tables/indexes. Even though this example says the system is for a single user, and therefore one would assume fairly small, I make it a habit to create a database with three file groups (PRIMARY, DATA, & INDEX) -- in my case I have a stored procedure that creates a database. After adding the DATA file group, it makes DATA the default group. This way, the only thing in the PRIMARY group should be system objects. Clustered indexes and heap tables go in the DATA group, and secondary indexes go in the INDEX group. Again, maybe overkill for this example, but good design should be a habit, not a choice.

  • G Bryant McClellan (11/20/2007)

    WARNING...SOAPBOX APPROACHING...With regard to the image URL storage, why not simply record a URL or file path? Why bother with storing images in SQL Server? Granted SQL can do so, but then so can the operating system. And just because you CAN store images in SQL doesn't mean you should.

    Isn't this again an example where the "WHAT" (... requirements do we want to fulfill?) should be clarified up-front? If we want to support distributed databases; or if we want to switch to another file server, file paths may create pain, whereas "everything in the database" might be easy - OTOH, with good server naming and/or RAID drives etc., this might not be such an issue.

    Other issues are operating - e.g., backups: When data are stored in other places than the database, maybe more has to be invested into backup and restore(!) processes, especially if those simple-to-redefine virtual file paths suddenly point to another server - but then again, there are backup tools out there which can do so much and then some even in clustered and distributed systems - however, who's paying for them? etc.etc. ...

    Still, if size permits and without any more information, I'd keep all things in the big box called database as a rule of thumb.

    Regards

    Harald M.

  • Matt Lacey (11/20/2007)

    It would be useful to know what database (version) we will be targeting. This would let us know which data types and features are available.

    He does state that this was done in the late 90's so that probably means his original design was for Sql Server 7 and maybe 2000. That would eliminate tinyint, xml, etc...

    Certainly I would agree with most of the comments already made. I agree that author's should be a separate table and a book_authors table would be nice also, but for a personal web site, where I am going to be theonly one entering reviews the current design works.

  • This is really is a great teaser. If this was a real task that I had to accomplish, however, I would have to ask questions - the requirements are not specific enough. Steve states that the site would be one in which he could log the books he bought or borrowed, track when they were complete and then write a short review. He also stated that it would be a single user system and that this wasn't designed to be a full scale, searchable system.

    This sounds like good information to build requirements from, but as soon as I tried to build technical specs I realized that this information is too vague and is incomplete. For example, what is the purpose? If the purpose is to log Steve's reviews of books, that can be done in a desktop application. Is there a further purpose? We could assume because we are building a website that Steve wants to share his reviews, but that isn't actually stated. If you are building a web site, it should be fully searchable for increased usability. If it isn't fully searchable, and therefore not very useable to your average internet troller, again, why not just keep a list in a desktop application? Also, some of the statements hinted at a lack of concern for growth, but a good database design will always consider growth.

    I would say the first step in good design is to gather or provide precise and complete requirements.

  • This (and some of your others) is a great comment, Harald.

    You have written exactly what I was thinking and often think in these situations. Very nicely done! 🙂

    harald_m_mueller (11/20/2007)


    Sorry if I come in "high-level", but what irritates me most is that there is no high-level description of what is needed. Some of the previous posters' remarks show this: Old Hand wrote

    "1. The ISBN column should be char instead of varchar, since it will always be 10 characters. This saves us those two extra bytes used for the var in the char.

    2. The author could be outsourced to a different table"

    These two items are on totally different levels of abstraction: The first is a technical information (which is based on knowledege about ISBNs), whereas the second is a modelling issue, with implications like

    * do we want to unify identical authors?

    * do we want to support an m:n relationship between authors and books in the future?

    etc.

    or his last remark "But I guess someone else is happy to translate this to 3NF.": Of course this is only possible if we identify the "entities" in contrast to the "values" in the model - and this is a modelling issue. Is an author "just a string" (a value) or "a real being" (an entity)? - only when we have decided on this, can we apply normalization.

    E/R modelling has long honored the disciplines of logical and physical modelling, exactly because of such issues; and with UML and other modelling concepts, it is even clearer that "logic"/"requirements" (what do we need conceptually to solve the problem) and "physics"/"design" (what do we need technically to solve the problem well for the chosen technology) must be separated - although, of course, they are very heavily dependent on each other (and of course, I do not want to promote "waterfall", i.e., someone must "know beforehand exactly what is needed" - this is nonsense, and certainly so in small and very small projects).

    Is it worth to pursue such a clear separation of issues in such a minimalistic project? I'd argue yes - maybe even more than in a large project: Because otherwise, discussions become muddled:

    - A: "I want to source out the author"

    - B: "But why, it's so small, there is no performance gain!"

    - A: "But we need to hava consistent author data"

    - B: "But then you will lose performance because of all these JOINs!"

    ...

    Such discussions are often quite fruitless, unless there is a common understanding that these two levels of issues must be separated.

    One issue which was not yet mentioned: Nullability. Is it already (on the requirements level!) decided that all data are nullable? - I'm quite sure that this is not ok; this must be clarified on the requirements/use-case level. And technically, your scripts dont even work: a primary key can only be declared on fields that are not null (so maybe the scripts are a clumsy way of expressing requirements? - then, Steve, please quickly switch to E/R diagrams or UML or whatever ... but not DDL/SQL).

    Otherwise, I support many of the items of the previous posters - only I'd like to see them "sorted into the right buckets" (of "requirements" and "design", respectively).

    Regards

    Harald M.

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Excellent Topic. I hope to see more articles on design.

    I agree with much of what was stated above, so I won't retype it.

    Let me mention 2 things:

    1. I would create a author key in the Title table, a ISBNtoAuthor Table, and a separate Author Table. This would create a many to many relationship. I don't go for the "put all book authors in one field separated by commas" idea. That is not how you design databases. Its short sided and you will want to break out the authors someday, might as well do it right the first time.

    2. Additionally, I would create a separate table for CoverImage and a ISBNtoConverImage Table. This way, I could make money off of other book stores besides Amazon. If Amazon doesn't have the book, maybe "Barns and Nobel" or "Borders" will give me money for the referral.

  • I have to agree with most of Old Hand's suggestions. Here are a few other practical ideas:

    (1) ISBN, title, and author should be not null (title should be longer than 80, too). nvarchar for title.

    (2) The clustered index should be on title+ISBN, because most lookups will be by title. (I would also add an index for authors b/c that's another common lookup)

    (3) a unique nonclustered index can be on ISBN

    (4) Someone suggested an identity as another index. That sounds like a practical idea because, in reality, I'll bet (somehow) ISBN is not completely unique...reality is always different than theory. I also agree that the ISBN field needs to be expanded in length for foreign titles, etc.

    (5) Although you could break this thing up into 3NF, that can unnecessarily overcomplicate things. But I love Old Hand's idea of storing the author as an index into an "authors" table. This allows for multiple authors

    (6) You might need an index for image also, in case of multiple images.

    (7) It seems implied that there are tables with lists of bookstatus, owner, and rating.

    (8) Need to think of a way to allow for multiple owners, just in case.

    This was a great exercise. I look forward to reading other's ideas.

  • I find it puzzling to have the reviewdate be the clustered index. This seems like an unlikely column for a query. Wouldn't it make more sense to have the title be the clustered index and perhaps the author be a non-clustered index?

  • Assuming you want this table to store lots of data then follow these recommendations. If you aren't storing much data, then use Excel and and save your customer some bucks in licensing.

    1. Separate your non-clustered indexes in another data file.

    2. Any column with less than 10 elements make it char, not varchar (already stated elsewhere but worth repeating)

    3. Isolate your text column on another data file.

    4. Use a Function to join the tables instead of lumping everything into one table or using a View.

    5. Unless you are using foreign languages, avoid nchar and nvarchar (ignore all advice that suggests otherwise)

    6. Base your indexes entirely upon the queries that return data.

    7. Store your dates in varchar columns; 20 or 25 elements will work. It eliminates the conflicts between the different programming languages.

    This will get you a good start. Anything less is failure and does a disservice to the SQL Server community.

    z.

  • dave.baldwin (11/20/2007)


    I find it puzzling to have the reviewdate be the clustered index. This seems like an unlikely column for a query. Wouldn't it make more sense to have the title be the clustered index and perhaps the author be a non-clustered index?

    Consider a previous posters remarks of this being a write-once/read-many type of application. One could easily theorize that this is a web application, and that the front page of said app would typically be showing the reviews in some form of order. Alphabetical by title or author isn't bad, but more often than not, sorting by descending date is more relevant. Searching is a different bag of cookies altogether.

    harald_m_mueller, I think your missing the spirit of what Steve is trying to put out. Yes, there could be a better description of what requirements are, but as a generic learning exercise and discussion go, it's not that bad. If anything, this article did what it's supposed to do, and that's to generate questions. In day to day jobs, no one is going to give you a perfect requirements. There's always going to be communication happening to narrow down the focus. It's just the way it is.

    And one more thing, for someone sounding very pretentious, "Old Hand" is a board title. Not a person.

  • My 2 cents - without covering the gamut.

    char for ISBN. It indexes and searches better.

    nvarchars. When unsure, don't use it. It's not a big headache to modify it in the future.

    Identiy column. Not needed here, not yet anyway.

    Seperate table for the reviews. Not a bad idea.

    smalldatetime. I avoid it like the plague, but that's just personal.

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • I happen to think that Steve's original design is quite sound. Performance considerations aside, it meets the original requirements. It is so easy to over-engineer a database to the point where it exceeds the scope, in this case: publish your personal book reviews on the web. We don't know if there will be 100's or thousands of books, but chances are we'll tend to be on a smaller scale.

    My only suggestions would be to add fields for publisher and publication year, and to size the ISBN bigger than 10 since that piece of data is outside your control.

    BTW anybody looking for a product like this should check out the Book Collector program from collectorz.com, I use it to manage my collection of 200 technical books & it works great. You just enter the ISBN and it searches online databases like Amazon, Barnes & Noble etc. and populates all the fields, including the cover image. It will also create all the html pages for publishing to the web. Very slick, and only $40 USD.

  • DBA_Rob had a good suggestion for the physical placement of data using PRIMARY, DATA, and INDEX file groups. I would add one more if the plan is to store images in the database, IMAGE. Seperate the images in the database from the other data.

    😎

  • don_goodman (11/20/2007)


    7. Store your dates in varchar columns; 20 or 25 elements will work. It eliminates the conflicts between the different programming languages.

    What about the formatting issues of dates and times in such a situation?

    I would expect this to cause more issues in terms of regionalization and multiple languages/cultures than would be saved by forcing all programming interaction with the data to format in a specific, hard coded way.

    This would also limit, or at the very least complicate, operations and queries you may wish to perform on the data as you would have to perform a conversion on each date before you can do anything.

    This will get you a good start. Anything less is failure and does a disservice to the SQL Server community.

    Do you need a special kind of brush for such a sweeping statement?

Viewing 15 posts - 16 through 30 (of 96 total)

You must be logged in to reply to this topic. Login to reply