Learning Data Modeling

  • Comments posted to this topic are about the item Learning Data Modeling

  • Basic database normalisation is covered in a lot in introductory texts about learning SQL, such as the Sams "Teach yourself X in 24 Hours" series.  These usually go up to third normal form as this is the most an analyst will usually need to know, and whilst they cover the basic logic normalisation it is really only a brief overview.  My introduction to Relational Algebra was from a book called "Access Database Design & Programming: Creating Programmable Database Applications with Access 97, 2000, 2002 & 2003" by Steven Roman.  I purchased it for a section on Visual basic, but found the section on Database design fascinating and was much more advanced than the title of the book, or its focus on Access, would suggest.  This is as good an introduction to database design and normalisation, up to Boyce Codd Normalisation, as Itzik Ben-gan's introduction to Set Theory is to advanced query design.

  • I'm disappointed but not surprised by what Steve says in his article. It's my experience too.

    IMO "Data Modeling" is an extremely useful skill. It fits well with lots of other skills (like programming, database design and systems design) but is something different from all of these. Perhaps that is one reason why is seems to be overlooked and undervalued.

    As @alex Gay implies, it's a really "portable" skill. The ideas of data modeling are just as valid with a small Access database as they are with a multi-zillion row, hundreds of tables ERP system (and the other way round). It's application isn't even limited to relational databases. I've used the same techniques for hierarchical structures and I've even successfully applied the principles to entirely manual filing systems. Understanding the compromises implicit n the data model can help you understand the limitations of any system derived from it. Of course, the impact and cost/benefit of those compromises is greater with the larger, more complex systems.

    Data modeling doesn't require lots of fancy tools or remembering lots of complex syntax, what it does require is abstract thought and then rigorous documenting of the results (the tools are _really_ useful for maintaining the results). Maybe those are further reasons why it is unrecognised or unfashionable.

    Maybe the root of the problem is that many people don't recognise "data modeling" as an activity at all. As a result they "don't know what they don't know" and so don't even think of looking for books of the kind @alex Gay mentions. This is all rather sad, especially as we are supposed to be moving to a world of "Big Data".

    Tom Gillies LinkedIn Profilewww.DuhallowGreyGeek.com[/url]

  • Proper database modeling is more than just knowing how to draw diagrams or about normalization rules.  A huge portion is understanding the subject area, data flows, processes and goals.  I would also include the ability to look ahead at possibilities and to judge what is feasible for both the near term and future capabilities.  Besides Entity-Relationship (E-R) diagrams, I have also witnessed the decline in the use of Data Flow Diagrams (DFD).  In my experience, DFDs have been very instrumental not only to document data flows within a system, but more importantly as a means of discussion to ensure that I correctly understand the system properly or as a tool to help others to understand.

    I have often wished I could distill my 30+ years of experience in data modeling into a book, but I have been plagued with not only a lack of time but also the inability to turn my experience and knowledge into a set of discreet rules or a methodology.  Perhaps it's because I unconsciously follow a set of rules that I am not aware of and need to identify them first! 😛

  • Aaron N. Cutshall - Friday, July 7, 2017 5:56 AM

    ...
    I have often wished I could distill my 30+ years of experience in data modeling into a book, but I have been plagued with not only a lack of time but also the inability to turn my experience and knowledge into a set of discreet rules or a methodology.  Perhaps it's because I unconsciously follow a set of rules that I am not aware of and need to identify them first! 😛

    That's a really good point. It all seems so easy: "identify the things" (entities), "write down what you want to know about them" (attributes), then normalize (to whatever normal form) and draw a diagram. Simple! Nothing to it. (My tongue is firmly in cheek there)

    I guess a lot of us "unconsciously follow a set of rules" and were probably taught by other people who did the same. If you do get round to codifying your rules, I might be interested in the book. 🙂

    Tom Gillies LinkedIn Profilewww.DuhallowGreyGeek.com[/url]

  • I've found Louis Davidson's Pro SQL Server Relation Design & Implementation and the Star Schema: The Complete Reference to be excellent resources.

    Like Aaron said, you really need to be familiar with the data flow.  Logical modelling has been a big help in that area.  Database Design is also somewhat of an art in addition to science.  You can over normalize just as easily as not normalize enough.

  • I think a part of the problem is the way people approach it.  When I work with developers, they already have preconceived notions coming into a design session of what the database should be, and want to jump to the end of the process.  There are some that try to tie the data model directly to how the application screens are designed, and don't want to model the data.  There are some that just want to come up with the most normalized design they can think of and be done with it.  Then there are sill others that think the best thing is just an attribute/value list, because they think that will be the most flexible and never require changes then.

    I approach it by thinking about the logical design first, come up with the conceptual ideas of the data we want a system to represent, and the cardinality of their relationships.  Next I start doing the normalization and analysis, then produce the physical model and design the tables.

    Related to that, people often forget that normalization is part of analysis, not part of design.  Once you have a model in third normal form, you will sometimes find that there is a need to denormalize certain aspects of it to make it usable in a real world system.  I've had to work with other people's designs before where they stuck to third normal form or higher, and then to get any useful information out of the system you had to do 5 or 6 joins, for something that I think should have been 2 or 3. 

    Then of course people try to build their reporting structure directly on top of this heavily normalized system, because they don't want to do the work of designing a star schema as part of a data mart or data warehouse, and end up doing more work because of it.  Data modeling is work, but I believe that if you go through the process, then it can actually reduce the total work time spent in the end.

  • I have, sad to say, found few developers & dbas who can understand an ERD, let alone design one or modify and realise the repercussions. Luckily when I started I had decent BAs who mentored me on design.
    The main excuse is that they want to optimise the database before too much code is written.
  • Why don't they demand better documentation? If they were to ask for us to produce ER diagrams and documentation, I'd hope most of you could produce one, even using SSMS or some other tool.

    No, no, no!

    THIS is the problem. ERD is NOT (just) documentation. ERD is the bedrock of your design, it is at the same time the actual design, the complete documentation of the design and (if you're smart) grist for the tool that creates the scripts that create the database. Any alterations that are made are made to the ERD first and then fed via the ERD tool to your development database.

    Personally I use (and live by) ModelRight, an ERD tool that does all of the above. It lets me design the physical model (which for me is conceptually identical to the logical model), it creates the scripts, it lets me create the comments that describe the tables, columns, and the standardized data dictionary column types. Further, when changes happen it lets me simultaneously make the changes, document them and creates the database ALTER scripts for me.

    ModelRight enforces consistency of data types, it verifies all the t's are crossed and the i's dotted. It will verify the model's consistency. It lets you automatically create columns for foreign keys simply by creating a relation--and it can delete the FK by deleting the relationship.

    It even follows my own custom naming conventions automatically!

    There are other ERD tools, of course, like Rational Rose, but they tend to be really expensive (read thousands or 10s of thousands each copy) but I can't imagine not using a rock-solid ERD tool. They're literally CAD for databases!

    Plus you can put the ERD into a version control system. 😀

    As a lone-wolf developer/DBA/etc. I couldn't function in the SQL Server world without it. If you don't use an ERD tool (one that handles scripting and reverse-engineering) you're going for a swim wearing a 10 ton weight chained to your ankle. Why put yourself to the (man-years) of additional unnecessary drudgery?

  • Perhaps the best way to proliferate a process is to make tools for the process ubiquitous. There is a rudimentary Database Diagram tool in SSMS, but it hasn't changed in any substantive way since it's initial release 20 years ago in SQL Server Enterprise Manager. There are many limitations that limit or exclude it's use as a design tool, like: no support for cross-database relations, it's bound to the physical model and any changes to the model must be persisted to the database schema. You also can't export the diagram to a format that can be used in other design tools. About the best you can do is print the diagram to a PDF document.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • What publicly available services could we use to have a discussion of the database model? I would assume the one for Stack Exchange has been published. Open source projects like mediawiki are available as well.

    There is a good list of open source projects at https://bitnami.com/stacks to pick from.

    I always appreciated that fact that pinboard.in published the data model for his service.

    412-977-3526 call/text

  • I did neglect to mention the tool sets available.  While there are a few good ERD tools out there, they tend to be expensive.  For a free tool, I rather like the MySQL Workbench.  While it's not 100% SQL Server compatible, it's pretty good for the price! 😛  I also agree with the lament that too often the ERD is treated as an afterthought as part of documentation only.  Same is true for a data dictionary.  I've yet to find a tool that allows you to enforce your design against a maintained data dictionary.  Usually the data dictionary is generated from your database or ERD as a documentation tool after the fact.

  • "The Nimble Elephant" by John Giles.  Best, most-pertinent design book I'veread.

  • ERD tools can be expensive, but thankfully EF core ships with one that can also scaffold databases on most popular systems... and its free

  • Aaron N. Cutshall - Friday, July 7, 2017 5:56 AM

    Proper database modeling is more than just knowing how to draw diagrams or about normalization rules.  A huge portion is understanding the subject area, data flows, processes and goals.  I would also include the ability to look ahead at possibilities and to judge what is feasible for both the near term and future capabilities.  Besides Entity-Relationship (E-R) diagrams, I have also witnessed the decline in the use of Data Flow Diagrams (DFD).  In my experience, DFDs have been very instrumental not only to document data flows within a system, but more importantly as a means of discussion to ensure that I correctly understand the system properly or as a tool to help others to understand.

    I have often wished I could distill my 30+ years of experience in data modeling into a book, but I have been plagued with not only a lack of time but also the inability to turn my experience and knowledge into a set of discreet rules or a methodology.  Perhaps it's because I unconsciously follow a set of rules that I am not aware of and need to identify them first! 😛

    Steve, I hope someone does write some articles here on SQL Server Central, on database design. I believe that at best, I'm just average at this.

    Aaron, I totally agree with you. One of the projects I took over when I got to this job is an inventory system designed by someone in the past. I have no experience at all with inventory systems. I never worked in a warehouse or pharmacy or store like Costco/Sam's Club. I don't know that domain at all. I've no idea if the database that whoever did is a good one or not. I hope so :blink:

    Kindest Regards, Rod Connect with me on LinkedIn.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply