Learning Data Modeling

  • Eric M Russell

    SSC Guru

    Points: 125100

    Typical programming books like "Database Programming with C#" may cover data modeling to come extent, but it's more of an afterthought and rather sparse.

    Ralph Kimball has some excellent books on dimensional data modeling. You'll find that experts within the data warehousing wing of IT tend to philosophize on big picture concepts like data modeling more so than authors and professionals who specialize in transnational or line of business applications. 

    Also, Wiley publishes some a good number of books on data modeling, both generic concepts and industry specific.

    The Data Model Resource Book, Vol. 2: A Library of Data Models for Specific Industries
    https://www.amazon.com/Data-Model-Resource-Book-Vol/dp/0471353485

    The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
    https://www.amazon.com/Data-Warehouse-Toolkit-Definitive-Dimensional/dp/1118530802

    Joe Celko's books will also teach you plenty about relational database modeling; both in general and also how to approach some of the more complicated edge cases.
    https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dstripbooks&field-keywords=joe+celko

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • sean redmond

    SSCertifiable

    Points: 5751

    I have two points:

     1. I wish that developers (at least, our developers anyway) would regard the DB as something less than a place to dump data and as something worth learning. I've had developers tell me that they don't care about the database and they express their contempt for the database with Entity Framework and careless datatype choices. All text-fields are varchar(max), all dates are datetime2(7) and GUIDs are present in every table just in case it may need to be accessed by a webservice. With such an attitude towards DBs, I'm not surprised that DB-modelling is falling out of favour.

    2. I would be interested in more articles on DB-architecture.
    For example, I'd like to see articles on the suitability of long tables (tables with many rows) versus wide tables (tables with many columns). Which scenarios suit which table type best?
    Or articles on archiving: the removal of cold data so that it is still accessible but that it doesn't slow the access of hot data.
    Or articles on scaling out databases with the likes of AlwaysOn, replication, log-shipping and clustering — how feasible is a DB-farm with, say, 20 servers that are read from and one server that is written to with the the DBs that are read from being copies (as appropriate to the technology) of the one DB that is written to?
    Or something fundamental about the position of data at the various levels in the data hierarchy. Say I have public data like (address and postcode data) and it is used by many, many different databases, what are the pros and cons to having the data within each table (no normalisation), as a separate table within each database (normalised but any change in the data must be made to all databases), as a replicated table from one master (normalisable, one master but the joys of replication), as a table in a separate database accessed by a linked-server (no FKs etc).
    Or the pros and cons of replacing a table with a very small number of elements unlikely to change soon (like the sex of a person) with hardcoded values and check constraints (so, 'F' instead of 0 and the removal of the person.sex table and the foreign key that goes with it). Heresy in the eyes of normalisation, one must remember the check constraints and all that.

  • Tom Gillies

    SSCrazy

    Points: 2745

    Sean Redmond - Monday, July 10, 2017 12:34 AM

    I have two points:

     1. I wish that developers (at least, our developers anyway) would regard the DB as something less than a place to dump data and as something worth learning. I've had developers tell me that they don't care about the database and they express their contempt for the database with Entity Framework and careless datatype choices. All text-fields are varchar(max), all dates are datetime2(7) and GUIDs are present in every table just in case it may need to be accessed by a webservice. With such an attitude towards DBs, I'm not surprised that DB-modelling is falling out of favour.

    2. I would be interested in more articles on DB-architecture.
    For example, I'd like to see articles on the suitability of long tables (tables with many rows) versus wide tables (tables with many columns). Which scenarios suit which table type best?
    Or articles on archiving: the removal of cold data so that it is still accessible but that it doesn't slow the access of hot data.
    Or articles on scaling out databases with the likes of AlwaysOn, replication, log-shipping and clustering — how feasible is a DB-farm with, say, 20 servers that are read from and one server that is written to with the the DBs that are read from being copies (as appropriate to the technology) of the one DB that is written to?
    Or something fundamental about the position of data at the various levels in the data hierarchy. Say I have public data like (address and postcode data) and it is used by many, many different databases, what are the pros and cons to having the data within each table (no normalisation), as a separate table within each database (normalised but any change in the data must be made to all databases), as a replicated table from one master (normalisable, one master but the joys of replication), as a table in a separate database accessed by a linked-server (no FKs etc).
    Or the pros and cons of replacing a table with a very small number of elements unlikely to change soon (like the sex of a person) with hardcoded values and check constraints (so, 'F' instead of 0 and the removal of the person.sex table and the foreign key that goes with it). Heresy in the eyes of normalisation, one must remember the check constraints and all that.

    I won't comment on "1)" 😉

    As for "2)" I'd be interested in all the topics Sean mentions.

    In general, I think I would be interested in topics about how to get from "Conceptual or Logical" to "Physical". What are the things that should be considered, and how should changes in parameters influence design? I realise that a lot of this is well and truly in "it depends" territory - that's what would make it interesting. There will not be simple answers, only factors to consider and heuristics.

    Tom Gillies LinkedIn Profilewww.DuhallowGreyGeek.com[/url]

  • Robert Sterbal-482516

    SSCrazy

    Points: 2788

    Here is another publicly available schema that I find very helpful for documenting bookmarks in a database:

    https://static.pinboard.in/schema.htm

  • xsevensinzx

    One Orange Chip

    Points: 25558

    Aaron N. Cutshall - Friday, July 7, 2017 8:00 AM

    I did neglect to mention the tool sets available.  While there are a few good ERD tools out there, they tend to be expensive.  For a free tool, I rather like the MySQL Workbench.  While it's not 100% SQL Server compatible, it's pretty good for the price! 😛  I also agree with the lament that too often the ERD is treated as an afterthought as part of documentation only.  Same is true for a data dictionary.  I've yet to find a tool that allows you to enforce your design against a maintained data dictionary.  Usually the data dictionary is generated from your database or ERD as a documentation tool after the fact.

    This is how I learned it. Workbench is awesome, even for SQL Server.

    For me, when I went to design my data warehouse from scratch, I focused more on books like Kimball's methodologies to help guide the way to the model. The best experience for me was not articles though. It was trial and error on real business requirements.

  • Eric M Russell

    SSC Guru

    Points: 125100

    Sean Redmond - Monday, July 10, 2017 12:34 AM

    I have two points:

     1. I wish that developers (at least, our developers anyway) would regard the DB as something less than a place to dump data and as something worth learning. I've had developers tell me that they don't care about the database and they express their contempt for the database with Entity Framework and careless datatype choices. All text-fields are varchar(max), all dates are datetime2(7) and GUIDs are present in every table just in case it may need to be accessed by a webservice. With such an attitude towards DBs, I'm not surprised that DB-modelling is falling out of favour.
    ...
    ...

    Regarding point #1, when it comes to enforcing proper relational database modeling, DBAs should fight the good fight while yielding territory that simply has no strategic significance.

    The following are common examples of domains where a relational data store is not needed and application developers can dump their data however they see fit:

    1.  Persistence of web application session state, like shopping carts for example. Once the user clicks the final submit button, that's the point when the completed purchase order should be contain in a RDMS. Think about the accounting department of a brick and mortar grocery store chain; they don't care about shopping carts, that's just a physical container for customers to temporarily hold stuff, what they care about are actual sales. It's the job of store clerks to manage shopping carts.

    2. Maintaining error, audit, and other types of event logs.

    3. Storage of unstructured data like documents, photos, and other types of BLOBS. Relational storage is expensive in terms of cost and retrieval, so it makes more sense to contain this type of data in bulk storage with only links back to the RDMS. I'm talking about not containing BLOBS within the relational tables themselves (ie: varbinary and varchar(max) columns). Of course SQL Server has many newer features for integrating bulk storage with relational storage at the logical level, and that's all good.

    4.  Schema-less data like JSON or entity-attribute-value. Yes, you can model this in a relational database, but other database engines like MongoDB and Riak can do it better, because that's what they're designed to do. Again, SQL Server can seamlessly integrate relational tables with these other data stores. What I'm saying is don't attempt to shoehorn these things into row-store tables.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Steve Jones - SSC Editor

    SSC Guru

    Points: 720887

    Tom Gillies - Monday, July 10, 2017 1:44 AM

    Sean Redmond - Monday, July 10, 2017 12:34 AM

    I have two points:

     1. I wish that developers (at least, our developers anyway) would regard the DB as something less than a place to dump data and as something worth learning. I've had developers tell me that they don't care about the database and they express their contempt for the database with Entity Framework and careless datatype choices. All text-fields are varchar(max), all dates are datetime2(7) and GUIDs are present in every table just in case it may need to be accessed by a webservice. With such an attitude towards DBs, I'm not surprised that DB-modelling is falling out of favour.

    2. I would be interested in more articles on DB-architecture.
    For example, I'd like to see articles on the suitability of long tables (tables with many rows) versus wide tables (tables with many columns). Which scenarios suit which table type best?
    Or articles on archiving: the removal of cold data so that it is still accessible but that it doesn't slow the access of hot data.
    Or articles on scaling out databases with the likes of AlwaysOn, replication, log-shipping and clustering — how feasible is a DB-farm with, say, 20 servers that are read from and one server that is written to with the the DBs that are read from being copies (as appropriate to the technology) of the one DB that is written to?
    Or something fundamental about the position of data at the various levels in the data hierarchy. Say I have public data like (address and postcode data) and it is used by many, many different databases, what are the pros and cons to having the data within each table (no normalisation), as a separate table within each database (normalised but any change in the data must be made to all databases), as a replicated table from one master (normalisable, one master but the joys of replication), as a table in a separate database accessed by a linked-server (no FKs etc).
    Or the pros and cons of replacing a table with a very small number of elements unlikely to change soon (like the sex of a person) with hardcoded values and check constraints (so, 'F' instead of 0 and the removal of the person.sex table and the foreign key that goes with it). Heresy in the eyes of normalisation, one must remember the check constraints and all that.

    I won't comment on "1)" 😉

    As for "2)" I'd be interested in all the topics Sean mentions.

    In general, I think I would be interested in topics about how to get from "Conceptual or Logical" to "Physical". What are the things that should be considered, and how should changes in parameters influence design? I realise that a lot of this is well and truly in "it depends" territory - that's what would make it interesting. There will not be simple answers, only factors to consider and heuristics.

    I'd be interseted in these. Who wants to write some?

Viewing 7 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply