I noticed the author used an ID column of IDENTITY as the PK in every table. This appears to be a best practice based on what i've read online. Can anyone point out when one would not use this a PK in their table? In other words, is there an exception to this rule?
Thanks for posting the article; I found it informative.
For an example where I personally would not use an identity primary key, imagine building a basic sports database.
First we'll create a Player table. An identity primary key is perfect here, since it will likely participate in many foreign key relationships as well and there's no obvious natural key. Similarly, you may have a Schedule table, where each game is listed, also with an identity GameId primary key that's frequently used. The Schedule might have a natural composite key you could use, such as GameDateTime + HomeTeam, but since GameDateTime can change, this isn't a great PK choice. (Also, if you go deep enough into the weeds, it is possible for the same team to play two games at the same time in split-squad exhibition matches. There's always another edge case in sports!)
But now, we want to track performance of the players. We could create a Game_Stat table which includes the statistics for each player in each game. A natural key here would be a composite of PlayerId + GameId, and I would choose this for my primary key. You could create a GameStatId identity column for your primary key (and I'm sure many people would), but the reason I wouldn't is because I would never use it. When querying statistics from that table, I would typically expect to be querying for specific PlayerId's or GameId's, and would be sure to index both of those columns. And one of them would likely make a good clustered index as well so that the data is organized in an order useful for at least some of my queries. I would never refer to the data by the GameStatId, so why add an extra 4 bytes to every row if it is never used? It won't be used in queries or foreign keys, it doesn't make a good clustered index for my use cases, and rather than preventing duplicate data it might increase the chances of duplicate data if you forget to make a unique index/constraint on PlayerId+GameId.
All that being said, now imagine someone says "Hey, we want the ability to note any records or significant milestones reached with a player's performance in a specific game." There are lots of ways this might be accomplished, but one possibility would be to create a new table with a foreign key relationship to the Game_Stat table. All of a sudden, it might make sense again to have an identity key in that Game_Stat table to make the foreign key relationship a bit simpler. So in conclusion, as with most things in SQL, it depends and YMMV.