This article is one of the more worse ones because it is dealing with recommendations which - concerning the original requirements of the database - are nonsence:
"because if you don’t have a primary key / clustered index and your table does inserts, updates and deletes, SQL Server will fragment the table, and it will take up a lot of extra space"
This is not true because Microsoft SQL Server will scan the PFS for free space. If the percentage value in the PFS is sufficient the record will be inserted on a arbitrary page which has enough space (see my article in TECHNET WIKI here: http://social.technet.microsoft.com/wiki/contents/articles/21877.sql-server-how-does-sql-server-allocate-space-in-a-heap.aspx)
Most important sentence concerning the reason behind using a heap seems to be that one:
"This database was used as a staging area to load data into the live database."
A heap can much faster load data than a clustered index. As long as it is not used for consolidation but only staging it would be the best choice. But that is an assumption which may fail if I know the workload 🙂
"In general, if your table is doing inserts, update and deletes, it is a bad idea to have a table without a clustered index."
NO, that is nonsense - a heap will ALWAYS better perform than a clustered index because of the following reasons:
in a heap you won't create "hot spots" if you clustered key is a contigious one all INSERT will be concentrate at the end of the table!
in a heap you won't have fragmentation when you INSERT data.
AND - due to the fact that we doesn't know the meta data of the table...
- will it be fragmented if the data are only fixed lenght numerics?
- will it be fragmented if the table is using (N)CHAR instead of (N)varchar?
NO - it won't because the fixed length elements cannot be expand because - e.g. strings - will be filled with 0x20 (blank)
The autor is quite often mixing PK and CI. In one sentence he is talking about a PK (which is a constraint!) and in the next sentence about a CI (which means physical allocation for data).
"So I altered each table and made the existing identity column the primary key / clustered index. The effect was the total size of the database dropped significantly. Here is an example of one table"
Aha - and than you have reduced fragmentation, rebuild the table (and ALL other indexes) and the log has bloated!
The described values are absolute normal and have NOTHING to do with any benefit of a CI (not to mention the PK).
You describe it like a witchdoctor who wants to sell some kind of "ghost medicine". If you would have checked the sys.dm_db_index_physical_stats for the index = 0 i bet you have had thousands of forwarded records. If you don't know what it is you can get some info here: http://www.sqlskills.com/blogs/paul/forwarding-and-forwarded-records-and-the-back-pointer-size/
"After a primary key was added the data space and the unused space dropped and the index space used didn’t go up that much."
WTH! It is not because of the PK but of the creation of the Clustered Index.
"I hope this article has shown the importance of adding primary keys to large tables to reduce the overall size of the database. "
No - it didn't. This article may confuse a novice when reading this stuff.
I would recommend to the author:
- read the article again
- review it
- rewrite some passages of it to get rid of the conflicts in this article!
Microsoft Certified Master: SQL Server 2008
MVP - Data Platform (2013 - ...)
my blog: http://www.sqlmaster.de (german only!)