"Select columns that are unique such as an IDENTITY column or contain many distinct values." ???
Really? Does the fact that a column is an identity column really qualify it for an index if the identity is never used in a query? This is an honest question, NOT sarcasm. I certainly have a lot to learn about indexes still. But I ask this because I recently spent some time with a customer that was looking to upgrade his storage because of what he perceived were disk contention issues. I came in with an eye to look at tuning the SQL environment before recommending any hardware or software upgrades and I found that he had followed the advice of a SQL blog that recommended making all of his clustered indexes on the primary keys (which all happened to be IDENTITY columns), none of which were ever mentioned in queries to the database, which I know might be unusual, but this data was only used for 2 import processes into what was essentially a mostly read environment (I know that is not normal and we are discussing warehousing) which never called these specific ID columns after import.
The point I am making is that we need to be careful about the reasons we choose columns to index on and avoid focusing on a SINGLE criteria, like "it's an IDENTITY column". These statements should have caveats and qualifiers, like "index on an IDENTITY column or unique column IF you use it as part of a query. If you never or rarely query against a column, you may never see a benefit, maybe only add overhead for no good reason". The result of removing these indexes from these ID columns and only adding a couple of indexes on data that was actually queried resulted in major performance improvements and refocused the equipment purchase.
And a statement like ...
"Conceptually, one can consider an index to be a data structure containing pairs ((a1,...,an), b), where (a1,c,an) is the tuple of values for the attributes A1,c,An and b is the address of the tuples in r with these values, that is, a pointer to the set ,c, " (Mannila & Raiha, 1992)"
...is not very helpful in a level 100 post for "a junior DBA". 🙂
And if you want to describe the use of B-trees, maybe explaining what a B-tree is would be helpful. 🙂 Even though most of us who post and read here may have a strong grasp, remember the intended audience.
But thanks for bringing it up. Indexes are a very important part of what we do BUT let me add, STATISTICS maybe more important. For level 300 or 400 people you might enjoy this from the MCM series:
I highly recommend even junior DBA's review some of these videos. They give insight into some topics that may be a little too advanced, but anyone of any level can get some good info.
Thanks for the post Ron! 😀
Microsoft Certified ...(insert many literal strings here)
Microsoft Design Architect with Alexander Open Systems