• I think this is an important post for every DBA to read. I have been a DBA for almost 30 years (started way back in the ancient days on the mainframe). DBAs are notorious in thinking that averages are as reliable as moral codes chiseled in stone. One distribution that didn't get mentioned in the article is bimodal. Basically your distribution diagram looks like a two-humped camel. If you have data like this, an average is as meaningful as the meeting minutes from the local Liars' Club.

    We should all be grateful that the DBMS vendors have made the Optimizer aware of the statistics meta data found in the system tables. For example, you might think that an index will fix a performance problem when searching on a column that has 100 unique values. The problem is, three of the values make up 95 percent of the data. Statistics will help for such non-uniform distribution. If you are searching for one of the three values, the Optimizer won't use the index (assuming your statistics are current). On the other hand, if you are searching for one of the rarely populated values, you can get enormous amount of filtering/selectivity.

    The rule here is to know your data, run reorgs/rebuild regularly on your indexes, and run statistics religiously.