Alex is my long time DB mentor/guru. Thought his feedback on this may be useful for others.
I'll try to elaborate on this one...
As defined in MSDN - When statistics are created, the Database Engine sorts the values of the columns on which the statistics are being built and creates a histogram based on up to 200 of these values, separated by intervals. The histogram specifies how many rows exactly match each interval value, how many rows fall within an interval, and a calculation of the density of values, or the incidence of duplicate values, within an interval.
Without getting into too much detail of the aforementioned, as well as what actually triggers an auto-update of statistics (which is not necessary time-based), the key point is that statistics do not have the overhead of indexes since only statistical sampling of the values in rows is kept. Which also translates to "... sometimes it is more efficient to use column statistics instead of an index to optimize query performance..."
Another important point is that for auto-update statistics on non-indexed columns as well as statistcs for composite indexes ONLY uses the first column, which may not always provide the most optimal execution plan with respect to non-first columns on the list. This is where composite indexes really don't help at all, even if regenerated from scratch. Here's what MSDN shows as a n example:
Creating statistics manually lets you create statistics that contain multiple column densities. These are average number of duplicates for the combination of columns. For example, a query contains the clause WHERE a = 7 and b = 9.
Creating manual statistics on both columns together (a, b) could allow the Database Engine to make a better estimate for the query, because the statistics also contain the average number of distinct values for the combination of columns a and b.
So the bottom line is that creating manual statistics would benefit queries that utilize multiple columns together. In other words, if all columns were indexed separately or as a composite, such structure may not provide the most efficient execution plan than manually created multi-column statistics. As with many other situations though, there's no one solution to all cases, and one would have to make controlled experiments to see what works best for each situation.
Hopefully I was able to clarify a bit 🙂
Thanks a lot,