Without going out and buying yet another book on SQL Server 2005 by Kalen Delaney, can any one provide me with a good explaination of statistics and how the work and are used by the query engine without going into a lot of detail?
Here's my brief answer...
Statistics are simply data SQL Server collects about a column's data selectivity, uniqueness and counts of particular values in that column. Stats exist on every index, however stats can also exist on columns that are not indexed.
Along these lines, how much effort should I put in to understand statistics? Would getting Inside Microsoft SQL Server 2005: Query Tuning and Optimization be a worthwhile investment? Just curious as I have invested a lot in a variety of SQL Server 2005 books over the last few months.
If you plan to be a DBA for a significant portion of time, invest the time in learning about query tuning. I would recommend you spare the week to take a class if you can get your company to pay for it.
It will do two things that are really important. First, it will teach you how to performance tune and troubleshoot slow queries in your databases. Second, it will give you a much better understanding of what SQL Server is doing with the query you have passed it so you can more intelligently make decisions on what to do and what not to do.
You may never specifically use performance tuning and query optimization, but understanding how SQL creates execution plans will be a great help in database design if you have to build anything new.
I think I have a pretty good understanding about building and tuning queries, and the use and creating of indexes. What I am not sure about is statistics. In working on slow performance of our PeopleSoft Finance system after upgrading to SQL Server 2005, I ran an hour long trace and used that with DTA. It recommended we build 35 indexes and a lot more multi-column statistics. We have implemented those recommendations and have seen a significant improvement in performance. I can understand how the indexes help, but how do the statistics? If you are doing this manually, how do you decide if a statistic is more appropriate than index? Without going into a 3 to 30 page explaination, how are statistics used to improve query performance?
Or should I invest in the book I mentioned earlier by Karen Delaney?
Also, I work for a school district, training funds are very sparse.
In my limited understanding : statistics are used by the optimiser to determine the most efficient query plan, including which indexes to use and how. Using an index to identify rows might be extremely efficient if it's a unique index on a large table, but extremely inefficient if there is only a small number of distinct values in that index. On non-indexed fields the stats presumably help the optimiser determine what kinds of join are appropriate and which order to process each element in the plan.
If you haven't done so already, take a look at the properties for some of the stats in your tables - it helps to visualise the kind of information they represent.
I think statistics are definitely one of the least-understood aspects of performance tuning. Not sure about the book as I haven't read it, but I would expect there to be some blogs about stats somewhere.
Based on that, would you say it is worth learning more about statistics? I am just curious how much time and effort I should put into some of this or if I should just "put my faith in Microsoft" and that DTA knows what's best.
I think it's handy to understand if you have the time and inclination, but for the most part you don't need to know it in detail - just knowing that they are beneficial is probably sufficient. Having said that, it's another tool to use in performance tuning and I have occasionally made queries run faster due to that knowledge. If you just want to get your toes wet, knowing when statistics become out of date is useful.
Here's a document I found:
I am just curious how much time and effort I should put into some of this or if I should just "put my faith in Microsoft" and that DTA knows what's best.
The DTA does not always know best. I've found that it tends to go overboard.
When I use it (which isn't often) what I'll generally do is implement its suggestions 1 by 1 and test the performance improvement, then I implement the ones that have the most impact.
We walk in the dark places no others will enterWe stand on the bridge and no one may pass