Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Is the AVG_RANGE_ROWS in DBCC SHOW_STATISTICS WITH HISTOGRAM first step 0 or 1 Expand / Collapse
Author
Message
Posted Thursday, November 14, 2013 12:09 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, March 12, 2014 6:50 PM
Points: 39, Visits: 141
Hi,

with reference to http://technet.microsoft.com/en-us/library/ms174384.aspx



B. Specifying the HISTOGRAM option


when I execute the following code

[code="sqlcode"]

USE AdventureWorks2012;
GO
DBCC SHOW_STATISTICS ("Person.Address", AK_Address_rowguid) WITH HISTOGRAM;
GO


RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
AC3973FF-355C-47B6-BD71-000E1B6F2C02 0 1 0 1
9534050D-1451-46AB-8A48-0137E7BD3997 94 1 94 1
7CCBD34E-3BB7-48C5-AB2E-02472716B5E3 61 1 61 1
34E2CD1E-FA81-4BF4-B9EB-02B4AFF6E3D1 41 1 41 1
B0F69A8C-1C4E-4A41-A710-0BB1865D85AA 607 1 607 1
[/code]

it informs me that average_range_rows is 1 yet


Solid area left of RANGE_HI_KEY represents the range of column values and the average number of times each column value occurs (AVG_RANGE_ROWS). The AVG_RANGE_ROWS for the first histogram step is always 0.



could any one clarified whether the AVR_RANGE_ROWS in DBCC SHOW_STATISTICS WITH HISTOGRAM should be 0 or 1.

thanks
Post #1514152
Posted Friday, November 15, 2013 9:28 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Yesterday @ 9:45 AM
Points: 1,222, Visits: 2,546
mssqlnoob (11/14/2013)
Hi,

with reference to http://technet.microsoft.com/en-us/library/ms174384.aspx



B. Specifying the HISTOGRAM option


when I execute the following code

[code="sqlcode"]

USE AdventureWorks2012;
GO
DBCC SHOW_STATISTICS ("Person.Address", AK_Address_rowguid) WITH HISTOGRAM;
GO


RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
AC3973FF-355C-47B6-BD71-000E1B6F2C02 0 1 0 1
9534050D-1451-46AB-8A48-0137E7BD3997 94 1 94 1
7CCBD34E-3BB7-48C5-AB2E-02472716B5E3 61 1 61 1
34E2CD1E-FA81-4BF4-B9EB-02B4AFF6E3D1 41 1 41 1
B0F69A8C-1C4E-4A41-A710-0BB1865D85AA 607 1 607 1
[/code]

it informs me that average_range_rows is 1 yet


Solid area left of RANGE_HI_KEY represents the range of column values and the average number of times each column value occurs (AVG_RANGE_ROWS). The AVG_RANGE_ROWS for the first histogram step is always 0.



could any one clarified whether the AVR_RANGE_ROWS in DBCC SHOW_STATISTICS WITH HISTOGRAM should be 0 or 1.

thanks


I agree that the BOL page says that AVG_RANGE_ROWS for the first histogram step will always be 0 but that in actual use it seems that the value is always 1. It appears that this may be a mistake in BOL, but it's one that doesn't really matter because the AVG_RANGE_ROWS value for the first step of the histogram is meaningless (in the sense that it doesn't provide any information that could be useful to the query optimizer) anyway.

I haven't found where this is documented (if at all), but the RANGE_HI_KEY of the first step of the histogram is the first key value in the set. RANGE_ROWS and DISTINCT_RANGE_ROWS refer to the number of rows/key values in the range exclusive of the RANGE_HI_KEY value, so RANGE_ROWS and DISTINCT_RANGE_ROWS for the first step would always be 0 (there are no rows with a key value lower than the RANGE_HI_KEY value).

Since AVG_RANGE_ROWS = RANGE_ROWS/DISTINCT_RANGE_ROWS, the AVG_RANGE_ROWS for the first step of the histogram would actually be 0/0, or undefined. I suspect that Microsoft had to put an integer in the AVG_RANGE_ROWS column for the first step of the histogram so somebody told the technical writers who produced BOL that the value would be 0 but then made it 1.

It makes sense both that the RANGE_HI_KEY value for the first step of the histogram is always the first key value and that the query optimizer would never need to look at the AVG_RANGE_ROWS of the first step. Making the RANGE_HI_KEY value for the first step equal to the first key value allows the optimizer to definitively determine the lowest key value, so that if you query the table for rows where the key value is less than the RANGE_HI_KEY for the first step, the optimizer knows that there will be no such rows (notice also that the RANGE_HI_KEY of the last step of the histogram will be the highest key value). The optimizer only needs to use the AVG_RANGE_ROWS value for queries on key values in the range that are less than the RANGE_HI_KEY value. There are no such rows for the first step of the histogram, so the optimizer will never need to refer to the AVG_RANGE_ROWS for the first step and therefore it really doesn't matter what value is given there.


Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1514777
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse