Is the AVG_RANGE_ROWS in DBCC SHOW_STATISTICS WITH HISTOGRAM first step 0 or 1

Question

Is the AVG_RANGE_ROWS in DBCC SHOW_STATISTICS WITH HISTOGRAM first step 0 or 1

mssqlnoob

Ten Centuries

Points: 1245
More actions
November 14, 2013 at 12:09 am

#301101

Hi,
with reference to http://technet.microsoft.com/en-us/library/ms174384.aspx
B. Specifying the HISTOGRAM option
when I execute the following code
USE AdventureWorks2012;
GO
DBCC SHOW_STATISTICS ("Person.Address", AK_Address_rowguid) WITH HISTOGRAM;
GO
RANGE_HI_KEYRANGE_ROWSEQ_ROWSDISTINCT_RANGE_ROWSAVG_RANGE_ROWS
AC3973FF-355C-47B6-BD71-000E1B6F2C020101
9534050D-1451-46AB-8A48-0137E7BD3997941941
7CCBD34E-3BB7-48C5-AB2E-02472716B5E3611611
34E2CD1E-FA81-4BF4-B9EB-02B4AFF6E3D1411411
B0F69A8C-1C4E-4A41-A710-0BB1865D85AA60716071
it informs me that average_range_rows is 1 yet
Solid area left of RANGE_HI_KEY represents the range of column values and the average number of times each column value occurs (AVG_RANGE_ROWS). The AVG_RANGE_ROWS for the first histogram step is always 0.
could any one clarified whether the AVR_RANGE_ROWS in DBCC SHOW_STATISTICS WITH HISTOGRAM should be 0 or 1.
thanks

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply

Jason Wolfkill SSCrazy Eights Points: 9832 More actions · Answer 1

mssqlnoob (11/14/2013)
Hi,
with reference to http://technet.microsoft.com/en-us/library/ms174384.aspx
B. Specifying the HISTOGRAM option
when I execute the following code
USE AdventureWorks2012;
GO
DBCC SHOW_STATISTICS ("Person.Address", AK_Address_rowguid) WITH HISTOGRAM;
GO
RANGE_HI_KEYRANGE_ROWSEQ_ROWSDISTINCT_RANGE_ROWSAVG_RANGE_ROWS
AC3973FF-355C-47B6-BD71-000E1B6F2C020101
9534050D-1451-46AB-8A48-0137E7BD3997941941
7CCBD34E-3BB7-48C5-AB2E-02472716B5E3611611
34E2CD1E-FA81-4BF4-B9EB-02B4AFF6E3D1411411
B0F69A8C-1C4E-4A41-A710-0BB1865D85AA60716071
it informs me that average_range_rows is 1 yet
Solid area left of RANGE_HI_KEY represents the range of column values and the average number of times each column value occurs (AVG_RANGE_ROWS). The AVG_RANGE_ROWS for the first histogram step is always 0.
could any one clarified whether the AVR_RANGE_ROWS in DBCC SHOW_STATISTICS WITH HISTOGRAM should be 0 or 1.
thanks

I agree that the BOL page says that AVG_RANGE_ROWS for the first histogram step will always be 0 but that in actual use it seems that the value is always 1. It appears that this may be a mistake in BOL, but it's one that doesn't really matter because the AVG_RANGE_ROWS value for the first step of the histogram is meaningless (in the sense that it doesn't provide any information that could be useful to the query optimizer) anyway.

I haven't found where this is documented (if at all), but the RANGE_HI_KEY of the first step of the histogram is the first key value in the set. RANGE_ROWS and DISTINCT_RANGE_ROWS refer to the number of rows/key values in the range exclusive of the RANGE_HI_KEY value, so RANGE_ROWS and DISTINCT_RANGE_ROWS for the first step would always be 0 (there are no rows with a key value lower than the RANGE_HI_KEY value).

Since AVG_RANGE_ROWS = RANGE_ROWS/DISTINCT_RANGE_ROWS, the AVG_RANGE_ROWS for the first step of the histogram would actually be 0/0, or undefined. I suspect that Microsoft had to put an integer in the AVG_RANGE_ROWS column for the first step of the histogram so somebody told the technical writers who produced BOL that the value would be 0 but then made it 1.

It makes sense both that the RANGE_HI_KEY value for the first step of the histogram is always the first key value and that the query optimizer would never need to look at the AVG_RANGE_ROWS of the first step. Making the RANGE_HI_KEY value for the first step equal to the first key value allows the optimizer to definitively determine the lowest key value, so that if you query the table for rows where the key value is less than the RANGE_HI_KEY for the first step, the optimizer knows that there will be no such rows (notice also that the RANGE_HI_KEY of the last step of the histogram will be the highest key value). The optimizer only needs to use the AVG_RANGE_ROWS value for queries on key values in the range that are less than the RANGE_HI_KEY value. There are no such rows for the first step of the histogram, so the optimizer will never need to refer to the AVG_RANGE_ROWS for the first step and therefore it really doesn't matter what value is given there.

Jason Wolfkill