with reference to http://technet.microsoft.com/en-us/library/ms174384.aspx
B. Specifying the HISTOGRAM option
when I execute the following code
DBCC SHOW_STATISTICS ("Person.Address", AK_Address_rowguid) WITH HISTOGRAM;
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
AC3973FF-355C-47B6-BD71-000E1B6F2C02 0 1 0 1
9534050D-1451-46AB-8A48-0137E7BD3997 94 1 94 1
7CCBD34E-3BB7-48C5-AB2E-02472716B5E3 61 1 61 1
34E2CD1E-FA81-4BF4-B9EB-02B4AFF6E3D1 41 1 41 1
B0F69A8C-1C4E-4A41-A710-0BB1865D85AA 607 1 607 1
it informs me that average_range_rows is 1 yet
Solid area left of RANGE_HI_KEY represents the range of column values and the average number of times each column value occurs (AVG_RANGE_ROWS). The AVG_RANGE_ROWS for the first histogram step is always 0.
could any one clarified whether the AVR_RANGE_ROWS in DBCC SHOW_STATISTICS WITH HISTOGRAM should be 0 or 1.
I agree that the BOL page says that AVG_RANGE_ROWS for the first histogram step will always be 0 but that in actual use it seems that the value is always 1. It appears that this may be a mistake in BOL, but it's one that doesn't really matter because the AVG_RANGE_ROWS value for the first step of the histogram is meaningless (in the sense that it doesn't provide any information that could be useful to the query optimizer) anyway.
I haven't found where this is documented (if at all), but the RANGE_HI_KEY of the first step of the histogram is the first key value in the set. RANGE_ROWS and DISTINCT_RANGE_ROWS refer to the number of rows/key values in the range exclusive of the RANGE_HI_KEY value, so RANGE_ROWS and DISTINCT_RANGE_ROWS for the first step would always be 0 (there are no rows with a key value lower than the RANGE_HI_KEY value).
Since AVG_RANGE_ROWS = RANGE_ROWS/DISTINCT_RANGE_ROWS, the AVG_RANGE_ROWS for the first step of the histogram would actually be 0/0, or undefined. I suspect that Microsoft had to put an integer in the AVG_RANGE_ROWS column for the first step of the histogram so somebody told the technical writers who produced BOL that the value would be 0 but then made it 1.
It makes sense both that the RANGE_HI_KEY value for the first step of the histogram is always the first key value and that the query optimizer would never need to look at the AVG_RANGE_ROWS of the first step. Making the RANGE_HI_KEY value for the first step equal to the first key value allows the optimizer to definitively determine the lowest key value, so that if you query the table for rows where the key value is less than the RANGE_HI_KEY for the first step, the optimizer knows that there will be no such rows (notice also that the RANGE_HI_KEY of the last step of the histogram will be the highest key value). The optimizer only needs to use the AVG_RANGE_ROWS value for queries on key values in the range that are less than the RANGE_HI_KEY value. There are no such rows for the first step of the histogram, so the optimizer will never need to refer to the AVG_RANGE_ROWS for the first step and therefore it really doesn't matter what value is given there.