Find the number of rows for a value, without querying the table

Nagaraj Venkatesan, 2017-05-11

Dear all,

Being a DBA, developers approach me with questions like can you run this query for me in production?

Select *

FROM

[dbo].[FactInternetSales]

WHERE UnitPrice = 32.6

Lets, say the table contains few billion rows and UnitPrice is not an indexed column, then query would take ages to provide the result. Not just that, it causes a massive I/O on a busy production database.  

For the above situation, I would usually ask the developer, “Would you need a accurate value or is it ok if it is approximate?”. If approximate numbers are fairly sufficient then, one can answer these type of questions without actually querying the table. How?

Simple – STATISTICS is the word 🙂

To find the statistic that will be useful for the query, please use the following script and provide the table name and column name. The script provides the statistic name we should be checking.

Declare @table_name varchar(100)

Declare @Column_name varchar(100)

SET @table_name = ‘FactInternetSales’

SET @Column_name = ‘UnitPrice’

SELECT OBJECT_NAME(s.object_id) AS object_name, 

    COL_NAME(sc.object_id, sc.column_id) AS column_name, 

    s.name AS statistics_name 

FROM sys.stats AS
JOIN sys.stats_columns AS sc 

    ON s.stats_id = sc.stats_id 
AND s.object_id = sc.object_id 

WHERE  
OBJECT_NAME(s.object_id) like @table_name

and    
COL_NAME(sc.object_id, sc.column_id) like @Column_name

ORDER BY s.name;

After finding the statistic name, just go to the table, expand “Statistics” and double click on the stat that you are interested in. The click on the details section, find the value interested in on “Range_HI_Key” Column and observe the “EQ_Rows” section to get the approximate number of rows. 



Most of you would know that statistics can be used to find query estimates but few (I guess) would use it operationally for these kind of requests. One can also use them to estimate for queries of this type too

Select count(*), 
ProductKey 
from [FactInternetSales]
Group by ProductKey

Select count(*) from

[FactInternetSales]

WHERE UnitPrice > 40  — ( Possible but can get little tricky at times, more of it on upcoming posts 🙂 )

Few quick pointers:

The above method relies on the following prerequisites

* Either Auto create Stats should be turned on or the column should be the leading column of any index for the statistic to be present. Auto Create Stats are on by default

* For the stat to be reasonably accurate, one needs to have Auto Update Statistics on ( which is also “on” by default )

* Automatically created statistics are named like ‘_WA%’. For a column to have a auto created stat, the column should have been queried at least once since database creation. Indexed column would not need auto created statistics

* To figure out when was the statistic last updated, use the following query 

SELECT 

sp.stats_id, name, filter_definition, last_updated, rows,

rows_sampled, steps, unfiltered_rows, modification_counter  

FROM sys.stats AS stat  

CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp 

WHERE stat.object_id = object_id(‘people_Data’)

and name like ‘_WA_Sys_0000000D_1273C1CD’;

If the value you are looking for doesn’t appear as a “Range_HI_Key” on the histogram then refer to the next biggest value on the “Range_Hi_Key” and observe “Avg_Range_Rows” for the estimates. Please refer to this post http://strictlysql.blogspot.sg/2016/01/sql-server-statistics-histograms.html for interpreting histograms.
There is always more to statistics. Will cover them in upcoming posts.


Original post (opens in new tab)

Rate

Share

Share

Rate

Find the number of rows for a value, without querying the table

Nagaraj Venkatesan, 2017-05-24 (first published: 2017-05-11)

Dear all,

Being a DBA, developers approach me with questions like can you run this query for me in production?

Select *

FROM

[dbo].[FactInternetSales]

WHERE UnitPrice = 32.6

Lets, say the table contains few billion rows and UnitPrice is not an indexed column, then query would take ages to provide the result. Not just that, it causes a massive I/O on a busy production database.  

For the above situation, I would usually ask the developer, “Would you need a accurate value or is it ok if it is approximate?”. If approximate numbers are fairly sufficient then, one can answer these type of questions without actually querying the table. How?

Simple – STATISTICS is the word 🙂

To find the statistic that will be useful for the query, please use the following script and provide the table name and column name. The script provides the statistic name we should be checking.

Declare @table_name varchar(100)

Declare @Column_name varchar(100)

SET @table_name = ‘FactInternetSales’

SET @Column_name = ‘UnitPrice’

SELECT OBJECT_NAME(s.object_id) AS object_name, 

    COL_NAME(sc.object_id, sc.column_id) AS column_name, 

    s.name AS statistics_name 

FROM sys.stats AS
JOIN sys.stats_columns AS sc 

    ON s.stats_id = sc.stats_id 
AND s.object_id = sc.object_id 

WHERE  
OBJECT_NAME(s.object_id) like @table_name

and    
COL_NAME(sc.object_id, sc.column_id) like @Column_name

ORDER BY s.name;

After finding the statistic name, just go to the table, expand “Statistics” and double click on the stat that you are interested in. The click on the details section, find the value interested in on “Range_HI_Key” Column and observe the “EQ_Rows” section to get the approximate number of rows. 



Most of you would know that statistics can be used to find query estimates but few (I guess) would use it operationally for these kind of requests. One can also use them to estimate for queries of this type too

Select count(*), 
ProductKey 
from [FactInternetSales]
Group by ProductKey

Select count(*) from

[FactInternetSales]

WHERE UnitPrice > 40  — ( Possible but can get little tricky at times, more of it on upcoming posts 🙂 )

Few quick pointers:

The above method relies on the following prerequisites

* Either Auto create Stats should be turned on or the column should be the leading column of any index for the statistic to be present. Auto Create Stats are on by default

* For the stat to be reasonably accurate, one needs to have Auto Update Statistics on ( which is also “on” by default )

* Automatically created statistics are named like ‘_WA%’. For a column to have a auto created stat, the column should have been queried at least once since database creation. Indexed column would not need auto created statistics

* To figure out when was the statistic last updated, use the following query 

SELECT 

sp.stats_id, name, filter_definition, last_updated, rows,

rows_sampled, steps, unfiltered_rows, modification_counter  

FROM sys.stats AS stat  

CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp 

WHERE stat.object_id = object_id(‘people_Data’)

and name like ‘_WA_Sys_0000000D_1273C1CD’;

If the value you are looking for doesn’t appear as a “Range_HI_Key” on the histogram then refer to the next biggest value on the “Range_Hi_Key” and observe “Avg_Range_Rows” for the estimates. Please refer to this post http://strictlysql.blogspot.sg/2016/01/sql-server-statistics-histograms.html for interpreting histograms.
There is always more to statistics. Will cover them in upcoming posts.


Rate

Share

Share

Rate

Related content

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

Robert Davis

2009-02-23

1,567 reads

Networking – Part 4

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I’d like to talk about social networking. We’ll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let…

Andy Warren

2009-02-17

1,530 reads

Speaking at Community Events – More Thoughts

Last week I posted Speaking at Community Events – Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I’ve got a few more thoughts on the topic this week, and I look forward to your comments.

Andy Warren

2009-02-13

360 reads