Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Fastest row counting method

SELECT COUNT(*) is most common method (and exact) how to find out how many records is in table. There is also another method which is not exact but is way faster especially when you are expecting zillion of rows to be counted. This method is based on fact that SQL Server internally tracks how many rows is maintaned by each index.

There are two situations in table which are important for what I am talking about:

  • Clustered index is present on table
  • Clustered index not present on table, records are “organized” in heap. Other non-clustered indexes doesn’t affect heap, they only point to records in the heap.

For both situations SQL Server knows how many records are contained either in the clustered index or in the heap. This information is held in sys.indexes system table and maintained by updating statistics. This is the reason why number of rows in sys.indexes table is not guaranteed – if statistics are not updated by any reason, this number is not actual.

Here’s code:

USE AdventureWorks

DECLARE @table VARCHAR(250)
SET @table = 'Person.Address' -- name of your table

SELECT rowcnt
FROM sys.sysindexes
WHERE id = OBJECT_ID(@table)
      AND (indid = 0 OR indid = 1) -- clustered index or heap

Comments

Posted by kalivaraprasadm on 28 July 2011

I Can't Understand clearly between select count(*) and Here's code. I Think there is difference between two of them. Let me explain briefly

Posted by prabhat.mohi on 29 July 2011

not working fine, As i used both Queries count(*)& the given batch. Results are different.

Posted by Koen Verbeeck on 29 July 2011

Isn't sp_spaceused faster? It immediately returns results and places no locks.

Posted by dvorix on 29 July 2011

Another way (not my code):

select ps.row_count from sys.dm_db_partition_stats ps

join sys.tables t on ps.object_id = t.object_id

where t.name = 'TABLENAME' and ps.index_id in (0,1)

Posted by Jakub Dvorak on 29 July 2011

Sys.indexes table holds information about row count in each table (it's complicated, see description in post). If you have table where you expect many millions of rows, simple SELECT COUNT(*) may takes seconds because rows are physically counted. If you are ok with not EXACT row count, you can pull it simply from sys.indexes table which takes nothing, because that simple select. Row count in sys.indexes MAY not be exact because this table is maintained through statistics. Statistics are by the way key to building correct query optimizer decisions and that's why it's worth to let them update automatically. Further description of statistics is for different post.

I think that sp_spacedused procedure internally uses same method of getting row count. Only benefit I see is that you can get row count for more tables by method described in post while with sp_spaceused only for one.

Posted by Jakub Dvorak on 29 July 2011

Thanks! I didn't know about partition statistics view.

Posted by chris.stuart on 29 July 2011

this is my results

DECLARE @table VARCHAR(250)

SET @table = 'dbo.Fact_Transactions' -- name of your table

SELECT rowcnt

FROM sys.sysindexes

WHERE id = OBJECT_ID(@table)

     AND (indid = 0 OR indid = 1) -- clustered index or heap

--  445,723,809 rows

--  0:00 time

select ps.row_count from sys.dm_db_partition_stats ps

join sys.tables t on ps.object_id = t.object_id

where t.name = 'dbo.Fact_Transactions' and ps.index_id in (0,1)

--  0 Rows

--  Correct values

SELECT COUNT(*)

FROM Fact_Transactions

--  445,729,099 rows

--  2:14 count

Posted by Peter.Frissen on 29 July 2011

I use this code below (lists the number of records of all tables, but can be easily adjusted with an additional selection on tablename).

Advantage: Does not require 'update statistics', so the number of rows is up-to-date immediately.

Disadvantage: you should execute this query when no active transaction exists on the selected table(s), because the entries in sys.partitions do not adhere to any isolation levels (dirty reads).

select object_name(object_id) as Name,

sum(rows) as Rows

from sys.partitions

where objectproperty(object_id, 'IsUserTable') = 1

 and index_id < 2

group by object_name(object_id)

order by 1

Posted by Nakul Vachhrajani on 29 July 2011

You can find a comparison of all commonly known row-counting methods in my post dated April 04, 2011 - beyondrelational.com/.../measuring-the-number-of-rows-in-a-table-are-there-any-alternatives-to-count.aspx.

In this post, I attempt to compare the following methods in terms of Performance & Accuracy.

1. Using COUNT(*)

2. Using Catalog Views

3. Using DMVs

4. Using Aggregate functions

5. Using system stored procedures

Hope you enjoy reading my post also.

Thanks & Regards,

Nakul Vachhrajani

beyondrelational.com/.../default.aspx

Posted by Jakub Dvorak on 29 July 2011

Hi Nakul,

very good and rich post.

Thanks

Posted by imSQrLy on 29 July 2011

I don't use count(*), but i do use count(ClusteredIndexColumn) often with the "with (nolock" hint. This seems to work just fine and is much faster than count(*).

Posted by Dean Cochrane on 29 July 2011

imSQRLY: You're fooling yourself if you think COUNT(some column) is faster than COUNT(*). Both queries will resolve to exactly the same plan, and take exactly the same amount of time.

Posted by Parthi on 29 July 2011

Hi,

DECLARE @table VARCHAR(250)

SET @table = 'Person.Address' -- name of your table

SELECT rowcnt

FROM sys.sysindexes

WHERE id = OBJECT_ID(@table)

     AND (indid = 0 OR indid = 1)

----------------------      

Select Count(*) from Person.Address

----------------------

For both of the queries there might be difference since the statistic might not be updated.

If you UPDATE STATISTICS and then you see the count then both the counts will be same.Because statistic has been  updated now, so the counts will be equal.

Thanks!

Posted by Jakub Dvorak on 29 July 2011

Agree with Dean. There is no difference between COUNT(*) and COUNT(1) for SQL Server query optimizer - Same query plan is built (COUNT (1) may include some scalar operation). However WITH NOLOCK hint may speed things a lot especially if you are in highly concurrent environment.

SELECT COUNT(*) statement place shared locks to pages/extent/rows and if exclusive locks (caused by e.g. update) are placed on some rows, your SELECT COUNT(*) statement must wait. WITH NOLOCK query hint tells engine to not to place even shared locks on rows so processing may be faster - your select don't have to wait until other locks are released. On the other hand, selects WITH NOLOCK "sees" also uncommited data (from currently opened transactions), which is bad if you want to get exact number of rows. It means that row count returned from WITH NOLOCK select is not guaranteed - opened transactions which was included in you row count may be rolled back.

Posted by John Fuex on 29 July 2011

I'm not crazy about this method for a number of reasons.

a) It isn't guaranteed to be accurate (should be a deal killer)

b) It involves code that elevates trickery over understandibility.

c) SQL is a declarative language. Let it do its job by describing what you want as transparently as possible and let it find the fastest way. If there were a far more efficient way to do such a common operation, don't you think the query optimizer would implement it? In fact, I seem to recall that it DOES use the statistics for things like this whenever it knows it can count on them.

Posted by Dean Cochrane on 29 July 2011

I just did a bit of testing/reading, and found some interesting results. BO says that the rowcounts in the DMVs are accurate - they are not based on stats. The testing I just did confirms this. On a table with reasonably high activity, all counts using count(*), count(*) with NOLOCK, sum(row_count) from sys.dm_db_partition_stats, sum(rows) from sys.partitions, and sp_spaceused all returned exactly the same number when run in a single batch.

sp_spaceused sums the count from sys.dm_db_partion_stats to get its rowcount.

All of the methods alleged to be faster were about an order of magnitude faster than count(*) from table (nolock) - this was on a table with about 1.7 million rows. I will try with a much larger table and see what happens.

This all applies to SQL Server 2005+.

Posted by Dean Cochrane on 29 July 2011

Ok, on a static (gets updated once per day) partitioned table with 1008714491 rows on a very busy server, the COUNT(*) methods took a varying amount of time hovering around 300000 ms. I didn't notice any difference between them even though one had the NOLOCK hint on - any efficiency realized by not placing/honoring locks is minor compared to the huge overall cost of the query.

The various DMV methods were at least 5 orders of magnitude faster - this is as expected. Even on this large table on a busy server, they all completed in under 10 ms, most in 2 or 3.

All returned exactly the same rowcount, and that table has just had 450,000 rows inserted and stats have not been updated. That will happen a bit later today.

Posted by Jakub Dvorak on 29 July 2011

To john.fuex: This post is of course about alternate, but still supported ways. It is not anything magic, just out of the box approach.

Posted by Jakub Dvorak on 29 July 2011

To Dean: This is billion rows????? Never seen such big table... Anyway, I will read something about that pretty DMV. To nolock - there must be a difference - shared lock (eg SELECT) and exclusiive locks (eg UPDATE) are mutual  exclusive hence every select requiring  access to row locked by shared lock must wait until exclusive lock is released. If you see no difference, there are probably less intensive exclusive lock traffic.

Posted by Dean Cochrane on 29 July 2011

Yes, 1 B rows, provides a good test of rowcount performance.

I agree, there would probably be a larger difference if there was any insert/update activity happening on that table at the time I ran the query, and the difference would be the amount of time the query was blocked.

I have not tested any of with an open transaction holding X locks on a table - perhaps I will find time to do that. The results are sure to be interesting.

Posted by Jakub Dvorak on 29 July 2011

Dean can you please drop me your contact on sqltreeo.com (there's contact form)? It's always worth to have contact for DBA sitting on billion rows. :) thanks

Posted by karl 93387 on 29 July 2011

Maybe it's from my Oracle days but I've also used:

  select count(1) from table

Is there a performance difference between count(*) and count(1)?

Posted by Jakub Dvorak on 29 July 2011

No, there's no performance difference. Query plan is same for count(*) and count(1).

Leave a Comment

Please register or log in to leave a comment.