What's the best way to count?

Question

What's the best way to count?

Dave62

SSCertifiable

Points: 7459
More actions
October 18, 2010 at 8:23 pm

#228196

Comments posted to this topic are about the item What's the best way to count?

Viewing 15 posts - 1 through 15 (of 57 total)

You must be logged in to reply to this topic. Login to reply

Oleg Netchaev SSCertifiable Points: 5276 More actions · Answer 1

This is a very good question, thank you Dave. The key here is the index_id: if the table has a clustered index then there will be a record in the stats view with index_id = 1. If the table is a heap then it will have index_id = 0. There is no way to have both index_id 0 and 1, so index_id < 2 or like you have it index_id = 0 or index_id = 1 should suffice. Grouping is important because it possible to have partitioned table.

Oleg

malleswarareddy_m SSCertifiable Points: 5847 More actions · Answer 2

Good question.keep on posting this type of questions.

Malleswarareddy
I.T.Analyst
MCITP(70-451)

David.Poole SSC Guru Points: 75896 More actions · Answer 3

I can't see anything that says that the figure returned will be reliable. Fast yes, blindingly so. Reliable.....not so sure.

LinkedIn Profile

vk-kirov SSCertifiable Points: 7686 More actions · Answer 4

which query is the fastest and most reliable for returning an accurate record count?

(One of) the fastest? Yes. The most reliable? I don't think so, because that query counts both committed and uncommitted records, while the "SELECT COUNT(*)" statement counts only committed records.

Here is an example.

1) Create a table:

CREATE TABLE dbo.QOTD_TEST (ID INT);

2) Insert several records into the table, don't commit them:

BEGIN TRANSACTION;

INSERT dbo.QOTD_TEST VALUES(1);

INSERT dbo.QOTD_TEST VALUES(2);

INSERT dbo.QOTD_TEST VALUES(3);

3) In another query window (e.g. in another connection) run queries 2-4:

SELECT CONVERT(bigint, [rows])

FROM sys.sysindexes

WHERE id = OBJECT_ID('dbo.QOTD_TEST') AND indid < 2

SELECT CAST(p.[rows] AS float)

FROM sys.tables AS tbl

INNER JOIN sys.indexes AS idx ON idx.[object_id] = tbl.[object_id] And idx.index_id < 2

INNER JOIN sys.partitions AS p ON p.[object_id] = CAST(tbl.[object_id] AS int) And p.index_id=idx.index_id

WHERE ((tbl.name = N'QOTD_TEST' AND SCHEMA_NAME(tbl.schema_id) = 'dbo'));

SELECT SUM (row_count)

FROM sys.dm_db_partition_stats

WHERE object_id = OBJECT_ID('dbo.QOTD_TEST') AND (index_id=0 or index_id=1);

All these queries return the value 3.

4) After that, run the 1st query:

SELECT COUNT(*) FROM dbo.QOTD_TEST;

If the database READ_COMMITTED_SNAPSHOT option is set to ON, this query will immediately return the value 0. If READ_COMMITTED_SNAPSHOT = OFF, this query will wait until the locks are removed.

5) In 1st query window, roll back the transaction and drop the table:

ROLLBACK TRANSACTION;

DROP TABLE dbo.QOTD_TEST;

I think that the 1st query is the most reliable, while others are the fastest.

Bob JH Cullen SSCrazy Points: 2082 More actions · Answer 5

I chose #1 because all the others appeared to be referencing indexes, and there was nothing in the question that stated the table had an index.

Maybe I'm missing something here...?

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 6

I liked the question - allthough with the exception of option 1 on really large (or busy) tables, all options will be blindingly fast, and you won't run this type of query often enough that a few microseconds more or less really matters.

I gave the right answer, but not for the reasons listed. The question specifically asked for "fastest and most reliable", so I focused on those aspects of the queries.

Option 1 scans the complete table (or one of the indexes). Absolutely reliable, but not fast. This answer can only be correct if all others are not.

Option 3 is not reliable. The Books Online entry for sys.partitions specifically describes the "rows" column as an "approximate number of rows in this partition". (The cast to float of the result and the cast to int of object_id are not wrong, though they are of course weird).

Options 2 and 4 are both reliable. (The explanation of the question says sysindexes can be unreliable. This is not supported by the referenced sources. The sysindexes system table was unreliable in SQL 2000 and before, but this has changed in SQL Server 2005 - here is a reference. Also, Books Online does not contain the words "approximate", "inaccurate", or anything similar for these views, as it does for sys.partitions.

Both the question explanation and the description in Books Online of sys.sysindexes mention that the sys.sysindexes compatibility view will be removed in future versions of SQL Server. This is very relevant for real work - but it was not a deciding factor in this question, so this should not be used to decide between options 2 and 4.

The same page in Books Online also contains this warning: "Partitioned tables and indexes are not fully supported in this view; use the sys.indexes catalog view instead." The table used in the example code is not partitioned, but I interpreted the code as examples for a randomly chosen sample table, and the question about the method in general. Since method 2 does not work for partitioned tables, it's not reliable.

Method 4 and method 1 are both reliable. Method 1 scans the table (or an index); method 4 uses systemm views - this one is much faster. That's why I chose method 4.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 7

Bob Cullen-434885 (10/19/2010)
I chose #1 because all the others appeared to be referencing indexes, and there was nothing in the question that stated the table had an index.
Maybe I'm missing something here...?

If the table has a clustered index, information is stored in the system views as the index with index id 1.

If the table has no clustered index, information is stored in the system views as the index with index id 0. Not a real index, but apprently Microsoft decided to take a practical approach and not use specific different system views for heaps (tables without clustered index).

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Bob JH Cullen SSCrazy Points: 2082 More actions · Answer 8

Well, at least I got something right - I had missed something!!

Thanks, Hugo 🙂

vk-kirov SSCertifiable Points: 7686 More actions · Answer 9

Hugo Kornelis (10/19/2010)
The same page in Books Online also contains this warning: "Partitioned tables and indexes are not fully supported in this view; use the sys.indexes catalog view instead." ... Since method 2 does not work for partitioned tables, it's not reliable.

The 2nd query works just fine for partitioned tables 🙂

That BOL page states what columns are not supported in case of partitioned tables: first, root, groupid, dpages, reserved, used, rowcnt, rowmodctr, and FirstAIM. The column 'rows' is not in this list, and it's supported.

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 10

vk-kirov (10/19/2010)
Hugo Kornelis (10/19/2010)
The same page in Books Online also contains this warning: "Partitioned tables and indexes are not fully supported in this view; use the sys.indexes catalog view instead." ... Since method 2 does not work for partitioned tables, it's not reliable.
The 2nd query works just fine for partitioned tables 🙂
That BOL page states what columns are not supported in case of partitioned tables: first, root, groupid, dpages, reserved, used, rowcnt, rowmodctr, and FirstAIM. The column 'rows' is not in this list, and it's supported.

I missed that - I stopped reading after seeing that quote. Thanks, vk-kirov!

With this additional information, I'd argue that options 2 and 4 are both correct. The unreliability of option 2 stopped after SQL 2000 (and the question specifically mentions SQL 2005 and later), and being deprecated was nog mentioned as a deciding factor in the question.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Koen Verbeeck SSC Guru Points: 259085 More actions · Answer 11

Option 1 isn't that reliable. For really really large tables (e.g. number of records > 2^31-1) it will produce an error. To be 100% sure, COUNT_BIG should have been used.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Koen Verbeeck SSC Guru Points: 259085 More actions · Answer 12

Oh yeah I forgot: great question! 😀

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

free_mascot One Orange Chip Points: 27169 More actions · Answer 13

One more challenging question, thank you Dave.

---------------------------------------------------
"Thare are only 10 types of people in the world:
Those who understand binary, and those who don't."

vk-kirov SSCertifiable Points: 7686 More actions · Answer 14

Hugo Kornelis (10/19/2010)
With this additional information, I'd argue that options 2 and 4 are both correct.

And what do you think about counting uncommitted records, is it such a reliable thing? 🙂