SELECT COUNT(*) vs DMVs to get row count

  • Is this an accurate way to get row count?

    SELECT

    t.name table_name,

    s.name schema_name,

    p.rows AS total_rows

    FROM sys.tables t

    join sys.schemas s on (t.schema_id = s.schema_id)

    join sys.partitions p on (t.object_id = p.object_id)

    WHERE p.index_id in (0,1) AND t.name='MyTable' AND p.rows=0

    vs the old and simple, SELECT COUNT(*) ? ...

    I'm about to query hundreds of databases to get row count for a particular table. The 2nd one looks more resource intensive to me, but I want to be sure the DMV provides accurate row information.

    Thanks in advance.

  • sql-lover (4/10/2015)


    Is this an accurate way to get row count?

    SELECT

    t.name table_name,

    s.name schema_name,

    p.rows AS total_rows

    FROM sys.tables t

    join sys.schemas s on (t.schema_id = s.schema_id)

    join sys.partitions p on (t.object_id = p.object_id)

    WHERE p.index_id in (0,1) AND t.name='MyTable' AND p.rows=0

    vs the old and simple, SELECT COUNT(*) ? ...

    I'm about to query hundreds of databases to get row count for a particular table. The 2nd one looks more resource intensive to me, but I want to be sure the DMV provides accurate row information.

    Thanks in advance.

    If ALL of your statistics are refreshed first this should return an accurate count...or really close in nearly every circumstance. Meaning that if the stats are fresh the row count is not likely to be off by more than a small handfull on a couple of tables. If something happens between the index refresh and your query they may be off by a few rows. If 100% accuracy is not critical this is fine. If you need pinpoint accuracy you will need to count the rows.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I use a slightly different query. I like to get the rows from the sys.dm_db_partition_stats

    SELECT

    so.name as TableName

    ,ps.row_count as TotalRows

    FROM sys.dm_db_partition_stats ps

    LEFT JOIN sys.objects so

    ON ps.object_id=so.object_id

    WHERE

    so.name='TableName'

    and ps.index_id = 1;

    James Phillips
    Sr. Consultant
    Pragmatic Works

  • Thanks for the responses so far.

    I need 100% certainty. The DMV/count is needed to identify the table(s) with an empty set. A massive DDL change will be deployed against those tables with zero rows and it will be done live. But! ... If there are rows, means an specific application module is active for that client and the DDL should not touch that table so changes should be perform on that client during the maintenance window only.

    But I had the feeling that statistics increases the accuracy of the DMV. It seems that I was right.

    So going back to my original question. If I want to be 100% sure that the table has zero rows, should I use COUNT(*) instead?

  • Quick suggestion, simplify the query, the difference between the first and the second is 50%

    😎

    SELECT

    t.name table_name,

    s.name schema_name,

    p.rows AS total_rows

    FROM sys.tables t

    join sys.schemas s on (t.schema_id = s.schema_id)

    join sys.partitions p on (t.object_id = p.object_id)

    WHERE p.index_id in (0,1) AND p.rows=0;

    SELECT

    OBJECT_NAME(SP.object_id) AS TABLE_NAME

    ,OBJECT_SCHEMA_NAME(ST.object_id) AS SCHEMA_NAME

    ,SP.rows AS TOTAL_ROWS

    FROM sys.partitions SP

    INNER JOIN sys.tables ST

    ON SP.object_id = ST.object_id

    WHERE SP.rows = 0;

  • sql-lover (4/10/2015)


    So going back to my original question. If I want to be 100% sure that the table has zero rows, should I use COUNT(*) instead?

    Yes. If you want 100% accuracy the guaranteed way is to look at the table. It may be slow on large tables. Of course if you are just trying to determine if there are rows you should EXISTS instead of COUNT.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Agree much less costly to just do a if exists (select top 1 from tablename)

    James Phillips
    Sr. Consultant
    Pragmatic Works

  • sql-lover (4/10/2015)


    Thanks for the responses so far.

    I need 100% certainty. The DMV/count is needed to identify the table(s) with an empty set. A massive DDL change will be deployed against those tables with zero rows and it will be done live. But! ... If there are rows, means an specific application module is active for that client and the DDL should not touch that table so changes should be perform on that client during the maintenance window only.

    But I had the feeling that statistics increases the accuracy of the DMV. It seems that I was right.

    So going back to my original question. If I want to be 100% sure that the table has zero rows, should I use COUNT(*) instead?

    The sys.partitions count is accurate or like Gail Shaw once put it, if it isn't then report a bug with Microsoft. The difference between these approaches is that using the sys.partitions is a single table query, using exist or count requires query on every single table, not likely to be more efficient!

    😎

  • Sean - I am interested in why you say that COUNT(*) is the only way to get a row count. I'm not disagreeing with you - just curious looking for a more detailed explanation of what's wrong with sys.partitions.

    Eirikur Eiriksson (4/10/2015)


    The difference between these approaches is that using the sys.partitions is a single table query, using exist or count requires query on every single table, not likely to be more efficient!

    😎

    If you are saying that the query posted by the OP would not be faster than SELECT COUNT(*) FROM <MyTable> for a row count then I'm going to have to respectfully disagree.

    (temp1 is my laptop test db)

    USE temp1

    GO

    IF OBJECT_ID('temp1.dbo.X') IS NOT NULL DROP TABLE dbo.X

    GO

    CREATE TABLE dbo.X (xID int not null, xTxt varchar(36) not null);

    WITH

    E1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(c)),

    E3(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c)

    INSERT dbo.X

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT ($))), newid()

    FROM E3 a, E3 b;

    SET NOCOUNT ON;

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

    SET STATISTICS TIME ON

    SELECT COUNT(*) FROM dbo.X;

    SELECT p.rows AS total_rows

    FROM sys.tables t

    join sys.schemas s on (t.schema_id = s.schema_id)

    join sys.partitions p on (t.object_id = p.object_id)

    WHERE p.index_id in (0,1) AND t.name='x';

    SET STATISTICS TIME OFF

    GO

    Results:

    SQL Server Execution Times:

    CPU time = 641 ms, elapsed time = 345 ms.

    SQL Server Execution Times:

    CPU time = 15 ms, elapsed time = 23 ms.

    Note that the SELECT COUNT(*) method is using a parallel plan to complete the task in 345 ms.

    Update/edit: I missed the updated query that you posted which completes in 0 ms on my laptop, 9ms for 4,000,000 rows.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Ok, quick update... I did some research on this - it appears that sys.partitions does not guarantee an accurate row count. According to Microsoft, the rows column in sys.partitions "Indicates the approximate number of rows in this partition."

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (4/10/2015)


    Ok, quick update... I did some research on this - it appears that sys.partitions does not guarantee an accurate row count. According to Microsoft, the rows column in sys.partitions "Indicates the approximate number of rows in this partition."

    AFAIK, sys.partitions and sys.dm_db_partition_stats are accurate but might not give exact information due to problems with transactions. Statistics don't affect them the way sysindexes was affected.

    These are other possible queries.

    SELECT SUM (row_count)

    FROM sys.dm_db_partition_stats

    WHERE object_id=OBJECT_ID('TableName')

    AND (index_id=0 or index_id=1);

    EXEC sp_spaceused 'TableName'

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Quick thought, let's not forget that tables can have more than one partition

    😎

    SELECT

    OBJECT_NAME(SP.object_id) AS TABLE_NAME

    ,OBJECT_SCHEMA_NAME(SP.object_id) AS SCHEMA_NAME

    ,SUM(SP.rows) AS TOTAL_ROWS

    FROM sys.partitions SP

    WHERE SP.index_id IN (0,1)

    GROUP BY SP.object_id

    HAVING SUM(SP.rows) = 0;

  • As far as I know, if you have to have a guaranteed, 100% accurate, count of the rows in the table, the way to do it is COUNT. Everything I know about all the other DMVs say that they'll get you close, if not spot on, most of the time. But it's that most bit that's an issue if you have to have a perfect set of counts. The thing is, why do you need a perfect set of row counts?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (4/11/2015)


    As far as I know, if you have to have a guaranteed, 100% accurate, count of the rows in the table, the way to do it is COUNT. Everything I know about all the other DMVs say that they'll get you close, if not spot on, most of the time. But it's that most bit that's an issue if you have to have a perfect set of counts. The thing is, why do you need a perfect set of row counts?

    So the script will run on empty sets only. Those are the only databases that can be done on the fly.

  • sql-lover (4/11/2015)


    Grant Fritchey (4/11/2015)


    As far as I know, if you have to have a guaranteed, 100% accurate, count of the rows in the table, the way to do it is COUNT. Everything I know about all the other DMVs say that they'll get you close, if not spot on, most of the time. But it's that most bit that's an issue if you have to have a perfect set of counts. The thing is, why do you need a perfect set of row counts?

    So the script will run on empty sets only. Those are the only databases that can be done on the fly.

    Belt and braces then?

    😎

    SELECT COUNT(*)

    FROM dbo.TABLE_NAME WITH (TABLOCKX);

Viewing 15 posts - 1 through 15 (of 38 total)

You must be logged in to reply to this topic. Login to reply