How To Get Table Row Counts Quickly And Painlessly

  • Comments posted to this topic are about the item How To Get Table Row Counts Quickly And Painlessly

    Kendal Van Dyke
    http://kendalvandyke.blogspot.com/[/url]

  • In order for the rowcounts found in sysIndexes in SQL Server 2000 to be anywhere close to accurate, you need to use DBCC UPDATEUSAGE on the table you're trying to get the rowcount on. To get the rowcounts (and a lot of other information) for all tables in a database, you need to do something like this ...

    --_______________________________________________________________________________________________________________________

    /**********************************************************************************************************************

    Purpose:

    Returns a single result set similar to sp_Space used for all user tables at once.

    Notes:

    1. May be used as a view, stored procedure, or table-valued funtion.

    2. Must comment out 1 "Schema" in the SELECT list below prior to use. See the adjacent comments for more info.

    Revision History:

    Rev 00 - 22 Jan 2007 - Jeff Moden

    - Initital creation for SQL Server 2000

    Rev 01 - 11 Mar 2007 - Jeff Moden

    - Add automatic page size determination for future compliance

    Rev 02 - 05 Jan 2008 - Jeff Moden

    - Change "Owner" to "Schema" in output. Add optional code per Note 2 to find correct schema name

    **********************************************************************************************************************/

    --===== Ensure that all row counts, etc is up to snuff

    -- Obviously, this will not work in a view or UDF and should be removed if in a view or UDF. External code should

    -- execute the command below prior to retrieving from the view or UDF.

    DBCC UPDATEUSAGE(0) WITH COUNT_ROWS, NO_INFOMSGS

    --===== Return the single result set similar to what sp_SpaceUsed returns for a table, but more

    SELECT DBName = DB_NAME(),

    --SchemaName = SCHEMA_NAME(so.UID), --Comment out if for SQL Server 2000

    SchemaName = USER_NAME(so.UID), --Comment out if for SQL Server 2005

    TableName = so.Name,

    TableID = so.ID,

    MinRowSize = MIN(si.MinLen),

    MaxRowSize = MAX(si.XMaxLen),

    ReservedKB = SUM(CASE WHEN si.IndID IN (0,1,255) THEN si.Reserved ELSE 0 END) * pkb.PageKB,

    DataKB = SUM(CASE WHEN si.IndID IN (0,1 ) THEN si.DPages ELSE 0 END) * pkb.PageKB

    + SUM(CASE WHEN si.IndID IN ( 255) THEN ISNULL(si.Used,0) ELSE 0 END) * pkb.PageKB,

    IndexKB = SUM(CASE WHEN si.IndID IN (0,1,255) THEN si.Used ELSE 0 END) * pkb.PageKB

    - SUM(CASE WHEN si.IndID IN (0,1 ) THEN si.DPages ELSE 0 END) * pkb.PageKB

    - SUM(CASE WHEN si.IndID IN ( 255) THEN ISNULL(si.Used,0) ELSE 0 END) * pkb.PageKB,

    UnusedKB = SUM(CASE WHEN si.IndID IN (0,1,255) THEN si.Reserved ELSE 0 END) * pkb.PageKB

    - SUM(CASE WHEN si.IndID IN (0,1,255) THEN si.Used ELSE 0 END) * pkb.PageKB,

    Rows = SUM(CASE WHEN si.IndID IN (0,1 ) THEN si.Rows ELSE 0 END),

    RowModCtr = MIN(si.RowModCtr),

    HasTextImage = MAX(CASE WHEN si.IndID IN ( 255) THEN 1 ELSE 0 END),

    HasClustered = MAX(CASE WHEN si.IndID IN ( 1 ) THEN 1 ELSE 0 END)

    FROM dbo.SysObjects so,

    dbo.SysIndexes si,

    (--Derived table finds page size in KB according to system type

    SELECT Low/1024 AS PageKB --1024 is a binary Kilo-byte

    FROM Master.dbo.spt_Values

    WHERE Number = 1 --Identifies the primary row for the given type

    AND Type = 'E' --Identifies row for system type

    ) pkb

    WHERE si.ID = so.ID

    AND si.IndID IN (0, --Table w/o Text or Image Data

    1, --Table with clustered index

    255) --Table w/ Text or Image Data

    AND so.XType = 'U' --User Tables

    AND PERMISSIONS(so.ID) 0

    GROUP BY so.Name,

    so.UID,

    so.ID,

    pkb.PageKB

    ORDER BY ReservedKB DESC

    Notice that there are no cursors or While loops... sp_MSForEachTable is nothing but the world's nastiest cursor behind the scenes. The code above also has the advantage of returning all the data in a single result set.

    So far as running SELECT COUNT(*) against a million row table goes.... try it out... I believe you'll be surprised at how quick it actually is and it's accurate at the time of the run...

    Here's a million row test table...

    --===== Create and populate a 1,000,000 row test table.

    -- Column "RowNum" has a range of 1 to 100,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

    -- Jeff Moden

    SELECT TOP 1000000

    SomeID = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    -- Takes about 1 second to execute.

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (SomeID)

    And here's the test to run against it...

    SELECT COUNT(*) FROM dbo.JBMTest

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • "Quickly and Painlessly" doesn't seem to fit the method, when SELECT COUNT(*) FROM myTable is just four little words (ok, a function ain't a word, whatever) long, and the alternative is joining three tables together and using a WHERE clause. Point taken that the system objects hold this data, but not sure that's easier.

    Well-written article though, clear, concise and to the point. Good job!

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • I find this works well

    USE DatabaseName

    GO

    CREATE TABLE #temp (

    table_name sysname ,

    row_count INT,

    reserved_size VARCHAR(50),

    data_size VARCHAR(50),

    index_size VARCHAR(50),

    unused_size VARCHAR(50))

    SET NOCOUNT ON

    INSERT #temp

    EXEC sp_msforeachtable 'sp_spaceused ''?'''

    SELECT a.table_name,

    a.row_count,

    COUNT(*) AS col_count,

    a.data_size

    FROM #temp a

    INNER JOIN information_schema.columns b

    ON a.table_name collate database_default

    = b.table_name collate database_default

    GROUP BY a.table_name, a.row_count, a.data_size

    ORDER BY CAST(REPLACE(a.data_size, ' KB', '') AS integer) DESC

    DROP TABLE #temp

  • It's useful if you want to know which tables contain data, or contain a lot of rows. I've used the method in the past after a customer had reported unexpected growth in their database. I quickly determined that a logging table existed with lots of rows in it and that it wasn't being cleared down.

    David

  • The undocumented stored procedure is sp_MSforeachtable, not sp_foreachtable.

    So you could do something like the following to return the results quickly:

    sp_MSforeachtable 'sp_spaceused ''?'''

    There is also the undocumented stored procedure sp_MSforeachdb.

    ----------------------------------------------------------------------------------------
    Dan English - http://denglishbi.wordpress.com

  • Good article. It's good to see sp_spaceused in an article. It provides a lot of great information for one little proc.

    I still can't believe that you can get a count from SQL Server faster in FoxPro than you can in the native product. It is dumb that we have to worry about slow record counts and use JET to see how many rows there are in order to do it fast. I think the development team should have a hash like JET uses that is integral to SQL Server.

  • The post is useful to get rowcount, but for the partitioned table it does not give correct rowcount.

    For the partitioned table, it gives count on individual partition rather than count of whole table

  • I find the report of DBCC CHECKDB useful.

    Why doesn't "SELECT COUNT(*) FROM table" look at an index, which apparently it doesn't(?)

  • I ran count(*), and sp_spaceused on a table, got 9997796 with count(*) and 9997764 with sp_spaceused. I told it @updateusage = 'True'.

    My first thought is 'corrupt table'. Comments ?

  • rja.carnegie (9/2/2009)


    Why doesn't "SELECT COUNT(*) FROM table" look at an index, which apparently it doesn't(?)

    It should pick the smallest index (by page reads).

  • I ran count(*) and got the count as 1048912 and when I ran sp_spaceused I get 1048613 , the count doesnt match . Any Comments?

  • jvrakesh (9/2/2009)


    I ran count(*) and got the count as 1048912 and when I ran sp_spaceused I get 1048613 , the count doesnt match . Any Comments?

    Yes... see my previous post (2nd post in this thread).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I had the same problem, different counts from sp_spaceused versus select count(*)

    So I tried sp_spaceused with @updateusage = 'True', and I also ran DBCC UPDATEUSAGE for the table.

    And the numbers never varied. My only thought is possibly a corrupt table.

    Luckily I have little need for record counts from large tables :^)...

  • I find it baffling that the query planner can't provide a way to do count(*) quickly in the common case. It does so many other clever things; this one seems fairly straightforward to implement by comparison, and there's obviously a need for it, as the existence of this article shows.

Viewing 15 posts - 1 through 15 (of 108 total)

You must be logged in to reply to this topic. Login to reply