SELECT COUNT(*) or COUNT(1) Performance or Benefits?

  • What are the advantages of using COUNT(*) -or- COUNT(1) in the following examples:

    declare @Address1 int

    select @Address1 = count(1) from dbo.Customers where ID < 0

    select @Address1 as Address1count

    declare @Address1 int

    select @Address1 = count(*) from dbo.Customers where ID < 0

    select @Address1 as Address1count

     

    BT
  • * has the advantage that sql server will determine which is the best way to find the count(*). This will allow the server to do an index seek/scan without actually fetching the data on the hd. count(1|colname) doesn't always offer this possibility.

    I know somebody else will have a more complete explanation of this process.

  • COUNT(*) or count(1) returns a count of all records including NULLS.

    COUNT(expression) returns a count of all non-null records.

    The first two don't seem to have any performance difference or differences in the execution plan they generate.

  • I think Remi is right in that when you use COUNT(*) the optimizer chooses which index to use and returns the count faster as compared to COUNT(1) which need not necessarily be an indexed column.

    ******************
    Dinakar Nethi
    Life is short. Enjoy it.
    ******************

  • Darn good question!!! 

    I wasn't born stupid - I had to study.

  • From books online on msdn :

    count(*)

    Specifies that all rows should be counted to return the total number of rows in a table. COUNT(*) takes no parameters and cannot be used with DISTINCT. COUNT(*) does not require an expression parameter because, by definition, it does not use information about any particular column. COUNT(*) returns the number of rows in a specified table without eliminating duplicates. It counts each row separately, including rows that contain null values.

    And from a post on Oracle (ok it's not sql server but the same logic applies).

    Count(*) returns the total row count

    count(1) returns all rows where 1 is not null >> simplified version it returns all rows.

    So Oracle is optimized to use count(*) instead of count(number) because count(1) wouldn't offer any performance gain. And I can only assume that sql server will use the same logic. I've also tried many different variation of count() but unless the specified column allows nulls the plan is always the same : index scan. With a non-indexed nullable column : clustered index scan.

    You can run this and see for yourself :

    Select count(autoval) from dbo.SysColumns --cis

    Select count(*) from dbo.SysColumns --is

    Select count(1) from dbo.SysColumns --is

    Select count(id) from dbo.SysColumns --is

  • My first article at SQL Server Central discussed this: Advice on using COUNT( )

    As David says, COUNT(*) or COUNT(1) (or COUNT(9917456) or whatever) will be treated exactly the same way by the optimizer. They both simply say "count the number of rows in the set", so the optimizer is free to do that exactly as it whishes. The fastest way to count the number of rows in a table is to do a scan on the most narrow non-clustered index of the table, so that is that plan that will be chosen if possible.

  • It's upto what you are going to achieve. If the first column is NOT NULL, COUNT(*) and COUNT(1) have the same result. If it's NULLable, the result may be different.

    USE northwind

    GO

    SET NOCOUNT ON

    SET ANSI_WARNINGS ON

    select COUNT(*) FROM dbo.Customers

    select COUNT(CustomerID) FROM dbo.Customers

    select COUNT(Region) FROM dbo.Customers

    select COUNT(FAX) FROM dbo.Customers

    Result:

               

    -----------

    91

               

    -----------

    91

               

    -----------

    31

    Warning: Null value is eliminated by an aggregate or other SET operation.

               

    -----------

    69

    Warning: Null value is eliminated by an aggregate or other SET operation.

    Their execution plans are slightly different. The last one uses clustered index scan and others use index scan on my machine.

    select COUNT(*) FROM dbo.Customers

    StmtText                                                                   

    ---------------------------------------------------------------------------

      |--Compute Scalar(DEFINE[Expr1002]=Convert([Expr1003])))

           |--Stream Aggregate(DEFINE[Expr1003]=Count(*)))

                |--Index Scan(OBJECT[Northwind].[dbo].[Customers].[Region]))

    select COUNT(CustomerID) FROM dbo.Customers

    StmtText                                                                   

    ---------------------------------------------------------------------------

      |--Compute Scalar(DEFINE[Expr1002]=Convert([Expr1003])))

           |--Stream Aggregate(DEFINE[Expr1003]=Count(*)))

                |--Index Scan(OBJECT[Northwind].[dbo].[Customers].[Region]))

    select COUNT(Region) FROM dbo.Customers

    StmtText                                                                        

    --------------------------------------------------------------------------------

      |--Compute Scalar(DEFINE[Expr1002]=Convert([Expr1003])))

           |--Stream Aggregate(DEFINE[Expr1003]=COUNT_BIG([Customers].[Region])))

                |--Index Scan(OBJECT[Northwind].[dbo].[Customers].[Region]))

    select COUNT(FAX) FROM dbo.Customers

    StmtText                                                                                   

    -------------------------------------------------------------------------------------------

      |--Compute Scalar(DEFINE[Expr1002]=Convert([Expr1003])))

           |--Stream Aggregate(DEFINE[Expr1003]=COUNT_BIG([Customers].[Fax])))

                |--Clustered Index Scan(OBJECT[Northwind].[dbo].[Customers].[PK_Customers]))

     

     

     

  • >If the first column is NOT NULL, COUNT(*) and COUNT(1) have the same result. If it's NULLable, the result may be different.

    Could you elaborate on that please?

  • Maybe I didn't make me clear. What I mean is that if the Nth column can have NULL value in the table, COUNT(*) and COUNT(N) may have different value.

    In the example I given above,

    the CustomerID is NOT NULL in the table; Fax and Region can be  NULL in the table and some records REALLY have null values for the them. You can see the four COUNT() give different results.

    The table is defined as:

    CREATE TABLE [dbo].[Customers] (

     [CustomerID] [nchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

     [CompanyName] [nvarchar] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

     [ContactName] [nvarchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [ContactTitle] [nvarchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [Address] [nvarchar] (60) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [City] [nvarchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [Region] [nvarchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [PostalCode] [nvarchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [Country] [nvarchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [Phone] [nvarchar] (24) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

     [Fax] [nvarchar] (24) COLLATE SQL_Latin1_General_CP1_CI_AS NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[Customers] WITH NOCHECK ADD

     CONSTRAINT [PK_Customers] PRIMARY KEY  CLUSTERED

     (

      [CustomerID]

    &nbsp  ON [PRIMARY]

    GO

     CREATE  INDEX [City] ON [dbo].[Customers]([City]) ON [PRIMARY]

    GO

     CREATE  INDEX [CompanyName] ON [dbo].[Customers]([CompanyName]) ON [PRIMARY]

    GO

     CREATE  INDEX [PostalCode] ON [dbo].[Customers]([PostalCode]) ON [PRIMARY]

    GO

     CREATE  INDEX [Region] ON [dbo].[Customers]([Region]) ON [PRIMARY]

    GO

     

     

  • Yes, you are right in that the queries return different results. The reasons were also mentioned earlier in this thread and in my article. As noted the reason is that COUNT(expression), where expression might simply be the name of a column, does not count occurences where expression evaluates to null. Since COUNT(*) counts every row, there is no concept of null involved (if there is a row then there is one, no matter how many of the columns on it that are null, or even might be) and therefore the results might differ. From the results we can see that there are 91 rows in the table Northwind.dbo.Customers, of which 60 (91-31) do not have a value (they are null) for column Region and 22 (91-69) do not have a value for column Fax.

    You are also correct in that whether a column allows nulls or not might affect which execution plan is chosen. If a column that does not allow nulls is used as the expression, such as COUNT(CustomerId), this has the same meaning as COUNT(*) since all rows should be counted. Therefore the same plans will be used. If it does allow null then there might be rows with null and therefore the expression must be evaluated for each row (here evaluated means that the value of the column must be checked). If there is an index on the column then a scan of that index will be used (as in the COUNT(Region) example query), and if there is not an index then a table scan/clustered index scan will be used (as with the COUNT(Fax) example).

    However, to finally get back to why I asked you to elaborate, I think you have misunderstood COUNT(1). The 1 does not indicate the ordinal number of a column, it is simply an expression that will always be true. So COUNT(*) and COUNT(1) will never return different results and will always use the same plan, since they are treated as identical queries by the optimizer.

  • i started using count(1) when count(*) appeared not to return results in a timely manner (on very large table, lots of rows/columns).  if truly implemented the same way by sql server, it must be serendipity i experienced a difference in performance. 

     

    that could be due to trying first with count(*), getting the table loaded in memory in the process, then moving on to using count(1).  i think of '1' taking up less memory internally to process, but there's no real foundation for that.

     

  • Well, it might of course turn out that I went too far when I said that "COUNT(*) and COUNT(1) will never return different results and will always use the same plan, since they are treated as identical queries by the optimizer", since I have no way of knowing that for 100% certainty. (Oh, what I wouldn't do for the SQL Server source code)

    That said, I would like more information from your example to be convinced that they were not treated the same in your specific situation. It is very probable that the reason is that the pages were fetched from disk in the first case and directly from cache in the second, just as you speculate.

    >i think of '1' taking up less memory internally to process, but there's no real foundation for that.

    In what way would it take up less memory than '*'? They are both nothing more than a character in a sql statement that is parsed by the optimizer into an execution plan. The execution plans for COUNT(*) and COUNT(1) have always been equal in the situations that I have encountered, but if they were not in your example (or any other situation) then of course there could be a difference in execution time.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply