COUNT NULLs

  • g.britton (1/21/2015)


    Just though I'd dispel the notion that only count(*) counts nulls:

    select count('null') as count_null from (select null n union all select null) countnull

    returns:

    count_null

    2

    We can have even more fun:

    with thing(I) as (

    select top 65536 NULL

    from master.sys.all_columns cross join master.sys.all_objects

    )

    select count(1/0) as NULLs from thing;

    NULLs

    65536

    Tom

  • g.britton (1/21/2015)


    Just though I'd dispel the notion that only count(*) counts nulls:

    select count('null') as count_null from (select null n union all select null) countnull

    returns:

    count_null

    2

    This code is not counting NULLs. The inner query returns two one-column rows (both with a NULL value in the column). The outer query then substitutes the string constant 'null' for every row from the inner query (disregarding the original content of that row), then counts the results. There were two rows, both with a non-NULL string value, so the result is 2.

    Do not confuse the NULL value with the string value 'null'. They are not the same.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • you may be right but the effect is exactly same so the fact that behind the scenes the process is different is probably irrelevant

    Gerald Britton, Pluralsight courses

  • g.britton (1/22/2015)


    you may be right but the effect is exactly same so the fact that behind the scenes the process is different is probably irrelevant

    The shorter explanation is that your code is not counting NULL values. It is replacing NULL values with something else and then counting that something else.

    You post that to "dispel the notion that only count(*) counts nulls", but the code you posted counts string constants. It does not dispel anything.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (1/22/2015)


    g.britton (1/22/2015)


    you may be right but the effect is exactly same so the fact that behind the scenes the process is different is probably irrelevant

    The shorter explanation is that your code is not counting NULL values. It is replacing NULL values with something else and then counting that something else.

    You post that to "dispel the notion that only count(*) counts nulls", but the code you posted counts string constants. It does not dispel anything.

    Oh yes it does! It explicitly, by example, dispels the notion that only count(*) counts nulls. In fact any constant value will count nulls. Not only that, the optimizer recognizes this condition and replaces count('null') with count(*)

    e.g. the following queries have exactly the same execution plans, right down to the scalar expression evaluated"

    select count($) from (select null) _(_)

    select count(*) from (select null) _(_)

    In both, the expression evaluated is [Expr1002] = Scalar Operator(Count(*))

    Gerald Britton, Pluralsight courses

  • g.britton (1/22/2015)


    Hugo Kornelis (1/22/2015)


    g.britton (1/22/2015)


    you may be right but the effect is exactly same so the fact that behind the scenes the process is different is probably irrelevant

    The shorter explanation is that your code is not counting NULL values. It is replacing NULL values with something else and then counting that something else.

    You post that to "dispel the notion that only count(*) counts nulls", but the code you posted counts string constants. It does not dispel anything.

    Oh yes it does! It explicitly, by example, dispels the notion that only count(*) counts nulls. In fact any constant value will count nulls. Not only that, the optimizer recognizes this condition and replaces count('null') with count(*)

    e.g. the following queries have exactly the same execution plans, right down to the scalar expression evaluated"

    select count($) from (select null) _(_)

    select count(*) from (select null) _(_)

    In both, the expression evaluated is [Expr1002] = Scalar Operator(Count(*))

    If you replace nulls with constant values, then you are counting those constant values. Not the nulls. They have been replaced.

    Your logic is similar to saying that all elephants are small (if you first replace them with mice).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • nice question...

    Manik
    You cannot get to the top by sitting on your bottom.

  • Hugo Kornelis (1/22/2015)


    g.britton (1/22/2015)


    Hugo Kornelis (1/22/2015)


    g.britton (1/22/2015)


    you may be right but the effect is exactly same so the fact that behind the scenes the process is different is probably irrelevant

    The shorter explanation is that your code is not counting NULL values. It is replacing NULL values with something else and then counting that something else.

    You post that to "dispel the notion that only count(*) counts nulls", but the code you posted counts string constants. It does not dispel anything.

    Oh yes it does! It explicitly, by example, dispels the notion that only count(*) counts nulls. In fact any constant value will count nulls. Not only that, the optimizer recognizes this condition and replaces count('null') with count(*)

    e.g. the following queries have exactly the same execution plans, right down to the scalar expression evaluated"

    select count($) from (select null) _(_)

    select count(*) from (select null) _(_)

    In both, the expression evaluated is [Expr1002] = Scalar Operator(Count(*))

    If you replace nulls with constant values, then you are counting those constant values. Not the nulls. They have been replaced.

    Your logic is similar to saying that all elephants are small (if you first replace them with mice).

    For what it's worth, Hugo, I agree with you with respect to the theory. However, what I am claiming is that SQL Server treats

    SELECT COUNT(*) FROM elephants

    the same as

    SELECT COUNT('mice') FROM elephants

    I claim two things in particular:

    1. both queries return the same results

    2. SQL Server generates the same execution plan for both.

    My investigation shows that both claims are true, at least with respect to these simple queries. Not only that, I have shown that SQL Server replaces COUNT('mice') with COUNT(*) when building the execution plan. You can see this in the execution plan. The compiler generates

    <Aggregate AggType="countstar" Distinct="0"/>

    for both.

    I am certainly open to having my claims refuted. However, a theoretical argument will not do. You need to provide a counterexample.

    Gerald Britton, Pluralsight courses

  • Very good question on the basics. 🙂

  • g.britton (1/23/2015)


    You're guessing and hoping you are right, but you are wrong. The execution plan proves it.

    In SQL, you need to make a distinction between the definition of the language, and the actual execution. The ANSI standard for SQL allows vendors to implement engines to evaluate queries in every order and every method they see fit as long as the results are the same. The execution plan shows the latter, so the only thing that can ever be proven by looking at an execution plan is how the querry was evaluated, not what the language is defined to mean.

    Execution plans can also vary between versions and even change between executions on the same server. I tested your original query on a SQL Server 2012 instance, and got this execution plan:

    Following the flow of data (right to left), the compute scalar operator produces two "empty" rows (i.e. rows with no columns). You can verify this by looking at the output property, which is empty (the outputted colums are normally listed here). You can also see this from the row size of the arrow (data flow) leaving the operator - 9 bytes, exactly the row overhead, so no real columns. The optimizer knows that whatever value is used in the inner query is not used anywhere, so it never bothers to put it in. It only cares about the amount of rows.

    The second operator from the right is a Stream Aggregate. There is no Group By property, so this does a global aggregation. The actual aggregation done is found in the Defined Values property, and on my system it reads "[Expr1004] = Scalar Operator(Count(*))". The optimizer has realized that the <b>LOGICAL</b> query ("produce two rows with some irrelevant values, then replace those irrelevant values with the string constant 'null' and count the number of non-NULL values") is equivalent to "produce two empty rows and count the number of rows". This deduction could be made because the 'null' string constant can never be NULL. If the expression used in the outer query had been complex enough that it could result in NULL (or that the optimizer could not with 100% certainty exclude that option), then this optimization would not have been done.

    The third operator from the right simply converts the result of the aggregation to integer, and the left-most operator returns it to the client.

    So, long story short:

    1. Looking at an execution plan is not a good way to draw conclusions on the meaning of SQL language elements. They are defined in the ANSI standard and in Books Onliine. The execution plan shows the actual execution, which can be different, as long as the results are the same.

    2. If you still insist on looking at the execution plan, you will see that the optimizer has transformed the query to something that in pseudo-SQL would look like "SELECT COUNT(*) FROM (SELECT <no columns> FROM <two rows>) AS countnull". So the actual execution is doing count(*).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (1/23/2015)


    g.britton (1/23/2015)


    You're guessing and hoping you are right, but you are wrong. The execution plan proves it.

    In SQL, you need to make a distinction between the definition of the language, and the actual execution. The ANSI standard for SQL allows vendors to implement engines to evaluate queries in every order and every method they see fit as long as the results are the same. The execution plan shows the latter, so the only thing that can ever be proven by looking at an execution plan is how the querry was evaluated, not what the language is defined to mean.

    of course

    Execution plans can also vary between versions and even change between executions on the same server.

    FWIW I tested it on 2005 through 2014

    So, long story short:

    1. Looking at an execution plan is not a good way to draw conclusions on the meaning of SQL language elements. They are defined in the ANSI standard and in Books Onliine. The execution plan shows the actual execution, which can be different, as long as the results are the same.

    which was my point, exactly. I said nothing about SQL language elements or ANSI standards. I only said that on SQL Server, there is no difference.

    2. If you still insist on looking at the execution plan, you will see that the optimizer has transformed the query to something that in pseudo-SQL would look like "SELECT COUNT(*) FROM (SELECT <no columns> FROM <two rows>) AS countnull". So the actual execution is doing count(*).

    Again, that is exactly what I said

    Gerald Britton, Pluralsight courses

  • g.britton (1/23/2015)


    So, long story short:

    1. Looking at an execution plan is not a good way to draw conclusions on the meaning of SQL language elements. They are defined in the ANSI standard and in Books Onliine. The execution plan shows the actual execution, which can be different, as long as the results are the same.

    which was my point, exactly. I said nothing about SQL language elements or ANSI standards. I only said that on SQL Server, there is no difference.

    2. If you still insist on looking at the execution plan, you will see that the optimizer has transformed the query to something that in pseudo-SQL would look like "SELECT COUNT(*) FROM (SELECT <no columns> FROM <two rows>) AS countnull". So the actual execution is doing count(*).

    Again, that is exactly what I said

    You wanted to "dispel the notion that only count(*) counts nulls".

    The code you posted does in the actual query and the logical execution flow not count nulls but string constants.

    The actual execution plan does not count nulls either, but empty rows. And it uses count(*).

    In my logic, your code reinforces, rather than dispells, that only count(*) counts nulls. But it is clear that I cannot conviince you, and you cannot convince me, so I will stop here.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Although I agree with you 100%, Hugo, I thinkyou are wasting your time (in much the same ways as I waste my time when commenting on this nonsense). People are so in love with this myth about COUNT not ignoring nulls that no amount of logic, no amount of evidence, no amount of demonstration will wean them of their faith in it. Evidently they have a religious belief in the concept or a NULL row.

    Tom

  • Straightforward, but true !

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply