The COUNT Function in T-SQL

  • Any thoughts/opinions on sp_spaceused?? I often use that ad hoc to get an idea of table size.

  • Great work Igor, 5 stars!. You did a great job grouping (pun not intended :-P) together the most important aspects of COUNT. Fantastic article and a great resource to point people to.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • patrickmcginnis59 10839 (7/12/2016)


    Igor Micev (7/12/2016)


    patrickmcginnis59 10839 (7/12/2016)


    An ORDER BY clause can be used explicitly in the end as well. In that case the COUNT is not deterministic. See Deterministic and Nondeterministic Functions for more.

    but the page you referenced has

    For example, specifying an ORDER BY clause in a query does not change the determinism of a function that used in that query.

    Just wondering about your description of deterministic, it doesn't seem to make sense.

    edit:

    another snippet from the web page might shed some light on this:

    Functions that call extended stored procedures are nondeterministic, because the extended stored procedures can cause side effects on the database.

    I think a simple (maybe naïve) definition of "deterministic" could be that given the same input, the function will return the same output, but your writeup doesn't even match this. Heck strictly speaking, count(<whatever>) isn't even behaving as a function and I wouldn't even call it one. Maybe call it an aggregate or something catchy like that.

    Please read carefully the reference. The aggregate functions are not deterministic when used with Over and Order By.

    Yes, I do see it now and I stand corrected. I still don't see why, but the pages back you up.

    edit: while this page doesn't address count, it does talk about a few different definitions of "deterministic".

    http://dba.stackexchange.com/questions/77639/are-rank-and-dense-rank-deterministic-or-non-deterministic

    I like this link and especially Paul White's response.

    With respect to Books Online, however, I would encourage caution. Unfortunately I think there's bogus (or, at best, highly ambiguous) info in BOL from time to time. ESPECIALLY where determinism is concerned. I have seen over the years NTILE, for example, described as deterministic and non-deterministic in BOL.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (7/12/2016)


    patrickmcginnis59 10839 (7/12/2016)


    Igor Micev (7/12/2016)


    patrickmcginnis59 10839 (7/12/2016)


    An ORDER BY clause can be used explicitly in the end as well. In that case the COUNT is not deterministic. See Deterministic and Nondeterministic Functions for more.

    but the page you referenced has

    For example, specifying an ORDER BY clause in a query does not change the determinism of a function that used in that query.

    Just wondering about your description of deterministic, it doesn't seem to make sense.

    edit:

    another snippet from the web page might shed some light on this:

    Functions that call extended stored procedures are nondeterministic, because the extended stored procedures can cause side effects on the database.

    I think a simple (maybe naïve) definition of "deterministic" could be that given the same input, the function will return the same output, but your writeup doesn't even match this. Heck strictly speaking, count(<whatever>) isn't even behaving as a function and I wouldn't even call it one. Maybe call it an aggregate or something catchy like that.

    Please read carefully the reference. The aggregate functions are not deterministic when used with Over and Order By.

    Yes, I do see it now and I stand corrected. I still don't see why, but the pages back you up.

    edit: while this page doesn't address count, it does talk about a few different definitions of "deterministic".

    http://dba.stackexchange.com/questions/77639/are-rank-and-dense-rank-deterministic-or-non-deterministic

    I like this link and especially Paul White's response.

    With respect to Books Online, however, I would encourage caution. Unfortunately I think there's bogus (or, at best, highly ambiguous) info in BOL from time to time. ESPECIALLY where determinism is concerned. I have seen over the years NTILE, for example, described as deterministic and non-deterministic in BOL.

    I see. I'll consider these remarks for the update of the article. I didn't use ORDER BY in the tests, but just took a look at it in the BOL and wrote that sentence.

    Thanks again.

    Igor Micev,My blog: www.igormicev.com

  • FWIW I dug into this a while back. For any query of the form:

    select count(<some constant>) from sometable

    the execution plan, on the Stream Aggregate operator, shows an expression aggregate value (scalar string) of "Count(*)"

    To me at least this indicates that the compiler ignores the actual constant and generates the same plan whatever the constant is.

    YMMV

    PS: you can fool people into thinking NULLs get counted like this:

    select count('NULL') from sometable

    (Note that 'NULL' is a string)

    Gerald Britton, Pluralsight courses

  • You state in the article that count(*) and count(1) perform the same but yet in the timings you give in the article they are not the same count(1) was faster. Maybe only by around 100ms but that is not an insignificant about and every little helps.

    It would have been nice to see the test performed 5 - 10 times in a row and either list those times or the average to back-up the claim. If the average had of been the same or within 10ms then maybe "the same" could be taken as fact.

  • peter.row - Friday, February 9, 2018 2:22 AM

    You state in the article that count(*) and count(1) perform the same but yet in the timings you give in the article they are not the same count(1) was faster. Maybe only by around 100ms but that is not an insignificant about and every little helps.

    It would have been nice to see the test performed 5 - 10 times in a row and either list those times or the average to back-up the claim. If the average had of been the same or within 10ms then maybe "the same" could be taken as fact.

    You will never get count(1) and count(*) the same in cases when you're testing on big tables (for millions of rows) , but in this case those 100 ms are insignificant.
    I noticed that little difference, and made a couple of tests on those queries. Any time the results were that very close, sometimes count(1) or count(*) faster for about 100 ms or less. However, it could depend on the current situation. In any case it was a tiny difference and compared to the times the queries executed, it's rally s tiny difference, insignificant.

    Igor Micev,My blog: www.igormicev.com

  • Heh, I'd strongly recommend taking a look at the query plans for the same query with COUNT(*)  and COUNT(1). Specifically, take a look at the aggregate operator to see the AggType and Scalar Operator being computed.

    You may be surprised (well, you shouldn't be, since Gerald Britton mentioned this 2 years ago, but just a few posts up).

    Ok, spoiler alert, the optimizer just turns COUNT(1) into a COUNT(*), so any differences measured are likely just noise or an unintended consequence of the specific testing methodology used. 

    I just tested 1000 executions each of a COUNT(*) vs COUNT(1) on a 1 million row heap, and the difference in execution time was 120 ms across all 1000 runs (i.e., .12 ms per execution, an amount I'm not confident can actually be reliably measured, so likely just noise).

    Repeating the test a couple times gave similar differences each run (and not always with the same syntax being "faster", sometimes it was COUNT(*), sometimes it was COUNT(1)).

    It's always possible there's more going on than the execution plan and test results show, but color me skeptical that there's any difference between the two until I get some really, really fantastically compelling evidence 🙂

    Cheers!

  • Jacob Wilkins - Friday, February 9, 2018 5:27 PM

    Heh, I'd strongly recommend taking a look at the query plans for the same query with COUNT(*)  and COUNT(1). Specifically, take a look at the aggregate operator to see the AggType and Scalar Operator being computed.

    You may be surprised (well, you shouldn't be, since Gerald Britton mentioned this 2 years ago, but just a few posts up).

    Ok, spoiler alert, the optimizer just turns COUNT(1) into a COUNT(*), so any differences measured are likely just noise or an unintended consequence of the specific testing methodology used. 

    I just tested 1000 executions each of a COUNT(*) vs COUNT(1) on a 1 million row heap, and the difference in execution time was 120 ms across all 1000 runs (i.e., .12 ms per execution, an amount I'm not confident can actually be reliably measured, so likely just noise).

    Repeating the test a couple times gave similar differences each run (and not always with the same syntax being "faster", sometimes it was COUNT(*), sometimes it was COUNT(1)).

    It's always possible there's more going on than the execution plan and test results show, but color me skeptical that there's any difference between the two until I get some really, really fantastically compelling evidence 🙂

    Cheers!

    Thanks for that Jacob. Explaining the difference rather than saying they are the same with 1 test is always a good idea.

  • You could have also included the over clause in those count examples, e.g.

    SELECT DISTINCT Column1,COUNT(*) OVER (PARTITION BY Column1 ORDER BY Column1) FROM #TmpCounts;
    SELECT DISTINCT Column1,COUNT(*) OVER (PARTITION BY Column1 ORDER BY Column1 ROWS UNBOUNDED PRECEDING) FROM #TmpCounts;
    SELECT DISTINCT Column1,COUNT(*) OVER (PARTITION BY Column1 ORDER BY Column1 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM #TmpCounts;

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply