COUNT_BIG

  • count_big(*) will count every row in table, including rows with null values on every column, probably because it gets the amount of rows out of catalog?

    Try this:

    declare @t table(i int, x int)

    insert into @t select 1, null union select 3, 1 union select 2, 1 union select null, null

    select * from @t

    select count_big(*) from @t

    select count_big(x) from @t

    count_big(*) returns 4, and count_big(x) returns 2.

  • I was confused enough by the explanation that I had to try this myself. Two statements in the explanation are contradictory (emphasis mine):

    ... COUNT_BIG(all column_2) will count all rows from the table (including duplicate & null rows). COUNT_BIG(DISTINCT column_2) will evaluate distinct non-null values.

    ALL is the default and hence COUNT_BIG(column 2) is same as COUNT_BIG(ALL column_2). This evaluates expression for each row in a group and returns the number of nonnull values. ...

    II and III are correct, by the way.

    Ed Leighton-Dick | Consultant | Microsoft Data Platform MVP | MCSE | PASS Regional Mentor

  • sjimmo (3/9/2010)


    Good question - Thought that I knew the answer, and was wrong. Now, I have a question. I ran the following against an existing database to prove/disprove the answer, and found the following:

    SELECT COUNT_BIG(*) FROM test_table

    98481

    SELECT COUNT_BIG(testfield2) FROM test_table

    98481

    SELECT COUNT_BIG(ALL testfield2) FROM test_table

    98481

    SELECT COUNT_BIG(DISTINCT testfield2) FROM test_table

    28452

    Based upon testing, though brief, and the answer, I would assume that the correct answer is 1, 2, and 3. The answer description states that ALL is the default, whether used or not thus selection 1 would use the default of ALL as shown in the above example, correct?

    No. It appears that you have no NULL values in testfield2. Here's the breakdown:

    (a) COUNT_BIG(*) is essentially a row count function -- it returns the number of rows returned, including those with NULL values in any column.

    (b) COUNT_BIG(testfield2) will return the number of rows where testfield2 is NOT NULL.

    (c) COUNT_BIG(ALL testfield2) is the same as the previous, as ALL is the default.

    (d) COUNT_BIG(DISTINCT testfield2) will return the number of unique nonnull values of testfield2

    So, (b) and (c) are always equal. If (a) is equal to (b) and (c), that means you have no NULLs in testfield2 in your result set. If (d) is equal to (b) and (c), that means you have no duplicate values in testfield2 in your result set. If they're all 4 equal, that means testfield2 is a column with all unique non-null values.

  • r5d4 (3/9/2010)


    the answer is I and II , not II and III.

    does anyone audit these questions? i want my point (joke) 🙂

    Me too, me too...

    It seems that there are more than 2 with the right answer if II and III are correct, as I is then correct too??

    Also, the question is a bit vague:From following data, which two queries will return identical results? (select all that apply)

    Do I select only 2 answers or all that apply?

    🙂

    anyone, enlighten me as why the first answer is not the same as the 3rd answer? (just seen the answer in the reply above this, OK i don't get the points)

  • I believe that this is what the explanation should have been.

    The answers are: SELECT COUNT_BIG(ALL COlumn_2) and SELECT COUNT_BIG(column_2). COUNT_BIG(*) will count all rows from the table (including duplicate & null rows). COUNT_BIG(DISTINCT column_2) will evaluate distinct non-null values.

    ALL is the default and hence COUNT_BIG(column 2) is same as COUNT_BIG(ALL column_2). This evaluates expression for each row in a group and returns the number of nonnull values.

    COUNT_BIG is works like COUNT, only difference is that COUNT_BIG returns bigint data type.

  • sknox (3/9/2010)


    sjimmo (3/9/2010)


    Good question - Thought that I knew the answer, and was wrong. Now, I have a question. I ran the following against an existing database to prove/disprove the answer, and found the following:

    SELECT COUNT_BIG(*) FROM test_table

    98481

    SELECT COUNT_BIG(testfield2) FROM test_table

    98481

    SELECT COUNT_BIG(ALL testfield2) FROM test_table

    98481

    SELECT COUNT_BIG(DISTINCT testfield2) FROM test_table

    28452

    Based upon testing, though brief, and the answer, I would assume that the correct answer is 1, 2, and 3. The answer description states that ALL is the default, whether used or not thus selection 1 would use the default of ALL as shown in the above example, correct?

    No. It appears that you have no NULL values in testfield2. Here's the breakdown:

    (a) COUNT_BIG(*) is essentially a row count function -- it returns the number of rows returned, including those with NULL values in any column.

    (b) COUNT_BIG(testfield2) will return the number of rows where testfield2 is NOT NULL.

    (c) COUNT_BIG(ALL testfield2) is the same as the previous, as ALL is the default.

    (d) COUNT_BIG(DISTINCT testfield2) will return the number of unique nonnull values of testfield2

    So, (b) and (c) are always equal. If (a) is equal to (b) and (c), that means you have no NULLs in testfield2 in your result set. If (d) is equal to (b) and (c), that means you have no duplicate values in testfield2 in your result set. If they're all 4 equal, that means testfield2 is a column with all unique non-null values.

    This is an excellent explanation!. The (a) holds true even if you have a table with records having null in every column (not that you ever want to have such table :-)).

    Consider, for example:

    declare @boloney table (column_1 int, column_2 int);

    insert into @boloney

    select 1, 1

    union

    select 2, null

    union

    select 3, 2

    union

    select 4, 2

    union

    select 5, 3

    union

    select null, null;

    select

    count_big(*) option_1,

    count_big(column_2) option_2,

    count_big(all column_2) option_3,

    count_big(distinct column_2) option_4

    from @boloney;

    go

    The result of the above will be

    opt_1 opt_2 opt_3 opt_4

    ----- ----- ----- -----

    6 4 4 3

    Even though there is a row with null in every column, the count_big(*) just like count(*) will still include it because it returns the number of rows in the table regardless of any values in any columns. This is probably because it includes the volatile rowid of the heap table below and the rowid is never null.

    Oleg

  • Nice explanation and test setup Oleg.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Oleg,

    Thanks for the explanatin. You're right - I did not have any nulls.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • CirquedeSQLeil (3/9/2010)


    Nice explanation and test setup Oleg.

    Thank you Jason.

    I wish the day will come when somebody will explain to me how to properly format the results output when including it in the post. Normal copy from SSMS results pane initially looks good but then all tabs and extra spaces get replaced by a single space (default html behavior). I tried to use the tag, but it still does not look right :hehe: The cave man attempt to cheat by typing

    the results here

    does not work either because the pre tag does not get parsed.

    Oleg

  • sjimmo (3/9/2010)


    Oleg,

    Thanks for the explanatin. You're right - I did not have any nulls.

    Looks like I am getting the credit I don't deserve :w00t:. The original explanation was posted by sknox, I just used it as a quoted reply and added the snippet to it.

    Oleg

  • I, also, learned something new. Thanks.

  • Hey pal,

    what do you mean by volatile rowid from the heap?

    Check a couple of posts upstairs, i pointed out the case of a full nulled row, but wasn't sure "why" count(*) included that kind of rows.

    Could it be just implementation desicion?

    Could the count(*) function get its return value from some sort of catalog when there is no where clause, instead of actually getting into the data?

  • sistemas 95572 (3/9/2010)


    Hey pal,

    what do you mean by volatile rowid from the heap?

    Check a couple of posts upstairs, i pointed out the case of a full nulled row, but wasn't sure "why" count(*) included that kind of rows.

    Could it be just implementation desicion?

    Could the count(*) function get its return value from some sort of catalog when there is no where clause, instead of actually getting into the data?

    What I meant is that even though you have a table with rows which have null in every column, every row in the table can still be uniquely identified by the engine, which actually stores the data on pages. If you have rows with null in every column then the odds are that this is a heap table, and from what I understand, the rowid is the only way for the engine to uniquely identify every record in such table.

    I cannot speculate whether implementation of the count(*) gets, as you say, "its return value from some sort of catalog", I just suggested that it might use the rowid (those are always unique and never null) to figure the count. In other words, I suggested that yes, it is "actually getting into the data".

    Sorry that I missed your earlier post, I apologize.

    Oleg

  • Thank you Oleg, I am curious about uniquely identifying rows in tables w/o PKs (sorry about the off topic).

    I mean, as I understand from your explanation there is some "row id" that we have no access to, and it´s managed by the dbms.

    If that is the case, let's assume a table with 2 int cols, with no primary key, and then you store 5 times the same values, ie:

    create table t2 (a int, b int)

    insert into t2 values (1,1)

    insert into t2 values (1,1)

    insert into t2 values (1,1)

    insert into t2 values (1,1)

    How can you actually uniquely identify one of those rows let's say if i want to delete only one of those?

    I've tryed doing it with enterprise manager, but got an error such as "wrong primary key or insufficcient column information".

    Sorry about the weird case, i'm a student just taking DB course :p

  • sistemas 95572 (3/9/2010)


    Thank you Oleg, I am curious about uniquely identifying rows in tables w/o PKs (sorry about the off topic).

    I mean, as I understand from your explanation there is some "row id" that we have no access to, and it´s managed by the dbms.

    If that is the case, let's assume a table with 2 int cols, with no primary key, and then you store 5 times the same values, ie:

    create table t2 (a int, b int)

    insert into t2 values (1,1)

    insert into t2 values (1,1)

    insert into t2 values (1,1)

    insert into t2 values (1,1)

    How can you actually uniquely identify one of those rows let's say if i want to delete only one of those?

    I've tryed doing it with enterprise manager, but got an error such as "wrong primary key or insufficcient column information".

    Sorry about the weird case, i'm a student just taking DB course :p

    Actually, you have 4 records in t2, but lets say there are 5 of them. Suppose you want to delete only one of those records while leaving other 4 intact. Since as lesser mortals, we do not have access to rowid, we cannot just issue the statement to, for example, delete THE_THIRD_ROW. On the other hand, if all we care is to delete one of the existing rows sparing other 4 then we could not care less which one we actually delete, because they all look the same to us. Therefore, we can do something like this (assuming that you have SQL Server 2005 or better):

    with cte (RecordNumber, a, b) as

    (

    select

    row_number() over (order by a) RecordNumber, *

    from t2

    )

    delete from cte where RecordNumber = 3;

    This will delete the row which we perceive as number 3, still giving us the result we need (One row deleted, other 4 are spared).

    Usually, the problems like this (delete the dups) arise when there is a need to get rid of all the duplicates in the existing heap table. Lets expand the question like this: I will add 3 more records into t2, 2 records with a = 2 and 1 record with a = 3.

    insert into t2 values (2,1);

    insert into t2 values (2,1);

    insert into t2 values (3,1);

    This will yield t2 to have 5 records with a = 1, 2 records with a = 2 and 1 record with a = 3. Out of those, the last record is already clean but all records before that have dups. Suppose we want to get rid of all dups while preserving all unique rows. In other words, the end result is expected to have t2 with one record with a = 1, 1 record with a = 2, and 1 record with a = 3. The statement to do this can be like this:

    with cte (PartitionedNumber, a, b) as

    (

    select

    row_number() over (partition by a order by a) PartitionedNumber, *

    from t2

    )

    delete from cte where PartitionedNumber > 1;

    Now selecting * from t2 will return just 3 leftover unique records.

    Oleg

Viewing 15 posts - 16 through 30 (of 45 total)

You must be logged in to reply to this topic. Login to reply