Recent PostsRecent Posts Popular TopicsPopular Topics
 Home Search Members Calendar Who's On

 COUNT_BIG Rate Topic Display Mode Topic Options
Author
 Message
 Posted Tuesday, March 09, 2010 10:21 AM
 SSCommitted Group: General Forum Members Last Login: Thursday, February 27, 2014 11:17 AM Points: 1,676, Visits: 1,741
 sknox (3/9/2010)sjimmo (3/9/2010)Good question - Thought that I knew the answer, and was wrong. Now, I have a question. I ran the following against an existing database to prove/disprove the answer, and found the following:SELECT COUNT_BIG(*) FROM test_table98481SELECT COUNT_BIG(testfield2) FROM test_table98481SELECT COUNT_BIG(ALL testfield2) FROM test_table98481SELECT COUNT_BIG(DISTINCT testfield2) FROM test_table28452Based upon testing, though brief, and the answer, I would assume that the correct answer is 1, 2, and 3. The answer description states that ALL is the default, whether used or not thus selection 1 would use the default of ALL as shown in the above example, correct?No. It appears that you have no NULL values in testfield2. Here's the breakdown:(a) COUNT_BIG(*) is essentially a row count function -- it returns the number of rows returned, including those with NULL values in any column.(b) COUNT_BIG(testfield2) will return the number of rows where testfield2 is NOT NULL.(c) COUNT_BIG(ALL testfield2) is the same as the previous, as ALL is the default.(d) COUNT_BIG(DISTINCT testfield2) will return the number of unique nonnull values of testfield2So, (b) and (c) are always equal. If (a) is equal to (b) and (c), that means you have no NULLs in testfield2 in your result set. If (d) is equal to (b) and (c), that means you have no duplicate values in testfield2 in your result set. If they're all 4 equal, that means testfield2 is a column with all unique non-null values.This is an excellent explanation!. The (a) holds true even if you have a table with records having null in every column (not that you ever want to have such table ).Consider, for example:`declare @boloney table (column_1 int, column_2 int);insert into @boloneyselect 1, 1unionselect 2, nullunionselect 3, 2unionselect 4, 2unionselect 5, 3unionselect null, null;select count_big(*) option_1, count_big(column_2) option_2, count_big(all column_2) option_3, count_big(distinct column_2) option_4 from @boloney;go`The result of the above will be `opt_1 opt_2 opt_3 opt_4----- ----- ----- ----- 6 4 4 3`Even though there is a row with null in every column, the count_big(*) just like count(*) will still include it because it returns the number of rows in the table regardless of any values in any columns. This is probably because it includes the volatile rowid of the heap table below and the rowid is never null.Oleg
Post #879538
 Posted Tuesday, March 09, 2010 10:24 AM
 SSC-Insane Group: General Forum Members Last Login: Today @ 7:39 AM Points: 20,235, Visits: 13,774
 Nice explanation and test setup Oleg. Jason AKA CirqueDeSQLeilI have given a name to my pain...MCM SQL ServerSQL RNNRPosting Performance Based Questions - Gail ShawPosting Data Etiquette - Jeff ModenHidden RBAR - Jeff ModenVLFs and the Tran Log - Kimberly Tripp
Post #879544
 Posted Tuesday, March 09, 2010 10:40 AM
 SSCrazy Group: General Forum Members Last Login: Thursday, March 06, 2014 8:01 AM Points: 2,917, Visits: 2,507
 Oleg,Thanks for the explanatin. You're right - I did not have any nulls. Steve JimmoSr DBA“If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan
Post #879553
 Posted Tuesday, March 09, 2010 10:45 AM
 SSCommitted Group: General Forum Members Last Login: Thursday, February 27, 2014 11:17 AM Points: 1,676, Visits: 1,741
 CirquedeSQLeil (3/9/2010)Nice explanation and test setup Oleg.Thank you Jason. I wish the day will come when somebody will explain to me how to properly format the results output when including it in the post. Normal copy from SSMS results pane initially looks good but then all tabs and extra spaces get replaced by a single space (default html behavior). I tried to use the [code] tag, but it still does not look right The cave man attempt to cheat by typing [pre]the results here[/pre] does not work either because the pre tag does not get parsed.Oleg
Post #879557
 Posted Tuesday, March 09, 2010 10:49 AM
 SSCommitted Group: General Forum Members Last Login: Thursday, February 27, 2014 11:17 AM Points: 1,676, Visits: 1,741
 sjimmo (3/9/2010)Oleg,Thanks for the explanatin. You're right - I did not have any nulls.Looks like I am getting the credit I don't deserve . The original explanation was posted by sknox, I just used it as a quoted reply and added the snippet to it.Oleg
Post #879561
 Posted Tuesday, March 09, 2010 11:35 AM
 Right there with Babe Group: General Forum Members Last Login: Tuesday, December 07, 2010 12:55 AM Points: 771, Visits: 504
 I, also, learned something new. Thanks.
Post #879594
 Posted Tuesday, March 09, 2010 12:31 PM
 SSC Rookie Group: General Forum Members Last Login: Tuesday, February 25, 2014 1:16 PM Points: 29, Visits: 32
 Hey pal,what do you mean by volatile rowid from the heap?Check a couple of posts upstairs, i pointed out the case of a full nulled row, but wasn't sure "why" count(*) included that kind of rows.Could it be just implementation desicion?Could the count(*) function get its return value from some sort of catalog when there is no where clause, instead of actually getting into the data?
Post #879637
 Posted Tuesday, March 09, 2010 12:53 PM
 SSCommitted Group: General Forum Members Last Login: Thursday, February 27, 2014 11:17 AM Points: 1,676, Visits: 1,741
 sistemas 95572 (3/9/2010)Hey pal,what do you mean by volatile rowid from the heap?Check a couple of posts upstairs, i pointed out the case of a full nulled row, but wasn't sure "why" count(*) included that kind of rows.Could it be just implementation desicion?Could the count(*) function get its return value from some sort of catalog when there is no where clause, instead of actually getting into the data?What I meant is that even though you have a table with rows which have null in every column, every row in the table can still be uniquely identified by the engine, which actually stores the data on pages. If you have rows with null in every column then the odds are that this is a heap table, and from what I understand, the rowid is the only way for the engine to uniquely identify every record in such table.I cannot speculate whether implementation of the count(*) gets, as you say, "its return value from some sort of catalog", I just suggested that it might use the rowid (those are always unique and never null) to figure the count. In other words, I suggested that yes, it is "actually getting into the data".Sorry that I missed your earlier post, I apologize.Oleg
Post #879657
 Posted Tuesday, March 09, 2010 1:11 PM
 SSC Rookie Group: General Forum Members Last Login: Tuesday, February 25, 2014 1:16 PM Points: 29, Visits: 32
 Thank you Oleg, I am curious about uniquely identifying rows in tables w/o PKs (sorry about the off topic).I mean, as I understand from your explanation there is some "row id" that we have no access to, and it´s managed by the dbms.If that is the case, let's assume a table with 2 int cols, with no primary key, and then you store 5 times the same values, ie:create table t2 (a int, b int)insert into t2 values (1,1)insert into t2 values (1,1)insert into t2 values (1,1)insert into t2 values (1,1)How can you actually uniquely identify one of those rows let's say if i want to delete only one of those?I've tryed doing it with enterprise manager, but got an error such as "wrong primary key or insufficcient column information".Sorry about the weird case, i'm a student just taking DB course :p
Post #879671
 Posted Tuesday, March 09, 2010 1:32 PM
 SSCommitted Group: General Forum Members Last Login: Thursday, February 27, 2014 11:17 AM Points: 1,676, Visits: 1,741
 sistemas 95572 (3/9/2010)Thank you Oleg, I am curious about uniquely identifying rows in tables w/o PKs (sorry about the off topic).I mean, as I understand from your explanation there is some "row id" that we have no access to, and it´s managed by the dbms.If that is the case, let's assume a table with 2 int cols, with no primary key, and then you store 5 times the same values, ie:create table t2 (a int, b int)insert into t2 values (1,1)insert into t2 values (1,1)insert into t2 values (1,1)insert into t2 values (1,1)How can you actually uniquely identify one of those rows let's say if i want to delete only one of those?I've tryed doing it with enterprise manager, but got an error such as "wrong primary key or insufficcient column information".Sorry about the weird case, i'm a student just taking DB course :pActually, you have 4 records in t2, but lets say there are 5 of them. Suppose you want to delete only one of those records while leaving other 4 intact. Since as lesser mortals, we do not have access to rowid, we cannot just issue the statement to, for example, delete THE_THIRD_ROW. On the other hand, if all we care is to delete one of the existing rows sparing other 4 then we could not care less which one we actually delete, because they all look the same to us. Therefore, we can do something like this (assuming that you have SQL Server 2005 or better):`with cte (RecordNumber, a, b) as( select row_number() over (order by a) RecordNumber, * from t2) delete from cte where RecordNumber = 3;`This will delete the row which we perceive as number 3, still giving us the result we need (One row deleted, other 4 are spared).Usually, the problems like this (delete the dups) arise when there is a need to get rid of all the duplicates in the existing heap table. Lets expand the question like this: I will add 3 more records into t2, 2 records with a = 2 and 1 record with a = 3. `insert into t2 values (2,1);insert into t2 values (2,1);insert into t2 values (3,1);`This will yield t2 to have 5 records with a = 1, 2 records with a = 2 and 1 record with a = 3. Out of those, the last record is already clean but all records before that have dups. Suppose we want to get rid of all dups while preserving all unique rows. In other words, the end result is expected to have t2 with one record with a = 1, 1 record with a = 2, and 1 record with a = 3. The statement to do this can be like this:`with cte (PartitionedNumber, a, b) as( select row_number() over (partition by a order by a) PartitionedNumber, * from t2) delete from cte where PartitionedNumber > 1;`Now selecting * from t2 will return just 3 leftover unique records.Oleg
Post #879694

 Permissions