Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««12345»»»

COUNT_BIG Expand / Collapse
Author
Message
Posted Tuesday, March 9, 2010 10:21 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, September 17, 2014 9:21 AM
Points: 1,676, Visits: 1,757
sknox (3/9/2010)
sjimmo (3/9/2010)
Good question - Thought that I knew the answer, and was wrong. Now, I have a question. I ran the following against an existing database to prove/disprove the answer, and found the following:

SELECT COUNT_BIG(*) FROM test_table
98481

SELECT COUNT_BIG(testfield2) FROM test_table
98481

SELECT COUNT_BIG(ALL testfield2) FROM test_table
98481

SELECT COUNT_BIG(DISTINCT testfield2) FROM test_table
28452

Based upon testing, though brief, and the answer, I would assume that the correct answer is 1, 2, and 3. The answer description states that ALL is the default, whether used or not thus selection 1 would use the default of ALL as shown in the above example, correct?


No. It appears that you have no NULL values in testfield2. Here's the breakdown:

(a) COUNT_BIG(*) is essentially a row count function -- it returns the number of rows returned, including those with NULL values in any column.
(b) COUNT_BIG(testfield2) will return the number of rows where testfield2 is NOT NULL.
(c) COUNT_BIG(ALL testfield2) is the same as the previous, as ALL is the default.
(d) COUNT_BIG(DISTINCT testfield2) will return the number of unique nonnull values of testfield2

So, (b) and (c) are always equal. If (a) is equal to (b) and (c), that means you have no NULLs in testfield2 in your result set. If (d) is equal to (b) and (c), that means you have no duplicate values in testfield2 in your result set. If they're all 4 equal, that means testfield2 is a column with all unique non-null values.


This is an excellent explanation!. The (a) holds true even if you have a table with records having null in every column (not that you ever want to have such table ).

Consider, for example:

declare @boloney table (column_1 int, column_2 int);

insert into @boloney
select 1, 1
union
select 2, null
union
select 3, 2
union
select 4, 2
union
select 5, 3
union
select null, null;

select
count_big(*) option_1,
count_big(column_2) option_2,
count_big(all column_2) option_3,
count_big(distinct column_2) option_4
from @boloney;
go

The result of the above will be

opt_1   opt_2   opt_3   opt_4
----- ----- ----- -----
6 4 4 3


Even though there is a row with null in every column, the count_big(*) just like count(*) will still include it because it returns the number of rows in the table regardless of any values in any columns. This is probably because it includes the volatile rowid of the heap table below and the rowid is never null.

Oleg
Post #879538
Posted Tuesday, March 9, 2010 10:24 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Today @ 1:14 PM
Points: 17,807, Visits: 15,725
Nice explanation and test setup Oleg.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server, MVP


SQL RNNR

Posting Performance Based Questions - Gail Shaw
Post #879544
Posted Tuesday, March 9, 2010 10:40 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 10:32 AM
Points: 2,917, Visits: 2,528
Oleg,
Thanks for the explanatin. You're right - I did not have any nulls.


Steve Jimmo
Sr DBA
“If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan
Post #879553
Posted Tuesday, March 9, 2010 10:45 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, September 17, 2014 9:21 AM
Points: 1,676, Visits: 1,757
CirquedeSQLeil (3/9/2010)
Nice explanation and test setup Oleg.

Thank you Jason.

I wish the day will come when somebody will explain to me how to properly format the results output when including it in the post. Normal copy from SSMS results pane initially looks good but then all tabs and extra spaces get replaced by a single space (default html behavior). I tried to use the [code] tag, but it still does not look right The cave man attempt to cheat by typing [pre]the results here[/pre] does not work either because the pre tag does not get parsed.

Oleg
Post #879557
Posted Tuesday, March 9, 2010 10:49 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, September 17, 2014 9:21 AM
Points: 1,676, Visits: 1,757
sjimmo (3/9/2010)
Oleg,
Thanks for the explanatin. You're right - I did not have any nulls.

Looks like I am getting the credit I don't deserve . The original explanation was posted by sknox, I just used it as a quoted reply and added the snippet to it.

Oleg
Post #879561
Posted Tuesday, March 9, 2010 11:35 AM
Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Tuesday, December 7, 2010 12:55 AM
Points: 771, Visits: 504
I, also, learned something new. Thanks.
Post #879594
Posted Tuesday, March 9, 2010 12:31 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, February 25, 2014 1:16 PM
Points: 29, Visits: 32
Hey pal,

what do you mean by volatile rowid from the heap?

Check a couple of posts upstairs, i pointed out the case of a full nulled row, but wasn't sure "why" count(*) included that kind of rows.

Could it be just implementation desicion?
Could the count(*) function get its return value from some sort of catalog when there is no where clause, instead of actually getting into the data?




Post #879637
Posted Tuesday, March 9, 2010 12:53 PM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, September 17, 2014 9:21 AM
Points: 1,676, Visits: 1,757
sistemas 95572 (3/9/2010)
Hey pal,

what do you mean by volatile rowid from the heap?

Check a couple of posts upstairs, i pointed out the case of a full nulled row, but wasn't sure "why" count(*) included that kind of rows.

Could it be just implementation desicion?
Could the count(*) function get its return value from some sort of catalog when there is no where clause, instead of actually getting into the data?





What I meant is that even though you have a table with rows which have null in every column, every row in the table can still be uniquely identified by the engine, which actually stores the data on pages. If you have rows with null in every column then the odds are that this is a heap table, and from what I understand, the rowid is the only way for the engine to uniquely identify every record in such table.

I cannot speculate whether implementation of the count(*) gets, as you say, "its return value from some sort of catalog", I just suggested that it might use the rowid (those are always unique and never null) to figure the count. In other words, I suggested that yes, it is "actually getting into the data".

Sorry that I missed your earlier post, I apologize.

Oleg
Post #879657
Posted Tuesday, March 9, 2010 1:11 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, February 25, 2014 1:16 PM
Points: 29, Visits: 32

Thank you Oleg, I am curious about uniquely identifying rows in tables w/o PKs (sorry about the off topic).

I mean, as I understand from your explanation there is some "row id" that we have no access to, and it´s managed by the dbms.

If that is the case, let's assume a table with 2 int cols, with no primary key, and then you store 5 times the same values, ie:

create table t2 (a int, b int)
insert into t2 values (1,1)
insert into t2 values (1,1)
insert into t2 values (1,1)
insert into t2 values (1,1)

How can you actually uniquely identify one of those rows let's say if i want to delete only one of those?
I've tryed doing it with enterprise manager, but got an error such as "wrong primary key or insufficcient column information".

Sorry about the weird case, i'm a student just taking DB course :p
Post #879671
Posted Tuesday, March 9, 2010 1:32 PM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Wednesday, September 17, 2014 9:21 AM
Points: 1,676, Visits: 1,757
sistemas 95572 (3/9/2010)

Thank you Oleg, I am curious about uniquely identifying rows in tables w/o PKs (sorry about the off topic).

I mean, as I understand from your explanation there is some "row id" that we have no access to, and it´s managed by the dbms.

If that is the case, let's assume a table with 2 int cols, with no primary key, and then you store 5 times the same values, ie:

create table t2 (a int, b int)
insert into t2 values (1,1)
insert into t2 values (1,1)
insert into t2 values (1,1)
insert into t2 values (1,1)

How can you actually uniquely identify one of those rows let's say if i want to delete only one of those?
I've tryed doing it with enterprise manager, but got an error such as "wrong primary key or insufficcient column information".

Sorry about the weird case, i'm a student just taking DB course :p


Actually, you have 4 records in t2, but lets say there are 5 of them. Suppose you want to delete only one of those records while leaving other 4 intact. Since as lesser mortals, we do not have access to rowid, we cannot just issue the statement to, for example, delete THE_THIRD_ROW. On the other hand, if all we care is to delete one of the existing rows sparing other 4 then we could not care less which one we actually delete, because they all look the same to us. Therefore, we can do something like this (assuming that you have SQL Server 2005 or better):

with cte (RecordNumber, a, b) as
(
select
row_number() over (order by a) RecordNumber, *
from t2
)
delete from cte where RecordNumber = 3;

This will delete the row which we perceive as number 3, still giving us the result we need (One row deleted, other 4 are spared).

Usually, the problems like this (delete the dups) arise when there is a need to get rid of all the duplicates in the existing heap table. Lets expand the question like this: I will add 3 more records into t2, 2 records with a = 2 and 1 record with a = 3.

insert into t2 values (2,1);
insert into t2 values (2,1);
insert into t2 values (3,1);

This will yield t2 to have 5 records with a = 1, 2 records with a = 2 and 1 record with a = 3. Out of those, the last record is already clean but all records before that have dups. Suppose we want to get rid of all dups while preserving all unique rows. In other words, the end result is expected to have t2 with one record with a = 1, 1 record with a = 2, and 1 record with a = 3. The statement to do this can be like this:

with cte (PartitionedNumber, a, b) as
(
select
row_number() over (partition by a order by a) PartitionedNumber, *
from t2
)
delete from cte where PartitionedNumber > 1;

Now selecting * from t2 will return just 3 leftover unique records.

Oleg
Post #879694
« Prev Topic | Next Topic »

Add to briefcase «««12345»»»

Permissions Expand / Collapse