JOIN for beginners

Question

JOIN for beginners

Edvard Korsbæk

SSCertifiable

Points: 5991
More actions
November 12, 2012 at 7:10 am

#266854

Hi all!
I have two tables:
Dutyrostershift, which as a field called id, which is defined as an int with identity(1,1) and is t5he primary field
TimeAccountMovement, which has a field called ownerid, which is equal to id in Dutyrostershift.
If i take
select count(a.id) from dbo.dutyrostershift A
the result is 83459
If i replace it with
select count(*) from dbo.dutyrostershift A join dbo.timeaccountmovement B ON b.ownerid = a.id
i get the result 140183
the number of dutyrostershifts has not changed, and I want a join, giving me the original number, i.e. 83459
When i look at the doc for LEFT JOIN, RIGHT JOIN and FULL JOIN, I se none, that gives me the result i want to have.
Is there a way?
Best regards
Edvard Korsbæk

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Roland Alexander STL SSCrazy Points: 2915 More actions · Answer 1

DDL for the tables would be helpful, along with sample data. Based on what you've provided, there is clearly a one-to-many relationship between the tables, so the question becomes, if you're looking for a count from the first table, and nothing more, why use a JOIN at all?

Roland Alexander
The Monday Morning DBA
There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

Koen Verbeeck SSC Guru Points: 259085 More actions · Answer 2

Roland is right, there's probably a one-to-many relationship, resulting in possible multiple results for one Dutyrostershift.ID.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

GSquared SSC Guru Points: 260824 More actions · Answer 3

You might try "count (distinct a.id)" and see if that does what you need. That will count distinct values, instead of total rows.

But it's normal for a join from a parent table to a subtable to result in multiplication of the rows in the parent table. That's expected behavior.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Edvard Korsbæk SSCertifiable Points: 5991 More actions · Answer 4

DDL for the tables would be helpful

What does DDL mean?

There is a one to many - I expect 1 to 5 timeaccountmovements for each dutyrostershift

The definations are:

CREATE TABLE [dbo].[dutyrostershift](

[dato] [datetime] NULL,

[std] [tinyint] NULL,

[specialvagt] [tinyint] NULL,

[daekbemand] [tinyint] NULL,

[extratimer] [int] NULL,

[overarbtimer] [int] NULL,

[manuel] [tinyint] NULL,

[beskyttet] [tinyint] NULL,

[confirmed] [tinyint] NULL,

[vacationtype] [varchar](50) NULL,

[breakswish] [tinyint] NULL,

[dutyrosterid] [int] NULL,

[employeeid] [int] NULL,

[employeegroupid] [int] NULL,

[childforcaredayid] [int] NULL,

[originatingstaffingrequirementid] [int] NULL,

[shifttype] [int] NULL,

[fromtime] [int] NULL,

[totime] [int] NULL,

[id] [int] IDENTITY(1,1) NOT NULL,

[leavetype_id] [int] NULL,

[LoginID] [int] NULL,

[StatusNo] [int] NULL,

[Time_Stamp] [datetime] NULL,

[Comment] [char](120) NULL,

[Is_Free_sat] [tinyint] NULL,

[Is_Center_Opening] [tinyint] NULL,

[is_fo_day] [tinyint] NULL,

CONSTRAINT [pk_dbo_pk_dutyrostershift] PRIMARY KEY CLUSTERED

(

[id] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

and

CREATE TABLE [dbo].[timeaccountmovement](

[timeaccountid] [int] NULL,

[ownerid] [int] NULL,

[ownertype] [int] NULL,

[starttime] [int] NULL,

[days] [int] NULL,

[endtime] [int] NULL,

[minutes] [int] NULL,

[duration] [varchar](20) NULL,

[id] [int] IDENTITY(1,1) NOT NULL,

[do_not_recalculate] [tinyint] NULL,

[DATO] [datetime] NULL,

CONSTRAINT [pk_dbo_pk_timeaccountmovement] PRIMARY KEY CLUSTERED

(

[id] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

Edvard Korsbæk SSCertifiable Points: 5991 More actions · Answer 5

That's expected behavior - OK, but i wonder why it is so...

DISTINCT() did the trick

Thanks

Edvard Korsbæk

Roland Alexander STL SSCrazy Points: 2915 More actions · Answer 6

DDL = Data Definition Language = just what you posted: the CREATE TABLE statements.

Since there is a one-to-many relationship, again, why are you doing a JOIN to derive a count? If you needed some value from the child table, what is it? If you don't need a value from the child table to group the count by, why JOIN at all?

Roland Alexander
The Monday Morning DBA
There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

Edvard Korsbæk SSCertifiable Points: 5991 More actions · Answer 7

Thanks for your replies!

I did not understand the expected behaviour of the JOIN - It still seems a bit weird to me.

What i want to know somethhing about is the number of Dutyrostershifts, where the timeaccountid is in the range from 3 to 5 in the timeaccountmovement

But as the count changed allready when i made the JOIN, i got a bit surprised.

Best regards

Edvard Korsbæk

Roland Alexander STL SSCrazy Points: 2915 More actions · Answer 8

Ah, now I understand. Then DISTINCT will be your best bet, as pointed out by GSquared, and as you've already seen for yourself.

Roland Alexander
The Monday Morning DBA
There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

GSquared SSC Guru Points: 260824 More actions · Answer 9

Count() just gets the number of rows, by default.

Here's a sample of how it works. Run this in a test environment so you can see what I'm talking about:

IF OBJECT_ID(N'tempdb..#T1') IS NOT NULL

DROP TABLE #T1;

IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL

DROP TABLE #T2;

CREATE TABLE #T1 (ID INT PRIMARY KEY);

CREATE TABLE #T2

(T1ID INT NOT NULL,

Col1 VARCHAR(10));

INSERT INTO #T1

(ID)

VALUES (1),

(2),

(3),

(4);

INSERT INTO #T2

(T1ID, Col1)

VALUES (1, 'A'),

(1, 'B'),

(2, 'A'),

(2, 'B'),

(2, 'C'),

(3, 'A');

SELECT *

FROM #T1;

SELECT COUNT(*)

FROM #T1;

SELECT *

FROM #T1

INNER JOIN #T2

ON #T1.ID = #T2.T1ID;

SELECT COUNT(#T1.ID)

FROM #T1

INNER JOIN #T2

ON #T1.ID = #T2.T1ID;

Now, you can easily see that #T1 has 4 rows. It inserts exactly 4, and it then selects them and selects the count on them.

But, in #T2, T1ID 1 has 2 matches, 2 has 3 matches, 3 has 1 match, and 4 has 0 matches. So, when we join them, we end up with 6 rows (2+3+1+0 = 6). Count() thus returns 6. Add the distinct keyword to the final query, COUNT(DISTINCT #T1.ID), and you get 3 as the result, because the Inner Join eliminates value 4 from the query, because it has no matches in the subtable.

Joins match rows in one table (or other dataset), to rows in another. If one of them has more rows than the other, which is normally the case, then you get more total rows, and Count() will return the increased (or decreased, in the sample of ID value = 4) number of rows.

Looking at the just plain SELECT * FROM, without the Count() function in it, should make it more clear.

Does that help?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

edvard 19773 (11/12/2012)
That's expected behavior - OK, but i wonder why it is so...
DISTINCT() did the trick

Hold on a sec...

Distinct is not a function, it takes no parameters and gets no brackets. It's a keyword that tells SQL to remove duplicate rows, that's all. Distinct within a count tells SQL to remove duplicate rows before it counts the rows.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Edvard Korsbæk SSCertifiable Points: 5991 More actions · Answer 11

ok!

Select count(distinct(a.id))

DISTINCT() was a shorthand here.

Best regards

Edvard Korsbæk

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 12

Distinct is not a function, it does not take parameters, it does not get brackets.

COUNT(DISTINCT a.id)

Putting brackets after tends to lead to a misunderstanding of how it works if you later do normal distinct like this:

SELECT DISTINCT(col1), Col2, Col3 FROM SomeTable

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass