JOIN for beginners

  • Hi all!

    I have two tables:

    Dutyrostershift, which as a field called id, which is defined as an int with identity(1,1) and is t5he primary field

    TimeAccountMovement, which has a field called ownerid, which is equal to id in Dutyrostershift.

    If i take

    select count(a.id) from dbo.dutyrostershift A

    the result is 83459

    If i replace it with

    select count(*) from dbo.dutyrostershift A join dbo.timeaccountmovement B ON b.ownerid = a.id

    i get the result 140183

    the number of dutyrostershifts has not changed, and I want a join, giving me the original number, i.e. 83459

    When i look at the doc for LEFT JOIN, RIGHT JOIN and FULL JOIN, I se none, that gives me the result i want to have.

    Is there a way?

    Best regards

    Edvard Korsbæk

  • DDL for the tables would be helpful, along with sample data. Based on what you've provided, there is clearly a one-to-many relationship between the tables, so the question becomes, if you're looking for a count from the first table, and nothing more, why use a JOIN at all?

    Roland Alexander 
    The Monday Morning DBA 
    There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

  • Roland is right, there's probably a one-to-many relationship, resulting in possible multiple results for one Dutyrostershift.ID.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • You might try "count (distinct a.id)" and see if that does what you need. That will count distinct values, instead of total rows.

    But it's normal for a join from a parent table to a subtable to result in multiplication of the rows in the parent table. That's expected behavior.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • DDL for the tables would be helpful

    What does DDL mean?

    There is a one to many - I expect 1 to 5 timeaccountmovements for each dutyrostershift

    The definations are:

    CREATE TABLE [dbo].[dutyrostershift](

    [dato] [datetime] NULL,

    [std] [tinyint] NULL,

    [specialvagt] [tinyint] NULL,

    [daekbemand] [tinyint] NULL,

    [extratimer] [int] NULL,

    [overarbtimer] [int] NULL,

    [manuel] [tinyint] NULL,

    [beskyttet] [tinyint] NULL,

    [confirmed] [tinyint] NULL,

    [vacationtype] [varchar](50) NULL,

    [breakswish] [tinyint] NULL,

    [dutyrosterid] [int] NULL,

    [employeeid] [int] NULL,

    [employeegroupid] [int] NULL,

    [childforcaredayid] [int] NULL,

    [originatingstaffingrequirementid] [int] NULL,

    [shifttype] [int] NULL,

    [fromtime] [int] NULL,

    [totime] [int] NULL,

    [id] [int] IDENTITY(1,1) NOT NULL,

    [leavetype_id] [int] NULL,

    [LoginID] [int] NULL,

    [StatusNo] [int] NULL,

    [Time_Stamp] [datetime] NULL,

    [Comment] [char](120) NULL,

    [Is_Free_sat] [tinyint] NULL,

    [Is_Center_Opening] [tinyint] NULL,

    [is_fo_day] [tinyint] NULL,

    CONSTRAINT [pk_dbo_pk_dutyrostershift] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    and

    CREATE TABLE [dbo].[timeaccountmovement](

    [timeaccountid] [int] NULL,

    [ownerid] [int] NULL,

    [ownertype] [int] NULL,

    [starttime] [int] NULL,

    [days] [int] NULL,

    [endtime] [int] NULL,

    [minutes] [int] NULL,

    [duration] [varchar](20) NULL,

    [id] [int] IDENTITY(1,1) NOT NULL,

    [do_not_recalculate] [tinyint] NULL,

    [DATO] [datetime] NULL,

    CONSTRAINT [pk_dbo_pk_timeaccountmovement] PRIMARY KEY CLUSTERED

    (

    [id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

  • That's expected behavior - OK, but i wonder why it is so...

    DISTINCT() did the trick

    Thanks

    Edvard Korsbæk

  • DDL = Data Definition Language = just what you posted: the CREATE TABLE statements.

    Since there is a one-to-many relationship, again, why are you doing a JOIN to derive a count? If you needed some value from the child table, what is it? If you don't need a value from the child table to group the count by, why JOIN at all?

    Roland Alexander 
    The Monday Morning DBA 
    There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

  • Thanks for your replies!

    I did not understand the expected behaviour of the JOIN - It still seems a bit weird to me.

    What i want to know somethhing about is the number of Dutyrostershifts, where the timeaccountid is in the range from 3 to 5 in the timeaccountmovement

    But as the count changed allready when i made the JOIN, i got a bit surprised.

    Best regards

    Edvard Korsbæk

  • Ah, now I understand. Then DISTINCT will be your best bet, as pointed out by GSquared, and as you've already seen for yourself.

    Roland Alexander 
    The Monday Morning DBA 
    There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

  • Count() just gets the number of rows, by default.

    Here's a sample of how it works. Run this in a test environment so you can see what I'm talking about:

    IF OBJECT_ID(N'tempdb..#T1') IS NOT NULL

    DROP TABLE #T1;

    IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL

    DROP TABLE #T2;

    CREATE TABLE #T1 (ID INT PRIMARY KEY);

    CREATE TABLE #T2

    (T1ID INT NOT NULL,

    Col1 VARCHAR(10));

    INSERT INTO #T1

    (ID)

    VALUES (1),

    (2),

    (3),

    (4);

    INSERT INTO #T2

    (T1ID, Col1)

    VALUES (1, 'A'),

    (1, 'B'),

    (2, 'A'),

    (2, 'B'),

    (2, 'C'),

    (3, 'A');

    SELECT *

    FROM #T1;

    SELECT COUNT(*)

    FROM #T1;

    SELECT *

    FROM #T1

    INNER JOIN #T2

    ON #T1.ID = #T2.T1ID;

    SELECT COUNT(#T1.ID)

    FROM #T1

    INNER JOIN #T2

    ON #T1.ID = #T2.T1ID;

    Now, you can easily see that #T1 has 4 rows. It inserts exactly 4, and it then selects them and selects the count on them.

    But, in #T2, T1ID 1 has 2 matches, 2 has 3 matches, 3 has 1 match, and 4 has 0 matches. So, when we join them, we end up with 6 rows (2+3+1+0 = 6). Count() thus returns 6. Add the distinct keyword to the final query, COUNT(DISTINCT #T1.ID), and you get 3 as the result, because the Inner Join eliminates value 4 from the query, because it has no matches in the subtable.

    Joins match rows in one table (or other dataset), to rows in another. If one of them has more rows than the other, which is normally the case, then you get more total rows, and Count() will return the increased (or decreased, in the sample of ID value = 4) number of rows.

    Looking at the just plain SELECT * FROM, without the Count() function in it, should make it more clear.

    Does that help?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • edvard 19773 (11/12/2012)


    That's expected behavior - OK, but i wonder why it is so...

    DISTINCT() did the trick

    Hold on a sec...

    Distinct is not a function, it takes no parameters and gets no brackets. It's a keyword that tells SQL to remove duplicate rows, that's all. Distinct within a count tells SQL to remove duplicate rows before it counts the rows.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • ok!

    Select count(distinct(a.id))

    DISTINCT() was a shorthand here.

    Best regards

    Edvard Korsbæk

  • Distinct is not a function, it does not take parameters, it does not get brackets.

    COUNT(DISTINCT a.id)

    Putting brackets after tends to lead to a misunderstanding of how it works if you later do normal distinct like this:

    SELECT DISTINCT(col1), Col2, Col3 FROM SomeTable

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply