SQL Server JOINS

  • Hi all,

    Can someone help explain to me how when I have 12 rows in table A and 10 in B and I do an inner join , I would get more rows than

    in both A and B ?

    Same with left and right joins...

    This is just a simplified example. Let me share one of my issues with you

    I have 2 views ; which was originally SQL on 2 base tables Culture and Trials.

    And then when attempting to add another table Culture Steps, one of the team members separated the SQL into 2 views

    Since this produces an error when updating(modification cannot be done as it affects multiple base tables), I would like to get

    back to changing the SQL such that I no longer use the views but achieve the same results.

    One of the views has

    SELECT some columns

    FROM dbo.Culture RIGHT JOIN

    dbo.Trial ON dbo.Culture.cultureID = dbo.Trial.CultureID LEFT OUTER JOIN

    dbo.TrialCultureSteps_view_part1 ON dbo.Culture.cultureID = dbo.TrialCultureSteps_view_part1.cultureID

    The other TrialCultureSteps_view_part1 view

    SELECT DISTINCT dbo.Culture.cultureID,

    (SELECT TOP (1) WeekNr

    FROM dbo.CultureStep

    WHERE (CultureID = dbo.Culture.cultureID)

    ORDER BY CultureStepID) AS normalstartweek

    FROM dbo.Culture INNER JOIN

    dbo.CultureStep AS CultureStep_1 ON dbo.Culture.cultureID = CultureStep_1.CultureID

    So how can I combine the joins the achieve the same results using SQL only on tables without the need for views?

  • Quick thought, could this be caused by Cartesian Product, see the example?

    😎

    USE tempdb;

    GO

    DECLARE @TBL01 TABLE

    (

    T_ID INT NOT NULL

    );

    DECLARE @TBL02 TABLE

    (

    T_ID INT NOT NULL

    );

    INSERT INTO @TBL01 (T_ID)

    VALUES (1),(2),(3),(4),(5),(6),(5),(6),(7),(8);

    INSERT INTO @TBL02 (T_ID)

    VALUES (1),(2),(3),(4),(5),(5),(6),(6),(7),(8),(9),(10);

    SELECT

    *

    FROM @TBL01 T1

    INNER JOIN @TBL02 T2

    ON T1.T_ID = T2.T_ID;

    Results

    T_ID T_ID

    ----------- ------

    1 1

    2 2

    3 3

    4 4

    5 5

    5 5

    6 6

    6 6

    5 5

    5 5

    6 6

    6 6

    7 7

    8 8

  • I don't think that's a Cartesian product. If it were, you would have 89 (8*10) rows of ouput.

    A CROSS JOIN would give you a Cartesian Product

  • Please post CREATE TABLE statements for all the tables involved, some sample data for each as INSERT INTO statements, and the output you want to see from your join.

  • Do you have enforced referential constraints on the columns you're joining on? Does the data accurately map between the two tables, or is it possible that you're getting multiple matches for your JOIN criteria? That would be the most likely reason for more rows is that more values match the JOIN criteria for any given row.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • gbritton1 (8/22/2014)


    I don't think that's a Cartesian product. If it were, you would have 89 (8*10) rows of ouput.

    A CROSS JOIN would give you a Cartesian Product

    A CROSS JOIN would produce a full Cartesian Product, equal duplicate values will produce partial Cartesian Product, regardless of the join type.

    😎

  • Eirikur Eiriksson (8/22/2014)


    gbritton1 (8/22/2014)


    I don't think that's a Cartesian product. If it were, you would have 89 (8*10) rows of ouput.

    A CROSS JOIN would give you a Cartesian Product

    A CROSS JOIN would produce a full Cartesian Product, equal duplicate values will produce partial Cartesian Product, regardless of the join type.

    😎

    Another of Jeff's articles to the rescue:

    http://www.sqlservercentral.com/articles/T-SQL/61539/[/url]

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thank you all for your replies πŸ™‚

    Before copying any create or insert data, here is an example

    Select * from Culture --2327 rows

    Select * from Trial --6793 rows

    Select * from CultureStep --13957 rows

    Select * from Culture inner join Trial on Culture.CultureID = Trial.CultureID --6785

    Select * from (Culture inner join Trial on Culture.CultureID = Trial.CultureID) inner join CultureStep

    on Culture.CultureID=CultureStep.CultureID --39661

    How come when the first join yields 6785 rows when inner joined with culturestep which has 13957 rows , this would yield row number greater than both (39661)?!

    I am not very good at this , kindly explain with details and/or refer me to a good example to clear all kinds of joins and help me understand what results to expect πŸ™‚

  • shaimaa.tarekelshoeiby (8/22/2014)


    Thank you all for your replies πŸ™‚

    Before copying any create or insert data, here is an example

    Select * from Culture --2327 rows

    Select * from Trial --6793 rows

    Select * from CultureStep --13957 rows

    Select * from Culture inner join Trial on Culture.CultureID = Trial.CultureID --6785

    Select * from (Culture inner join Trial on Culture.CultureID = Trial.CultureID) inner join CultureStep

    on Culture.CultureID=CultureStep.CultureID --39661

    How come when the first join yields 6785 rows when inner joined with culturestep which has 13957 rows , this would yield row number greater than both (39661)?!

    I am not very good at this , kindly explain with details and/or refer me to a good example to clear all kinds of joins and help me understand what results to expect πŸ™‚

    To me it looks like a many-to-many table without a filter, don't think it is a coincidence that the cardinality of Trial = CultureStep / 2 and CultureStep = Culture x 6

    😎

    Quick question, can you provide either an ERD or a create table script for the three tables?

  • Culture

    CREATE TABLE [dbo].[Culture](

    [cultureID] [int] IDENTITY(1,1) NOT NULL,

    [CultureName] [nvarchar](50) NOT NULL,

    [Process] [nvarchar](50) NULL,

    [Cropspecialist] [nvarchar](50) NULL,

    [Requestor] [nvarchar](50) NULL,

    [CuttingsOrSeed] [nvarchar](50) NULL,

    [EndLocation] [nvarchar](50) NULL,

    [StartWeek] [int] NULL,

    [StartWeekDeviation] [int] NULL,

    [DeliveryWeek] [int] NULL,

    [DeliveryWeekDeviation] [int] NULL,

    [Purpose] [nvarchar](50) NULL,

    [Crop] [nvarchar](50) NULL,

    [EntryVariation?] [nvarchar](50) NULL,

    [CultureRemarks1] [nvarchar](255) NULL,

    [CultureRemarks2] [nvarchar](255) NULL,

    [extra] [int] NULL,

    [planningGroupID1] [int] NULL,

    [extra2] [int] NULL,

    [extra3] [int] NULL,

    [croptype1] [nvarchar](255) NULL,

    [FLS/VEG] [nvarchar](255) NULL,

    [Department] [nvarchar](255) NULL,

    [subProcess] [nvarchar](255) NULL,

    [timeregCode] [nvarchar](255) NULL,

    [createdBy] [nvarchar](255) NULL,

    [dateCreated] [nvarchar](255) NULL,

    CONSTRAINT [aaaaaCulture_PK] PRIMARY KEY NONCLUSTERED

    Culture Step:

    CREATE TABLE [dbo].[CultureStep](

    [CultureStepID] [int] IDENTITY(1,1) NOT NULL,

    [CultureID] [int] NULL,

    [EntryVarNr] [int] NULL,

    [WeekNr] [int] NULL,

    [stepordernr] [int] NULL,

    [days] [int] NULL,

    [plants per entry] [int] NULL,

    [plants per fust] [decimal](18, 1) NULL,

    [extraM2] [int] NULL,

    [fustPerM2(Manual)] [float] NULL,

    [SoilType] [nvarchar](255) NULL,

    [covering soil] [nvarchar](255) NULL,

    [watering] [nvarchar](255) NULL,

    [fungicide treatm after sowing] [nvarchar](255) NULL,

    [name Fungicide] [nvarchar](255) NULL,

    [conc Fungiside] [nvarchar](255) NULL,

    [type of Hood] [nvarchar](255) NULL,

    [Pincing?] [nvarchar](50) NULL,

    [temperature dayNight C] [nvarchar](255) NULL,

    [Neg diff] [nvarchar](255) NULL,

    [Cold drop] [nvarchar](255) NULL,

    [artificial light (y/n)] [nvarchar](255) NULL,

    [daylength neutral (yes/no)] [nvarchar](255) NULL,

    [daylength] [nvarchar](255) NULL,

    [standard PGR treatment?] [nvarchar](255) NULL,

    [name PGR] [nvarchar](255) NULL,

    [conc PGR] [nvarchar](255) NULL,

    [Spraying Schedule PGR] [nvarchar](255) NULL,

    [PH] [nvarchar](255) NULL,

    [EC] [nvarchar](255) NULL,

    [EC in drench water] [nvarchar](255) NULL,

    [fertilisation schedule vegetative] [nvarchar](255) NULL,

    [fertilisation schedule generative] [nvarchar](255) NULL,

    [remarks fertilisation] [nvarchar](255) NULL,

    [Common diseases] [nvarchar](255) NULL,

    [remarks herbicides] [nvarchar](255) NULL,

    [remarks fungicides] [nvarchar](255) NULL,

    [prefLocation] [nvarchar](255) NULL,

    [year] [int] NULL,

    [dummy] [nvarchar](50) NULL,

    [plants/hour] [int] NULL,

    [EntryFactor] [decimal](18, 2) NULL,

    [fustID1] [int] NULL,

    [Actionid1] [int] NULL,

    [CultureStepNameID1] [int] NULL,

    [daynr] [int] NULL,

    [endweekmanual] [int] NULL,

    [CultureStepRemarks] [nvarchar](255) NULL,

    CONSTRAINT [aaaaaCultureStep_PK] PRIMARY KEY NONCLUSTERED

    Trial:

    CREATE TABLE [dbo].[Trial](

    [TrialID] [int] IDENTITY(1,1) NOT NULL,

    [TrialCode] [nvarchar](50) NOT NULL,

    [ActualNrOfTotalEntries] [int] NULL,

    [ActualNrOfEntriesVarNr1] [int] NULL,

    [ActualNrOfEntriesVarNr2] [int] NULL,

    [ActualNrOfEntriesVarNr3] [int] NULL,

    [ActualNrOfEntriesVarNr4] [int] NULL,

    [AtualStartweek] [int] NULL,

    [PlannedStartWeek] [int] NULL,

    [ActualStartYear] [nvarchar](50) NULL,

    [CultureID] [int] NULL,

    [weekOfTrialrequest] [int] NULL,

    [yearOfTrialrequest] [nvarchar](50) NULL,

    [remarks1] [nvarchar](200) NULL,

    [remarks2] [nvarchar](200) NULL,

    [RequestedNrOfTotalEntries] [int] NULL,

    [RequestedStartweek] [int] NULL,

    [RequestedNrOfEntriesVarNr1] [int] NULL,

    [RequestedNrOfEntriesVarNr2] [int] NULL,

    [RequestedNrOfEntriesVarNr3] [int] NULL,

    [RequestedNrOfEntriesVarNr4] [int] NULL,

    [plannedRequest] [nvarchar](255) NULL,

    [trialCodeGroup] [nvarchar](255) NULL,

    [OrigRequestedNrOfTotalEntries] [int] NULL,

    [OrigRequestedStartweek] [int] NULL,

    [OrigRequestedNrOfEntriesVarNr1] [int] NULL,

    [OrigRequestedNrOfEntriesVarNr2] [int] NULL,

    [OrigRequestedNrOfEntriesVarNr3] [int] NULL,

    [OrigRequestedNrOfEntriesVarNr4] [int] NULL,

    [PlanningAproved] [nvarchar](255) NULL,

    [EntryFactorTrial] [float] NULL,

    [FirstStepWithEntryFactor] [float] NULL,

    [ReasonEntryFactor] [nvarchar](100) NULL,

    [RequestedEntryFacorTrial] [float] NULL,

    [RequestedFirstStepWithEntryFactor] [float] NULL,

    [weekCorrection] [int] NULL,

    [StepWCorr] [int] NULL,

    [M2Correction] [int] NULL,

    [StepM2Corr] [int] NULL,

    [fixedDeliveryWeek] [int] NULL,

    [createdBy] [nvarchar](255) NULL,

    [dateCreated] [nvarchar](255) NULL,

    [trialevaluation] [nvarchar](255) NULL,

    CONSTRAINT [aaaaaTrial_PK] PRIMARY KEY NONCLUSTERED

  • Quick analysis

    😎

    First a simplified ERD

    +--------------+ +--------------+ +-----------------+

    | Trial | | Culture | | CultureStep |

    +--------------+ +--------------+ +-----------------+

    | TrialID | ,---|-| CultureID |-|--, | CultureStepID |

    | TrialCode | | | CultureName | '---|<| CultureID |

    | CultureID |>|--' | (Culture ) | | ( CultureStep ) |

    | (Trial ) | | (attributes) | | ( attributes ) |

    | (attributes) | +--------------+ +-----------------+

    +--------------+

    Looking at the ERD, we can tell that:

    Each Trial has one and only one Culture.

    One or more Trials can share the same Culture.

    And

    Each Culture has one or more CultureSteps.

    One or more Cultures can share one or more CultureSteps.

    Inspecting the Cardinality

    Table | Row Count

    ------------|-----------

    Trial | 6793

    Culture | 2327

    CultureStep | 13957

    As the query "Select * from Culture inner join Trial on Culture.CultureID = Trial.CultureID" returns 6785 rows, we know that there are 8 (6793 - 6785) Trials sharing the same Culture. We can also tell that each Culture has the average of 6 CultureSteps (13957/2327 = 5.998).

    Since we do not know the distribution of CultureID in the CultureStep table, the exact numbers cannot be produced but here is an approxymation:

    Select * from (Culture inner join Trial on Culture.CultureID = Trial.CultureID) inner join CultureStep on Culture.CultureID=CultureStep.CultureID

    Given that each CultureID appears aprox. 6 times in the CultureStep and the set Culture-Trial has 6785 entries, the expected result should have close to 6 x 6785 entries.

  • Thank you for the excellent post .

    Can you please help me become capable of doing such analysis?

    How were you able to make the simplified ERD and come up with the conclusions under it

    I want to learn this and be good at it ...

    Also , need to be good with joins and although I did read out several links with examples ...I still seem to find issues like when my brain cannot digest how an inner join would get more rows than the individual selects and so on.

  • shaimaa.tarekelshoeiby (8/25/2014)


    Thank you for the excellent post .

    Can you please help me become capable of doing such analysis?

    How were you able to make the simplified ERD and come up with the conclusions under it

    I want to learn this and be good at it ...

    Also , need to be good with joins and although I did read out several links with examples ...I still seem to find issues like when my brain cannot digest how an inner join would get more rows than the individual selects and so on.

    You are very welcome.

    My suggestion is to grab a good book on the subject, i.e. "Database Modeling and Design" or "Data Modeling for the Business" and while reading through it, use only simple tools such as pen/paper or a text editor to work through exercises and examples. Also read up on the "Relational Database Theory", "Naive Set Theory" and "Relational Algebra". This should get you well on the way.

    Another thing I find helpful, especially when tackling very complex problems, is to practice solving puzzles like Soduku by memory (not using pen and paper), keeps the grey matter functional;-)

    😎

  • Thank you ...

    But with so limited time , what can I do to help me go through until I can have more time to read the book(s) you suggested and use pen/paper?

    Any recommended set of articles or a small daily exercise ; something of the sort? πŸ™‚

  • There's no instant or magic way to build this kind of knowledge. It takes a long time. I'd add getting a copy of the book T-SQL Querying Fundamentals by Itzik Ben Gan. It's a great way to learn.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply