Understanding INNER join in detail

  • jpellman (7/8/2008)


    Please read your work before publishing content. English is America's language - God bless America!!!

    And this site is used internationally by many others than just us American's. We need to be tolerant of others for whom American English is a second language.

    😎

  • orrego.cesar (7/8/2008)


    I have a big question (I guess that is big only for me), but... 'what is the real advantage of INNER JOIN??? why not to use WHERE condition to join tables or to do al the combinations that you did in your article???'

    Thanks a lot.

    (PS: anybody knows if the 'diferences' are the same in al DBMS's???)

    The ANSI standard is to do complete the table joins in the from clause and filter your result sets in the where clasue. As this is how I started learning SQL, it just seems cleaner to me. I easily identify the criteria for joining the tables from the criteria of the filters. When both are included in the where clause, I find it to be less clear, IMHO.

    😎

  • jim.powers (7/8/2008)


    orrego.cesar (7/8/2008)


    I have a big question (I guess that is big only for me), but... 'what is the real advantage of INNER JOIN??? why not to use WHERE condition to join tables or to do al the combinations that you did in your article???'

    Thanks a lot.

    (PS: anybody knows if the 'diferences' are the same in al DBMS's???)

    To understand this, you need to understand the order in which the statement is processed. The FROM clause is processed first. SQL Server (or any RDBMS for that matter) starts with this statement to limit the amount of data returned in the query. From what I have read, I believe you'll find that using INNER JOIN rather than joining in the WHERE clause will be a substantial performance boost since you have already limited the data set before it starts to pull the actual data. This is especially critical when joining multiple tables as it limits the amount of data the query engine has to process when it gets to the WHERE statement.

    Actually if you look at the query engine at work it will generally do some translation of the query so ultimately INNER JOIN versus older ANSI join were conditions are in the WHERE clause are the same and I have seen little performance difference on most simple queries. However that said the JOIN syntaxes where created mostly for readability to reduce confusion about what you were doing but it still requires a bit of work on the person to ensure they get the expected results when they get into more complex designs. For example I have seen people do this

    SELECT

    *

    FROM

    dbo.First_Table FT

    LEFT JOIN

    dbo.Second_Table ST

    ON

    FT.Primary_Key_Value = ST.Primary_Key_Value

    WHERE

    ST.Number_Column = 1

    In this case the user has turned the net effect into an INNER JOIN as they have removed all other rows except those where Number_Column from Second_Table is equal to 1. The query ould produce the exact same results as this

    SELECT

    *

    FROM

    dbo.First_Table FT

    INNER JOIN

    dbo.Second_Table ST

    ON

    FT.Primary_Key_Value = ST.Primary_Key_Value

    WHERE

    ST.Number_Column = 1

    In the case of INNER JOIN condition placement is not as important as it is in other join types. So if they really did want all rows from First_Table and only the rows from Second_Table that matched on Primary_Key_Value and Number_Column equals 1 then they would need to write this way

    SELECT

    *

    FROM

    dbo.First_Table FT

    LEFT JOIN

    dbo.Second_Table ST

    ON

    FT.Primary_Key_Value = ST.Primary_Key_Value AND

    ST.Number_Column = 1

    Otherwise if the expected result was achieved they should consider a revise to INNER JOIN to ensure the compiler makes better decisions.

    But going back to the statement the biggest benefit from JOIN syntaxes is readbility for anyone who has to review the query. In the above demonstration old ANSI join syntax could make a really complex query much harder to determine the error in the logic.

  • orrego.cesar (7/8/2008)


    thanks for your answer jim.powers... but i'm still not clear in this case.

    I don't know if i have to ask in other topic (because i guess that is not exactly the same kind of questions) but... in measurable terms (time, reads)... a 'SELECT * FROM Table1 INNER JOIN Table2 on (Table1.id = Table2.id)' is so different to a 'SELECT * FROM Table1, Table2 WHERE Table1.id=Table2.id'

    Obviously the product of the FROM is different... the final resulset is the same too... but.. when you use a JOIN, are not doing a cartesian product an then a conditional discrimination of rows???

    I don't know HOW works really the JOIN...

    thanks a lot again.

    CΓ©sar.

    PS: I really sorry about my grammar...

    The final result set is the same with both queries. What you want to do is limit the amount of work the query engine has to do. When you do a JOIN, it is not returning a cartesian product. However, I'm going to guess that doing the "joining" in the WHERE clause initially does produce a cartesian product. The challenge I would propose is to consider how much work you are making the engine do. If you ask it to do a cartesian join first and then tell it you really didn't want all of that data to begin with, you are asking the engine to do more work than is really necessary.

    As far a the differences in the two query examples you posted here, they are different in HOW the engine processes the query but it will return the same data. When I started writing queries, I did everything in the WHERE clause as well, so I understand it is a difficult transition. I believe you'll find, in the long run, that not only will performance increase, you'll find that your queries are more readable by keeping the JOIN criteria in the FROM clause. Then, you'll only be using the WHERE clause to limit the result set as desired.

  • Using "Table1, Table2 WHERE ..." is an older syntax. The ANSI standard for quite some time has been to use "Table1 INNER JOIN Table2 ON ..." (LEFT OUTER JOIN, etc.)

    However, I thought Cesar's question was worth checking with the Execution Plan. They are identical, as expected, and the plan confirms the use of INNER JOIN.

    Cesar, you should experiment with the preferred syntax.

  • Susantha, do you have the ability to edit this article? If you can, would you like me to correct your English so that you can post an improved version?

  • jim.powers (7/8/2008)


    To understand this, you need to understand the order in which the statement is processed. The FROM clause is processed first. SQL Server (or any RDBMS for that matter) starts with this statement to limit the amount of data returned in the query. From what I have read, I believe you'll find that using INNER JOIN rather than joining in the WHERE clause will be a substantial performance boost since you have already limited the data set before it starts to pull the actual data. This is especially critical when joining multiple tables as it limits the amount of data the query engine has to process when it gets to the WHERE statement.

    I'm not sure this is true... my understanding is the actual query is pre-processed by the engine and converted via the optimizer prior to actual execution of the query in a way that the optimizer thinks will result in the best performance.

    When I take queries using ANSI-92 or later style joins and rewrite them as a list of tables with the join criteria in the WHERE clause, I get the exact same estimated and actual execution plans for the queries.

    Here's an example from the view [HumanResources].[vEmployeeDepartment] in the adventureworks OLTP database for SQL 2005.

    Here's the original query

    SELECT

    e.[EmployeeID]

    ,c.[Title]

    ,c.[FirstName]

    ,c.[MiddleName]

    ,c.[LastName]

    ,c.[Suffix]

    ,e.[Title] AS [JobTitle]

    ,d.[Name] AS [Department]

    ,d.[GroupName]

    ,edh.[StartDate]

    FROM [HumanResources].[Employee] e

    INNER JOIN [Person].[Contact] c

    ON c.[ContactID] = e.[ContactID]

    INNER JOIN [HumanResources].[EmployeeDepartmentHistory] edh

    ON e.[EmployeeID] = edh.[EmployeeID]

    INNER JOIN [HumanResources].[Department] d

    ON edh.[DepartmentID] = d.[DepartmentID]

    WHERE GETDATE() BETWEEN edh.[StartDate] AND ISNULL(edh.[EndDate], GETDATE());

    And if rewritten to remove ANSI-92 joins you get

    SELECT

    e.[EmployeeID]

    ,c.[Title]

    ,c.[FirstName]

    ,c.[MiddleName]

    ,c.[LastName]

    ,c.[Suffix]

    ,e.[Title] AS [JobTitle]

    ,d.[Name] AS [Department]

    ,d.[GroupName]

    ,edh.[StartDate]

    FROM [HumanResources].[Employee] e,

    [Person].[Contact] c ,

    [HumanResources].[EmployeeDepartmentHistory] edh,

    [HumanResources].[Department] d

    WHERE

    c.[ContactID] = e.[ContactID]AND

    e.[EmployeeID] = edh.[EmployeeID] AND

    edh.[DepartmentID] = d.[DepartmentID] AND

    GETDATE() BETWEEN edh.[StartDate] AND ISNULL(edh.[EndDate], GETDATE());

    Not only do both use the same estimated execution plan, they also use the same actual execution plans.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Hi, I agree, it is a good article, but the output for querys 4 and 5 are incorrect.

  • I couldn't even get to the guts of this article simply for the fact that I found the grammar/verbiage intolerable. Next time you should get someone who can clearly write in English to (at least) edit. It's like a resume with a word misspelled...right in the garbage.

    ps. Two words....Spell check.

  • Hmmm - I think someone should have taken a look at this article before it was published.

    Cartesian products are rarely useful and are typically only used to generate test data. Not sure it's particularly relevant to a discussion on INNER JOINS.

  • mtassin (7/8/2008)


    I'm not sure this is true... my understanding is the actual query is pre-processed by the engine and converted via the optimizer prior to actual execution of the query in a way that the optimizer thinks will result in the best performance.

    Well, I'm embarrassed to say, I may stand corrected on this issue. It may, perhaps, be that the optimizer processes the query in a particular order but then ultimately comes up with the same execution plan, regardless of how it was written. I'll have to look back at where I read this and reread it again for more clarification on my part.

    I did take one of my longer running queries which runs against a table with nearly 9 million rows. I ran it using the JOINS in the FROM clause and then changing it up to put the joins in the WHERE clause. Both ran with the same execution plan and took the same amount of time. Part of me wonders if it ran the rewrite from the procedure cache but I'll never know at this point.

    Sorry if I have introduced any confusion here...

  • As someone for whom joins have always been a weak area I was very interested in this article, but I have to admit that the quality of the grammar did let it down, making it very hard to follow the flow of the explanation. It would definitely have benefited from having someone read through and correct it before it was published.

    Not that I could do better in another language, or that I thought that the mistakes were very bad, simply that when struggling to get your head around a difficult subject it doesn't take much in the way of grammatical errors to throw off your understanding. Personally, even as a native English speaker (British English... you know, the ones who invented it! πŸ˜‰ I still always get someone to proof read any formal documents I write before publishing them, so I don't think expecting the same in this instance to be that unreasonable. The responses from some of the other posters however are just downright rude, rather than constructive, and hardly allow them to maintain the moral or literary high ground.

  • Crazy Canuck (7/8/2008)


    Hmmm - I think someone should have taken a look at this article before it was published.

    Cartesian products are rarely useful and are typically only used to generate test data. Not sure it's particularly relevant to a discussion on INNER JOINS.

    I have to disagree on this. I think the use for this case was solid just needed a bit of polish (grammar and flow as already suggested). But the cartesian product does allow them to highlight as they did the information to point out that out of all possibilities what you get. This does very well to demonstrate to someone with no/limited knowledge to grasp what they get from their choice.

    However, I am not quite sure the title lends to the article the concept of what they present versus what one may expect from it based on title. A better title might have been "Comparison operators in INNER JOINS" or something like that.

    It might help if articles could be classified based on target audience. For me it was an average article but that is based on my expirence versus if I were a beginner. I might find the article very helpfull in such a case. An long time user rating an article targeted at beginners really does not help the article at all.

  • While I do appreciate the effort that went into the article, I have to agree that it would have been better if it had been edited and the SQL checked for accuracy. While I am not completely new to SQL, I am new to SQL-Server 2005 and when I have to stop and try to figure out why I am getting errors pasting the SQL into SQL Server Management Studio, it makes it harder to focus on the concepts in the article.

    Regards,

    Eric

  • Your problems with cut & paste are not the author's fault. Everything I copy from a code block gets smooshed into one long line, even Jeff Moden's beautiful code. I forget why it happens but I remember the work-around:

    Keep an instance of MS Word open on the side.

    Copy a code block & paste into Word.

    Select and copy the code block FROM WORD.

    Paste into a QA or SSMS query window.

    The line breaks will follow along then and you won't have that problem.

    It's anoying but it works.

Viewing 15 posts - 31 through 45 (of 134 total)

You must be logged in to reply to this topic. Login to reply