SQL & the JOIN Operator

  • its great article.

  • Charles Kincaid (10/7/2009)


    Chris,

    Not exactly. The UNION / UNION ALL is kind of tough one to get wrapped around. UNION ALL combines two or more identically structured result sets into one. All rows from each result set are returned in the output. By contrast UNION (without the ALL) returns only distinct rows.

    Example: Say you were doing income tax preparation and had a table of clients for each year. CLIENT2007, CLIENT2008, CLIENT2009. Each table contains just name and taxpayer ID (SSN). Then your old buddy 'Joe Smith' '999-00-1234' came in all three years. Doing the UNION he would show up just the once but UNION ALL he wold appear three times.

    Terrible design that but I'm just trying to get over the one point.

    Hi Charles, I assume that you are referring to my post earlier responding to SQL Noob.

    You might note that I was not pointing out any differences between UNION and UNION ALL, which I am quite aware of the difference of, but if you could replicate the effects of a UNION ALL in a JOIN and whether or not that might be more efficient.

    Perhaps my use of the word sets instead of tables made it unclear, but I use that since sometimes the data being unioned in is not always a table but could be a subquery which may include data from multiple tables, hashes, xml or constant values and are not technically identically structured. That's probably just semantics because the result is obviously homogenous.

    Anyway, it's still good for people to know that UNION ALL gets all data rows and UNION gets all distinct rows. I've definitely been burned on that one in the past.

  • Since the 'or' operator has lower precedence as the 'and' operator, when you ommit the parentheses the join condition means something completely different. Without parentheses the predicate behind the 'or' is evaluated separately, which leaves out the 'a.id = b.id' part completely, thus returning all records from 'a' for each and every record where 'b.date2' is empty. This might blow up the size of the result set a lot when a has many rows and b has many rows without a date in this column. Never ommit any parenthenses whenever you use both 'and' and 'or' in a predicate. You can never use too many parenthenses in these cases, computers don't know what you mean, they just do what you tell them to, but that's generaly not what you want.

    I've seen predicates with both the 'and' and 'or' operator being the cause of 'human' errors in SQL so often, it's almost worth an article.

  • Sorry, I forgot the include the quote:

    sql.varun (10/7/2009)


    Good Article,

    I have a question on Joins.Can anyone help me.

    When I run below query it is running for 40 mins as it is using Table Spool.

    Query:

    select a.*,b.*

    from table a

    left join table b

    on a.id = b.id

    and a.date1 between b.date1 and b.date2

    or b.date2 is null

    Modifing this query is running in 5 mins.

    select a.*,b.*

    from table a

    left join table b

    on a.id = b.id

    and (a.date1 between b.date1 and b.date2 or b.date2 is null)

    But I am not sure whether both queries results same output.

    Can you explain how does it works while we use multiple join conditions using venn diagrams.

    (considering order of precedence)

    Thanks,

    Varun.

  • SQL Noob (10/7/2009)


    i have a database where i dump logs into from 50 or so domain controllers, sql servers and other servers. each server has one or more tables. to get all failed logons from the last 2 days i query all the tables and link them by union all statements.

    would it be faster to just find something to join it on? i'm even looking at combining a lot of the tables into 3-5 tables and a few lookup tables instead of using union all

    The possibility definitely exists for you to speed performance by creating lookup tables or adding appropriate indexes.

    When it comes to performance tweaking, it can be very hard to say what generally will or won't improve the speed of a query. Query analyzer will often help you find areas where the current queries are taking the longest time to perform and might help suggest where indexes might be most useful. I'm not sure if you've been using that or not.

    More information than would be practical in this forum would be needed to get an idea of what the best method might be for you. The methods used may involve tradeoffs. For instance, if you could get some of the querying done on the servers providing the data before they dump, you may be able to pair the data down a bit and reduce the overall time needed to complete your reports. This, of course, may not be practical in your environment.

  • Excellent article. I have a real-life example of the need for a Cross Join in production. Inside my stored procedure, it is necessary that I check conditions against each company/state/county combination.

    I have a table of unique companies and a table of unique state/counties. With a cross join, I can create a derived table that I use to check conditions for each unique combination.

    Keith Wiggans

  • Actually, Cris, I was referring to you (Chris.Strolia-Davis). The way UNION (and ALL) works is to take multiple result sets and merge them together. It almost does not care where the result sets come from. Look at the UNION statements done by folks on this forum when setting up example tables. They often UNION rows that come from fixed data and not tables at all. You can get results from tables and UNION that with results from a stored procedure and UNION that with results from a function or CLR procedure.

    I don't think that I'm still being clear. Let me put this statement out her and see how much fire it draws. The UNION of result sets takes place after all the JOINS have happened. In that sense UNIONs are not JOINs at all.

    ATBCharles Kincaid

  • davidpenton (10/7/2009)


    Jeff Moden (10/7/2009)


    Heh... of course, I will admit that, no matter which method someone uses to identify the column/table relationships, just about anything would be better than a format like the following (taken from a recent post)...

    ...

    from

    tbl_LeaveStatistics,tbl_leavedetails,tbl_LeaveTypeMaster,tbl_EmpMaster,tbl_UserTypeMaster,tbl_RankMaster

    I agree. How about that sickly hungarian notation for database objects? That kind of stuff makes me sad 🙂

    Agreed. I especially "love" such notation when the demands of the code require a change to a view or an inline table function (or vice versa). 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Wagner this is a great article. I gave it all the stars. Then look at the other results. Seven pages on the forums. If you have been around this site for a while you may have spotted that articles do 2 things. They, in themselves, educate, illustrate, inform and provoke. The last triggers the second thing. They get conversations going.

    ATBCharles Kincaid

  • Wow... thanks for the reminder especially about Michael Cole's fine article.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • vliet (10/8/2009)


    Since the 'or' operator has lower precedence as the 'and' operator, when you ommit the parentheses the join condition means something completely different. Without parentheses the predicate behind the 'or' is evaluated separately, which leaves out the 'a.id = b.id' part completely, thus returning all records from 'a' for each and every record where 'b.date2' is empty. This might blow up the size of the result set a lot when a has many rows and b has many rows without a date in this column. Never ommit any parenthenses whenever you use both 'and' and 'or' in a predicate. You can never use too many parenthenses in these cases, computers don't know what you mean, they just do what you tell them to, but that's generaly not what you want.

    I've seen predicates with both the 'and' and 'or' operator being the cause of 'human' errors in SQL so often, it's almost worth an article.

    Thanks for the information. I understood what you have explained.

    But in above case, b.date field has no nulls and I am getting same results.

    I think it was a precautionary statement. I will check the requirement and try to add parentheses.

  • Excellent article.

    I'll echo several other posts and point out I frequently use cross joins for test data or simply when I need to have a huge number of rows for some reason (such as generating a tally table).

    Also, remember that you can place indexes on views (if certain conditions are met, described in books oline and at http://www.sqlservercentral.com/articles/Indexed+Views/63963/ )

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • superlangerkerl (10/7/2009)


    It was a good article, however these are fundamental types of joins. I would like to have seen something on hash joins and merge joins.

    I had noted your post and then noted that nobody responded. There is a difference in the type you mention that almost puts them outside this kind of article.

    I was about to spout off and say that these are things available to the query optimizer only. You don't get to tell SQL Server how to join only what to join. Then I did some more research and found out that you CAN tell the query optimizer what to do. There is this thing called "Join Hints". But Microsoft says:

    Because the SQL Server query optimizer typically selects the best execution plan for a query, we recommend that hints, including <join_hint>, be used only as a last resort by experienced developers and database administrators.

    For the most part I tend to agree. Then understanding what goes on under the hood (bonnet) and why the query optimizer picks what it does can be helpful.

    From MSDN:

    Join Hints

    Understanding Hash Joins

    Understanding Nested Loops Joins

    Understanding Merge Joins

    Those links will get you started.

    ATBCharles Kincaid

  • Hello,

    Been thoroughly enjoying your article on SQL JOINs. Thank you so much for this information.

    Regards,

    Mike G.

    Seattle, WA

  • <!>

Viewing 15 posts - 61 through 75 (of 98 total)

You must be logged in to reply to this topic. Login to reply