SQL & the JOIN Operator

  • Comments posted to this topic are about the item SQL & the JOIN Operator

  • Pretty good information!

  • Wagner,

    You might want to mention in the:

    "Excluding the Intersection of the Sets" section this is similar to

    select... where not in (select... from table2) but it performs alot better w/ the join vs. where not in?

    Just my 02c. Again, well done!

    Mark

  • Excellent article Wagner!

    Thank you very much!

    There's no such thing as dumb questions, only poorly thought-out answers...
  • I always enjoy reading "back to basics" articles and this one is great - simple, detailed and comprehensive.







    **ASCII stupid question, get a stupid ANSI !!!**

  • With respect for your efforts in creating a very useful article, I have one item of criticism. The numerous, simple, grammatical errors throughout the article destroy readability, and authorial credibility.

  • Excellent article.

    You mentioned not knowing a real world application of the CROSS JOIN.

    In my experience, this is typically used for creating test data.

    Sometimes you need to test data in all sorts of different configurations. By using a cross join, you can set up the different parameters and try all combinations.

    Additionally, if you are trying to create bogus data for a test environment, this is one way of taking data from different parts of the real data and generating new data that is not actually real.

    In many cases, this type of join is used on temporary or memory based tables in a batch since the data it produces often needs to go through additional transformation and filtering before it is useful.

    I guess, technically, it isn't used in a "real world" application, but it is used for real world issues.

  • Hi, super, thanks for your comments ... (and thanks everybody for all other comments on this articles).

    I feel it's rather risky to try to find a golden rule when it comes to performance. It depends on so many variables that I'd better test things on each particular case.

    In both statements we are talking about here, we have operations that we are told to avoid... Either using comparions to NULL value when using the JOIN or using the NOT IN predicate when using the subquery.

    I guess the real advantage in using the syntax I suggested in the article is that the SELECT statement can list fields from both tables when we use the JOIN.

    Regarding performance, I was curious to check your suggestion and so I ran a "SET STATISTICS PROFILE ON;" to show performance on both statements.

    To my surprise, the Total Subtree Cost on those queries were almost exactly the same same (difference was less than 1%)

    0.0070812 for the JOIN

    0.00706536 for the subquery

    Sometimes RDBMs play a trick on us. 🙂

  • Terrific article!! I was in a meeting yesterday where we were talking about this very topic!

    I think that visualizing what is happening by use of the Venn diagrams is important -- nice inclusion in the article.

  • Great article. Re cross joins, I used to use them when I had one table with a list of products, and one table with a list of start dates and end dates (52 rows = weeks of the year). A simple cross join gave me weekly buckets for each product (which we then used for sales & purchasing forecasting).

    Rog

  • Chris.Strolia-Davis (10/7/2009)


    Excellent article.

    You mentioned not knowing a real world application of the CROSS JOIN.

    In my experience, this is typically used for creating test data.

    Sometimes you need to test data in all sorts of different configurations. By using a cross join, you can set up the different parameters and try all combinations.

    Additionally, if you are trying to create bogus data for a test environment, this is one way of taking data from different parts of the real data and generating new data that is not actually real.

    In many cases, this type of join is used on temporary or memory based tables in a batch since the data it produces often needs to go through additional transformation and filtering before it is useful.

    I guess, technically, it isn't used in a "real world" application, but it is used for real world issues.

    The real world applications for using CROSS JOINS are many and varied. Most of them revolve around the use of a Tally Table (Numbers Table) to do things like make a Tally CTE which in turn would be cross joined to a delimited column to do splits or used to generate contiguous dates, etc. When limited by Triangular self joins (about half a cross join but still uses CROSS JOIN), they can be used to generate "schedule pairs" and a whole lot more. And, you're also correct... they can be used to very quickly generate very large volumes of constrained randomized test data. It's not uncommon to see some of the frequent posters generate a million row test table to make their point about a performance problem/solution. Rog_os also pointed out a frequent use above.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Great article. Very good primer on the subject. When you do your next one on advanced joins please talk about moving thing from the WHERE to the ON for better performance. Case in point I needed a list of all CUSTOMERs who had 'CREDIT' type ORDERs placed in the last 30 days. So instead of:

    WHERE [ORDER].TypeId = 3 AND ...

    use

    INNER JOIN [ORDER] ON [ORDER].CustomerID = [CUSTOMER].Id AND [ORDER].TypeId = 3

    ATBCharles Kincaid

  • Nice article, Wagner! Articles of this nature should be required reading for anyone just starting out in SQL and those that enjoy a refresher. Well done. In the vein of "One picture is worth a thousand words", you did a great job with the graphics. Thanks for taking the time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • is there any performance difference doing a union all compared to a join when you need to get all the rows from two or more tables? i have a database where i do union all on some data in 20 tables or so when running a report and it seems to take a long time

  • We use "cross joins", actually no join at all, when we are creating our dimensions in our data warehouse.

    <><
    Livin' down on the cube farm. Left, left, then a right.

Viewing 15 posts - 1 through 15 (of 98 total)

You must be logged in to reply to this topic. Login to reply