SQL & the JOIN Operator

Question

Post reply

SQL & the JOIN Operator

wagner crivelini

SSC Eights!

Points: 926
More actions
October 6, 2009 at 11:25 pm

#205306

Comments posted to this topic are about the item SQL & the JOIN Operator

Viewing 15 posts - 1 through 15 (of 99 total)

You must be logged in to reply to this topic. Login to reply

SuperDBA-207096 SSCrazy Eights Points: 8176 More actions · Answer 1

SuperDBA-207096

SSCrazy Eights

Points: 8176

October 7, 2009 at 4:49 am

#1062680

Pretty good information!

SuperDBA-207096 SSCrazy Eights Points: 8176 More actions · Answer 2

Wagner,

You might want to mention in the:

"Excluding the Intersection of the Sets" section this is similar to

select... where not in (select... from table2) but it performs alot better w/ the join vs. where not in?

Just my 02c. Again, well done!

Mark

blandry SSCarpal Tunnel Points: 4821 More actions · Answer 3

Excellent article Wagner!

Thank you very much!

There's no such thing as dumb questions, only poorly thought-out answers...

sushila SSC-Dedicated Points: 35293 More actions · Answer 4

I always enjoy reading "back to basics" articles and this one is great - simple, detailed and comprehensive.

**ASCII stupid question, get a stupid ANSI !!!**

jclark017 Grasshopper Points: 21 More actions · Answer 5

With respect for your efforts in creating a very useful article, I have one item of criticism. The numerous, simple, grammatical errors throughout the article destroy readability, and authorial credibility.

Chris.Strolia-Davis Old Hand Points: 315 More actions · Answer 6

Excellent article.

You mentioned not knowing a real world application of the CROSS JOIN.

In my experience, this is typically used for creating test data.

Sometimes you need to test data in all sorts of different configurations. By using a cross join, you can set up the different parameters and try all combinations.

Additionally, if you are trying to create bogus data for a test environment, this is one way of taking data from different parts of the real data and generating new data that is not actually real.

In many cases, this type of join is used on temporary or memory based tables in a batch since the data it produces often needs to go through additional transformation and filtering before it is useful.

I guess, technically, it isn't used in a "real world" application, but it is used for real world issues.

wagner crivelini SSC Eights! Points: 926 More actions · Answer 7

Hi, super, thanks for your comments ... (and thanks everybody for all other comments on this articles).

I feel it's rather risky to try to find a golden rule when it comes to performance. It depends on so many variables that I'd better test things on each particular case.

In both statements we are talking about here, we have operations that we are told to avoid... Either using comparions to NULL value when using the JOIN or using the NOT IN predicate when using the subquery.

I guess the real advantage in using the syntax I suggested in the article is that the SELECT statement can list fields from both tables when we use the JOIN.

Regarding performance, I was curious to check your suggestion and so I ran a "SET STATISTICS PROFILE ON;" to show performance on both statements.

To my surprise, the Total Subtree Cost on those queries were almost exactly the same same (difference was less than 1%)

0.0070812 for the JOIN

0.00706536 for the subquery

Sometimes RDBMs play a trick on us. 🙂

Janus Lin SSC Enthusiast Points: 100 More actions · Answer 8

Terrific article!! I was in a meeting yesterday where we were talking about this very topic!

I think that visualizing what is happening by use of the Venn diagrams is important -- nice inclusion in the article.

roger_os SSC Eights! Points: 891 More actions · Answer 9

Great article. Re cross joins, I used to use them when I had one table with a list of products, and one table with a list of start dates and end dates (52 rows = weeks of the year). A simple cross join gave me weekly buckets for each product (which we then used for sales & purchasing forecasting).

Rog

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 10

Chris.Strolia-Davis (10/7/2009)
Excellent article.
You mentioned not knowing a real world application of the CROSS JOIN.
In my experience, this is typically used for creating test data.
Sometimes you need to test data in all sorts of different configurations. By using a cross join, you can set up the different parameters and try all combinations.
Additionally, if you are trying to create bogus data for a test environment, this is one way of taking data from different parts of the real data and generating new data that is not actually real.
In many cases, this type of join is used on temporary or memory based tables in a batch since the data it produces often needs to go through additional transformation and filtering before it is useful.
I guess, technically, it isn't used in a "real world" application, but it is used for real world issues.

The real world applications for using CROSS JOINS are many and varied. Most of them revolve around the use of a Tally Table (Numbers Table) to do things like make a Tally CTE which in turn would be cross joined to a delimited column to do splits or used to generate contiguous dates, etc. When limited by Triangular self joins (about half a cross join but still uses CROSS JOIN), they can be used to generate "schedule pairs" and a whole lot more. And, you're also correct... they can be used to very quickly generate very large volumes of constrained randomized test data. It's not uncommon to see some of the frequent posters generate a million row test table to make their point about a performance problem/solution. Rog_os also pointed out a frequent use above.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Charles Kincaid SSChampion Points: 13593 More actions · Answer 11

Great article. Very good primer on the subject. When you do your next one on advanced joins please talk about moving thing from the WHERE to the ON for better performance. Case in point I needed a list of all CUSTOMERs who had 'CREDIT' type ORDERs placed in the last 30 days. So instead of:

WHERE [ORDER].TypeId = 3 AND ...

use

INNER JOIN [ORDER] ON [ORDER].CustomerID = [CUSTOMER].Id AND [ORDER].TypeId = 3

ATBCharles Kincaid

Jeff Moden SSC Guru Points: 1004749 More actions · Answer 12

Nice article, Wagner! Articles of this nature should be required reading for anyone just starting out in SQL and those that enjoy a refresher. Well done. In the vein of "One picture is worth a thousand words", you did a great job with the graphics. Thanks for taking the time.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

alen teplitsky SSC-Dedicated Points: 30011 More actions · Answer 13

is there any performance difference doing a union all compared to a join when you need to get all the rows from two or more tables? i have a database where i do union all on some data in 20 tables or so when running a report and it seems to take a long time

Tobar SSCarpal Tunnel Points: 4876 More actions · Answer 14

We use "cross joins", actually no join at all, when we are creating our dimensions in our data warehouse.

<><
Livin' down on the cube farm. Left, left, then a right.