Stairway to Advanced T-SQL Level 1: Intro to Advanced T-SQL Using a CROSS JOIN

Question

Post reply

Stairway to Advanced T-SQL Level 1: Intro to Advanced T-SQL Using a CROSS JOIN

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply

Greg Larsen SSC-Insane Points: 20966 More actions · Answer 1

Oops, cut and pasted the wrong text.

Gregory A. Larsen, MVP

kkilsby Grasshopper Points: 24 More actions · Answer 2

Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.

roger.plowman SSChampion Points: 10265 More actions · Answer 3

So the only real use of the term "CROSS JOIN" vs the older seperation of two tables by a comma is simply to reassure the reader that the cross join was intentional and the author didn't simply forget a join?

That's the only reason I can see to have a specific syntax for cross joins. Mind you, it's probably a good idea to document the deliberate creation of a Cartsian product, especially given the disasterous consequences if the tables are too big! 🙂

Brain2000 Old Hand Points: 363 More actions · Answer 4

Is it possible to disable cross joins for either a specific login, database, or server (or combination) for production servers? I find cross joins to be a security risk as it is a playground for DoS attacks. Not to mention, programmers that don't know what they are doing.

Phil Parkin SSC Guru Points: 246686 More actions · Answer 5

So the only real use of the term "CROSS JOIN" vs the older seperation of two tables by a comma is simply to reassure the reader that the cross join was intentional and the author didn't simply forget a join?
That's the only reason I can see to have a specific syntax for cross joins. Mind you, it's probably a good idea to document the deliberate creation of a Cartsian product, especially given the disasterous consequences if the tables are too big! Smile

In essence, you are correct. However, CROSS JOIN is also the ANSI SQL standard.

--Edit: got hit by the reply/quote bug

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 6

kkilsby (2/19/2016)
Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.

No. SQL Server does not generally implement joins by first taking the Cartesian product (CROSS JOIN) and then filtering.

For one thing, that would be grossly inefficient; for a second, it would require some tweaking for certain types of joins.

Think OUTER JOINS, particularly. Take the following simple example:

left_id

-----------

1

2

right_id

-----------

2

3

CROSS JOIN RESULTS

left_id right_id

----------- -----------

1 2

2 2

1 3

2 3

LEFT OUTER JOIN ON left_id=right_id RESULTS

left_id right_id

----------- -----------

1 NULL --This row appears nowhere in the CROSS JOIN results

2 2

That first row in the result set of the left outer join couldn't be achieved by taking the result of the cross join and filtering (at least not simply; you'd have to introduce other mechanisms into the join algorithm).

As for the inefficiency/performance problem, try the following example:

--Run the CREATE TABLE and INSERT statements as

--separately from the SELECT queries that follow.

--One of the SELECTs (hint: it's the CROSS JOIN)

--will not finish in a reasonable time.

CREATE TABLE #test_all_int (ID int);

CREATE TABLE #test_odd_int (ID int);

INSERT INTO #test_all_int

SELECT TOP 1000000 ROW_NUMBER() OVER(ORDER BY (SELECT 1))

FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2;

INSERT INTO #test_odd_int

SELECT ID

FROM #test_all_int ai

WHERE ai.ID%2=1;

GO

--In the following queries I declare a variable to

--which I can redirect the output of the query.

--That way we don't have to deal with the overhead

--of displaying the results.

--Here's the CROSS JOIN:

--Don't hold your breath waiting for this to finish;

--run it and wait as long as your patience

--allows. :)

DECLARE @int_bucket int;

SELECT @int_bucket=ai.id, @int_bucket=oi.id

FROM #test_all_int ai

CROSS JOIN

#test_odd_int oi;

GO

--If this INNER JOIN were implemented by doing the cross join

--and then filtering, it should take at least as long

--as the above query. This one actually finishes fairly

--quickly, though :)

DECLARE @int_bucket int;

SELECT @int_bucket=ai.id, @int_bucket=oi.id

FROM #test_all_int ai

INNER JOIN

#test_odd_int oi ON ai.id=oi.id;

GO

--Clean up after ourselves

DROP TABLE #test_all_int,#test_odd_int;

GO

Craig Freedman has a nice set of articles looking at the physical join operators used in SQL Server; they should clear up how SQL Server generally goes about implementing JOINs:

http://blogs.msdn.com/b/craigfr/archive/2006/07/26/nested-loops-join.aspx

http://blogs.msdn.com/b/craigfr/archive/2006/08/03/merge-join.aspx

http://blogs.msdn.com/b/craigfr/archive/2006/08/10/687630.aspx

Cheers!

Alan Burstein SSC Guru Points: 61136 More actions · Answer 7

Jacob Wilkins (2/19/2016)
kkilsby (2/19/2016)
Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.
No. SQL Server does not generally implement joins by first taking the Cartesian product (CROSS JOIN) and then filtering.

Actually, yes, that is exactly what happens; note the Logical Query Processing Poster below. In a T-SQL query the FROM operator is evaluated first. For a JOIN table operator:

1. a Cartesian Product is created

2. The ON Filter is evaluated and unmatched rows are filtered out

3. The OUTER rows are added when applicable

When evaluating the query, the SQL engine is smart enough to evaluate the filter before loading all of the rows into memory. Ben Gan discusses this in great detail in a couple of his books (which I can't access at the moment.)

Itzik Ben Gan Logical Query Processing Poster

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 8

Alan.B (2/22/2016)
Jacob Wilkins (2/19/2016)
kkilsby (2/19/2016)
Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.
No. SQL Server does not generally implement joins by first taking the Cartesian product (CROSS JOIN) and then filtering.
Actually, yes, that is exactly what happens; note the Logical Query Processing Poster below. In a T-SQL query the FROM operator is evaluated first. For a JOIN table operator:
1. a Cartesian Product is created
2. The ON Filter is evaluated and unmatched rows are filtered out
3. The OUTER rows are added when applicable
When evaluating the query, the SQL engine is smart enough to evaluate the filter before loading all of the rows into memory. Ben Gan discusses this in great detail in a couple of his books (which I can't access at the moment.)
Itzik Ben Gan Logical Query Processing Poster

Sure, that is a way to understand what happens from an abstract, logical perspective. However, since the original posted comment was about the lack of a performance penalty, I (in my opinion fairly, but I'm understandably biased :-)) was considering physical implementation, where that absolutely is not the way things generally happen (suggested by my use of the word "implement" in the fragment you quoted and my providing links to explanations of the the physical join operators explicitly to explain "implementing JOINs"; still, perhaps I should have been more clear).

Take the merge join operator. Unless all rows from one input match all rows from the other input on the join criteria (in which case you're just doing a thinly disguised CROSS JOIN anyway), it absolutely is not the case that a cartesian product is generated; not only is it not generated, it is not even the case that every row from one input is compared to every row in the other input.

I will take it as a fair point that I may not have been clear enough about "CROSS JOIN plus some other stuff later" being a fair understanding of what happens in an abstract sense; again, though, since the point was about the supposed lack of a performance penalty for CROSS JOINs relative to other joins, I addressed physical implementation, where what I said holds. 🙂

Cheers!

Alan Burstein SSC Guru Points: 61136 More actions · Answer 9

Jacob Wilkins (2/22/2016)
Alan.B (2/22/2016)
Jacob Wilkins (2/19/2016)
kkilsby (2/19/2016)
Doesn't the SQL DB engine first create a cross join for any query involving more than one table? So performance isn't an issue for a cross join query.
No. SQL Server does not generally implement joins by first taking the Cartesian product (CROSS JOIN) and then filtering.
Actually, yes, that is exactly what happens; note the Logical Query Processing Poster below. In a T-SQL query the FROM operator is evaluated first. For a JOIN table operator:
1. a Cartesian Product is created
2. The ON Filter is evaluated and unmatched rows are filtered out
3. The OUTER rows are added when applicable
When evaluating the query, the SQL engine is smart enough to evaluate the filter before loading all of the rows into memory. Ben Gan discusses this in great detail in a couple of his books (which I can't access at the moment.)
Itzik Ben Gan Logical Query Processing Poster
Sure, that is a way to understand what happens from an abstract, logical perspective. However, since the original posted comment was about the lack of a performance penalty, I (in my opinion fairly, but I'm understandably biased :-)) was considering physical implementation, where that absolutely is not the way things generally happen (suggested by my use of the word "implement" in the fragment you quoted and my providing links to explanations of the the physical join operators explicitly to explain "implementing JOINs"; still, perhaps I should have been more clear).
Take the merge join operator. Unless all rows from one input match all rows from the other input on the join criteria (in which case you're just doing a thinly disguised CROSS JOIN anyway), it absolutely is not the case that a cartesian product is generated; not only is it not generated, it is not even the case that every row from one input is compared to every row in the other input.
I will take it as a fair point that I may not have been clear enough about "CROSS JOIN plus some other stuff later" being a fair understanding of what happens in an abstract sense; again, though, since the point was about the supposed lack of a performance penalty for CROSS JOINs relative to other joins, I addressed physical implementation, where what I said holds. 🙂
Cheers!

I agree with everything you are saying. I could have re-worded my response to distinguish between what is happening logically and what is actually happening.

Cheers.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 10

akljfhnlaflkj

SSC Guru

Points: 76202

March 9, 2016 at 4:34 pm

#1864131

Great to see this explained. Thanks.

Jack 49290 SSChasing Mays Points: 650 More actions · Answer 11

Trying to learn SQL by working throught this article, and Listing 7 ran for over 1:24:50 and returned error about widows handle …

Dropped WHILE @I < 1000 to WHILE @I < 10

And completed in 2:35 - with no records

Why so long? / issues

please be gentle ....

neil-560592 SSCommitted Points: 1557 More actions · Answer 12

Instead of using CROSS JOIN you could instead use: INNER JOIN ON 1=1
or even: select columns from table1, table2
Is there any difference in the output between these three, either logically or in performance?
While it is true that you have to type a few more characters, it does mean you and the people who have to support your code don't have to learn about "CROSS JOIN" as a new concept.

http://90.212.51.111 domain

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 13

neil-560592 - Tuesday, December 18, 2018 6:46 AM
Instead of using CROSS JOIN you could instead use: INNER JOIN ON 1=1
or even: select columns from table1, table2
Is there any difference in the output between these three, either logically or in performance?
While it is true that you have to type a few more characters, it does mean you and the people who have to support your code don't have to learn about "CROSS JOIN" as a new concept.

IMHO opinion, neither of those methods has any advantage over CROSS JOIN and vice versa except for one thing...

If people are doing the 1=1 thing or the non-ANSI joins that you suggest, then they already know about the CROSS JOIN thing even if they've not used the "CROSS JOIN". If people are going to use SQL , then they should learn the correct JOIN syntax and use it. The 1=1 and non-ANSI JOIN methods are considered to be "code smells" in SQL and should be avoided in most cases... especially if you're trying to impress someone on an iinterview or impress an "inspection team" from one of your customers. It also makes the use of cross join functionality very obvious when troubleshooting underfire during a crisis and helps make it totally obvious that it's not a mistake just because of some missing criteria.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

stephenkeogh SSC Rookie Points: 43 More actions · Answer 14

Good article Gregory

Listing 7: TSQL to create sample data for performance test

IF @I%7 > 0

This code adds no sales data for every 7th customer ?