Do extra join conditions help, hurt, not matter?

Question

Do extra join conditions help, hurt, not matter?

jschwartz_0412

SSC Enthusiast

Points: 177
More actions
February 28, 2020 at 4:51 pm
Go to Answer
#3729644

Hi - I searched for a couple hours, but can't find this topic and am wondering if any of you has insight on it.
I have an inner join of several tables. Hypothetically, if each of those tables has a unique primary key. The first table can be joined to the second on a primary key and the second table to the third with a primary key. Also assume that other columns could be used in a join on those tables. Is it better to only use the primary key, or is it better to add in some of the other columns, or is SQL Server smart enough to ignore the superfluous columns?
To further illustrate in case I'm not properly explaining my question.
Table1 has columns col1, col2, col3, col4 (primary key is on col3 and col4)
Table2 has columns col2, col3, col4, col5 (no primary key)
If I want to do an inner join between tables 1 and 2 is it better to do this:
select * from table1 t1 inner join table2 t2 on t1.col3=t2.col3 and t1.col4=t2.col4
or
select * from table1 t1 inner join table2 t2 on t1.col3=t2.col3 and t1.col4=t2.col4 and t1.col2=t2.col2
or doesn't it matter?
My expectation is that I'll get a scan on table2 regardless since there's no PK (it's just a heap I think).
I don't typically post questions to groups, so I apologize if I've made this unclear or left out important details or made a faux pas.
Thanks for any insight you may have.
Chris Harshman

SSC-Forever

Points: 42192
More actions
February 28, 2020 at 10:07 pm
Answer
#3729742

the SQL Server engine will evaluate any condition you write in your query. Adding extra conditions will force it to do more work. I'm surprised to hear it causes an extra scan, but it definitely would consume extra CPU cycles comparing those 2 columns.

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 1

Unfortunately - the answer to your question is 'It Depends' 😉

Since table 2 does not have a defined constraint then you can have duplicate values at any level. How you join is going to depend on what results you want to get based on the data.

If you know that the HEAP was created as a one to one relationship between all 3 columns then you probably want to join using all 3 columns. If col2 can be different for each distinct value of col3/col4 - then you probably want to join using just col3/col4...again, it really depends on the data in that table and the expected results from your query.

Side note: I have seen this situation many times - and the developer decided to join on just the 2 columns. Duplicate rows were returned and instead of adding the other column to the query the developer figure it was just easier to put in a DISTINCT (or GROUP BY or even both). Not only is that incorrect...but it forces an unnecessary sort operation and affects the performance of the query.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 2

jschwartz_0412 wrote:

Table1 has columns col1, col2, col3, col4 (primary key is on col3 and col4)
Table2 has columns col2, col3, col4, col5 (no primary key)
If I want to do an inner join between tables 1 and 2 is it better to do this:
select * from table1 t1 inner join table2 t2 on t1.col3=t2.col3 and t1.col4=t2.col4
or
select * from table1 t1 inner join table2 t2 on t1.col3=t2.col3 and t1.col4=t2.col4 and t1.col2=t2.col2
or doesn't it matter?

I'd say you should only put the actual join condition in the INNER JOIN clause of your query. Since (col3,col4) is the primary key of Table1, do your join on those 2 columns, you can't get any more specific. If there are extra criteria for filtering either Table1 or Table2, put those in your WHERE clause. It makes it easier for the next person who has to read the code to understand what is happening.

jschwartz_0412 SSC Enthusiast Points: 177 More actions · Answer 3

Thank you Chris. While I hadn't thought of the next person, I agree completely and appreciate the response.

I'm still interested if there is a positive or negative impact and why.

I did run a test on my actual query with and without the extraneous columns in the join condition and I see 1 more scan in the statistics, telling me that, in my current situation, I should definitely exclude the extra columns. My gut tells me this would always be a bad thing, but my head says that the optimizer will take care of it, but my test agreed with my gut.

Thanks again for the response - I really do appreciate it.

jschwartz_0412 SSC Enthusiast Points: 177 More actions · Answer 4

Thanks Jeffrey. I appreciate the time you took to reply to me. That makes total sense. I'll keep that in mind when I'm working with our developers to fix this problem.

Thanks again!

I should mention that the results of the queries are identical. The extraneous columns in the join condition do not impact the results. I'm only curious about performance.

This reply was modified 5 years, 10 months ago by jschwartz_0412.

jschwartz_0412 SSC Enthusiast Points: 177 More actions · Answer 5

Thank you again Chris. That makes a lot of sense to me. Have a great weekend. I believe you have answered my question.

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 6

I suppose one way to think about it, is to understand what types of things that SQL Server can determine it doesn't need to do. For example, if you have a view or even a table valued function that includes a join to a table, but this table isn't used in any where clause, group by clause, or columns used in the final select results of the query, and it doesn't change the number of rows being returned, then the SQL Server engine can optimize that join out and it is noticeable in your execution plan.

When any of those conditions isn't met, then SQL Server will evaluate it and consider it pertinent to the results.