Am I taking crazy pills? Dispute over inner vs. outer join.

  • We have the following code (dumbed down):

    SELECT Customer.CustomerId, Location.Description

    FROM Customer

    LEFT JOIN Location ON Customer.LocationId = Location.LocationId

    It's a left join here because LocationId may be NULL in the Customer table. Our report designer is suggesting, on the basis of "better performance from inner joins", that we add a record to the Location table with a LocationId value of (-1), and then use the following query.

    SELECT Customer.CustomerId, Location.Description

    FROM Customer

    JOIN Location ON ISNULL(Customer.LocationId, -1) = Location.LocationId

    To me, this just seems like bad design and, even if it is faster (I estimate something like a possible .003% improvement), it's just a hackish way of doing things. What say you, colleagues?

  • Estimates are fine and well, but have the report designer demonstrate how much of a gain it is. My guess is as yours, that this will not add up to even one hour of development time over a few thousand uses.

    Also, the query may not be able to make use of an index on Customer.LocationID, due to that column being wrapped in an isnull function.

  • Set up a test environment, test both multiple times (run each at least 10 times, more is better), analyse results, conclude.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I have seen that kluge query method work much faster and I have seen it be slower. It all depends. It would be well worth testing and demonstrating the performance of each.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • TheGreenShepherd (2/14/2014)


    We have the following code (dumbed down):

    SELECT Customer.CustomerId, Location.Description

    FROM Customer

    LEFT JOIN Location ON Customer.LocationId = Location.LocationId

    It's a left join here because LocationId may be NULL in the Customer table. Our report designer is suggesting, on the basis of "better performance from inner joins", that we add a record to the Location table with a LocationId value of (-1), and then use the following query.

    SELECT Customer.CustomerId, Location.Description

    FROM Customer

    JOIN Location ON ISNULL(Customer.LocationId, -1) = Location.LocationId

    To me, this just seems like bad design and, even if it is faster (I estimate something like a possible .003% improvement), it's just a hackish way of doing things. What say you, colleagues?

    BWAHAAHAA!!!! That is the funniest thing I have heard so far today. Let me get this straight. We are going to add an actual row to a table as a placeholder for NULL. It is true that a LEFT JOIN may perform slower than an INNER JOIN but that is because it can return more rows. Even though I think the very thought of this is absolutely horrible, the implementation suggestion is even worse. This is going to effectively make LocationId a non nullable column. Why not make the column non nullable with a default of -1. That would at least eliminate the ridiculous ISNULL in your join.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • SQLRNNR (2/14/2014)


    I have seen that kluge query method work much faster and I have seen it be slower. It all depends. It would be well worth testing and demonstrating the performance of each.

    Really? That just seems like such a horribly bad idea it couldn't possibly work.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • That inner join won't give you a better performance as your condition is non-SARGable. You should remain with a LEFT JOIN like this:

    SELECT Customer.CustomerId, ISNULL( Location.Description, 'Unknown Location') Description

    FROM Customer

    LEFT JOIN Location ON Customer.LocationId = Location.LocationId

    Or if you want to add the Location -1, you should update as well your Customer table and maybe make the column non nullable.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Sean Lange (2/14/2014)


    SQLRNNR (2/14/2014)


    I have seen that kluge query method work much faster and I have seen it be slower. It all depends. It would be well worth testing and demonstrating the performance of each.

    Really? That just seems like such a horribly bad idea it couldn't possibly work.

    Yeah, I know. It blew me away to see it work properly and faster.

    I think the best conclusion is to test, analyse and conclude as Gail said.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Unfortunately, this is a project still in the design phase, so it's not like we're trying to address an existing performance problem or anything. And, we only have about 100k rows of mock data. I haven't seen much difference between the two queries when I run them. My analysis was based off of the query cost in the execution plan.

  • Ask your designer about the performance implications of ISNULL around a join column - and research it yourself first if you're not already 100% sure.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • SQLRNNR (2/14/2014)


    Sean Lange (2/14/2014)


    SQLRNNR (2/14/2014)


    I have seen that kluge query method work much faster and I have seen it be slower. It all depends. It would be well worth testing and demonstrating the performance of each.

    Really? That just seems like such a horribly bad idea it couldn't possibly work.

    Yeah, I know. It blew me away to see it work properly and faster.

    I think the best conclusion is to test, analyse and conclude as Gail said.

    I was going to throw together a quick million row table test but I took so long writing my initial response that Gail had already suggested testing. I decided to let the OP roll with it.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Yeah, I know, it's non-SARGable. That was really my last line of defense for shooting this idea down, because his next suggestion is to make that column non-nullable, and default the values to -1, which also seems like a bit of a hack.

  • TheGreenShepherd (2/14/2014)


    Yeah, I know, it's non-SARGable. That was really my last line of defense for shooting this idea down, because his next suggestion is to make that column non-nullable, and default the values to -1, which also seems like a bit of a hack.

    That is a better option than allowing nulls imho (making the column non-nullable). There are savings in the pages by not having null values and the reduced need to throw coding hacks at the database to get around the null values.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • SQLRNNR (2/14/2014)


    Sean Lange (2/14/2014)


    SQLRNNR (2/14/2014)


    I have seen that kluge query method work much faster and I have seen it be slower. It all depends. It would be well worth testing and demonstrating the performance of each.

    Really? That just seems like such a horribly bad idea it couldn't possibly work.

    Yeah, I know. It blew me away to see it work properly and faster.

    I think the best conclusion is to test, analyse and conclude as Gail said.

    I've seen it too but in VFP years ago. It worked ok. Not all of the kludges in the same system worked as well and they make the ERD and the code significantly more tricky to work with when properly-written code would have performed better in almost all cases.

    Set up a test which picks 3 customers out of a mockup sample set of a million 😉


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • The estimated cost of a query is not a guarantee to how the query will actually perform (in fact,t he cost value has an er....interesting origin...). Run statistics IO, and statistics time on the query to see how many logical reads each one will have to do.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply