A Refresher on Joins

  • Dennis, to get the result set you need, you can do the following:

    select a.f1, b.f1, c.f1

    from t1 a

    full join t2 b on a.f1 = b.f1

    full join t3 c on c.f1 = ISNULL(b.f1, a.f1)


  • I once used a Cartesian product to create a report to show all our companies instructors and all classes. I then had to match this with another query so that there would be a check mark by the classes the instructors taught. That was several years ago and haven't needed one since, but that proved it's always good to know these things because you don't know what will come up.

  • I must know -- why did the VP decree that aliases is bad?  I use them all the time in all sorts of systems, including payroll.  They're nigh unto a neccessity IMO when building complex queries.

    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • I did some checking on the three proposed types of outerjoins with no data in a second table:

    Note: If you want data from the second table when it exists, then you should use Q3

    Q1: select * from aat1

    where aat1.f1 not in (select aat2.f1 from aat2)

    --Exception joins should better be implemented like this:

    Q2: select * from aat1

    where not exists (select 1 from aat2 where aat1.f1 = aat2.f1)

    Q3: SELECT a.*

    FROM aat1 a

    LEFT JOIN aat2 b ON a.f1=b.f1

    WHERE b.f1 IS NULL

    Using DBCC FREEPROCCACHE between them, IN SQL Server 2000, Q1 and Q2 have identical execution plans and estimated costs based on the small tables provided: Cost 0.0417 using a Left Anti Semi Join.  Note: I did put pks on the tables and the results were equivalent but the cost slightly higher without the pks (no surprise)

    Q3 came in at the same cost of: 0.0417, but it used a Nested Loops Left outer Join.

    These are small tables so I tried similar queries on two parent child tables with the parent table havin a two column pk, and the child table having a 3 column pk.  Row counts: Parent table: 1707989 Child Table: 431251

    Once again Q1 and Q2 had the same estimated cost and used the same execution plan with a join type of Merge Join/Left Anti-Semi Join Cost: 32.8 and Q3 came in cost of 33.7 using a Merge Join/Left Outer Join.

    I have had a Microsoft consultant tell me not to use the NOT IN syntax, but honestly I haven't seen proof in the execution plans.


    Oh well this is some food for thought.

  • We built some complex querries and the aliases were hard to remember as to what table was doing what.  Now our querries are even longer. 

    I voted to break some of them up and process the result in code.  That got changed to breaking them up and sticking them into stored procs.

    ATBCharles Kincaid

  • Good article, especially for beginners, but useful as a simple refresher.

    We discuss joins quite often in optomizing code and it's amazing how many developers forget the basic rules.

    Waiting for the next article on joins though ...


    We passed upon the stair - and I was that man who sold the world
  • This was indeed a good refresher on joins...

    What if you had to extract data from table_1 which does not exist in table_2 but there are two or more conditions...(By conditions I mean what comes after the 'on' keyword).

  • You just add them to the where clause (if it's on the left table).  If it's on the joined table then you have to make it part of the join clause...

  • If this article is supposed to be for newbies, I think more narrative is needed for the explanation "Observe the query and output carefully...and then see if you can get the following output, which you will agree makes more sense."

    Sorry, but that provides no explanation about why it would make more sense. You should point out the differences, how to achieve the output and the reason it makes more sense.

  • Well here is a real world scenario


    receiver_line rl

    left outer join receiver r

    on (r.id=rl.receiver_id)

    left outer join receiver_line_del rld

    on (rl.receiver_id = rld.receiver_id and rld.receiver_line_no=rl.line_no)

    left outer join purc_order_line pol

    on (rl.purc_order_id = pol.purc_order_id and rl.purc_order_line_no = pol.line_no)

    left outer join purchase_order po

    on (po.id=rl.purc_order_id)

    left outer join purc_line_del pld

    on (pol.purc_order_id = pld.purc_order_id and pld.purc_order_line_no = pol.line_no)

    left outer join vendor v

    on (po.vendor_id=v.id)

    My problem is that i get dupicate records/rows when there is a pld record/row and I do not know hwo to avoid this.

    The table relationships are receivers to purchase orders

    There a header(1)/line(many)/line delivery(many) relationship hend the r,rl,rld and po,pol,pld

    How do I avoid the duplicate records/rows when pld exist?

    Each line could have many deliveries scheduled.


Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply