Left Outer Join (+ Group By) Mystery

  • Hi Guys. Someone probably has dealt with this before but here goes:

    I need to find the total number of users and number of users who placed an order. The answers are grouped according to a SegmentationCode which defines what user they are. I tried:

    select count(*) As Users

    ,count(distinct ordUserID) As Purchasers

    ,sum(ordValue * 0.86) As Value

    ,sgmSegmentationCode

    from users

    left outer join UserSegmentationCode [SegCode] on users.usrID = [SegCode].sgmUserID

    left outer join orders on orders.ordUserID = users.usrID

    where ((users.usrLastPurchace > dateadd(yy,-3,@EndDate) and users.usrLastAccess > dateadd(yy,-3,@EndDate))

    or (users.usrLastPurchace is null --and usrRegDate > dateadd(yy,-1,@EndDate)

    and users.usrLastAccess > dateadd(yy,-1,@EndDate)))

    and users.usrRegDate < @EndDate

    and (orders.ordState >= 16000 or orders.ordDateOpened is null)

    group by [SegCode].sgmSegmentationCode

    For some reason, the Users count is incorrect. I had to split it up in to 2 queries to solve this. I think it's got to do with how left joins works - note I did do a check of OrdDateOpened is null - ie if there were no orders for a particular user.

  • If you aliased you tables and then used Alias.column it would be a bit easier to help as we don't know what columns go with what tables. That being said I would guess that one of your problems is that you have criteria in your where clause that apply to one of the left outer join tables. If you want restrictions on tables in your outer joins you should put them in the ON clause. This affects how the data is returned.

    You should check out the link in my signature line on how to post schema and data so we can better help you.

  • Sorry about that. I have added the aliases now in the first post. I don't want to place any restrictions in the ON clause, because I want to count all enabled users whether they have an order or not. The where clause defines who an enabled user is, basically someone who has not purchased/logged in after a certain period. ie. Need to count distinct users if they have an orderID of null or a proper orderID. Using distinct(user.usrID) or distinct (order.ordUserID) also doesn't work

  • asaacks (11/13/2008)


    Sorry about that. I have added the aliases now in the first post. I don't want to place any restrictions in the ON clause, because I want to count all enabled users whether they have an order or not. The where clause defines who an enabled user is, basically someone who has not purchased/logged in after a certain period. ie. Need to count distinct users if they have an orderID of null or a proper orderID. Using distinct(user.usrID) or distinct (order.ordUserID) also doesn't work

    Jack's advice is sound. He's explained accurately why your query doesn't work. "I don't want to" isn't really appropriate when you've sought, and been given, the free advice of an expert.

    I don't want to place any restrictions in the ON clause, because I

    want to count all enabled users whether they have an order or not.

    Then put the restrictions on the orders table into the ON clause for the orders table, as suggested. It won't affect the count of users, whether they have an order or not, because orders is LEFT JOINed to users. Spend a little time reading about LEFT JOIN in BOL.

    The where clause defines who an enabled user is, basically someone

    who has not purchased/logged in after a certain period. ie. Need to count

    distinct users if they have an orderID of null or a proper orderID

    Referencing a left-joined table in the WHERE clause usually converts the join to a FULL JOIN, which in your case will result in users without an order being excluded from your result set. You won't get any 'users if they have an orderID of null'.

    I'd set up this query a little differently:

    [font="Courier New"]SELECT  COUNT(*) AS Users

               ,SUM(CASE WHEN o.ordUserID IS NULL THEN 0 ELSE 1 END) AS Purchasers              

              ,SUM(o.Value) AS Value

              ,sgmSegmentationCode

    FROM users u

    LEFT OUTER JOIN UserSegmentationCode [SegCode] ON u.usrID = [SegCode].sgmUserID

    LEFT OUTER JOIN (SELECT ordUserID, SUM(ordValue * 0.86) AS Value

       FROM orders

       WHERE o.ordState >= 16000 OR o.ordDateOpened IS NULL

       GROUP BY ordUserID

    ) o ON o.ordUserID = u.usrID

    WHERE  (

       (u.usrLastPurchace > DATEADD(yy,-3,@EndDate) AND u.usrLastAccess > DATEADD(yy,-3,@EndDate))

                       OR (u.usrLastPurchace IS NULL AND u.usrLastAccess > DATEADD(yy,-1,@EndDate))

       )

       AND u.usrRegDate < @EndDate

      

    GROUP BY [SegCode].sgmSegmentationCode

    [/font]

    ... because I'm lazy and this is easier for me to read and understand.

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply