Count across multiple tables with left outer join

  • I inherited code that looks like the following. According to my colleagues, this used to work in a SQL 2000 database, but it doesn't work in SQL 2008. I don't have a SQL 2000 database to run this against, so I can't test to see if it ever actually gave the intended results. I feel like I'm missing something obvious here...

    Note that the below code is *much* simpler than the actual - the actual code is checking for the existence and number of transactions across about 15 tables that have a couple of million rows each.

    create table #projectList(

    projectID char(1)

    )

    insert into #projectList(projectID)

    (select 'A'

    union all

    select 'B'

    union all

    select 'C'

    )

    create table #APData(

    projectID char(1),

    tranAmt float

    )

    insert into #APData(projectID, tranAmt)

    (select 'A',101.01

    union all

    select 'B', 202.02

    union all

    select 'C', 303.03

    union all

    select 'C', 909.09

    )

    create table #ARData(

    projectID char(1),

    tranAmt float

    )

    insert into #ARData(projectID, tranAmt)

    (select 'A',44.44

    union all

    select 'A',55.55

    union all

    select 'B',66.66

    union all

    select 'B',77.77

    union all

    select 'B',88.88

    )

    /*

    the existing code that doesn't return data the way the programmer expected it to

    */

    select p.projectid, apCount = COUNT(ap.projectid), arCount = COUNT(ar.projectid)

    from #projectList p

    left outer join #APData ap

    on p.projectID = ap.projectID

    left outer join #ARData ar

    on p.projectID = ar.projectID

    group by p.projectID

    -- this shows the expected results

    select p.projectid,

    apCount = (select COUNT(ap.projectid) from #APData ap where ap.projectID = p.projectID),

    arCount = (select COUNT(ar.projectid) from #ARData ar where ar.projectID = p.projectID)

    from #projectList p

    drop table #projectList

    drop table #APData

    drop table #ARData

    Thanks in advance,

    -Ki

  • I think somebody has a faulty memory. When I run against Sql2000 and sql2008 I get the same results from your sample.

    projectidapCountarCount

    A22

    B33

    C20

    projectidapCountarCount

    A12

    B13

    C20

    Note: I did run both from Sql2008 SSMS but the functionality is at the server.

  • Hi Kiara,

    Please check the below modified code.It shows your expected result.

    select p.projectid,

    apCount = COALESCE(countap,0)

    , arCount = COALESCE(countar,0)

    from #projectList p

    left outer join (SELECT projectID,COUNT(*) AS countap FROM #APData GROUP BY projectID) AS ap

    on p.projectID = ap.projectID

    left outer JOIN (SELECT projectID,COUNT(*) AS countar FROM #ARData GROUP BY projectID) AS ar

    on p.projectID = ar.projectID

  • sql-programmers (8/13/2011)


    Hi Kiara,

    Please check the below modified code.It shows your expected result.

    select p.projectid,

    apCount = COALESCE(countap,0)

    , arCount = COALESCE(countar,0)

    from #projectList p

    left outer join (SELECT projectID,COUNT(*) AS countap FROM #APData GROUP BY projectID) AS ap

    on p.projectID = ap.projectID

    left outer JOIN (SELECT projectID,COUNT(*) AS countar FROM #ARData GROUP BY projectID) AS ar

    on p.projectID = ar.projectID

    Thanks - and very true.

    I was actually just trying to figure out if the original code *should* work - because I couldn't find any way to *get* it to.

    Your suggestion works just fine, although I'd probably run with something more like this if I do this rewrite with derived tables:

    select p.projectid,

    apCount = coalesce(apc.apCount,0),

    arCount = coalesce(arc.arCount,0)

    from #projectList p

    left outer join (select distinct ap.projectid, apCount = count(*) over (partition by ap.projectid) from #APData ap) apc

    on p.projectid = apc.projectid

    left outer join (select distinct ar.projectid, arCount = count(*) over (partition by ar.projectid) from #ARData ar) arc

    on p.projectID = arc.projectid

    Caveat: I haven't played with this approach against my actual data, so I don't know what the performance will look like yet. I'm just more likely to use window functions when I can - but until I take a look at the execution plans, I'm not going to swear that they're faster than not using them. It just normally works out that way.

    Given what this particular section of code is trying to accomplish, I may just gut the entire section of the program and rewrite it. When I originally saw this, my reaction was "oh, cool! I've never done x that way!" followed by "but it doesn't do what it claims to do..." 🙂

    -Ki

  • Kiara (8/15/2011)


    ...Caveat: I haven't played with this approach against my actual data, so I don't know what the performance will look like yet...

    Have a look at the actual execution plans - the OVER() version is much more expensive.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/15/2011)


    Kiara (8/15/2011)


    ...Caveat: I haven't played with this approach against my actual data, so I don't know what the performance will look like yet...

    Have a look at the actual execution plans - the OVER() version is much more expensive.

    LOL - of *course* I will! Against the actual data involved, not the temp tables I used for the example. When you're dealing with millions of rows per table, and checking for existence on multiple tables, there's no way on earth I'm making a production decision based on a quick reply to a post that I've only tested far enough to make sure I didn't make a typo in the code.

    Now I'm curious, though, so once I get back to the actual problem this is related to, I'll post a version of what actually works best in the environment it has to play nicely in...

    -Ki

  • Kiara (8/15/2011)


    ...Now I'm curious, though, so once I get back to the actual problem this is related to, I'll post a version of what actually works best in the environment it has to play nicely in...

    Cool. I've run the same code against a 5krow table and a 40mrow table, each generating the same plan for the same query. The plans for the two different queries however are wildly different, and on both of these tables, the GROUP BY is about 8 times faster than DISTINCT with OVER(). Each table has a non-unique clustered index on the aggregated column.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply