Stop joined aggregates from multiplying

  • I have three tables. A customer table, an invoice table, and a payment table. I want to pull, in a single query, the credit limit (a single value for the customer, in the customer table), the outstanding balance on invoices (an aggregate from the invoice table) and the total unapplied amounts from payments (an aggregate from the payment table)

    SELECT a.CustKey, a.CustID, a.CustName,

    ISNULL(a.CreditLimit, 0) 'CreditLimit', a.Hold,

    ISNULL(SUM(b.Balance), 0), ISNULL(SUM(c.UnappliedAmt), 0)

    FROM tarCustomer a (NOLOCK)

    LEFT JOIN tarInvoice b (NOLOCK) ON a.CustKey = b.CustKey

    LEFT JOIN tarCustPmt c (NOLOCK) ON a.CustKey = c.CustKey

    GROUP BY a.CustKey, a.CustID, a.CustName, ISNULL(a.CreditLimit, 0)

    I ran this query, and got incorrect results. There are two matching records in the invoice table, and 11 matching records in the payment table. The value returned for the invoice table was 11 times what it should have been, and the vale returned for the payment table was 2 times what it should have been. So clearly, they're multiplying one another.

    Is there any way to get the joins not to result in this multiplication? I feel like there's something obvious that I'm missing.

    Thanks,

    Lisa

  • Something like this (untested) should get you started.

    with invs

    as (

    select CustKey

    ,sum(balance) invbalance

    from tarInvoice

    group by CustKey

    )

    ,payments

    as (

    select CustKey

    ,Sum(UnappliedAmt) UnappliedAmt

    from tarCustPmt

    group by CustKey

    )

    select c.CustKey

    ,c.CustId

    ,etc etc

    ,IsNull(invs.invbalance, 0)

    ,isNull(payments.UnappliedAmt)

    from tarCustomer c

    left join invs on c.CustKey = invs.CustKey

    left join payments on c.CustKey = payments.CustKey

    Your problem is that you are doing a many-to-many join. Avoid that by using subqueries which do separate aggregations and then joining on the results.

    --Edit: I also recommend that you use meaningful aliases (not a,b,c etc) makes your query easier to read. And do you understand the implications of using the NOLOCK hint?

    --Edit 2 - fixed typo

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Thanks, Phil. I thought NOLOCK prevented table locks, and made the code run faster. Is that not the case?

    Also... doing the subqueries with "With X as" instead of doing subqueries... does that prevent SQL from running that query over and over? Is that new?

    Thanks again,

    Lisa

    Phil Parkin (12/14/2012)


    Something like this (untested) should get you started.

    with invs

    as (

    select CustKey

    ,sum(balance) invbalance

    from tarInvoice

    group by CustKey

    )

    ,payments

    as (

    select CustKey

    ,Sum(UnappliedAmt) UnappliedAmt

    from tarCustPmt

    group by CustKey

    )

    select c.CustKey

    ,c.CustId

    ,etc etc

    ,IsNull(invs.invbalance, 0)

    ,isNull(payments.UnappliedAmt)

    from tarCustomer c

    left join invs on c.CustKey = invs.CustKey

    left join payments on c.CustKey = payments.CustKey

    Your problem is that you are doing a many-to-many join. Avoid that by using subqueries which do separate aggregations and then joining on the results.

    --Edit: I also recommend that you use meaningful aliases (not a,b,c etc) makes your query easier to read. And do you understand the implications of using the NOLOCK hint?

    --Edit 2 - fixed typo

  • Lisa Liel (12/14/2012)


    Thanks, Phil. I thought NOLOCK prevented table locks, and made the code run faster. Is that not the case?

    Also... doing the subqueries with "With X as" instead of doing subqueries... does that prevent SQL from running that query over and over? Is that new?

    Thanks again,

    Lisa[/Quote]

    NOLOCK will no doubt take away some of your locking pain, but it comes at a price. It also performs what are known as 'dirty' reads - reads of data from uncommitted transactions. If any of those transactions is rolled back just after your query runs, your query will have returned the wrong data and you'll find it difficult to say why. I suggest you do a little more research, just so that you definitely understand the implications.

    The "With [subq] as select ..." syntax was new in SQL Server 2005. The technical term for this construction is Common Table Expression (CTE). In the above instance, there will be no performance difference between the solution I quoted and one which uses old-style subqueries. The main reason I use them is that I find the syntax more intuitive. Getting a little more advanced, they can also do some stuff that subqueries cannot (eg, recursion).

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • I will second Phil on the NOLOCK. That query hint is by no means a go fast pill. It is a get your data quick as long as you don't care about duplicates, missing rows, rolled back transactions and various other quirks that are nearly impossible to debug.

    Here are a few links on the topic that I keep handy.

    http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx

    http://sqlblog.com/blogs/andrew_kelly/archive/2009/04/10/how-dirty-are-your-reads.aspx

    http://blogs.msdn.com/b/sqlcat/archive/2007/02/01/previously-committed-rows-might-be-missed-if-nolock-hint-is-used.aspx

    http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx

    Now this isn't to say that the NOLOCK hint does not have its place. It just frequently gets overused/abused in the name of "making it faster". It does indeed make it faster but that speed comes at a cost. Understanding the full ramifications is very important. 🙂

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (12/14/2012)


    I will second Phil on the NOLOCK. That query hint is by no means a go fast pill. It is a get your data quick as long as you don't care about duplicates, missing rows, rolled back transactions and various other quirks that are nearly impossible to debug.

    Here are a few links on the topic that I keep handy.

    Thanks, guys. I'll read through the links. The fact is, in my particular context, speed is more important, and things like rolled back transactions are rare enough not to outweigh that. But there may be other circumstances, and I'm glad to know that it isn't as simple as I'd thought.

  • Lisa Liel (12/15/2012)


    Sean Lange (12/14/2012)


    I will second Phil on the NOLOCK. That query hint is by no means a go fast pill. It is a get your data quick as long as you don't care about duplicates, missing rows, rolled back transactions and various other quirks that are nearly impossible to debug.

    Here are a few links on the topic that I keep handy.

    Thanks, guys. I'll read through the links. The fact is, in my particular context, speed is more important, and things like rolled back transactions are rare enough not to outweigh that. But there may be other circumstances, and I'm glad to know that it isn't as simple as I'd thought.

    You might consider looking into isolation instead. Something like snapshot isolation sounds like it would be a good fit for what you are describing.

    http://msdn.microsoft.com/en-us/library/tcbchxcb%28v=vs.80%29.aspx

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply