Summing distinct records

  • Hi all,

    Just a quick question which I'm fairly sure I already know the answer to.

    Let's say I have a table which stores a clientID and the purchases the client made, as well as a table that stored the clientID and his different accounts.

    Also, let's say the client had multiple purchases and multiple accounts:

    CREATE TABLE #ClientPurchases

    (

    ID INT IDENTITY PRIMARY KEY,

    ClientID INT,

    Purchase DECIMAL(18, 2)

    )

    CREATE TABLE #ClientAccounts

    (

    ID INT IDENTITY PRIMARY KEY,

    ClientID INT,

    AccountNumber BIGINT

    )

    INSERT INTO #ClientPurchases (ClientID, Purchase)

    VALUES (1, 10)

    INSERT INTO #ClientPurchases (ClientID, Purchase)

    VALUES (1, 20)

    INSERT INTO #ClientPurchases (ClientID, Purchase)

    VALUES (1, 50)

    INSERT INTO #ClientPurchases (ClientID, Purchase)

    VALUES (1, 100)

    INSERT INTO #ClientPurchases (ClientID, Purchase)

    VALUES (2, 100)

    INSERT INTO #ClientPurchases (ClientID, Purchase)

    VALUES (2, 500)

    INSERT INTO #ClientPurchases (ClientID, Purchase)

    VALUES (2, 400)

    INSERT INTO #ClientPurchases (ClientID, Purchase)

    VALUES (2, 400)

    INSERT INTO #ClientAccounts (ClientID, AccountNumber)

    VALUES (1, 1234567)

    INSERT INTO #ClientAccounts (ClientID, AccountNumber)

    VALUES (1, 1512512)

    INSERT INTO #ClientAccounts (ClientID, AccountNumber)

    VALUES (1, 2362632)

    INSERT INTO #ClientAccounts (ClientID, AccountNumber)

    VALUES (1, 412421)

    INSERT INTO #ClientAccounts (ClientID, AccountNumber)

    VALUES (2, 235236262)

    INSERT INTO #ClientAccounts (ClientID, AccountNumber)

    VALUES (2, 141241)

    INSERT INTO #ClientAccounts (ClientID, AccountNumber)

    VALUES (2, 124124)

    Now let's say I want to get a count of the total number of accounts, purchases, and the total value of the purchases. Normally I'd write a request like this:

    SELECT

    #ClientPurchases.ClientID,

    COUNT(DISTINCT AccountNumber) AS NumAccounts,

    COUNT(DISTINCT #ClientPurchases.ID) AS NumPurchases,

    SUM(DISTINCT Purchase) AS AmtPurchases

    FROM #ClientPurchases

    JOIN #ClientAccounts ON #ClientPurchases.ClientID = #ClientAccounts.ClientID

    GROUP BY #ClientPurchases.ClientID

    This works for the number of accounts and purchases, but clearly doesn't work for the value of the purchases, because the SUM(DISTINCT Purchase) part is looking at a distinct value of purchases, and not a distinct purchase record.

    Obviously there's ways I can get this to work, such as for example splitting the request into two:

    WITH CountAccounts AS

    (

    SELECT

    ClientID,

    COUNT(*) AS NumAccounts

    FROM #ClientAccounts

    GROUP BY ClientID

    ),

    ClientPurchases AS

    (

    SELECT

    ClientID,

    COUNT(*) AS NumPurchases,

    SUM(Purchase) AS AmtPurchases

    FROM #ClientPurchases

    GROUP BY ClientID

    )

    SELECT

    ISNULL(CountAccounts.ClientID, ClientPurchases.ClientID) AS ClientID,

    ISNULL(NumAccounts, 0) AS NumAccounts,

    ISNULL(NumPurchases, 0) AS NumPurchases,

    ISNULL(AmtPurchases, 0) AS AmtPurchases

    FROM CountAccounts

    FULL JOIN ClientPurchases ON CountAccounts.ClientID = ClientPurchases.ClientID

    My question is just whether there is a simpler way of doing this that I'm forgetting about, since my real query is considerably bigger and I'd rather not take the time to figure out how to split it up. Basically is there a simple way I can get a SUM of distinct values, but where the "distinct-itude" is not decided by the value, but rather by a different key.

  • Unfortunately not that I'm aware of. Because the Join will distribute the two halves of the join into both counts, you have to use correllated subqueries (basically the CTE you built) for each value.

    Personally, I'd probably do something like:

    SELECT

    clientID,

    (SELECT COUNT(*) FROM #ClientAccount WHERE ClientID = d.ClientID)

    ...

    FROM

    (SELECT DISTINCT ClientID FROM tblClients) AS d


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Based on your sample data, what should the expected output look like? Manually build us a table that represents the expected results.

  • The second query provides the correct results from my example set, but here they are anyway:

    ClientIDNumAccountsNumPurchasesAmtPurchases

    144180.00

    2341400.00

  • I would do this with a correlated subquery, thusly:

    SELECT ClientID

    ,(SELECT COUNT(AccountNumber)

    FROM #ClientAccounts ca

    WHERE ca.ClientID = cp.ClientID

    GROUP BY ClientID) AS NumAccounts

    ,NumPurchases, AmtPurchases

    FROM (

    SELECT ClientID, COUNT(ClientID) AS NumPurchases, SUM(Purchase) AS AmtPurchases

    FROM #ClientPurchases

    GROUP BY ClientID) cp

    IMHO, this version looks somewhat simpler than your version that you said works and its query plan cost shows less. Actual performance may vary though so run a timing test to be sure.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Actually, I have a better idea. If you need to worry about ClientIDs that have accounts but no purchases (the above query won't work), use this.

    ;WITH CTE (ClientID, AccountNumber, NumPurchases, AmtPurchases) AS (

    SELECT ClientID, 1, 0, 0 FROM #ClientAccounts

    UNION ALL

    SELECT ClientID, 0, 1, Purchase FROM #ClientPurchases

    )

    SELECT ClientID, SUM(AccountNumber) AS NumAccounts

    ,SUM(NumPurchases) AS NumPurchases, SUM(AmtPurchases) AS AmtPurchases

    FROM CTE

    GROUP BY ClientID

    Still pretty simple and has the same query plan cost as my prior suggestion.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply