sum every two number combination

  • Let's say your coworker gave you the results of a query, and noted he/she did not include SalesOrderID numbers 43674 and 44295. However, you run that query: (from AdventureWorks)

    SELECT [SalesOrderID]

    , a.City

    ,[TotalDue]

    FROM [AdventureWorks].[Sales].[SalesOrderHeader] s

    INNER JOIN AdventureWorks.Person.Address a

    ON s.BillToAddressID = a.AddressID

    WHERE City = 'Ottawa'

    AND SalesOrderID NOT IN ('43674','44295')

    And find that you're grand total for orders = $98689.28, whereas the results the coworker sent has as grand total in sales of $109276.67. So you know he/she made a mistake and excluded the wrong two SalesOrderIDs. Now, obviously, the solution is to call and ask them to open the query and verify the excuded IDs. That's exactly what I did, and issue is resolved. But I got it into my head that it would be cool to be able to somehow figure out the two excluded IDs by summing all two number combinations until you found the sum that matched the difference.

    This is more of a pet project or brain teaser I've given myself, but I'm not getting very far. A gentle nudge in the right direction would be nice. πŸ˜‰

  • A wild guess just for fun! Cue all smarter folks to line up and show how inefficient this query is!

    SELECT TAB1.SalesOrderId AS ID1,

    TAB1.TotalDue AS DUE1,

    TAB2.SalesOrderId AS ID2,

    TAB2.TotalDue AS DUE2

    FROM SALESTABLE TAB1

    CROSS JOIN

    SALESTABLE TAB2

    WHERE TAB1.SalesOrderID <> TAB2.SalesOrderID AND

    TAB1.TotalDue + TAB2.Totaldue = @SPECIFIED_DIFFERENCE

  • You know, it worked. It tool 3.58 minutes, but it worked. πŸ™‚

  • Can TotalDue every be less than 0?

  • I don't have anything to test with, but I believe this could be better.

    However, the problem is that it's still using a cartesian product to obtain the result.

    SELECT TAB1.SalesOrderId AS ID1,

    TAB1.TotalDue AS DUE1,

    TAB2.SalesOrderId AS ID2,

    TAB2.TotalDue AS DUE2

    FROM SALESTABLE TAB1

    JOIN SALESTABLE TAB2 ON TAB1.SalesOrderID < TAB2.SalesOrderID --Avoid half of the rows.

    WHERE TAB1.TotalDue < @SPECIFIED_DIFFERENCE --This are useless unless you have negatives.

    AND TAB2.TotalDue < @SPECIFIED_DIFFERENCE --Same as above.

    AND TAB1.TotalDue + TAB2.Totaldue = @SPECIFIED_DIFFERENCE

    I'm not sure if the conditions added will help but I'm sure the condition for the join must help the performance.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Lynn Pettis (9/24/2012)


    Can TotalDue every be less than 0?

    No. And there are no nulls.

  • Luis Cazares (9/24/2012)


    I don't have anything to test with, but I believe this could be better.

    However, the problem is that it's still using a cartesian product to obtain the result.

    SELECT TAB1.SalesOrderId AS ID1,

    TAB1.TotalDue AS DUE1,

    TAB2.SalesOrderId AS ID2,

    TAB2.TotalDue AS DUE2

    FROM SALESTABLE TAB1

    JOIN SALESTABLE TAB2 ON TAB1.SalesOrderID < TAB2.SalesOrderID --Avoid half of the rows.

    WHERE TAB1.TotalDue < @SPECIFIED_DIFFERENCE --This are useless unless you have negatives.

    AND TAB2.TotalDue < @SPECIFIED_DIFFERENCE --Same as above.

    AND TAB1.TotalDue + TAB2.Totaldue = @SPECIFIED_DIFFERENCE

    I'm not sure if the conditions added will help but I'm sure the condition for the join must help the performance.

    What you have here is what I was thinking. If the total due for each order can not be less than zero, then it makes sense to restrict the testing to orders where the total due for each order is less than or equal to the total difference. If you could have a negative total, then this check would not be worthwhile.

  • Or we could put an index on the totaldue column πŸ˜‰

    SELECT TAB1.SalesOrderId AS ID1,

    TAB1.TotalDue AS DUE1,

    TAB2.SalesOrderId AS ID2,

    TAB2.TotalDue AS DUE2

    FROM SALESTABLE TAB1

    JOIN

    SALESTABLE TAB2

    ON TAB1.Totaldue = @SPECIFIED_DIFFERENCE - TAB2.Totaldue

  • And of course if the exact rows matter this can get a LOT more challenging because there may be more than 1 pair where the sum is the total you are looking for.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (9/24/2012)


    And of course if the exact rows matter this can get a LOT more challenging because there may be more than 1 pair where the sum is the total you are looking for.

    I can do that! And I can examine triplet combinations (or deeper) as well!

    DECLARE @Orders Table

    (OrderID VARCHAR(4), Amount MONEY)

    INSERT INTO @Orders

    SELECT 2122, 5400

    UNION ALL SELECT '2123', 5500

    UNION ALL SELECT '2124', 1500

    UNION ALL SELECT '2125', 5700

    UNION ALL SELECT '2126', 4500

    UNION ALL SELECT '2126', 1500

    UNION ALL SELECT '2127', 5200

    UNION ALL SELECT '2129', 1000

    DECLARE @Amount MONEY = 7000

    ;WITH UNIQUEnTuples (n, Tuples, Amount) AS (

    SELECT 1

    -- Add brackets around the OrderID to make later string comparison unique

    , CAST('[' + OrderID + ']' AS VARCHAR(max)) COLLATE Latin1_General_BIN

    ,Amount

    FROM @Orders

    UNION ALL

    SELECT 1 + n.n, a.OrderID + n.Tuples, n.Amount + t.Amount

    FROM @Orders t

    CROSS APPLY (

    SELECT '[' + t.OrderID + ']') a(OrderID)

    JOIN UNIQUEnTuples n ON a.OrderID < n.Tuples

    WHERE CHARINDEX(a.OrderID, n.Tuples) = 0 AND n < 3 AND

    n.Amount + t.Amount <= @Amount

    )

    SELECT *

    FROM UNIQUEnTuples

    WHERE Amount = @Amount -- The sum you're intereseted in identifying

    For an explanation of this approach, you can take a look at this article: http://www.sqlservercentral.com/articles/sql+n-Tuples/89809/

    I've found that the COLLATE I added improves performance slightly. You can modify n < 3 to account for the depth of nTuples you want to examine.

    Edit: Added the @Amount variable so I could use it to limit the rows examined in the recursive leg of the rCTE (potentially improves performance).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thank you for this solution. I had originally tried to do a recursive cte, but though I'm beginning to understand how they work, I haven't been able to succussfully create my own from scratch. The logic of pulling data from something that doesn't really exist yet is a little much for my linear brain to fully embrace. I'm trying though!

  • Hi!

    If you are interested in how recursive cte is working, you may check out this blog post by Rob Farley


    I am really sorry for my poor gramma. And I hope that value of my answers will outweigh the harm for your eyes.
    Blog: http://somewheresomehow.ru[/url]
    Twitter: @SomewereSomehow

  • dwain.c (9/24/2012)


    Sean Lange (9/24/2012)


    And of course if the exact rows matter this can get a LOT more challenging because there may be more than 1 pair where the sum is the total you are looking for.

    I can do that! And I can examine triplet combinations (or deeper) as well!

    ...

    When you wrote that excellent article Dwain, did you have any idea how often you would have to quote it? I think you would have said "about once a year". In practice it seems like once a week πŸ˜€

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (9/25/2012)


    dwain.c (9/24/2012)


    Sean Lange (9/24/2012)


    And of course if the exact rows matter this can get a LOT more challenging because there may be more than 1 pair where the sum is the total you are looking for.

    I can do that! And I can examine triplet combinations (or deeper) as well!

    ...

    When you wrote that excellent article Dwain, did you have any idea how often you would have to quote it? I think you would have said "about once a year". In practice it seems like once a week πŸ˜€

    I knew that article from Dwain would have an answer. Of course my point got lost in the shuffle. The OP said something about knowing WHICH IDs made the sum. My point was that because of the possibility of multiple pairs getting the EXACT IDs would be impossible. Of course you would have to use Dwain's technique to find them. πŸ˜‰

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • ChrisM@Work (9/25/2012)


    dwain.c (9/24/2012)


    Sean Lange (9/24/2012)


    And of course if the exact rows matter this can get a LOT more challenging because there may be more than 1 pair where the sum is the total you are looking for.

    I can do that! And I can examine triplet combinations (or deeper) as well!

    ...

    When you wrote that excellent article Dwain, did you have any idea how often you would have to quote it? I think you would have said "about once a year". In practice it seems like once a week πŸ˜€

    Chris - It's interesting that you should say that. When I stumbled upon both the scripts in that article (entirely by accident) I just thought they were neat and banged out the article really quickly. I then later realized they were kinda like a solution looking for a problem. The shortfall of the article (which I remedied to a certain extent in the discussion thread) is that I didn't delve very deeply into their applicability.

    And yes, I am a bit surprised at the number of times I've found an opportunity to use them. Apparently there aren't too many methods in SQL to arrive at all combinations like they do (at least not very easily).


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply