Selecting rows in a table.. not so simple!

  • Table 1:

    A B C

    1 1 4.7

    2 1 4.7

    3 1 4.7

    Table 2:

    A D E

    1 1a 1.2

    1 1b 1.8

    2 2a 2.1

    2 2b 2.3

    3 3a 0.6

    I would like to filter the second table taking one row per ID of Field A (first table) and selecting the rows whose sum of E is equal to the value in field C; in this example the resulting table should be:

    Table 3:

    A D E

    1 1b 1.8

    2 2b 2.3

    3 3a 0.6

    Total field E = value in field C = 4.7

    Thanks a lot in advance!!

  • What you are describing sounds a lot like the classic bin packing problem. http://en.wikipedia.org/wiki/Bin_packing_problem

    The reason you couldn't figure this out is because it is incredibly difficult. You have to try all combinations until it matches.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Hi Sean,

    thank you for the clarification..

    so do you think there is no SQL solutions?

  • marcipesa (2/6/2014)


    something like this?:

    http://stackoverflow.com/questions/17604893/how-can-i-match-the-sum-of-a-subset-of-items-to-a-target-number%5B/quote%5D

    Yeah something like that. Notice the looping, this is a performance killer in sql. You have the added complexity of a number of value sets to use. The example posted there is a straight line of numbers but you have to use only 1 value from each set. Any chance you can use CLR? I suspect you will get better performance in .NET than you will in straight t-sql.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (2/6/2014)


    marcipesa (2/6/2014)


    something like this?:

    http://stackoverflow.com/questions/17604893/how-can-i-match-the-sum-of-a-subset-of-items-to-a-target-number%5B/quote%5D

    Yeah something like that. Notice the looping, this is a performance killer in sql. You have the added complexity of a number of value sets to use. The example posted there is a straight line of numbers but you have to use only 1 value from each set. Any chance you can use CLR? I suspect you will get better performance in .NET than you will in straight t-sql.

    DwainC is the expert on this kinda stuff. If he's not already seen this, I'll draw his attention to it tomorrow. He's flying right now 🙂

    Then we can all play 🙂 It's time this problem was put to bed.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • ChrisM@home (2/6/2014)


    Sean Lange (2/6/2014)


    marcipesa (2/6/2014)


    something like this?:

    http://stackoverflow.com/questions/17604893/how-can-i-match-the-sum-of-a-subset-of-items-to-a-target-number%5B/quote%5D

    Yeah something like that. Notice the looping, this is a performance killer in sql. You have the added complexity of a number of value sets to use. The example posted there is a straight line of numbers but you have to use only 1 value from each set. Any chance you can use CLR? I suspect you will get better performance in .NET than you will in straight t-sql.

    DwainC is the expert on this kinda stuff. If he's not already seen this, I'll draw his attention to it tomorrow. He's flying right now 🙂

    Then we can all play 🙂 It's time this problem was put to bed.

    Agreed!!!

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • marcipesa (2/6/2014)


    Table 3:

    A D E

    1 1b 1.8

    2 2b 2.3

    3 3a 0.6

    In the above result set, are you using Column B as your 'grouping' control?

    The reason I ask is I would have expected you to want a result set that looked like this:

    1 1b 1.8

    1 2b 2.3

    1 3a 0.6

    2 1b 1.8

    2 2b 2.3...


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Column B is just a grouping of the first table associated to the values of column C.

    Thank you very much for the attention guys!

  • How big is Table B?

    I know of one way for sure but if you've got lot's of rows in Table B it will be quite slow.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • This way should be a little faster than what I was originally thinking about.

    CREATE TABLE #Table1

    (

    A INT PRIMARY KEY

    ,B INT

    ,C DECIMAL(10,2)

    );

    INSERT INTO #Table1 (A, B, C)

    SELECT 1, 1, 4.7

    UNION ALL SELECT 2, 1, 4.7

    UNION ALL SELECT 3, 1, 4.7;

    CREATE TABLE #Table2

    (

    A INT

    ,D VARCHAR(10)

    ,E DECIMAL(10,2)

    ,PRIMARY KEY(A, D)

    );

    INSERT INTO #Table2 (A, D, E)

    SELECT 1, '1a', 1.2

    UNION ALL SELECT 1, '1b', 1.8

    UNION ALL SELECT 2, '2a', 2.1

    UNION ALL SELECT 2, '2b', 2.3

    UNION ALL SELECT 3, '3a', 0.6;

    DECLARE @SQL NVARCHAR(MAX);

    WITH Tally (n) AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)

    )

    SELECT @SQL = N'

    SELECT d.A, d.D, d.E

    FROM #Table2 a1 ' +

    (

    SELECT ' ' + '

    CROSS APPLY

    (

    SELECT *

    FROM #Table2 x

    WHERE a' + CAST(n AS VARCHAR(5)) + '.A < x.A

    ) a' + CAST(n+1 AS VARCHAR(5))

    FROM (SELECT TOP ((SELECT COUNT(*) FROM #Table1)-1) n FROM Tally) a

    ORDER BY n

    FOR XML PATH(''), TYPE

    ).value('.', 'VARCHAR(8000)') +

    N' CROSS APPLY (VALUES ' + STUFF(

    (

    SELECT ',(a' + CAST(n AS VARCHAR(5)) + '.A, a' + CAST(n AS VARCHAR(5)) +

    '.D, a' + CAST(n AS VARCHAR(5)) + '.E)'

    FROM (SELECT TOP ((SELECT COUNT(*) FROM #Table1)) n FROM Tally) a

    FOR XML PATH(''), TYPE

    ).value('.', 'VARCHAR(8000)'), 1, 1, '') +

    N') d(A, D, E) WHERE ' + STUFF(

    (

    SELECT '+a' + CAST(n AS VARCHAR(5)) + '.E'

    FROM (SELECT TOP ((SELECT COUNT(*) FROM #Table1)) n FROM Tally) a

    FOR XML PATH(''), TYPE

    ).value('.', 'VARCHAR(8000)'), 1, 1, '') + N' = (SELECT MIN(C) FROM #Table1);'

    PRINT @SQL;

    EXEC sp_executesql @SQL;

    GO

    DROP TABLE #Table1;

    DROP TABLE #Table2;

    It will probably still have serious performance issues if you've got many, many rows in your #Table2.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • To clarify a bit on my "black box" dynamic SQL solution, it builds a query that looks like this (with not the same indentation of course):

    SELECT d.A, d.D, d.E

    FROM #Table2 a1

    CROSS APPLY

    (

    SELECT *

    FROM #Table2 x

    WHERE a1.A < x.A

    ) a2

    CROSS APPLY

    (

    SELECT *

    FROM #Table2 x

    WHERE a2.A < x.A

    ) a3

    CROSS APPLY

    (

    VALUES (a1.A, a1.D, a1.E),(a2.A, a2.D, a2.E),(a3.A, a3.D, a3.E)

    ) d(A, D, E)

    WHERE a1.E+a2.E+a3.E = (SELECT MIN(C) FROM #Table1);

    There is precisely one less CA against #Table2 than you have rows in #Table1. It then does an unpivot (using the CROSS APPLY VALUES approach to UNPIVOT described in one of the articles linked in my signature) to retrieve just the 3 columns that you want to display.

    This is a higher performing method than using a rCTE to generate the n-Tuples (combinations) like I did here:

    Generating n-Tuples with SQL [/url]. That is the solution I was thinking about at first, and probably what Sean and ChrisM were alluding to.

    Hope this helps.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • BTW. If you're going to use this solution, you're going to need to understand the dynamic SQL anyway. So I suggest that you do so by attempting to make the resulting query look like this:

    DECLARE @Total DECIMAL(10,2) = (SELECT MIN(C) FROM #Table1);

    SELECT d.A, d.D, d.E

    FROM #Table2 a1

    CROSS APPLY

    (

    SELECT *

    FROM #Table2 x

    WHERE a1.A < x.A AND a1.E + x.E <= @Total

    ) a2

    CROSS APPLY

    (

    SELECT *

    FROM #Table2 x

    WHERE a2.A < x.A AND a1.E + a2.E + x.E <= @Total

    ) a3

    CROSS APPLY

    (

    VALUES (a1.A, a1.D, a1.E),(a2.A, a2.D, a2.E),(a3.A, a3.D, a3.E)

    ) d(A, D, E)

    WHERE a1.E+a2.E+a3.E = @Total;

    This should pretty significantly improve the overall speed because it ensures that combinations don't go over the assigned total. It probably will only work if you don't have negative values for E.

    Edit: Note that the DECLARE of @Total needs to be in the dynamic SQL string to be in scope to use in the subsequent query. Alternatively, you could pass it in as a parameter to the dynamic SQL.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Regardless of the number of rows in the table, this can be done incredibly quickly IF you're only looking for one set that adds up to the desired number.

    So... are you looking for just one set or more than one set that match the number?

    It can also be done fairly quickly, regardless of the number of rows, if you want exclusive sets... that is, if something is used in one set, they'll never be used again in any other set.

    I just need to know which you want.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/10/2014)


    Regardless of the number of rows in the table, this can be done incredibly quickly IF you're only looking for one set that adds up to the desired number.

    So... are you looking for just one set or more than one set that match the number?

    It can also be done fairly quickly, regardless of the number of rows, if you want exclusive sets... that is, if something is used in oe set, they'll never be used again in any other set.

    I just need to know which you want.

    Are you thinking this is a special case of relational division?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply