Aggregate Query question

  • This is something that may be better handled in an application of perhaps a CLR aggregate function of some sort.

    Reminds me of something used for packaging parts, looking for the max number of items that would fit in a specific size box for example.

  • And so, a link to Hugo Kornelis's bin packing series, which may spark an idea or 2.

    http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/30/bin-packing-part-1-setting-a-baseline.aspx


    And then again, I might be wrong ...
    David Webb

  • Les Cardwell (6/19/2012)


    Lynn,

    It's a redistricting (route) requirement, where we're looking to contain each district to roughly the same number of meters (service locations). Urban growth and all that.

    That's interesting, because that sounds like it would impose some reasonably important links between data points. After all you wouldn't want to superimpose districts, so you wouldn't be looking for ANY set, but sets of "adjacent" data points. This would severely reduce the amount of combinations to evaluate.

    Is there adjacency data missing from the description?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Les Cardwell (6/19/2012)


    Chasing,

    >>All the possible combinations of those rows that added to 5?

    All the possible DISTINCT combinations.

    In the vein of being pedantic (because that's the only way I can contribute on this one πŸ˜‰ ), probabilistically speaking, combinations are distinct. Permutations have repeats, in all possible orders.

    So, the permutation count for David's scenario is 50*49*48*47*46 = 50!/45! = 254,251,200

    That would leave 120 (5*4*3*2*1) permutations for each distinct combination, so the combination count = 50!/(45!*5!) = 2,118,760 records.

    I hope this is being printed on green-bar paper on a dot-matrix printer.

    I'm going to go to my corner now.

  • >>And so, a link to Hugo Kornelis's bin packing series, which may spark an idea or 2.

    Thanks. It does have similarities. I'll have to give it a thorough read.

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • >>Is there adjacency data missing from the description?

    The adjacencies have already been resolved.

    Thx,

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • There's a simple solution for this. Anyone remember the challenge of obtaining every possible combination of the letters of the alphabet, starting A, AB, AC.....BC, BD, BE......ABCDEFGHIJKLMNOPQRSTUVWXYZ....ZY, ZZ?

    Generate instead a list of integer rowIDs of the original data. Think CROSS APPLY and IN, and you've cracked it.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • So, the permutation count for David's scenario is 50*49*48*47*46 = 50!/45! = 254,251,200

    That would leave 120 (5*4*3*2*1) permutations for each distinct combination, so the combination count = 50!/(45!*5!) = 2,118,760 records.

    LOL...a math major I am not, but that would explain the problem, since there are ~240 possible combinations to resolve to 30. I do have a 'good enough' result by breaking it into smaller sections, but was curious about other possible algorithms to the seemingly simple problem initially stated, or maybe overlooking the obvious.

    Thx,

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • Chris,

    Oddly...I haven't seen that one. Any idea where I might track it down?

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • Les Cardwell (6/19/2012)


    Lynn,

    It's a redistricting (route) requirement, where we're looking to contain each district to roughly the same number of meters (service locations). Urban growth and all that.

    How many "routes" do you have to do this problem for, Les? To be sure, how many routes do you have and how many to you want? Also, just in case I have an epiphany, what are the values for the smallest and largest routes?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • While I am loathe to quote myself (well, not really, just saying), have you seen this article?

    http://www.sqlservercentral.com/articles/sql+n-Tuples/89809/

    A minor modification to calculate the value of each n-tuple and then select for those that equal 10 I believe gives you the result you're looking for.

    DECLARE @t TABLE (Prod CHAR(10), value INT)

    INSERT INTO @t

    SELECT '1000', 3 UNION ALL SELECT '1001', 5 UNION ALL SELECT '1002', 6

    UNION ALL SELECT '1003', 5 UNION ALL SELECT '1004', 8 UNION ALL SELECT '1005', 6

    ;WITH UNIQUEnTuples (n, Tuples, value) AS (

    SELECT DISTINCT 1, CAST(Prod AS VARCHAR(max)), value

    FROM @t

    UNION ALL

    SELECT 1 + n.n, t.Prod + ',' + n.Tuples, t.value + n.value

    FROM @t t JOIN UNIQUEnTuples n ON t.Prod < n.Tuples

    WHERE CHARINDEX(t.Prod, n.Tuples) = 0

    )

    SELECT *

    FROM UNIQUEnTuples

    WHERE value = 10

    ORDER BY n, Tuples

    If your Prod column is not-unique, just remove the DISTINCT.

    Also, with 240 combinations, you'll probably need to set a leveling factor to limit to 1, 2, 3, etc. combinations of items, otherwise this algorithm will run a long time. Like this:

    WHERE CHARINDEX(t.Prod, n.Tuples) = 0 AND n <= 3

    Can't seem to make it say n less than or equal 3 in the SQL above.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I keep associating districts with a geographical area, so I cannot help but think that using distance alone is not going to serve you well in this case. Would it make more sens to turn it into a map-reduce type problem?

    As in:

    - figure out your total coverage area

    - split your area in 2. and calculate the density. if the density it substantially higher on one side, split it again.

    - keep running a split until you have the number of districts needed or the right average size of districts needed.

    - normalize once you have the rough cut.

    With coordinates for each point, should be easy enough to do the rough groupings.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • OK sports fans, I've now read through this entire thread and I'm now 90% the solution I posted before will work for the OP. I was a bit concerned about speed though, so I ran a little test with the following (modified) version of the code.

    Constraints:

    1. Max tuples combined is 3

    2. Value must sum to 10 for combination

    3. Total number of items = 250

    Here is the test harness and the code.

    DECLARE @t TABLE (Prod VARCHAR(4), value INT)

    ;WITH Tally (n) AS (

    SELECT TOP 250 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns)

    INSERT INTO @t

    SELECT RIGHT('000' + CAST(n AS VARCHAR), 4), ABS(CHECKSUM(NEWID())) % 10 + 1

    FROM Tally

    ;WITH UNIQUEnTuples (n, Tuples, value, [check]) AS (

    SELECT 1, CAST(Prod AS VARCHAR(max)), value, CAST(value AS VARCHAR(MAX))

    FROM @t

    UNION ALL

    SELECT 1 + n.n, t.Prod + ',' + n.Tuples, n.value + t.value, [check]+'+'+CAST(t.value AS VARCHAR(MAX))

    FROM @t t JOIN UNIQUEnTuples n ON t.Prod < n.Tuples

    WHERE CHARINDEX(t.Prod, n.Tuples) = 0 AND n <= 3 AND t.value + n.value <= 10

    )

    SELECT *

    FROM UNIQUEnTuples

    WHERE value = 10

    ORDER BY n, Tuples

    Results:

    1. Rows returned = 122000 give or take

    2. Elapsed time = ~ 2 minutes

    Improvements to the code:

    1. I eliminated the DISTINCT clause on the anchor leg of the CTE because I didn't think it would be necessary.

    2. Limited the number of levels examined to 3 or less

    3. Removed rows where the value of the Tuples is > 10 (no longer need to be considered at level = n+1)

    4. Added a check column to show you the amounts added together

    By the way, this is similar to the traditional OR Knapsack problem (http://en.wikipedia.org/wiki/Knapsack_problem) except that the constraint on the KP is less than or equal, whereas the constraint here is equal.

    The KP is traditionally solved using a Dynamic programming algorithm, which is also recursive. In this case, my solution is brute-force but sufficient for a problem of this small size.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Sheesh...quite the brain trust up here. It's going to take me a bit to wade through these. I'll report the results after I come up for air πŸ™‚

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • David Webb-200187 (6/19/2012)


    Hmmmm....

    Let's say you had 50 rows, all with a quantity of 1. The target quantity was 5.

    How would you want that brought back? All the possible combinations of those rows that added to 5?

    Just the rows that could possibly be a component of the 5? You'd end up bringing them all back. What would that tell you? Would you want all the combinations grouped by some artificial designator (like a solution_number)?

    I agree with the idea that t-sql isn't the best vehicle for generating solution combinations like this. A number crunching package like SAS or SPSS would be a better bet ( or find someone who has an old APL compiler).

    1,125,899,906,842,624 distinct combinations in 50 rows:

    DROP TABLE #Sample

    CREATE TABLE #Sample (Item INT, Qty INT)

    INSERT INTO #Sample (Item, Qty)

    SELECT 1, 3 UNION ALL

    SELECT 2, 5 UNION ALL

    SELECT 3, 6 UNION ALL

    SELECT 4, 5 UNION ALL

    SELECT 5, 8 UNION ALL

    SELECT 6, 6

    -----------------------------------------------------

    -- Generate every row combination from the input set, concatenate into a string:

    -- SELECT UniquePerms = POWER(2,n)-1

    -- FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) d (n)

    -- where n = number of rows (six, in this case)

    -- the rCTE generates about a million rows in 30 seconds, corresponding to 20 input rows.

    -----------------------------------------------------

    ;WITH Calculator AS(

    SELECT

    [Level]= 1,

    MyString= CAST('01' AS VARCHAR(40)),

    LastDigit= CAST(1 AS BIGINT),

    Pos= 1,

    MaxLevels= (SELECT COUNT(*) FROM #Sample)

    UNION ALL

    SELECT

    [Level]= [Level] + 1,

    MyString= y.MyString,

    LastDigit= y.NewDigit,

    Pos= y.Newpos,

    MaxLevels

    FROM Calculator c

    CROSS APPLY (

    SELECT Newpos, NewDigit,

    MyString = CAST(LEFT(c.MyString,d.Newpos-1) + RIGHT('0'+CAST(d.NewDigit AS VARCHAR),2) AS VARCHAR(40))

    FROM (

    SELECT Newpos, NewDigit = CASE

    WHEN c.LastDigit = c.MaxLevels THEN 1+CAST(SUBSTRING(c.MyString,x.Newpos,2) AS BIGINT)

    ELSE 1+c.LastDigit END

    FROM (SELECT Newpos = CASE WHEN c.LastDigit = c.MaxLevels THEN c.Pos-2 ELSE c.Pos+2 END) x (Newpos)

    ) d (Newpos, NewDigit)

    ) y (Newpos, NewDigit, MyString)

    WHERE c.[Level] < POWER(2,MaxLevels)-1

    )

    SELECT

    c.MyString,

    s.SUMQuantity

    FROM Calculator c

    CROSS APPLY ( -- unpivot the string MyString and use to probe table #Sample

    SELECT SUM(qty)

    FROM #Sample s

    INNER JOIN (

    SELECT n, RowID = CAST(SUBSTRING(c.MyString,(n*2)-1,2) AS INT)

    FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) d (n)

    WHERE n <= Pos+1

    ) d ON d.RowID = s.Item

    ) s (SUMQuantity)

    WHERE s.SUMQuantity = 10

    OPTION (MAXRECURSION 0);

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 16 through 30 (of 109 total)

You must be logged in to reply to this topic. Login to reply