Aggregate Query question

  • Like I said, great work Chris! And thanks for taking the time to look closer.

    Now if we can just figure out a way to apply this to the bin packing problem, we can kill two birds with one stone.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (6/22/2012)


    ...The only thing I don't like about your version, and I'm really splitting hairs here, is the hardcoding of the 4 on the LEFT. If you use this query over different problems (pulling it from your toolbox), it would be really easy to forget that the 4 needs to correspond to the length of your item string.

    A local variable holding that value (defined right after the table) can easily avoid that issue though.

    The LEFT isn't necessary, neither is a local variable. You're comparing ID's in the source table to the last ID joined to the lhs of the string. If you save that last ID as a new column in the CTE, you can compare directly:

    ;WITH UNIQUEnTuples (n, Prod, Tuples, value)

    AS (

    SELECT

    1, Prod, Tuples = CAST(Prod AS VARCHAR(8000)), value

    FROM @t

    UNION ALL

    SELECT

    n.n+1, t.Prod, t.Prod + ',' + n.Tuples, n.value + t.value

    FROM @t t

    JOIN UNIQUEnTuples n

    ON n.Prod > t.Prod

    WHERE t.value + n.value <= 10

    )

    SELECT *

    FROM UNIQUEnTuples

    WHERE value = 10

    ORDER BY n, Tuples

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks folks. Brilliant! It made mincemeat out of the Route problem. It worked so well though that I'm now tossing it at a much bigger NP problem that we attempted in '98 on a challenge by JC. At that time, WITH and CTE wasn't available, so we weren't able to resolve the final sets.

    Stay tuned...

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • After putting the route issue to bed, I was mulling over how well this performed. So, to escalate its application, I tossed this at the Jan. '98 Dr. Dobbs 'Sweden Redistricting' challenge that Celko had challenged the Database Users forum to attempt using SQL. With his help, I arrived at a SQL solution using negation and divide/conquer, but the permutations for resolving the final set were far more than my measly IBM NetPro2 server could handle. It's haunted me for years since I haven't found a way to apply a multidimensional key to the problem, or any other SQL solutions :crazy:

    Using Dwain's solution I ran a test against a subset of 27, and it resolved out to sets of 5 in short order, then ran it against sets of 10 which resolved while I was working on another project. Rather than try the divide/conquer method first, I just staged the algorithm with the Candidates (151) where the total population of the set = 167,976. I figured it would take a long time to run, and even with the algorithm, might still not be feasible without some level of divide/conquer. I thought I'd let it run the weekend and report back since there is far more to resolve than the smaller set...

    Sweden Challenge...

    * Total 'old' number of Districts was 66.

    * Total Sweden Population = 167,976.

    * New District population set requirement = between 5,900 and 6,000 each.

    * Each New District had to be created using neighbors of Old Districts.

    * Approx. number of 'new' districts = 11 or 12 = ((167,976 / 5,900) / 2.5 old districts to each new district)

    * Possible Candidates after resolving 'neighbors' for the redistricting = 151 new districts, each containing 1, 2, or 3 'old' districts (based on population requirements).

    If it resolves, it would not only return 1 solution, but ALL solutions. Something the two dev teams that solved the problem couldn't do at that time using linear algorithms, even with the supercomputers they had at their disposal. Even if it doesn't resolve the 151, the divide and conquer method only reduces the largest subset to 107... though even 107 is a statistically significant reduction.

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • ChrisM@Work (6/22/2012)


    dwain.c (6/22/2012)


    ...The only thing I don't like about your version, and I'm really splitting hairs here, is the hardcoding of the 4 on the LEFT. If you use this query over different problems (pulling it from your toolbox), it would be really easy to forget that the 4 needs to correspond to the length of your item string.

    A local variable holding that value (defined right after the table) can easily avoid that issue though.

    The LEFT isn't necessary, neither is a local variable. You're comparing ID's in the source table to the last ID joined to the lhs of the string. If you save that last ID as a new column in the CTE, you can compare directly:

    ;WITH UNIQUEnTuples (n, Prod, Tuples, value)

    AS (

    SELECT

    1, Prod, Tuples = CAST(Prod AS VARCHAR(8000)), value

    FROM @t

    UNION ALL

    SELECT

    n.n+1, t.Prod, t.Prod + ',' + n.Tuples, n.value + t.value

    FROM @t t

    JOIN UNIQUEnTuples n

    ON n.Prod > t.Prod

    WHERE t.value + n.value <= 10

    )

    SELECT *

    FROM UNIQUEnTuples

    WHERE value = 10

    ORDER BY n, Tuples

    Drat! Thought of that last night and you beat me to it. Did you check to see if this solution is any faster?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Les Cardwell (6/22/2012)


    After putting the route issue to bed, I was mulling over how well this performed. So, to escalate its application, I tossed this at the Jan. '98 Dr. Dobbs 'Sweden Redistricting' challenge that Celko had challenged the Database Users forum to attempt using SQL. With his help, I arrived at a SQL solution using negation and divide/conquer, but the permutations for resolving the final set were far more than my measly IBM NetPro2 server could handle. It's haunted me for years since I haven't found a way to apply a multidimensional key to the problem, or any other SQL solutions :crazy:

    Using Dwain's solution I ran a test against a subset of 27, and it resolved out to sets of 5 in short order, then ran it against sets of 10 which resolved while I was working on another project. Rather than try the divide/conquer method first, I just staged the algorithm with the Candidates (151) where the total population of the set = 167,976. I figured it would take a long time to run, and even with the algorithm, might still not be feasible without some level of divide/conquer. I thought I'd let it run the weekend and report back since there is far more to resolve than the smaller set...

    Sweden Challenge...

    * Total 'old' number of Districts was 66.

    * Total Sweden Population = 167,976.

    * New District population set requirement = between 5,900 and 6,000 each.

    * Each New District had to be created using neighbors of Old Districts.

    * Approx. number of 'new' districts = 11 or 12 = ((167,976 / 5,900) / 2.5 old districts to each new district)

    * Possible Candidates after resolving 'neighbors' for the redistricting = 151 new districts, each containing 1, 2, or 3 'old' districts (based on population requirements).

    If it resolves, it would not only return 1 solution, but ALL solutions. Something the two dev teams that solved the problem couldn't do at that time using linear algorithms, even with the supercomputers they had at their disposal. Even if it doesn't resolve the 151, the divide and conquer method only reduces the largest subset to 107... though even 107 is a statistically significant reduction.

    ~Les

    Les - I'm very glad that I read the posts properly and was able to suggest a reasonable approach to solve your problem. Let's not forget Chris' contribution to chopping the mincemeat even finer!

    Please do let us know how this works on your Sweden redistricting problem. I am infinitely interested in hearing the ways the basic algorithm can be applied.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Les Cardwell (6/22/2012)


    Thanks folks. Brilliant! It made mincemeat out of the Route problem. It worked so well though that I'm now tossing it at a much bigger NP problem that we attempted in '98 on a challenge by JC. At that time, WITH and CTE wasn't available, so we weren't able to resolve the final sets.

    Stay tuned...

    Les - One other thing. Would you mind posting your final row counts (input and output), CPU and elapsed times to solution?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (6/22/2012) Did you check to see if this solution is any faster?

    To answer my own question:

    CPU Elapsed

    With LEFT 1566.2 2156

    Original 1741 2306.6

    Add Column 1616.2 2251.8

    This was the average over 5 runs using 50 items.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • >>Les - One other thing. Would you mind posting your final row counts (input and output), CPU and elapsed times to solution?

    Will do. It's still running. If it doesn't finish by Monday a.m., I'll perform a divide/conquer on another server. P != NP, but we'll see how far we can get 😎

    I did something similar for my dissertation using Phil Factor's 'Subscriptions' challenge as a litmus for testing multidimensional keys. It was enlightening. I just haven't found a way to do the same with this challenge. Well, not yet anyways :alien:

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • dwain.c (6/22/2012)


    dwain.c (6/22/2012) Did you check to see if this solution is any faster?

    To answer my own question:

    CPU Elapsed

    With LEFT 1566.2 2156

    Original 1741 2306.6

    Add Column 1616.2 2251.8

    This was the average over 5 runs using 50 items.

    You need to increase the value saught to increase the signal to noise ratio:

    --2a FROM @t t JOIN UNIQUEnTuples n ON t.Prod < LEFT(n.Tuples,4)

    Table 'Worktable'. Scan count 2, logical reads 27389583, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#3DB3258D'. Scan count 2, logical reads 9120073, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 165283 ms, elapsed time = 208140 ms.

    --2 FROM UNIQUEnTuples n JOIN @t t ON t.Prod < LEFT(n.Tuples,4)

    Table 'Worktable'. Scan count 2, logical reads 27389583, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#3DB3258D'. Scan count 2, logical reads 9120073, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 164628 ms, elapsed time = 207793 ms.

    -- 6 FROM UNIQUEnTuples n JOIN @t t ON t.Prod < n.Prod

    Table 'Worktable'. Scan count 2, logical reads 27399734, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#3DB3258D'. Scan count 2, logical reads 9120073, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 182849 ms, elapsed time = 231285 ms.

    -- 5 FROM UNIQUEnTuples n JOIN @t t ON n.Prod > t.Prod

    Table 'Worktable'. Scan count 2, logical reads 27399734, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#3DB3258D'. Scan count 2, logical reads 9120073, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 183145 ms, elapsed time = 230665 ms.

    -- 4 FROM @t t JOIN UNIQUEnTuples n ON t.Prod < n.Prod

    Table 'Worktable'. Scan count 2, logical reads 27399734, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#3DB3258D'. Scan count 2, logical reads 9120073, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 184502 ms, elapsed time = 238268 ms.

    -- 3 FROM @t t JOIN UNIQUEnTuples n ON n.Prod > t.Prod

    Table 'Worktable'. Scan count 2, logical reads 27399734, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#3DB3258D'. Scan count 2, logical reads 9120073, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 186172 ms, elapsed time = 234086 ms.

    -- 1a FROM UNIQUEnTuples n JOIN @t t ON t.Prod < n.Tuples WHERE CHARINDEX(t.Prod, n.Tuples) = 0

    Table 'Worktable'. Scan count 2, logical reads 27389583, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#3DB3258D'. Scan count 2, logical reads 9120073, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 228619 ms, elapsed time = 272246 ms.

    -- 1 FROM @t t JOIN UNIQUEnTuples n ON t.Prod < n.Tuples WHERE CHARINDEX(t.Prod, n.Tuples) = 0

    Table 'Worktable'. Scan count 2, logical reads 27389583, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#3DB3258D'. Scan count 2, logical reads 9120073, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 229166 ms, elapsed time = 273776 ms.

    These results are for a value saught of 20.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • FWIW, with some interesting results, I did solve the challenge, but ironically, it was with my original SQL on a screaming database server using the divide and conquer method. What a difference 20yrs makes. I want to try the CTE using the same (it's still running against the entire set), and will post all the findings up here once it's decomposed. The feedback for improvements should be interesting.

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • For those who want to explore this a bit further, I'm attaching the SQL that JC provided to create the 'Sweden' table, and a 'Map' table to associate all old districts with their neighbors. I'll follow up with the solution, though I may not have time to clean it all up as much as I'd like.

    FWIW, the 'set' solution did complete on my tweaked out server, but the CTE solution never did, which I find interesting. But I'll comment more on it later.

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • Here's the entire SQL set that solves the challenge using negation. SW1.doc and SW2.doc are separated versions of the previous attachment. I can now process the entire set in under 15 minutes using these scripts on this server. It generates the new districts in three files...

    1. NewDistricts - those districts that are isolated by location/population.

    2. T2 - new districts that are made up of 2 old districts.

    3. T3 - new districts that are made up of 3 old districts.

    I used Population sums from Sweden (gross), and the three new tables as the proof...

    Population Counts

    Sweden167976

    T2 59928

    D2 71930

    ND 36118

    -------------------------

    167976 167976

    All possible new districts can be generated by cross joining (T2 X T3) X all NewDistricts. I don't think any of the supercomputer models using linear algorithms was able to do the same.

    I'm going to run some tests with the CTE, but so far, it seems to bottleneck at some level of complexity.

    As always, I might have missed something in my excitement over finally solving this NP problem after 22yrs :hehe:

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • Dwain,

    I must be doing something wrong here. I'm using the CTE to attempt the same logic as SW7 to resolve those new Sweden districts with two old districts (doubles), but it doesn't complete, or maybe takes a very long time to complete.

    ;

    WITH UNIQUEnTuples ( n, Tuples, [Population], [check] )

    AS ( SELECT 1 ,

    CAST(OID AS VARCHAR(MAX)) ,

    [Population] ,

    CAST([Population] AS VARCHAR(MAX))

    FROM dbo.Remaining

    UNION ALL

    SELECT 1 + n.n ,

    t.OID + ',' + CAST(n.Tuples AS VARCHAR(MAX)) ,

    n.[Population] + t.[Population] ,

    [check] + '+' + CAST(t.[Population] AS VARCHAR(MAX))

    FROM dbo.Remaining t

    JOIN UNIQUEnTuples n ON t.OID < n.Tuples

    WHERE CHARINDEX(t.OID, n.Tuples) = 0

    AND n <= 12 --24 doubles / 2 per new district

    AND t.[Population] + n.[Population] <= 71930 --population of doubles (D1)

    AND t.District3 = 0 --only select doubles

    )

    SELECT *

    FROM UNIQUEnTuples

    WHERE [Population] = 71930 --population of doubles (D1)

    ORDER BY n , Tuples

    --SELECT SUM(population) FROM dbo.D1 AS d = 71930

    Does anything jump out?

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • Jeff,

    jeffem (6/19/2012)


    In the vein of being pedantic (because that's the only way I can contribute on this one πŸ˜‰ ), probabilistically speaking, combinations are distinct. Permutations have repeats, in all possible orders.

    So, the permutation count for David's scenario is 50*49*48*47*46 = 50!/45! = 254,251,200

    That would leave 120 (5*4*3*2*1) permutations for each distinct combination, so the combination count = 50!/(45!*5!) = 2,118,760 records.

    So, given 66 original districts (Sweden), that resolve into 151 candidates (population count between 5,900-6,000) resulting in 28 new districts (1 single, 16 doubles, and 11 triples), what would be the permutations for all possible orders if calculated without divide and conquer? From the outset, one could estimate the number of new districts to be ~28 by dividing the Sweden population by the desired population of the new districts

    167,976 / 5,950 = 28.23

    So would it be...

    66!/(66-28)! = 1.0407674943659650459224651367354e+48

    (???)

    All possible combinations resulted in...

    1 - unique set (NewDistricts)

    58 - possible combinations of doubles

    162 - possible combinations of triples

    ...162x58x1 ...for a total possible number of correct result sets = 9,396.

    AAR, just curious.

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

Viewing 15 posts - 46 through 60 (of 109 total)

You must be logged in to reply to this topic. Login to reply