Aggregate Query question

  • Les Cardwell (7/11/2012)


    Ha! I'm glad I could send you on such an adventure 🙂

    I'll run both for time comparisons this weekend when there's not much activity on this server (this has been one of those server nightmares where the vendor's specs we're not even close to what it took to bring things under control).

    On the plus side, I learned a great deal about CTE's.

    Left wanting is the elusive MK solution, but JC's bin packing methodology opened a door. More fodder for meditation in waiting rooms.

    What is the "elusive MK solution?" Is this another potential challenge for me? 🙂 I too am interested in the bin packing problems (URL provided by someone earlier in this thread). Been looking and thinking - sounds interesting.

    FYI. I have improved the speed of the step that populates NewDistricts TABLE by about 60%. That is actually one of the more expensive steps in the solution.

    I also created SQL to perform two checks:

    1. That each solution contains no duplicated NewDistricts

    2. That there are no duplicated solutions in the final results set

    Both checks passed in the results set I posted last time. Unfortunately, I did not need to utilize my fancy dynamic SQL SP to accomplish either of these. Oh well!

    I can provide these if you are interested.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I am...might as well have it all before I run it up 🙂

    I'll send you a paper on MK models via email.

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • First check: No duplicated tuples in any solutions:

    ;WITH BaseDistricts AS (

    -- Stuff the Tuple for the Special Districts into a single string and calculate

    -- the population for that combination.

    SELECT Tuple, b.[population]

    FROM (

    SELECT Tuple=STUFF((

    SELECT ';' + Tuple

    FROM dbo.NewDistricts

    WHERE SpecialDistrict = 1

    FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'), 1, 1, '')) a

    CROSS APPLY (

    SELECT [Population]=SUM([Population])

    FROM dbo.NewDistricts

    WHERE SpecialDistrict = 1) b

    ),

    SolutionSet AS (

    SELECT SolutionID=ROW_NUMBER() OVER (ORDER BY Solution)

    ,Solution

    ,[Population]=a.[Population] + b.[Population] + c.[Population]

    FROM BaseDistricts a

    -- Now apply the Triplets

    CROSS APPLY (SELECT Tuples, [Population] FROM dbo.OptimalTuples WHERE n=10) b

    -- Now apply the Doublets

    CROSS APPLY (SELECT Tuples, [Population] FROM dbo.OptimalTuples WHERE n=12) c

    -- Create the combined Tuple

    CROSS APPLY (SELECT Solution=a.Tuple + ';' + b.Tuples + ';' + c.Tuples) d

    )

    SELECT SolutionID, Item

    FROM (

    SELECT SolutionID, Item

    ,rn=ROW_NUMBER() OVER (PARTITION BY SolutionID, Item ORDER BY (SELECT NULL))

    FROM SolutionSet

    CROSS APPLY DelimitedSplit8K(Solution, ';')) a

    WHERE rn<>1

    Second check: No Duplicate Solutions

    ;WITH BaseDistricts AS (

    -- Stuff the Tuple for the Special Districts into a single string and calculate

    -- the population for that combination.

    SELECT Tuple, b.[population]

    FROM (

    SELECT Tuple=STUFF((

    SELECT ';' + Tuple

    FROM dbo.NewDistricts

    WHERE SpecialDistrict = 1

    FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'), 1, 1, '')) a

    CROSS APPLY (

    SELECT [Population]=SUM([Population])

    FROM dbo.NewDistricts

    WHERE SpecialDistrict = 1) b

    ),

    SolutionSet AS (

    SELECT SolutionID=ROW_NUMBER() OVER (ORDER BY Solution)

    ,Solution

    ,[Population]=a.[Population] + b.[Population] + c.[Population]

    FROM BaseDistricts a

    -- Now apply the Triplets

    CROSS APPLY (SELECT Tuples, [Population] FROM dbo.OptimalTuples WHERE n=10) b

    -- Now apply the Doublets

    CROSS APPLY (SELECT Tuples, [Population] FROM dbo.OptimalTuples WHERE n=12) c

    -- Create the combined Tuple

    CROSS APPLY (SELECT Solution=a.Tuple + ';' + b.Tuples + ';' + c.Tuples) d

    )

    SELECT SolutionID, Solution

    FROM (

    SELECT SolutionID, Solution

    ,rn=ROW_NUMBER() OVER (PARTITION BY Solution ORDER BY (SELECT NULL))

    FROM SolutionSet) a

    WHERE rn<>1

    Both checks return 0 rows if the solution is a valid optimal solution.

    I've had a couple of additional ideas with respect to the NewDistricts creation process and to the 2 SPs that I want to check out this weekend and I'll pass those along with my findings afterwards.

    This is going to be a busy weekend.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • This script should create the NewDistricts TABLE in under 5 seconds on a machine with the same spec as the one I ran my prior timing tests on (bringing the overall solution to well under 5 minutes run time).

    SET STATISTICS TIME ON

    ;WITH NewDistricts AS (

    SELECT n, d1, d2, d3

    ,s1='[' + RIGHT('00' + CAST(d1 AS VARCHAR), 2) + ']'

    ,s2='[' + RIGHT('00' + CAST(d2 AS VARCHAR), 2) + ']'

    ,s3='[' + RIGHT('00' + CAST(d3 AS VARCHAR), 2) + ']'

    ,Tuple='[' + RIGHT('00' + CAST(d1 AS VARCHAR), 2) + ',' +

    RIGHT('00' + CAST(d2 AS VARCHAR), 2) + ',' + RIGHT('00' + CAST(d3 AS VARCHAR), 2) + ']'

    ,[population]

    FROM (

    -- Create the triplets

    SELECT n=3, e.d1, e.d2, e.d3

    ,[population]=b.[population] + c.[population] + d.[population]

    ,rn=ROW_NUMBER() OVER (PARTITION BY e.d1, e.d2, e.d3 ORDER BY (SELECT NULL))

    FROM (

    SELECT c.d1, c.d2, c.d3

    FROM dbo.Map a

    INNER JOIN dbo.Map b

    ON a.district = b.district OR

    a.district = b.neighbor OR

    a.neighbor = b.district OR

    a.neighbor = b.neighbor

    CROSS APPLY (

    VALUES (b.district, a.district, a.neighbor)

    ,(b.neighbor, a.district, a.neighbor)

    ) c (d1, d2, d3)

    GROUP BY c.d1, c.d2, c.d3) a

    CROSS APPLY (

    SELECT d1=CASE WHEN d1 > d3 THEN d2 WHEN d1 > d2 THEN d2 ELSE d1 END

    ,d2=CASE WHEN d1 > d3 THEN d3 WHEN d1 > d2 THEN d1 ELSE d2 END

    ,d3=CASE WHEN d1 > d3 THEN d1 WHEN d1 > d2 THEN d3 ELSE d3 END) e

    LEFT JOIN dbo.Sweden b ON a.d1 = b.district

    LEFT JOIN dbo.Sweden c ON a.d2 = c.district

    LEFT JOIN dbo.Sweden d ON a.d3 = d.district

    WHERE b.[population] + c.[population] + d.[population] BETWEEN 5900 AND 6100 AND

    e.d1 <> e.d2 AND e.d2 <> e.d3 AND e.d1 <> e.d3) x

    WHERE rn=1

    UNION ALL

    -- Create the doublets

    SELECT 2, a.district, neighbor, 0

    ,'[' + RIGHT('00' + CAST(a.district AS VARCHAR), 2) + ']'

    ,'[' + RIGHT('00' + CAST(neighbor AS VARCHAR), 2) + ']'

    ,NULL

    ,Tuple='[' + RIGHT('00' + CAST(a.district AS VARCHAR), 2) + ',' +

    RIGHT('00' + CAST(neighbor AS VARCHAR), 2) + ']'

    ,b.[population] + c.[population]

    FROM dbo.Map a

    LEFT JOIN dbo.Sweden b ON a.district = b.district

    LEFT JOIN dbo.Sweden c ON a.neighbor = c.district

    WHERE b.[population] + c.[population] BETWEEN 5900 AND 6100

    UNION ALL

    -- Add the singlet

    SELECT 1, district, 0, 0

    ,'[' + RIGHT('00' + CAST(district AS VARCHAR), 2) + ']'

    ,NULL

    ,NULL

    ,Tuple='[' + RIGHT('00' + CAST(district AS VARCHAR), 2) + ']'

    ,[population]

    FROM dbo.Sweden

    WHERE [population] BETWEEN 5900 AND 6100

    ),

    SpecialDistricts AS (

    SELECT d

    FROM NewDistricts

    CROSS APPLY (VALUES (d1), (d2), (d3)) a(d)

    GROUP BY d

    HAVING COUNT(d) = 1

    )

    INSERT INTO dbo.NewDistricts

    SELECT n, d1, d2, d3, s1, s2, s3, Tuple, [Population]

    ,SpecialDistrict=CASE a+b+c WHEN 0 THEN 0 ELSE 1 END

    FROM NewDistricts

    CROSS APPLY (

    SELECT a=CASE WHEN d1 IN (SELECT d FROM SpecialDistricts) THEN 1 ELSE 0 END

    ,b=CASE WHEN d2 IN (SELECT d FROM SpecialDistricts) THEN 1 ELSE 0 END

    ,c=CASE WHEN d3 IN (SELECT d FROM SpecialDistricts) THEN 1 ELSE 0 END) a

    ORDER BY CASE a+b+c WHEN 0 THEN 0 ELSE 1 END DESC, n, d1, d2, d3

    SET STATISTICS TIME OFF

    Note that this, and the SPs I provided all require SQL 2008, mainly because of the CROSS APPLY VALUES I use to UNPIVOT.

    It would be nice if you could verify what I've done here to make sure that the solutions I'm generating are valid and that I didn't miss something.

    The other ideas I mentioned are experimental in nature but I have a feeling they'll improve it further.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Well Les, I have good news. Apparently my latest ideas panned out and I've gotten the overall solution run time down to 1.5 minutes.

    Looking forward to how this stacks up against the past.

    That's probably about as far as I can take it. Can't imagine getting any additional new ideas to improve it further.

    The write up and all code is attached.

    Good luck! I'll check back after you've had a go with it.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 5 posts - 106 through 109 (of 109 total)

You must be logged in to reply to this topic. Login to reply