tuning query using max() over(partition by) clause

  • Hi ,

    I have a procedure that loads about 60 million rows using code similar to below. I just wrote up this sample code and loaded with about 7 million rows to test the execution plan. It seems like it uses a sort operation to return a few max() over(partition by) different columns. what would be the best way to rewrite the query or process to get better performance. The code used to create the sample tables and load them is commented.

    Please let me know any suggestions . Thank you

    --DROP TABLE TestExecPlans

    --CREATE TABLE TestExecPlans

    --(Usr INT NOT NULL IDENTITY(1,1) PRIMARY KEY,

    --val1 INT NOT NULL,

    --val2 INT NOT NULL,

    --val3 INT NOT NULL,

    --pal1 INT NOT NULL,

    --pal2 INT NOT NULL,

    --pal3 INT NOT NULL,

    --)

    --CREATE CLUSTERED INDEX IX_TestEx_Usr on TestExecPlans(Usr)

    --DROP INDEX IX_TestEx_Usr on TestExecPlans

    --DROP TABLE DestTable

    --CREATE TABLE pparpati.DestTable

    --(Usr INT NOT NULL IDENTITY(1,1) PRIMARY KEY,

    --val1 INT NULL,

    --val2 INT NULL,

    --val3 INT NULL,

    --pal1 INT NULL,

    --pal2 INT NULL,

    --pal3 INT NULL,

    --)

    --CREATE NONCLUSTERED INDEX IX_val1 ON Desttable(val1);

    --INSERT INTO TestExecPlans

    -- ( val1, val2, val3, pal1, pal2, pal3 )

    --VALUES

    --(1,2,5,1,2,5),

    --(3,3,3,3,3,3),

    --(0,1,0,0,1,0)

    --INSERT INTO TestExecPlans

    --SELECT val1, val2, val3, pal1, pal2, pal3 FROM TestExecPlans

    --SELECT count(*) FROM TestExecPlans

    ;WITH CTE1 AS

    (

    SELECT Usr,val1,val2+1 AS val2, val3, pal1 FROM TestExecPlans WHERE val1=1

    ),

    CTE2 AS

    (

    SELECT Usr,val1,val2+1 AS val2, val3, pal1 FROM TestExecPlans WHERE val1=3

    ),

    CTE3 AS

    (

    SELECT Usr,val1,val2+1 AS val2, val3, pal1 FROM TestExecPlans WHERE val1=0

    )

    INSERT INTO DestTable

    SELECT c1.val1, c2.val2, c3.val3,

    (SELECT MIN(mv)FROM (VALUES (c1.val1), (c2.val2), (c3.val3)) AS t(mv)) AS MinSimilarity,

    (SELECT MAX(mv)FROM (VALUES (c1.val1), (c2.val2), (c3.val3)) AS t(mv)) AS MaxSimilarity,

    MAX(c1.val1) OVER (PARTITION BY ISNULL(c1.Usr, ISNULL(c2.Usr,c3.Usr))) AS maxval

    --(SELECT MIN(mv)FROM (VALUES (pal1), (pal2), (pal3)) AS t(mv)) AS MinSimilarity,

    --(SELECT MAX(mv)FROM (VALUES (pal1), (pal2), (pal3)) AS t(mv)) AS MaxSimilarity

    FROM CTE1 c1 FULL OUTER JOIN CTE2 c2 ON c1.Usr=c2.Usr FULL OUTER JOIN CTE3 c3 ON ISNULL(c1.usr,c2.usr)=c3.Usr

  • anyone can think of anything for this one. I have n't been able to figure it. Thanks

  • I am looking at this. Is it possible for you to post the actual execution plan that is created when you run this query?

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • This can still be optimized (I'm out of time here). But this should be an enormous improvement.

    WITH CTE AS

    (

    SELECT Usr,

    CASE WHEN val1=1 THEN 1 END AS val1,

    CASE WHEN val1=3 THEN val2+1 END AS val2,

    CASE WHEN val1=0 THEN val3 END AS val3

    FROM TestExecPlans

    )

    INSERT INTO DestTable

    SELECT *,

    (SELECT MIN(mv)

    FROM (VALUES (val1), (val2), (val3)) AS t(mv)) AS MinSimilarity,

    (SELECT MAX(mv)

    FROM (VALUES (val1), (val2), (val3)) AS t(mv)) AS MaxSimilarity,

    MAX(val1) OVER (PARTITION BY Usr) AS maxval

    FROM CTE;

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • venus.pvr (1/31/2014)


    anyone can think of anything for this one. I have n't been able to figure it. Thanks

    Your query almost certainly isn't doing what you are expecting. The column Usr is unique. There's no point in looking for three rows for the same usr and different values of val1 and full-outer-joining them because there will always be only one, even in 60 million rows. Your query is equivalent to this:

    SELECT

    val1 = CASE WHEN val1=1 THEN val1 END,

    val2 = CASE WHEN val1=3 THEN val2+1 END,

    val3 = CASE WHEN val1=0 THEN val3 END,

    MinSimilarity = CASE WHEN val1=1 THEN val1 WHEN val1=3 THEN val2+1 WHEN val1=0 THEN val3 END,

    MaxSimilarity = CASE WHEN val1=1 THEN val1 WHEN val1=3 THEN val2+1 WHEN val1=0 THEN val3 END,

    maxval = CASE WHEN val1=1 THEN val1 END

    FROM TestExecPlans

    - which is a heck of a lot cheaper, but still pointless.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Hi Alan/Chris,

    Sorry, I did not post my original query since I didn't want to post the orig code. But all those parts are needed for my case. I wrote something similar. But I think what I wrote doesn't make sense at all. So it will be hard to interpret all this..

    I think I will try to write up some code which makes sense and also post the execution plan.

    thank you very much for your time.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply