tuning query using max() over(partition by) clause

Question

Post reply

tuning query using max() over(partition by) clause

venus.pvr

SSCarpal Tunnel

Points: 4813
More actions
January 27, 2014 at 2:16 pm

#287739

Hi ,
I have a procedure that loads about 60 million rows using code similar to below. I just wrote up this sample code and loaded with about 7 million rows to test the execution plan. It seems like it uses a sort operation to return a few max() over(partition by) different columns. what would be the best way to rewrite the query or process to get better performance. The code used to create the sample tables and load them is commented.
Please let me know any suggestions . Thank you
--DROP TABLE TestExecPlans
--CREATE TABLE TestExecPlans
--(Usr INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
--val1 INT NOT NULL,
--val2 INT NOT NULL,
--val3 INT NOT NULL,
--pal1 INT NOT NULL,
--pal2 INT NOT NULL,
--pal3 INT NOT NULL,
--)
--CREATE CLUSTERED INDEX IX_TestEx_Usr on TestExecPlans(Usr)
--DROP INDEX IX_TestEx_Usr on TestExecPlans
--DROP TABLE DestTable
--CREATE TABLE pparpati.DestTable
--(Usr INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
--val1 INT NULL,
--val2 INT NULL,
--val3 INT NULL,
--pal1 INT NULL,
--pal2 INT NULL,
--pal3 INT NULL,
--)
--CREATE NONCLUSTERED INDEX IX_val1 ON Desttable(val1);
--INSERT INTO TestExecPlans
-- ( val1, val2, val3, pal1, pal2, pal3 )
--VALUES
--(1,2,5,1,2,5),
--(3,3,3,3,3,3),
--(0,1,0,0,1,0)
--INSERT INTO TestExecPlans
--SELECT val1, val2, val3, pal1, pal2, pal3 FROM TestExecPlans
--SELECT count(*) FROM TestExecPlans
;WITH CTE1 AS
(
SELECT Usr,val1,val2+1 AS val2, val3, pal1 FROM TestExecPlans WHERE val1=1
),
CTE2 AS
(
SELECT Usr,val1,val2+1 AS val2, val3, pal1 FROM TestExecPlans WHERE val1=3
),
CTE3 AS
(
SELECT Usr,val1,val2+1 AS val2, val3, pal1 FROM TestExecPlans WHERE val1=0
)
INSERT INTO DestTable
SELECT c1.val1, c2.val2, c3.val3,
(SELECT MIN(mv)FROM (VALUES (c1.val1), (c2.val2), (c3.val3)) AS t(mv)) AS MinSimilarity,
(SELECT MAX(mv)FROM (VALUES (c1.val1), (c2.val2), (c3.val3)) AS t(mv)) AS MaxSimilarity,
MAX(c1.val1) OVER (PARTITION BY ISNULL(c1.Usr, ISNULL(c2.Usr,c3.Usr))) AS maxval
--(SELECT MIN(mv)FROM (VALUES (pal1), (pal2), (pal3)) AS t(mv)) AS MinSimilarity,
--(SELECT MAX(mv)FROM (VALUES (pal1), (pal2), (pal3)) AS t(mv)) AS MaxSimilarity
FROM CTE1 c1 FULL OUTER JOIN CTE2 c2 ON c1.Usr=c2.Usr FULL OUTER JOIN CTE3 c3 ON ISNULL(c1.usr,c2.usr)=c3.Usr

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

venus.pvr SSCarpal Tunnel Points: 4813 More actions · Answer 1

anyone can think of anything for this one. I have n't been able to figure it. Thanks

Alan Burstein SSC Guru Points: 61141 More actions · Answer 2

I am looking at this. Is it possible for you to post the actual execution plan that is created when you run this query?

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Alan Burstein SSC Guru Points: 61141 More actions · Answer 3

This can still be optimized (I'm out of time here). But this should be an enormous improvement.

WITH CTE AS

(

SELECT Usr,

CASE WHEN val1=1 THEN 1 END AS val1,

CASE WHEN val1=3 THEN val2+1 END AS val2,

CASE WHEN val1=0 THEN val3 END AS val3

FROM TestExecPlans

)

INSERT INTO DestTable

SELECT *,

(SELECT MIN(mv)

FROM (VALUES (val1), (val2), (val3)) AS t(mv)) AS MinSimilarity,

(SELECT MAX(mv)

FROM (VALUES (val1), (val2), (val3)) AS t(mv)) AS MaxSimilarity,

MAX(val1) OVER (PARTITION BY Usr) AS maxval

FROM CTE;

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 4

venus.pvr (1/31/2014)
anyone can think of anything for this one. I have n't been able to figure it. Thanks

Your query almost certainly isn't doing what you are expecting. The column Usr is unique. There's no point in looking for three rows for the same usr and different values of val1 and full-outer-joining them because there will always be only one, even in 60 million rows. Your query is equivalent to this:

SELECT

val1 = CASE WHEN val1=1 THEN val1 END,

val2 = CASE WHEN val1=3 THEN val2+1 END,

val3 = CASE WHEN val1=0 THEN val3 END,

MinSimilarity = CASE WHEN val1=1 THEN val1 WHEN val1=3 THEN val2+1 WHEN val1=0 THEN val3 END,

MaxSimilarity = CASE WHEN val1=1 THEN val1 WHEN val1=3 THEN val2+1 WHEN val1=0 THEN val3 END,

maxval = CASE WHEN val1=1 THEN val1 END

FROM TestExecPlans

- which is a heck of a lot cheaper, but still pointless.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

venus.pvr SSCarpal Tunnel Points: 4813 More actions · Answer 5

Hi Alan/Chris,

Sorry, I did not post my original query since I didn't want to post the orig code. But all those parts are needed for my case. I wrote something similar. But I think what I wrote doesn't make sense at all. So it will be hard to interpret all this..

I think I will try to write up some code which makes sense and also post the execution plan.

thank you very much for your time.

tuning query using max() over(partition by) clause

Cookies on SQLServerCentral