Query optimizing 2

Question

Query optimizing 2

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

Won't matter if it returns too much data. Assuming that what you posted is the full list of columns, a table scan is actually a very good plan.

Rokh SSC Journeyman Points: 90 More actions · Answer 2

What I have posted is a simplification of the real case. I can't post the real stuff as it owned by the company. But the real case isn't returning much columns at all. Just a few more. The point of everything was that when adding the CASE/WHEN statements, the execution time went up quite a bit. I wondered if I was completely off with my method.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

It adds some overhead but nothing majorly noticable.

I use this to death in my reports and there's no real difference (onec I can have a seek and / or at least a covering index).

Can you post the actual execution plan? Maybe we're not seeing something real obvious that's killing that query.

Rokh SSC Journeyman Points: 90 More actions · Answer 4

http://www.sqlservercentral.com/Forums/Attachment9832.aspx

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

Rokh (9/21/2011)
http://www.sqlservercentral.com/Forums/Attachment9832.aspx

--Before changing the query, there was this missing index in the plan

--I'm not 100% sure it would be the best after the change,

--I would try both version, #1 as is, and #2 with Sold_to_country outside the include after the ToCity

/*

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]

ON [dbo].[Sales_City] ([ToCity])

INCLUDE ([Sold_To_Country],[SalesAmount])

*/

SELECT

b.Sold_To_Country

, SUM(CASE WHEN b.ToCity = 'New York' THEN SalesAmount

ELSE 0

END) Sales_New_York

, SUM(CASE WHEN b.ToCity = 'Detroit' THEN SalesAmount

ELSE 0

END) Sales_Detroit

, SUM(CASE WHEN b.ToCity = 'Los Angeles' THEN SalesAmount

ELSE 0

END) Sales_Los_Angeles

, SUM(CASE WHEN b.ToCity = 'Paris' THEN SalesAmount

ELSE 0

END) Sales_Paris

, SUM(CASE WHEN b.ToCity = 'Lyon' THEN SalesAmount

ELSE 0

END) Sales_Lyon

, SUM(CASE WHEN vToCity = 'Bonn' THEN SalesAmount

ELSE 0

END) Sales_Bonn

, SUM(CASE WHEN b.ToCity = 'Hamburg' THEN SalesAmount

ELSE 0

END) Sales_Hamburg

, SUM(CASE WHEN b.ToCity = 'Frankfurt' THEN SalesAmount

ELSE 0

END) Sales_Frankfurt

FROM

--I don't think you need this join, that would whack 60% of the query cost

-- dbo.CountriesInvolved a INNER JOIN

dbo.Sales_City b

-- ON a.Sold_To_Country = b.Sold_To_Country

WHERE

ToCity IN ( 'New York' , 'Detroit' , 'Los Angeles' , 'Paris' , 'Lyon' ,

'Bonn' , 'Hamburg' , 'Frankfurt' )

GROUP BY

b.Sold_To_Country

--You don't, need the group on ToCity

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 6

{Edit} Post withdrawn... still testing and posted too early.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Rokh SSC Journeyman Points: 90 More actions · Answer 7

Ok, keeping the thread alive :).

It's not as easy it looks I think. My theory for now is that I need to accept the performance. I hope you'll come up with better.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 8

Rokh (9/22/2011)
Ok, keeping the thread alive :).
It's not as easy it looks I think. My theory for now is that I need to accept the performance. I hope you'll come up with better.

Is it possible & have you tried removing the join from the query?

Combined with Jeff's excellent pre-aggregate idea this should fly WAY faster than it is now.

Rokh SSC Journeyman Points: 90 More actions · Answer 9

No I didn't.

Like stated before, the inner join is not be removed. It represents a real life situation. I've only skinned it down for this forum. Call it respect 🙂

But the inner join should stay.

Offcourse, without the inner join things will speed up. However, the big picture is that I noticed a serious performance decrease using the case/when statements. As I'm not an expert I can imagine you guys have a better alternative. If not, then I'm still grateful for your persistance and help.

Gr,

~R

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

Rokh (9/22/2011)
No I didn't.
Like stated before, the inner join is not be removed. It represents a real life situation. I've only skinned it down for this forum. Call it respect 🙂
But the inner join should stay.
Offcourse, without the inner join things will speed up. However, the big picture is that I noticed a serious performance decrease using the case/when statements. As I'm not an expert I can imagine you guys have a better alternative. If not, then I'm still grateful for your persistance and help.
Gr,
~R

Never had an issue with the case when to pivot data so I don't have a better option aside from doing it client side.

Rokh SSC Journeyman Points: 90 More actions · Answer 11

Not a biggy. Thanks for the help. I think Jeff is still researching. So let's not close this case yet.

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 12

I apologize for the delay but yes, indeed, I am doing some testing in my "spare" time. This has become quite interesting.

Conventional wisdom would indicate that a single "SUM" column of convential pre-aggregation methods should be faster than a multi-column set of "SUMs" found both in the "Basic Cross Tab" and the "Original" method posted by Rokh. So much for conventional wisdom. 😉 There are some surprising results to the testing I've done but they also support what Rokh has already stated... his original method is faster than all the other methods including my (now poorly) conceived notion of what pre-aggregation is. The bottom line is that it appears that Rokh has actually discovered a better, faster, and less resource intensive method of "Pre-Aggregation" to be used prior to "pivoting".

I'm not finished with my analysis of the execution plan (I'm using SQL Server 2005 on an older single CPU 32 bit machine for this testing and need to do similar tests on other, more modern boxes) but I thought I'd share my test code (especially the data creation code) in case someone else wants to do some testing. After all, one simple test is worth a thousand expert opinions even if one of them is mine. 😛

I've included some code to create a million rows of test data below. My indexing recommendations are also included in that code. Make sure that you don't run the code anywhere except in TempDB because it drops "real" tables. I used real tables because I wanted to see what DTA would do (which wasn't much). Of course, there should be a PK on the Sales_City table but none of the known columns were worthy.

--=============================================================================================================

-- Create the test environment. Nothing in this section is a part of the solution.

--=============================================================================================================

--===== Identify a nice, safe place for these tests that everyone has.

USE tempdb

;

--===== Conditionally drop the tables to make reruns easier in SSMS.

IF OBJECT_ID('temdb..#CountryCity','U') IS NOT NULL DROP TABLE #CountryCity;

IF OBJECT_ID('dbo.CountriesInvolved','U') IS NOT NULL DROP TABLE dbo.CountriesInvolved;

IF OBJECT_ID('dbo.Sales_City','U') IS NOT NULL DROP TABLE dbo.Sales_City;

--===== Create the given tables

CREATE TABLE dbo.CountriesInvolved

(

Sold_To_Country NVARCHAR(100) NOT NULL

)

;

CREATE TABLE dbo.Sales_City

(

Sold_To_Country NVARCHAR(100) NULL,

ToCity NVARCHAR(100) NULL,

SalesAmount FLOAT NULL

)

;

--===== Build and populate a table to hold countries and states for the random data generator

SELECT RowNum = CAST(RowNum AS INT),

CountryName = CAST(CountryName AS NVARCHAR(100)),

CityName = CAST(CityName AS NVARCHAR(100))

INTO #CountryCity

FROM (

SELECT 1,'United States of America','New York' UNION ALL

SELECT 2,'United States of America','Detroit' UNION ALL

SELECT 3,'United States of America','Los Angeles' UNION ALL

SELECT 4,'France','Paris' UNION ALL

SELECT 5,'France','Lyon' UNION ALL

SELECT 6,'Germany','Bonn' UNION ALL

SELECT 7,'Germany','Hamburg' UNION ALL

SELECT 8,'Germany','Frankfurt' UNION ALL

SELECT 9,'SomeCountry1','SomeCity1' UNION ALL

SELECT 10,'SomeCountry1','SomeCity2' UNION ALL

SELECT 11,'SomeCountry2','SomeCity1' UNION ALL

SELECT 12,'SomeCountry2','SomeCity2'

) d (RowNum, CountryName, CityName)

;

--===== Add a Clustered Index for speed

CREATE CLUSTERED INDEX ix_#CountryCity_RowNum

ON #CountryCity (RowNum) WITH FILLFACTOR = 100

;

--===== Populate and index the given tables

INSERT INTO dbo.CountriesInvolved

(Sold_To_Country)

SELECT DISTINCT

Sold_To_Country = CountryName

FROM #CountryCity

WHERE CountryName NOT LIKE 'SomeCountry%'

;

ALTER TABLE dbo.CountriesInvolved

ADD CONSTRAINT PK_CountriesInvolved

PRIMARY KEY CLUSTERED (Sold_To_Country) WITH FILLFACTOR = 100

;

WITH

cteDataGenerator AS

(

SELECT TOP 1000000

RowNum = ABS(CHECKSUM(NEWID()))%12+1,

SalesAmount = RAND(CHECKSUM(NEWID()))*99+1

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

)

INSERT INTO dbo.Sales_City

(Sold_To_Country, ToCity, SalesAmount)

SELECT Sold_To_Country = cc.CountryName,

ToCity = cc.CityName,

gen.SalesAmount

FROM cteDataGenerator gen

LEFT JOIN #CountryCity cc

ON gen.RowNum = cc.RowNum

;

CREATE NONCLUSTERED INDEX ix_Sales_City_Sold_To_Country_Cover01

ON dbo.Sales_City (Sold_To_Country, ToCity) INCLUDE (SalesAmount)

;

I ran 4 different pieces of code with SQL Profiler on completed batches. The description of each is in the comments in the code. Here's that code...

-----------------------------------------------------------------------------------------

DBCC FREEPROCCACHE WITH NO_INFOMSGS

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

GO

--===== The Original Code ===============================================================

select

Sold_To_Country,

sum(Sales_New_York) Sales_New_York,

sum(Sales_Detroit) Sales_Detroit,

sum(Sales_Los_Angeles) Sales_Los_Angeles,

sum(Sales_Paris) Sales_Paris,

sum(Sales_Lyon) Sales_Lyon,

sum(Sales_Bonn) Sales_Bonn,

sum(Sales_Hamburg) Sales_Hamburg,

sum(Sales_Frankfurt) Sales_Frankfurt

from

(

select

b.Sold_To_Country,

(case when ToCity = 'New York' then SUM(SalesAmount) else 0 end) Sales_New_York,

(case when ToCity = 'Detroit' then SUM(SalesAmount) else 0 end) Sales_Detroit,

(case when ToCity = 'Los Angeles' then SUM(SalesAmount) else 0 end) Sales_Los_Angeles,

(case when ToCity = 'Paris' then SUM(SalesAmount) else 0 end) Sales_Paris,

(case when ToCity = 'Lyon' then SUM(SalesAmount) else 0 end) Sales_Lyon,

(case when ToCity = 'Bonn' then SUM(SalesAmount) else 0 end) Sales_Bonn,

(case when ToCity = 'Hamburg' then SUM(SalesAmount) else 0 end) Sales_Hamburg,

(case when ToCity = 'Frankfurt' then SUM(SalesAmount) else 0 end) Sales_Frankfurt

from

dbo.CountriesInvolved a inner join dbo.Sales_City b

on a.Sold_To_Country = b.Sold_To_Country

group by b.Sold_To_Country, ToCity

) c

group by Sold_To_Country

GO

-----------------------------------------------------------------------------------------

DBCC FREEPROCCACHE WITH NO_INFOMSGS

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

GO

--===== Basic Cross Tab =================================================================

SELECT

b.Sold_To_Country,

SUM(CASE WHEN ToCity = 'New York' THEN (SalesAmount) ELSE 0 END) Sales_New_York,

SUM(CASE WHEN ToCity = 'Detroit' THEN (SalesAmount) ELSE 0 END) Sales_Detroit,

SUM(CASE WHEN ToCity = 'Los Angeles' THEN (SalesAmount) ELSE 0 END) Sales_Los_Angeles,

SUM(CASE WHEN ToCity = 'Paris' THEN (SalesAmount) ELSE 0 END) Sales_Paris,

SUM(CASE WHEN ToCity = 'Lyon' THEN (SalesAmount) ELSE 0 END) Sales_Lyon,

SUM(CASE WHEN ToCity = 'Bonn' THEN (SalesAmount) ELSE 0 END) Sales_Bonn,

SUM(CASE WHEN ToCity = 'Hamburg' THEN (SalesAmount) ELSE 0 END) Sales_Hamburg,

SUM(CASE WHEN ToCity = 'Frankfurt' THEN (SalesAmount) ELSE 0 END) Sales_Frankfurt

FROM dbo.CountriesInvolved a INNER JOIN dbo.Sales_City b

ON a.Sold_To_Country = b.Sold_To_Country

GROUP BY b.Sold_To_Country

;

GO

-----------------------------------------------------------------------------------------

DBCC FREEPROCCACHE WITH NO_INFOMSGS

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

GO

--===== Pre-Aggregated Cross Tab using CTE ==============================================

WITH

ctePreAgg AS

(

SELECT sc.Sold_To_Country,

sc.ToCity,

SalesAmount = SUM(sc.SalesAmount)

FROM dbo.Sales_City sc

RIGHT JOIN dbo.CountriesInvolved ci

ON sc.Sold_To_Country = ci.Sold_To_Country

WHERE ci.Sold_To_Country > ''

GROUP BY sc.Sold_To_Country, sc.ToCity

)

SELECT Sold_To_Country,