September 15, 2011 at 11:51 am
So after my fail in the first post I recreated the problem. The goal is to optimize this query in two ways:
- Do I really need a nested select statement?
- The "CASE WHEN" statements are really slow on a large dataset. Is there a better way?
This is the query:
select
Sold_To_Country,
sum(Sales_New_York) Sales_New_York,
sum(Sales_Detroit) Sales_Detroit,
sum(Sales_Los_Angeles) Sales_Los_Angeles,
sum(Sales_Paris) Sales_Paris,
sum(Sales_Lyon) Sales_Lyon,
sum(Sales_Bonn) Sales_Bonn,
sum(Sales_Hamburg) Sales_Hamburg,
sum(Sales_Frankfurt) Sales_Frankfurt
from
(
select
b.Sold_To_Country,
(case when ToCity = 'New York' then SUM(SalesAmount) else 0 end) Sales_New_York,
(case when ToCity = 'Detroit' then SUM(SalesAmount) else 0 end) Sales_Detroit,
(case when ToCity = 'Los Angeles' then SUM(SalesAmount) else 0 end) Sales_Los_Angeles,
(case when ToCity = 'Paris' then SUM(SalesAmount) else 0 end) Sales_Paris,
(case when ToCity = 'Lyon' then SUM(SalesAmount) else 0 end) Sales_Lyon,
(case when ToCity = 'Bonn' then SUM(SalesAmount) else 0 end) Sales_Bonn,
(case when ToCity = 'Hamburg' then SUM(SalesAmount) else 0 end) Sales_Hamburg,
(case when ToCity = 'Frankfurt' then SUM(SalesAmount) else 0 end) Sales_Frankfurt
from
dbo.CountriesInvolved a inner join dbo.Sales_City b
on a.Sold_To_Country = b.Sold_To_Country
group by b.Sold_To_Country, ToCity
) c
group by Sold_To_Country
September 15, 2011 at 12:00 pm
i think you can just do it with inline SUM..CASE...
you almost had the syntax in your example:
does this do what you wanted?
SELECT
b.Sold_To_Country,
SUM(CASE WHEN ToCity = 'New York' THEN (SalesAmount) ELSE 0 END) Sales_New_York,
SUM(CASE WHEN ToCity = 'Detroit' THEN (SalesAmount) ELSE 0 END) Sales_Detroit,
SUM(CASE WHEN ToCity = 'Los Angeles' THEN (SalesAmount) ELSE 0 END) Sales_Los_Angeles,
SUM(CASE WHEN ToCity = 'Paris' THEN (SalesAmount) ELSE 0 END) Sales_Paris,
SUM(CASE WHEN ToCity = 'Lyon' THEN (SalesAmount) ELSE 0 END) Sales_Lyon,
SUM(CASE WHEN ToCity = 'Bonn' THEN (SalesAmount) ELSE 0 END) Sales_Bonn,
SUM(CASE WHEN ToCity = 'Hamburg' THEN (SalesAmount) ELSE 0 END) Sales_Hamburg,
SUM(CASE WHEN ToCity = 'Frankfurt' THEN (SalesAmount) ELSE 0 END) Sales_Frankfurt
FROM dbo.CountriesInvolved a
INNER JOIN dbo.Sales_City b
ON a.Sold_To_Country = b.Sold_To_Country
GROUP BY b.Sold_To_Country, ToCity
Lowell
September 15, 2011 at 1:11 pm
It's not the desired result. Here's the deal:
Using this query:
select
b.Sold_To_Country,
(case when ToCity = 'New York' then SUM(SalesAmount) else 0 end) Sales_New_York,
(case when ToCity = 'Detroit' then SUM(SalesAmount) else 0 end) Sales_Detroit,
(case when ToCity = 'Los Angeles' then SUM(SalesAmount) else 0 end) Sales_Los_Angeles,
(case when ToCity = 'Paris' then SUM(SalesAmount) else 0 end) Sales_Paris,
(case when ToCity = 'Lyon' then SUM(SalesAmount) else 0 end) Sales_Lyon,
(case when ToCity = 'Bonn' then SUM(SalesAmount) else 0 end) Sales_Bonn,
(case when ToCity = 'Hamburg' then SUM(SalesAmount) else 0 end) Sales_Hamburg,
(case when ToCity = 'Frankfurt' then SUM(SalesAmount) else 0 end) Sales_Frankfurt
from
dbo.CountriesInvolved a inner join dbo.Sales_City b
on a.Sold_To_Country = b.Sold_To_Country
group by b.Sold_To_Country, ToCity
Results in this:
So I built another query around it (which can be seen in the first post,) sum again and have only one group left (without a subgroup):
And that's the desired result. So that's challenge 1.
The second challenge is to replace the the "case when's". They force a table scan. It's very inefficient execution time wise.
September 15, 2011 at 8:36 pm
Rokh (9/15/2011)
It's not the desired result. Here's the deal:Using this query:
select
b.Sold_To_Country,
(case when ToCity = 'New York' then SUM(SalesAmount) else 0 end) Sales_New_York,
(case when ToCity = 'Detroit' then SUM(SalesAmount) else 0 end) Sales_Detroit,
(case when ToCity = 'Los Angeles' then SUM(SalesAmount) else 0 end) Sales_Los_Angeles,
(case when ToCity = 'Paris' then SUM(SalesAmount) else 0 end) Sales_Paris,
(case when ToCity = 'Lyon' then SUM(SalesAmount) else 0 end) Sales_Lyon,
(case when ToCity = 'Bonn' then SUM(SalesAmount) else 0 end) Sales_Bonn,
(case when ToCity = 'Hamburg' then SUM(SalesAmount) else 0 end) Sales_Hamburg,
(case when ToCity = 'Frankfurt' then SUM(SalesAmount) else 0 end) Sales_Frankfurt
from
dbo.CountriesInvolved a inner join dbo.Sales_City b
on a.Sold_To_Country = b.Sold_To_Country
group by b.Sold_To_Country, ToCity
Results in this:
So I built another query around it (which can be seen in the first post,) sum again and have only one group left (without a subgroup):
And that's the desired result. So that's challenge 1.
The second challenge is to replace the the "case when's". They force a table scan. It's very inefficient execution time wise.
Lutz made a minor error (copy'n'paste error, most likely) and I've crossed it out below but he was definitely on the right track... Also, which table is "ToCity" and "Sales Amount" in? I ask because I think you might not even need the join to the CountriesInvolved table. And, no... the CASE WHEN's aren't the problem for such code. The fact that you're trying to SUM and PIVOT at the same time is the problem. Lemme know about the things I asked about above and I'll show you how to write this for max performance.
SELECT
b.Sold_To_Country,
SUM(CASE WHEN ToCity = 'New York' THEN (SalesAmount) ELSE 0 END) Sales_New_York,
SUM(CASE WHEN ToCity = 'Detroit' THEN (SalesAmount) ELSE 0 END) Sales_Detroit,
SUM(CASE WHEN ToCity = 'Los Angeles' THEN (SalesAmount) ELSE 0 END) Sales_Los_Angeles,
SUM(CASE WHEN ToCity = 'Paris' THEN (SalesAmount) ELSE 0 END) Sales_Paris,
SUM(CASE WHEN ToCity = 'Lyon' THEN (SalesAmount) ELSE 0 END) Sales_Lyon,
SUM(CASE WHEN ToCity = 'Bonn' THEN (SalesAmount) ELSE 0 END) Sales_Bonn,
SUM(CASE WHEN ToCity = 'Hamburg' THEN (SalesAmount) ELSE 0 END) Sales_Hamburg,
SUM(CASE WHEN ToCity = 'Frankfurt' THEN (SalesAmount) ELSE 0 END) Sales_Frankfurt
FROM dbo.CountriesInvolved a
INNER JOIN dbo.Sales_City b
ON a.Sold_To_Country = b.Sold_To_Country
GROUP BY b.Sold_To_Country, ToCity
--Jeff Moden
Change is inevitable... Change for the better is not.
September 15, 2011 at 8:52 pm
Also, if your table has more cities in it than what you have in the query, add the following WHERE clause just before the GROUP BY...
WHERE ToCity IN ('New York', 'Detroit', 'Los Angeles', 'Paris', 'Lyon', 'Bonn', 'Hamburg', 'Frankfurt')
--Jeff Moden
Change is inevitable... Change for the better is not.
September 15, 2011 at 11:35 pm
Jeff Moden (9/15/2011)
Also, if your table has more cities in it than what you have in the query, add the following WHERE clause just before the GROUP BY...
WHERE ToCity IN ('New York', 'Detroit', 'Los Angeles', 'Paris', 'Lyon', 'Bonn', 'Hamburg', 'Frankfurt')
The original query (I merely simulated the structure to keep things simple here) has this where clause 🙂
. Also, which table is "ToCity" and "Sales Amount" in? I ask because I think you might not even need the join to the CountriesInvolved table.
The join is needed as CountriesInvolved is the resultset of a CTE, deciding which coutries I need to analyse.
So it all starts like this:
;WITH CountriesInvolved AS
(
Sold_To_Country,
-- Other calculations and stuff
),
ExtendedDetails AS
(
select
--stuff mainly consisting of sum(case when else end) cases
from
CountriesInvolved a inner join Sales_City b
on a.a.Sold_To_Country = b.Sold_To_Country
)
-- more sql here to bind the resultset of above with 3 other tables
I appreciate the help I'm getting here by the way. Thanks guys.
September 16, 2011 at 6:05 am
I still need to know which table ToCity and SalesAmount are in to optimize this to the tune of about a 300%+ performance gain. Are they in the Sales_City table (educated guess but still just a guess)? If not, please let me know where.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 16, 2011 at 10:35 am
Yes, they're in the Sales_City table! In the process of making screenies I forgot to post the answer.
September 16, 2011 at 12:48 pm
Well I'll be dammed. The first query is faster then the more readable and shorter query...
I've tested it using 19 million records.
select
Sold_To_Country,
sum(Sales_New_York) Sales_New_York,
sum(Sales_Detroit) Sales_Detroit,
sum(Sales_Los_Angeles) Sales_Los_Angeles,
sum(Sales_Paris) Sales_Paris,
sum(Sales_Lyon) Sales_Lyon,
sum(Sales_Bonn) Sales_Bonn,
sum(Sales_Hamburg) Sales_Hamburg,
sum(Sales_Frankfurt) Sales_Frankfurt
from
(
select
b.Sold_To_Country,
(case when ToCity = 'New York'then SUM(SalesAmount) else 0 end) Sales_New_York,
(case when ToCity = 'Detroit'then SUM(SalesAmount) else 0 end) Sales_Detroit,
(case when ToCity = 'Los Angeles' then SUM(SalesAmount) else 0 end) Sales_Los_Angeles,
(case when ToCity = 'Paris'then SUM(SalesAmount) else 0 end) Sales_Paris,
(case when ToCity = 'Lyon'then SUM(SalesAmount) else 0 end) Sales_Lyon,
(case when ToCity = 'Bonn'then SUM(SalesAmount) else 0 end) Sales_Bonn,
(case when ToCity = 'Hamburg'then SUM(SalesAmount) else 0 end) Sales_Hamburg,
(case when ToCity = 'Frankfurt'then SUM(SalesAmount) else 0 end) Sales_Frankfurt
from
dbo.CountriesInvolved a inner join dbo.Sales_City b
on a.Sold_To_Country = b.Sold_To_Country
group by b.Sold_To_Country, ToCity
) c
group by Sold_To_Country
-- this one below produces the same output but is twice as slow...
SELECT
b.Sold_To_Country,
SUM(CASE WHEN ToCity = 'New York' THEN (SalesAmount) ELSE 0 END) Sales_New_York,
SUM(CASE WHEN ToCity = 'Detroit' THEN (SalesAmount) ELSE 0 END) Sales_Detroit,
SUM(CASE WHEN ToCity = 'Los Angeles' THEN (SalesAmount) ELSE 0 END) Sales_Los_Angeles,
SUM(CASE WHEN ToCity = 'Paris' THEN (SalesAmount) ELSE 0 END) Sales_Paris,
SUM(CASE WHEN ToCity = 'Lyon' THEN (SalesAmount) ELSE 0 END) Sales_Lyon,
SUM(CASE WHEN ToCity = 'Bonn' THEN (SalesAmount) ELSE 0 END) Sales_Bonn,
SUM(CASE WHEN ToCity = 'Hamburg' THEN (SalesAmount) ELSE 0 END) Sales_Hamburg,
SUM(CASE WHEN ToCity = 'Frankfurt' THEN (SalesAmount) ELSE 0 END) Sales_Frankfurt
FROM dbo.CountriesInvolved a INNER JOIN dbo.Sales_City b
ON a.Sold_To_Country = b.Sold_To_Country
GROUP BY b.Sold_To_Country
September 18, 2011 at 12:38 pm
Rokh (9/16/2011)
Well I'll be dammed. The first query is faster then the more readable and shorter query...I've tested it using 19 million records.
In that case, could you post the CREATE TABLE statements for the two tables involved here... I've got an idea to really make it sail...
--Jeff Moden
Change is inevitable... Change for the better is not.
September 19, 2011 at 10:39 am
CREATE TABLE [dbo].[CountriesInvolved](
[Sold_To_Country] [nvarchar](255) NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[Sales_City](
[Sold_To_Country] [nvarchar](255) NULL,
[ToCity] [nvarchar](255) NULL,
[SalesAmount] [float] NULL
) ON [PRIMARY]
September 20, 2011 at 8:24 pm
Thanks,
Try this... it's a technique called "Pre-Aggregation" (phrase coined by Peter Larsson). It usually runs quite a bit faster than many straight forward Cross-Tab methods and is a bit less code to write than the faster method you found.
WITH
ctePreAgg AS
(
SELECT sc.Sold_To_Country,
sc.ToCity,
SalesAmount = SUM(sc.SalesAmount)
FROM dbo.Sales_City sc
INNEr JOIN dbo.CountriesInvolved ci
ON sc.Sold_To_Country = ci.Sold_To_Country
GROUP BY sc.Sold_To_Country, sc.ToCity
)
SELECT Sold_To_Country,
SUM(CASE WHEN ToCity = 'New York' THEN (SalesAmount) ELSE 0 END) Sales_New_York,
SUM(CASE WHEN ToCity = 'Detroit' THEN (SalesAmount) ELSE 0 END) Sales_Detroit,
SUM(CASE WHEN ToCity = 'Los Angeles' THEN (SalesAmount) ELSE 0 END) Sales_Los_Angeles,
SUM(CASE WHEN ToCity = 'Paris' THEN (SalesAmount) ELSE 0 END) Sales_Paris,
SUM(CASE WHEN ToCity = 'Lyon' THEN (SalesAmount) ELSE 0 END) Sales_Lyon,
SUM(CASE WHEN ToCity = 'Bonn' THEN (SalesAmount) ELSE 0 END) Sales_Bonn,
SUM(CASE WHEN ToCity = 'Hamburg' THEN (SalesAmount) ELSE 0 END) Sales_Hamburg,
SUM(CASE WHEN ToCity = 'Frankfurt' THEN (SalesAmount) ELSE 0 END) Sales_Frankfurt
FROM ctePreAgg
GROUP BY Sold_To_Country
OPTION (RECOMPILE)
;
Of course, I tested with NonClustered indexes on both tables where "Sold_To_Country" appears.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 21, 2011 at 2:59 am
Funny, has about the same execution speed as my original query. have to deal with the fact that implementing CASE/WHEN statements like this implies a serious performance hit. It forces table scans. So be it 🙂
Thanks for your help! Appreciate it a lot!
September 21, 2011 at 5:06 am
Rokh (9/21/2011)
Funny, has about the same execution speed as my original query. have to deal with the fact that implementing CASE/WHEN statements like this implies a serious performance hit. It forces table scans. So be it 🙂Thanks for your help! Appreciate it a lot!
No it does NOT. You have a scan here because you have no where clause.
September 21, 2011 at 7:10 am
Oki, so I added a where clause. The execution plan still shows 39% table scan on the Sales_City table.
The final query looks like this then:
select
Sold_To_Country,
sum(Sales_New_York) Sales_New_York,
sum(Sales_Detroit) Sales_Detroit,
sum(Sales_Los_Angeles) Sales_Los_Angeles,
sum(Sales_Paris) Sales_Paris,
sum(Sales_Lyon) Sales_Lyon,
sum(Sales_Bonn) Sales_Bonn,
sum(Sales_Hamburg) Sales_Hamburg,
sum(Sales_Frankfurt) Sales_Frankfurt
from
(
select
b.Sold_To_Country,
(case when ToCity = 'New York' then SUM(SalesAmount) else 0 end) Sales_New_York,
(case when ToCity = 'Detroit' then SUM(SalesAmount) else 0 end) Sales_Detroit,
(case when ToCity = 'Los Angeles' then SUM(SalesAmount) else 0 end) Sales_Los_Angeles,
(case when ToCity = 'Paris' then SUM(SalesAmount) else 0 end) Sales_Paris,
(case when ToCity = 'Lyon' then SUM(SalesAmount) else 0 end) Sales_Lyon,
(case when ToCity = 'Bonn' then SUM(SalesAmount) else 0 end) Sales_Bonn,
(case when ToCity = 'Hamburg' then SUM(SalesAmount) else 0 end) Sales_Hamburg,
(case when ToCity = 'Frankfurt' then SUM(SalesAmount) else 0 end) Sales_Frankfurt
from
dbo.CountriesInvolved a inner join dbo.Sales_City b
on a.Sold_To_Country = b.Sold_To_Country
where
-- decreases table scan with 12%.
ToCity IN ('New York', 'Detroit', 'Los Angeles', 'Paris', 'Lyon', 'Bonn', 'Hamburg', 'Frankfurt')
group by b.Sold_To_Country, ToCity
) c
group by Sold_To_Country
Viewing 15 posts - 1 through 15 (of 39 total)
You must be logged in to reply to this topic. Login to reply