September 21, 2011 at 7:23 am
Won't matter if it returns too much data. Assuming that what you posted is the full list of columns, a table scan is actually a very good plan.
September 21, 2011 at 7:31 am
What I have posted is a simplification of the real case. I can't post the real stuff as it owned by the company. But the real case isn't returning much columns at all. Just a few more. The point of everything was that when adding the CASE/WHEN statements, the execution time went up quite a bit. I wondered if I was completely off with my method.
September 21, 2011 at 7:36 am
It adds some overhead but nothing majorly noticable.
I use this to death in my reports and there's no real difference (onec I can have a seek and / or at least a covering index).
Can you post the actual execution plan? Maybe we're not seeing something real obvious that's killing that query.
September 21, 2011 at 8:07 am
September 21, 2011 at 8:38 am
Rokh (9/21/2011)
http://www.sqlservercentral.com/Forums/Attachment9832.aspx
--Before changing the query, there was this missing index in the plan
--I'm not 100% sure it would be the best after the change,
--I would try both version, #1 as is, and #2 with Sold_to_country outside the include after the ToCity
/*
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Sales_City] ([ToCity])
INCLUDE ([Sold_To_Country],[SalesAmount])
*/
SELECT
b.Sold_To_Country
, SUM(CASE WHEN b.ToCity = 'New York' THEN SalesAmount
ELSE 0
END) Sales_New_York
, SUM(CASE WHEN b.ToCity = 'Detroit' THEN SalesAmount
ELSE 0
END) Sales_Detroit
, SUM(CASE WHEN b.ToCity = 'Los Angeles' THEN SalesAmount
ELSE 0
END) Sales_Los_Angeles
, SUM(CASE WHEN b.ToCity = 'Paris' THEN SalesAmount
ELSE 0
END) Sales_Paris
, SUM(CASE WHEN b.ToCity = 'Lyon' THEN SalesAmount
ELSE 0
END) Sales_Lyon
, SUM(CASE WHEN vToCity = 'Bonn' THEN SalesAmount
ELSE 0
END) Sales_Bonn
, SUM(CASE WHEN b.ToCity = 'Hamburg' THEN SalesAmount
ELSE 0
END) Sales_Hamburg
, SUM(CASE WHEN b.ToCity = 'Frankfurt' THEN SalesAmount
ELSE 0
END) Sales_Frankfurt
FROM
--I don't think you need this join, that would whack 60% of the query cost
-- dbo.CountriesInvolved a INNER JOIN
dbo.Sales_City b
-- ON a.Sold_To_Country = b.Sold_To_Country
WHERE
ToCity IN ( 'New York' , 'Detroit' , 'Los Angeles' , 'Paris' , 'Lyon' ,
'Bonn' , 'Hamburg' , 'Frankfurt' )
GROUP BY
b.Sold_To_Country
--You don't, need the group on ToCity
September 22, 2011 at 7:24 am
{Edit} Post withdrawn... still testing and posted too early.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 22, 2011 at 10:40 am
Ok, keeping the thread alive :).
It's not as easy it looks I think. My theory for now is that I need to accept the performance. I hope you'll come up with better.
September 22, 2011 at 10:47 am
Rokh (9/22/2011)
Ok, keeping the thread alive :).It's not as easy it looks I think. My theory for now is that I need to accept the performance. I hope you'll come up with better.
Is it possible & have you tried removing the join from the query?
Combined with Jeff's excellent pre-aggregate idea this should fly WAY faster than it is now.
September 22, 2011 at 12:42 pm
No I didn't.
Like stated before, the inner join is not be removed. It represents a real life situation. I've only skinned it down for this forum. Call it respect 🙂
But the inner join should stay.
Offcourse, without the inner join things will speed up. However, the big picture is that I noticed a serious performance decrease using the case/when statements. As I'm not an expert I can imagine you guys have a better alternative. If not, then I'm still grateful for your persistance and help.
Gr,
~R
September 22, 2011 at 2:02 pm
Rokh (9/22/2011)
No I didn't.Like stated before, the inner join is not be removed. It represents a real life situation. I've only skinned it down for this forum. Call it respect 🙂
But the inner join should stay.
Offcourse, without the inner join things will speed up. However, the big picture is that I noticed a serious performance decrease using the case/when statements. As I'm not an expert I can imagine you guys have a better alternative. If not, then I'm still grateful for your persistance and help.
Gr,
~R
Never had an issue with the case when to pivot data so I don't have a better option aside from doing it client side.
September 22, 2011 at 2:40 pm
Not a biggy. Thanks for the help. I think Jeff is still researching. So let's not close this case yet.
September 23, 2011 at 7:39 pm
I apologize for the delay but yes, indeed, I am doing some testing in my "spare" time. This has become quite interesting.
Conventional wisdom would indicate that a single "SUM" column of convential pre-aggregation methods should be faster than a multi-column set of "SUMs" found both in the "Basic Cross Tab" and the "Original" method posted by Rokh. So much for conventional wisdom. 😉 There are some surprising results to the testing I've done but they also support what Rokh has already stated... his original method is faster than all the other methods including my (now poorly) conceived notion of what pre-aggregation is. The bottom line is that it appears that Rokh has actually discovered a better, faster, and less resource intensive method of "Pre-Aggregation" to be used prior to "pivoting".
I'm not finished with my analysis of the execution plan (I'm using SQL Server 2005 on an older single CPU 32 bit machine for this testing and need to do similar tests on other, more modern boxes) but I thought I'd share my test code (especially the data creation code) in case someone else wants to do some testing. After all, one simple test is worth a thousand expert opinions even if one of them is mine. 😛
I've included some code to create a million rows of test data below. My indexing recommendations are also included in that code. Make sure that you don't run the code anywhere except in TempDB because it drops "real" tables. I used real tables because I wanted to see what DTA would do (which wasn't much). Of course, there should be a PK on the Sales_City table but none of the known columns were worthy.
--=============================================================================================================
-- Create the test environment. Nothing in this section is a part of the solution.
--=============================================================================================================
--===== Identify a nice, safe place for these tests that everyone has.
USE tempdb
;
--===== Conditionally drop the tables to make reruns easier in SSMS.
IF OBJECT_ID('temdb..#CountryCity','U') IS NOT NULL DROP TABLE #CountryCity;
IF OBJECT_ID('dbo.CountriesInvolved','U') IS NOT NULL DROP TABLE dbo.CountriesInvolved;
IF OBJECT_ID('dbo.Sales_City','U') IS NOT NULL DROP TABLE dbo.Sales_City;
--===== Create the given tables
CREATE TABLE dbo.CountriesInvolved
(
Sold_To_Country NVARCHAR(100) NOT NULL
)
;
CREATE TABLE dbo.Sales_City
(
Sold_To_Country NVARCHAR(100) NULL,
ToCity NVARCHAR(100) NULL,
SalesAmount FLOAT NULL
)
;
--===== Build and populate a table to hold countries and states for the random data generator
SELECT RowNum = CAST(RowNum AS INT),
CountryName = CAST(CountryName AS NVARCHAR(100)),
CityName = CAST(CityName AS NVARCHAR(100))
INTO #CountryCity
FROM (
SELECT 1,'United States of America','New York' UNION ALL
SELECT 2,'United States of America','Detroit' UNION ALL
SELECT 3,'United States of America','Los Angeles' UNION ALL
SELECT 4,'France','Paris' UNION ALL
SELECT 5,'France','Lyon' UNION ALL
SELECT 6,'Germany','Bonn' UNION ALL
SELECT 7,'Germany','Hamburg' UNION ALL
SELECT 8,'Germany','Frankfurt' UNION ALL
SELECT 9,'SomeCountry1','SomeCity1' UNION ALL
SELECT 10,'SomeCountry1','SomeCity2' UNION ALL
SELECT 11,'SomeCountry2','SomeCity1' UNION ALL
SELECT 12,'SomeCountry2','SomeCity2'
) d (RowNum, CountryName, CityName)
;
--===== Add a Clustered Index for speed
CREATE CLUSTERED INDEX ix_#CountryCity_RowNum
ON #CountryCity (RowNum) WITH FILLFACTOR = 100
;
--===== Populate and index the given tables
INSERT INTO dbo.CountriesInvolved
(Sold_To_Country)
SELECT DISTINCT
Sold_To_Country = CountryName
FROM #CountryCity
WHERE CountryName NOT LIKE 'SomeCountry%'
;
ALTER TABLE dbo.CountriesInvolved
ADD CONSTRAINT PK_CountriesInvolved
PRIMARY KEY CLUSTERED (Sold_To_Country) WITH FILLFACTOR = 100
;
WITH
cteDataGenerator AS
(
SELECT TOP 1000000
RowNum = ABS(CHECKSUM(NEWID()))%12+1,
SalesAmount = RAND(CHECKSUM(NEWID()))*99+1
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
)
INSERT INTO dbo.Sales_City
(Sold_To_Country, ToCity, SalesAmount)
SELECT Sold_To_Country = cc.CountryName,
ToCity = cc.CityName,
gen.SalesAmount
FROM cteDataGenerator gen
LEFT JOIN #CountryCity cc
ON gen.RowNum = cc.RowNum
;
CREATE NONCLUSTERED INDEX ix_Sales_City_Sold_To_Country_Cover01
ON dbo.Sales_City (Sold_To_Country, ToCity) INCLUDE (SalesAmount)
;
I ran 4 different pieces of code with SQL Profiler on completed batches. The description of each is in the comments in the code. Here's that code...
-----------------------------------------------------------------------------------------
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
GO
--===== The Original Code ===============================================================
select
Sold_To_Country,
sum(Sales_New_York) Sales_New_York,
sum(Sales_Detroit) Sales_Detroit,
sum(Sales_Los_Angeles) Sales_Los_Angeles,
sum(Sales_Paris) Sales_Paris,
sum(Sales_Lyon) Sales_Lyon,
sum(Sales_Bonn) Sales_Bonn,
sum(Sales_Hamburg) Sales_Hamburg,
sum(Sales_Frankfurt) Sales_Frankfurt
from
(
select
b.Sold_To_Country,
(case when ToCity = 'New York' then SUM(SalesAmount) else 0 end) Sales_New_York,
(case when ToCity = 'Detroit' then SUM(SalesAmount) else 0 end) Sales_Detroit,
(case when ToCity = 'Los Angeles' then SUM(SalesAmount) else 0 end) Sales_Los_Angeles,
(case when ToCity = 'Paris' then SUM(SalesAmount) else 0 end) Sales_Paris,
(case when ToCity = 'Lyon' then SUM(SalesAmount) else 0 end) Sales_Lyon,
(case when ToCity = 'Bonn' then SUM(SalesAmount) else 0 end) Sales_Bonn,
(case when ToCity = 'Hamburg' then SUM(SalesAmount) else 0 end) Sales_Hamburg,
(case when ToCity = 'Frankfurt' then SUM(SalesAmount) else 0 end) Sales_Frankfurt
from
dbo.CountriesInvolved a inner join dbo.Sales_City b
on a.Sold_To_Country = b.Sold_To_Country
group by b.Sold_To_Country, ToCity
) c
group by Sold_To_Country
GO
-----------------------------------------------------------------------------------------
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
GO
--===== Basic Cross Tab =================================================================
SELECT
b.Sold_To_Country,
SUM(CASE WHEN ToCity = 'New York' THEN (SalesAmount) ELSE 0 END) Sales_New_York,
SUM(CASE WHEN ToCity = 'Detroit' THEN (SalesAmount) ELSE 0 END) Sales_Detroit,
SUM(CASE WHEN ToCity = 'Los Angeles' THEN (SalesAmount) ELSE 0 END) Sales_Los_Angeles,
SUM(CASE WHEN ToCity = 'Paris' THEN (SalesAmount) ELSE 0 END) Sales_Paris,
SUM(CASE WHEN ToCity = 'Lyon' THEN (SalesAmount) ELSE 0 END) Sales_Lyon,
SUM(CASE WHEN ToCity = 'Bonn' THEN (SalesAmount) ELSE 0 END) Sales_Bonn,
SUM(CASE WHEN ToCity = 'Hamburg' THEN (SalesAmount) ELSE 0 END) Sales_Hamburg,
SUM(CASE WHEN ToCity = 'Frankfurt' THEN (SalesAmount) ELSE 0 END) Sales_Frankfurt
FROM dbo.CountriesInvolved a INNER JOIN dbo.Sales_City b
ON a.Sold_To_Country = b.Sold_To_Country
GROUP BY b.Sold_To_Country
;
GO
-----------------------------------------------------------------------------------------
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
GO
--===== Pre-Aggregated Cross Tab using CTE ==============================================
WITH
ctePreAgg AS
(
SELECT sc.Sold_To_Country,
sc.ToCity,
SalesAmount = SUM(sc.SalesAmount)
FROM dbo.Sales_City sc
RIGHT JOIN dbo.CountriesInvolved ci
ON sc.Sold_To_Country = ci.Sold_To_Country
WHERE ci.Sold_To_Country > ''
GROUP BY sc.Sold_To_Country, sc.ToCity
)
SELECT Sold_To_Country,
SUM(CASE WHEN ToCity = 'New York' THEN (SalesAmount) ELSE 0 END) Sales_New_York,
SUM(CASE WHEN ToCity = 'Detroit' THEN (SalesAmount) ELSE 0 END) Sales_Detroit,
SUM(CASE WHEN ToCity = 'Los Angeles' THEN (SalesAmount) ELSE 0 END) Sales_Los_Angeles,
SUM(CASE WHEN ToCity = 'Paris' THEN (SalesAmount) ELSE 0 END) Sales_Paris,
SUM(CASE WHEN ToCity = 'Lyon' THEN (SalesAmount) ELSE 0 END) Sales_Lyon,
SUM(CASE WHEN ToCity = 'Bonn' THEN (SalesAmount) ELSE 0 END) Sales_Bonn,
SUM(CASE WHEN ToCity = 'Hamburg' THEN (SalesAmount) ELSE 0 END) Sales_Hamburg,
SUM(CASE WHEN ToCity = 'Frankfurt' THEN (SalesAmount) ELSE 0 END) Sales_Frankfurt
FROM ctePreAgg
GROUP BY Sold_To_Country
;
GO
-----------------------------------------------------------------------------------------
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
GO
--===== REAL pre-aggregated Cross Tab using Temp Table ==================================
IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL DROP TABLE #MyHead
SELECT sc.Sold_To_Country,
sc.ToCity,
SalesAmount = SUM(sc.SalesAmount)
INTO #MyHead
FROM dbo.Sales_City sc
RIGHT JOIN dbo.CountriesInvolved ci
ON sc.Sold_To_Country = ci.Sold_To_Country
GROUP BY sc.Sold_To_Country, sc.ToCity
SELECT Sold_To_Country,
SUM(CASE WHEN ToCity = 'New York' THEN (SalesAmount) ELSE 0 END) Sales_New_York,
SUM(CASE WHEN ToCity = 'Detroit' THEN (SalesAmount) ELSE 0 END) Sales_Detroit,
SUM(CASE WHEN ToCity = 'Los Angeles' THEN (SalesAmount) ELSE 0 END) Sales_Los_Angeles,
SUM(CASE WHEN ToCity = 'Paris' THEN (SalesAmount) ELSE 0 END) Sales_Paris,
SUM(CASE WHEN ToCity = 'Lyon' THEN (SalesAmount) ELSE 0 END) Sales_Lyon,
SUM(CASE WHEN ToCity = 'Bonn' THEN (SalesAmount) ELSE 0 END) Sales_Bonn,
SUM(CASE WHEN ToCity = 'Hamburg' THEN (SalesAmount) ELSE 0 END) Sales_Hamburg,
SUM(CASE WHEN ToCity = 'Frankfurt' THEN (SalesAmount) ELSE 0 END) Sales_Frankfurt
FROM #MyHead
GROUP BY Sold_To_Country
;
GO
... and here are the run results (two sets of output from two runs)...
--Jeff Moden
Change is inevitable... Change for the better is not.
September 24, 2011 at 1:21 pm
Reply noted. I'll study as soon I've settled down a bit. Thanks for your effort in advance.
September 25, 2011 at 7:55 am
First off....I agree with Jeff's test results having run on both 2005 std and 2008R2 dev.
personally, I really dont like hardcoding pivot columns, and more often than not, will be pulling these type of results from SSAS.
However, having some 'spare time' and keen to learn, I have been playing around with dynamic pivot queries and indexed views....I am quite pleased with the results. 🙂
Using an indexed view to pre aggregate the data seems to work well.
Also I found that another indexed view significantly reduced a bottle neck on the creation of the dynamic 'IN' clause required.....I was testing on 20M rows with 30 countries each with 100 cities.
so, for your kind review and critique.....some code to follow.
(Please use Jeff's original set up from his earlier post)
USE tempdb
;
--===== Conditionally drop the views
IF Object_id('vw_PreAg', 'V') IS NOT NULL
DROP VIEW vw_PreAg;
IF Object_id('vw_ListCols', 'V') IS NOT NULL
DROP VIEW vw_ListCols;
GO
--===== Create an indexed view that will speed up the creation of the dynamic 'IN' clause @listcols
--===== was found to very beneficial in testing on 20M rows with 30 counries each with 100 cities
CREATE VIEW [dbo].[VW_ListCols]
WITH SCHEMABINDING
AS
SELECT dbo.Sales_City.ToCity AS SC,
Count_big(*) AS Cnt
FROM dbo.CountriesInvolved
INNER JOIN dbo.Sales_City
ON dbo.CountriesInvolved.Sold_To_Country = dbo.Sales_City.Sold_To_Country
GROUP BY dbo.Sales_City.ToCity
GO
--===== Create a unique clustered index on vw_ListCols
SET ARITHABORT ON
GO
SET CONCAT_NULL_YIELDS_NULL ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_NULLS ON
GO
SET ANSI_PADDING ON
GO
SET ANSI_WARNINGS ON
GO
SET NUMERIC_ROUNDABORT OFF
GO
CREATE UNIQUE CLUSTERED INDEX [CI_IX_ListCols] ON [dbo].[VW_ListCols] ( [SC] ASC )
--WITH (
-- PAD_INDEX = OFF,
-- STATISTICS_NORECOMPUTE = OFF,
-- SORT_IN_TEMPDB = OFF,
-- IGNORE_DUP_KEY = OFF,
-- DROP_EXISTING = OFF,
-- ONLINE = OFF,
-- ALLOW_ROW_LOCKS = ON,
-- ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]
GO
--===== Create an indexed view that efficiently pre aggregates the data
CREATE VIEW [dbo].[vw_PreAg]
WITH SCHEMABINDING
AS
SELECT dbo.sales_city.Sold_To_Country,
dbo.sales_city.ToCity,
SUM(Isnull(dbo.sales_city.SalesAmount, 0)) AS SumAmt,
Count_big(*) AS cnt
FROM dbo.Sales_city
GROUP BY dbo.sales_city.Sold_To_Country,
dbo.sales_city.ToCity
GO
--===== Create a unique clustered index on vw_PreAg
SET ARITHABORT ON
GO
SET CONCAT_NULL_YIELDS_NULL ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_NULLS ON
GO
SET ANSI_PADDING ON
GO
SET ANSI_WARNINGS ON
GO
SET NUMERIC_ROUNDABORT OFF
GO
CREATE UNIQUE CLUSTERED INDEX [CI_iX_preag] ON [dbo].[vw_PreAg] ( [Sold_To_Country] ASC, [ToCity] ASC )
--WITH (
-- PAD_INDEX = OFF,
-- STATISTICS_NORECOMPUTE = OFF,
-- SORT_IN_TEMPDB = OFF,
-- IGNORE_DUP_KEY = OFF,
-- DROP_EXISTING = OFF,
-- ONLINE = OFF,
-- ALLOW_ROW_LOCKS = ON,
-- ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]
GO
the actual query :
--===== The 'Dynamic' pivot query
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
GO
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @listCol VARCHAR(MAX)
DECLARE @query VARCHAR(MAX)
---- following creates a dynamic string to be used in the PIVOT 'IN' clause.
SELECT @listCol = Stuff((SELECT DISTINCT '],[' + SC
FROM VW_ListCols WITH ( NOEXPAND ) --the NOEXPAND required to force index view usage on on non enterprise versions
ORDER BY '],[' + SC
FOR XML PATH('')), 1, 2, '') + ']'
--PRINT @ListCol ---uncomment to review
--===== the actual PIVOT code
SET @query =
'
SELECT *
FROM (SELECT vw_PreAg.Sold_To_Country,
vw_PreAg.ToCity,
SUM(SumAmt) AS Total
FROM vw_PreAg WITH (NOEXPAND)
INNER JOIN
CountriesInvolved ON vw_PreAg.Sold_To_Country = CountriesInvolved.Sold_To_Country
GROUP BY vw_PreAg.Sold_To_Country,
vw_PreAg.ToCity) AS src PIVOT ( SUM(total) FOR ToCity IN ( ' + @listCol + ' )
) AS pvt
ORDER BY pvt.Sold_to_Country
'
--PRINT @query ---uncomment to review
EXECUTE ( @query )
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
September 25, 2011 at 8:46 am
J Livingston SQL (9/25/2011)
First off....I agree with Jeff's test results having run on both 2005 std and 2008R2 dev.personally, I really dont like hardcoding pivot columns, and more often than not, will be pulling these type of results from SSAS.
However, having some 'spare time' and keen to learn, I have been playing around with dynamic pivot queries and indexed views....I am quite pleased with the results. 🙂
Using an indexed view to pre aggregate the data seems to work well.
Also I found that another indexed view significantly reduced a bottle neck on the creation of the dynamic 'IN' clause required.....I was testing on 20M rows with 30 countries each with 100 cities.
so, for your kind review and critique.....some code to follow.
(Please use Jeff's original set up from his earlier post)
Do you have the code that built the 20M*30*100 data for both tables so we can test at the level you did?
Did you do a comparison of the original code to the code to the Indexed View/Dynamic method? If so, can you show us the results as the differences in duration and resource usage?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 39 total)
You must be logged in to reply to this topic. Login to reply