query performance

Question

Post reply

query performance

Hadrian

Hall of Fame

Points: 3742
More actions
June 15, 2018 at 5:26 am

#363094

Hi,
I have this execution plan and a query which run more than 10 h.Please help me with some performance tuning recommendations.
UPDATE
h
SET
[DataSourceIsNational] = ISNULL(g.[DataSourceIsNational], h.[DataSourceIsNational])
, [DataSourceIsComprehensive] = ISNULL(g.[DataSourceIsComprehensive], h.[DataSourceIsComprehensive])
, [DataSourceCoverage] = ISNULL(g.[DataSourceCoverage], h.[DataSourceCoverage])
, [Multiplier] = ISNULL(g.[Multiplier], h.[Multiplier])
, [Nominator] = ISNULL(g.[Nominator], h.[Nominator])
, [Denominator] = ISNULL(g.[Denominator], h.[Denominator])
, [YValue] = ISNULL(g.[YValue], h.[YValue])
, [N] = ISNULL(g.[N], h.[N])
, [NMissing] = ISNULL(g.[NMissing], h.[NMissing])
FROM
#OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons h
LEFT JOIN
(
SELECT
f.[TimeId] AS [TimeId]
, f.[GeoId] AS [GeoId]
, f.[XValue]
, CAST(MIN(CAST(ISNULL(f.[DataSourceIsNational], 0) AS TINYINT)) AS BIT) AS [DataSourceIsNational]
, CAST(MIN(CAST(ISNULL(f.[DataSourceIsComprehensive], 0) AS TINYINT)) AS BIT) AS [DataSourceIsComprehensive]
, AVG(f.[DataSourceCoverage]) AS [DataSourceCoverage]
, AVG(f.[Multiplier]) AS [Multiplier]
, SUM(f.[Nominator]) AS [Nominator]
, CASE WHEN SUM(f.[YValue]) = 0 THEN SUM(f.[Denominator]) ELSE SUM(f.[Nominator]) / SUM(f.[YValue]) END AS [Denominator]
, SUM(f.[YValue]) AS [YValue]
, SUM(f.[N]) AS [N]
, SUM(f.[NMissing]) AS [NMissing]
FROM
(
SELECT DISTINCT
a.[TimeId] AS [TimeId]
, a.[GeoId] AS [GeoId]
, c.[DataSourceIsNational]
, c.[DataSourceIsComprehensive]
, c.[DataSourceCoverage]
, c.[Multiplier]
, c.[Nominator]
, c.[Denominator]
, a.[XValue]
, c.[YValue]
, c.[N] AS [N]
, c.[NMissing] AS [NMissing]
, c.[id]
FROM
#OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons a
INNER JOIN
[DM_Ref].[ref].[Time] b
ON
a.[TimeId] = b.[id]
INNER JOIN
(
SELECT
*
, ROW_NUMBER() OVER (ORDER BY [TimeId], [GeoId], [XValue]) AS [id]
FROM
#OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons
) c
ON
a.[GeoId] = c.[GeoId]
AND a.[XValue] = c.[XValue]
INNER JOIN
[DM_Ref].[ref].[Time] d
ON
c.[TimeId] = d.[id]
AND (
(b.[Unit] = 'Y' AND d.[NumYearInAll] <= b.[NumYearInAll])
OR (b.[Unit] = 'Q' AND d.[NumQuarterInAll] <= b.[NumQuarterInAll])
OR (b.[Unit] = 'M' AND d.[NumMonthInAll] <= b.[NumMonthInAll])
OR (b.[Unit] = 'W' AND d.[NumWeekInAll] <= b.[NumWeekInAll])
OR (b.[Unit] = 'D' AND d.[NumDayInAll] <= b.[NumDayInAll])
)
INNER JOIN
#OUT_DBD__calculateMeasureByAlgorithm_X_BASICTIMENUM_YearSeasons e
ON
b.[DatetimeStart] >= e.[LastSeasonDatetimeStart]
AND b.[DatetimeEndExcl] <= e.[DatetimeEndExcl]
AND d.[DatetimeStart] >= e.[DatetimeStart]
AND d.[DatetimeEndExcl] <= e.[DatetimeEndExcl]
) f
GROUP BY
f.[TimeId]
, f.[GeoId]
, f.[XValue]
) g
ON
h.[GeoId] = g.[GeoId]
AND h.[TimeId] = g.[TimeId]
AND h.[XValue] = g.[XValue]
Thanks,
BR,
Hadrian

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Hadrian Hall of Fame Points: 3742 More actions · Answer 1

Hadrian

Hall of Fame

Points: 3742

June 15, 2018 at 5:27 am

#1994097

MSSQL 2016SP1

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 2

We need the actual execution plan, please - you've posted the estimated plan. How many rows are in your temp tables?

John

Thong Lam Valued Member Points: 69 More actions · Answer 3

1. What is the order of join of the sub query you want.
2. I think you can remove DISTINCT because you select c.[id] not duplicate
3. Need more infomation

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 4

First thing to check are the joins, looks like there is something missing there as hash match inner join is producing 3.18 x 10^9 rows!
😎

I would also test the query with option (maxdop 1) as the parallelism can be very expensive on this type of queries.

Sergiy SSC Guru Points: 110206 More actions · Answer 5

The query contains 3 instances of #OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons which are joined to [DM_Ref].[ref].[Time] 2 times.

I suspect there is a better way to retrieve required data.

I also suspect the temp tables don't have any keys/indexes, which means every row from one table has to be matched to every row of another table (or another instance of the same table).

Which gives those billions or rows in the hash join Eirikur was talking about.

Review the design of the query, it obviously was not thought through.

_____________
Code for TallyGenerator

Hadrian Hall of Fame Points: 3742 More actions · Answer 6

thanks all Guys for helping me to understand some performance issues with this query.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 7

Pull the timing section out into a separate query and test to see if you have a missing join somewhere:

-- outer references b.[id], d.[id] SELECT [b_id] = b.[id], [d_id] = d.[id] INTO #TimingSection FROM [DM_Ref].[ref].[Time] b INNER JOIN [DM_Ref].[ref].[Time] d ON ( (b.[Unit] = 'Y' AND d.[NumYearInAll] <= b.[NumYearInAll]) OR (b.[Unit] = 'Q' AND d.[NumQuarterInAll] <= b.[NumQuarterInAll]) OR (b.[Unit] = 'M' AND d.[NumMonthInAll] <= b.[NumMonthInAll]) OR (b.[Unit] = 'W' AND d.[NumWeekInAll] <= b.[NumWeekInAll]) OR (b.[Unit] = 'D' AND d.[NumDayInAll] <= b.[NumDayInAll]) ) INNER JOIN #OUT_DBD__calculateMeasureByAlgorithm_X_BASICTIMENUM_YearSeasons e ON b.[DatetimeStart] >= e.[LastSeasonDatetimeStart] AND b.[DatetimeEndExcl] <= e.[DatetimeEndExcl]

AND d.[DatetimeStart] >= e.[DatetimeStart]
AND d.[DatetimeEndExcl] <= e.[DatetimeEndExcl]

UPDATE h SET
[DataSourceIsNational] = ISNULL(g.[DataSourceIsNational], h.[DataSourceIsNational])
, [DataSourceIsComprehensive] = ISNULL(g.[DataSourceIsComprehensive], h.[DataSourceIsComprehensive])
, [DataSourceCoverage] = ISNULL(g.[DataSourceCoverage], h.[DataSourceCoverage])
, [Multiplier] = ISNULL(g.[Multiplier], h.[Multiplier])
, [Nominator] = ISNULL(g.[Nominator], h.[Nominator])
, [Denominator] = ISNULL(g.[Denominator], h.[Denominator])
, [YValue] = ISNULL(g.[YValue], h.[YValue])
, [N] = ISNULL(g.[N], h.[N])
, [NMissing] = ISNULL(g.[NMissing], h.[NMissing])
FROM #OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons h
LEFT JOIN ( -- g
SELECT
  f.[TimeId] AS [TimeId]
  , f.[GeoId] AS [GeoId]
  , f.[XValue]
  , CAST(MIN(CAST(ISNULL(f.[DataSourceIsNational], 0) AS TINYINT)) AS BIT) AS [DataSourceIsNational]
  , CAST(MIN(CAST(ISNULL(f.[DataSourceIsComprehensive], 0) AS TINYINT)) AS BIT) AS [DataSourceIsComprehensive]
  , AVG(f.[DataSourceCoverage]) AS [DataSourceCoverage]
  , AVG(f.[Multiplier]) AS [Multiplier]
  , SUM(f.[Nominator]) AS [Nominator]
  , CASE WHEN SUM(f.[YValue]) = 0 THEN SUM(f.[Denominator]) ELSE SUM(f.[Nominator]) / SUM(f.[YValue]) END AS [Denominator]
  , SUM(f.[YValue]) AS [YValue]
  , SUM(f.[N]) AS [N]
  , SUM(f.[NMissing]) AS [NMissing]
FROM ( -- f
  SELECT -- DISTINCT probably not required
    -- if it is, then check your joins!!
   a.[TimeId] AS [TimeId]
   , a.[GeoId] AS [GeoId]
   , a.[XValue]
   , c.[DataSourceIsNational]
   , c.[DataSourceIsComprehensive]
   , c.[DataSourceCoverage]
   , c.[Multiplier]
   , c.[Nominator]
   , c.[Denominator]
   , c.[YValue]
   , c.[N] AS [N]
   , c.[NMissing] AS [NMissing]
   , c.[id]
  FROM #OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons a
----------------------------------------------------------------------------------------------------------------
---- outer references b.[id], d.[id]
--  INNER JOIN [DM_Ref].[ref].[Time] b
--   ON a.[TimeId] = b.[id] -- outer ref
--  INNER JOIN [DM_Ref].[ref].[Time] d
--   ON (
--    (b.[Unit] = 'Y' AND d.[NumYearInAll] <= b.[NumYearInAll])
--    OR (b.[Unit] = 'Q' AND d.[NumQuarterInAll] <= b.[NumQuarterInAll])
--    OR (b.[Unit] = 'M' AND d.[NumMonthInAll] <= b.[NumMonthInAll])
--    OR (b.[Unit] = 'W' AND d.[NumWeekInAll] <= b.[NumWeekInAll])
--    OR (b.[Unit] = 'D' AND d.[NumDayInAll] <= b.[NumDayInAll])
--   )
--  INNER JOIN #OUT_DBD__calculateMeasureByAlgorithm_X_BASICTIMENUM_YearSeasons e
--    ON b.[DatetimeStart] >= e.[LastSeasonDatetimeStart]
--     AND b.[DatetimeEndExcl] <= e.[DatetimeEndExcl]

--    AND d.[DatetimeStart] >= e.[DatetimeStart]
--    AND d.[DatetimeEndExcl] <= e.[DatetimeEndExcl]
----------------------------------------------------------------------------------------------------------------
  INNER JOIN ( -- c
   SELECT *, ROW_NUMBER() OVER (ORDER BY [TimeId], [GeoId], [XValue]) AS [id]
   FROM #OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons
  ) c
   ON c.[GeoId] = a.[GeoId]
   AND c.[XValue] = a.[XValue]
  INNER JOIN #TimingSection t
   ON c.[TimeId] = t.d_id -- outer ref
   AND a.[TimeId] = t.b_id -- outer ref
) f
GROUP BY f.[TimeId], f.[GeoId], f.[XValue]
) g
ON h.[TimeId] = g.[TimeId]
AND h.[GeoId] = g.[GeoId]
AND h.[XValue] = g.[XValue]

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Hadrian Hall of Fame Points: 3742 More actions · Answer 8

Hi,

also this query had run few minutes in sql 2012 and now run hours in 2016.
Thanks,

Hadrian

sgmunson SSC Guru Points: 110639 More actions · Answer 9

Hadrian - Wednesday, June 20, 2018 6:55 AM
Hi,
also this query had run few minutes in sql 2012 and now run hours in 2016.
Thanks,
Hadrian

This may well be due to changes in the query optimizer that took place in SQL 2014. You can run the query using the older version of the optimizer by adding an option to your query to enable that, but I don't have that info handy. Someone who's familiar may post it, or you can search the web for that option. There is also a trace flag that can enable the old optimizer server-wide, and you'll need to look that up too. However, that's not usually a good long-term solution, as the newer optimizer tends to choke more on code that is "less than optimal" than on fairly good code. At least with SQL 2016, you have an option to enable the old version on a per query basis, whereas with 2014, the only option was to go server-wide.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Thom A SSC Guru Points: 99126 More actions · Answer 10

Hadrian - Wednesday, June 20, 2018 6:55 AM
Hi,
also this query had run few minutes in sql 2012 and now run hours in 2016.
Thanks,
Hadrian

There were changes to the cardility between those versions. Generally it causes more "poorly" written SQL to run worse, which does force better writing standards; and better written queries run better. 🙂

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

Sergiy SSC Guru Points: 110206 More actions · Answer 11

Couple of points about the query.

The query updates the table with values from a copy of the same table grouped by unique key

TimeId, [GeoId], [XValue].

Every record with the same combination of the key values gets identical values for every other column.

Would it be logical to remove duplicates?

The query uses LEFT JOIN and then updates the table with original values where there is no match for the join.

Would it be better to use INNER JOIN and update only those records which are actually getting updated?

_____________
Code for TallyGenerator