query performance

  • Hi,

    I have this execution plan and a query which run more than 10 h.Please help me with some performance tuning recommendations.

    UPDATE

    h

    SET

    [DataSourceIsNational] = ISNULL(g.[DataSourceIsNational], h.[DataSourceIsNational])

    , [DataSourceIsComprehensive] = ISNULL(g.[DataSourceIsComprehensive], h.[DataSourceIsComprehensive])

    , [DataSourceCoverage] = ISNULL(g.[DataSourceCoverage], h.[DataSourceCoverage])

    , [Multiplier] = ISNULL(g.[Multiplier], h.[Multiplier])

    , [Nominator] = ISNULL(g.[Nominator], h.[Nominator])

    , [Denominator] = ISNULL(g.[Denominator], h.[Denominator])

    , [YValue] = ISNULL(g.[YValue], h.[YValue])

    , [N] = ISNULL(g.[N], h.[N])

    , [NMissing] = ISNULL(g.[NMissing], h.[NMissing])

    FROM

    #OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons h

    LEFT JOIN

    (

    SELECT

    f.[TimeId] AS [TimeId]

    , f.[GeoId] AS [GeoId]

    , f.[XValue]

    , CAST(MIN(CAST(ISNULL(f.[DataSourceIsNational], 0) AS TINYINT)) AS BIT) AS [DataSourceIsNational]

    , CAST(MIN(CAST(ISNULL(f.[DataSourceIsComprehensive], 0) AS TINYINT)) AS BIT) AS [DataSourceIsComprehensive]

    , AVG(f.[DataSourceCoverage]) AS [DataSourceCoverage]

    , AVG(f.[Multiplier]) AS [Multiplier]

    , SUM(f.[Nominator]) AS [Nominator]

    , CASE WHEN SUM(f.[YValue]) = 0 THEN SUM(f.[Denominator]) ELSE SUM(f.[Nominator]) / SUM(f.[YValue]) END AS [Denominator]

    , SUM(f.[YValue]) AS [YValue]

    , SUM(f.[N]) AS [N]

    , SUM(f.[NMissing]) AS [NMissing]

    FROM

    (

    SELECT DISTINCT

    a.[TimeId] AS [TimeId]

    , a.[GeoId] AS [GeoId]

    , c.[DataSourceIsNational]

    , c.[DataSourceIsComprehensive]

    , c.[DataSourceCoverage]

    , c.[Multiplier]

    , c.[Nominator]

    , c.[Denominator]

    , a.[XValue]

    , c.[YValue]

    , c.[N] AS [N]

    , c.[NMissing] AS [NMissing]

    , c.[id]

    FROM

    #OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons a

    INNER JOIN

    [DM_Ref].[ref].[Time] b

    ON

    a.[TimeId] = b.[id]

    INNER JOIN

    (

    SELECT

    *

    , ROW_NUMBER() OVER (ORDER BY [TimeId], [GeoId], [XValue]) AS [id]

    FROM

    #OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons

    ) c

    ON

    a.[GeoId] = c.[GeoId]

    AND a.[XValue] = c.[XValue]

    INNER JOIN

    [DM_Ref].[ref].[Time] d

    ON

    c.[TimeId] = d.[id]

    AND (

    (b.[Unit] = 'Y' AND d.[NumYearInAll] <= b.[NumYearInAll])

    OR (b.[Unit] = 'Q' AND d.[NumQuarterInAll] <= b.[NumQuarterInAll])

    OR (b.[Unit] = 'M' AND d.[NumMonthInAll] <= b.[NumMonthInAll])

    OR (b.[Unit] = 'W' AND d.[NumWeekInAll] <= b.[NumWeekInAll])

    OR (b.[Unit] = 'D' AND d.[NumDayInAll] <= b.[NumDayInAll])

    )

    INNER JOIN

    #OUT_DBD__calculateMeasureByAlgorithm_X_BASICTIMENUM_YearSeasons e

    ON

    b.[DatetimeStart] >= e.[LastSeasonDatetimeStart]

    AND b.[DatetimeEndExcl] <= e.[DatetimeEndExcl]

    AND d.[DatetimeStart] >= e.[DatetimeStart]

    AND d.[DatetimeEndExcl] <= e.[DatetimeEndExcl]

    ) f

    GROUP BY

    f.[TimeId]

    , f.[GeoId]

    , f.[XValue]

    ) g

    ON

    h.[GeoId] = g.[GeoId]

    AND h.[TimeId] = g.[TimeId]

    AND h.[XValue] = g.[XValue]

    Thanks,

    BR,

    Hadrian

  • MSSQL 2016SP1

  • We need the actual execution plan, please - you've posted the estimated plan.  How many rows are in your temp tables?

    John

  • 1. What is the order of join of the sub query you want.
    2. I think you can remove DISTINCT because you select c.[id] not duplicate
    3. Need more infomation

  • First thing to check are the joins, looks like there is something missing there as hash match inner join is producing 3.18 x 10^9 rows!
    😎

    I would also test the query with option (maxdop 1) as the parallelism can be very expensive on this type of queries.

  • The query contains 3 instances of #OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons which are joined to [DM_Ref].[ref].[Time] 2 times.

    I suspect there is a better way to retrieve required data.

    I also suspect the temp tables don't have any keys/indexes, which means every row from one table has to be matched to every row of another table (or another instance of the same table).

    Which gives those billions or rows in the hash join Eirikur was talking about.

    Review the design of the query, it obviously was not thought through.

    _____________
    Code for TallyGenerator

  • thanks all Guys for helping me to understand some performance issues with this query.

  • Pull the timing section out into a separate query and test to see if you have a missing join somewhere:

    -- outer references b.[id], d.[id]
    SELECT
     [b_id] = b.[id],
     [d_id] = d.[id]
    INTO #TimingSection
    FROM [DM_Ref].[ref].[Time] b
    INNER JOIN [DM_Ref].[ref].[Time] d
     ON  (
      (b.[Unit] = 'Y' AND d.[NumYearInAll] <= b.[NumYearInAll])
      OR (b.[Unit] = 'Q' AND d.[NumQuarterInAll] <= b.[NumQuarterInAll])
      OR (b.[Unit] = 'M' AND d.[NumMonthInAll] <= b.[NumMonthInAll])
      OR (b.[Unit] = 'W' AND d.[NumWeekInAll] <= b.[NumWeekInAll])
      OR (b.[Unit] = 'D' AND d.[NumDayInAll] <= b.[NumDayInAll])
     )
    INNER JOIN #OUT_DBD__calculateMeasureByAlgorithm_X_BASICTIMENUM_YearSeasons e
      ON b.[DatetimeStart] >= e.[LastSeasonDatetimeStart]
      AND b.[DatetimeEndExcl] <= e.[DatetimeEndExcl]

      AND d.[DatetimeStart] >= e.[DatetimeStart]
      AND d.[DatetimeEndExcl] <= e.[DatetimeEndExcl]

    UPDATE h SET
     [DataSourceIsNational] = ISNULL(g.[DataSourceIsNational], h.[DataSourceIsNational])
     , [DataSourceIsComprehensive] = ISNULL(g.[DataSourceIsComprehensive], h.[DataSourceIsComprehensive])
     , [DataSourceCoverage] = ISNULL(g.[DataSourceCoverage], h.[DataSourceCoverage])
     , [Multiplier] = ISNULL(g.[Multiplier], h.[Multiplier])
     , [Nominator] = ISNULL(g.[Nominator], h.[Nominator])
     , [Denominator] = ISNULL(g.[Denominator], h.[Denominator])
     , [YValue] = ISNULL(g.[YValue], h.[YValue])
     , [N] = ISNULL(g.[N], h.[N])
     , [NMissing] = ISNULL(g.[NMissing], h.[NMissing])
    FROM #OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons h
    LEFT JOIN ( -- g
     SELECT
      f.[TimeId] AS [TimeId]
      , f.[GeoId] AS [GeoId]
      , f.[XValue]
      , CAST(MIN(CAST(ISNULL(f.[DataSourceIsNational], 0) AS TINYINT)) AS BIT) AS [DataSourceIsNational]
      , CAST(MIN(CAST(ISNULL(f.[DataSourceIsComprehensive], 0) AS TINYINT)) AS BIT) AS [DataSourceIsComprehensive]
      , AVG(f.[DataSourceCoverage]) AS [DataSourceCoverage]
      , AVG(f.[Multiplier]) AS [Multiplier]
      , SUM(f.[Nominator]) AS [Nominator]
      , CASE WHEN SUM(f.[YValue]) = 0 THEN SUM(f.[Denominator]) ELSE SUM(f.[Nominator]) / SUM(f.[YValue]) END AS [Denominator]
      , SUM(f.[YValue]) AS [YValue]
      , SUM(f.[N]) AS [N]
      , SUM(f.[NMissing]) AS [NMissing]
     FROM ( -- f
      SELECT -- DISTINCT probably not required
        -- if it is, then check your joins!!
       a.[TimeId] AS [TimeId]
       , a.[GeoId] AS [GeoId]
       , a.[XValue]
       , c.[DataSourceIsNational]
       , c.[DataSourceIsComprehensive]
       , c.[DataSourceCoverage]
       , c.[Multiplier]
       , c.[Nominator]
       , c.[Denominator]
       , c.[YValue]
       , c.[N] AS [N]
       , c.[NMissing] AS [NMissing]
       , c.[id]
      FROM #OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons a
    ----------------------------------------------------------------------------------------------------------------
    ---- outer references b.[id], d.[id]
    --  INNER JOIN [DM_Ref].[ref].[Time] b
    --   ON a.[TimeId] = b.[id] -- outer ref
    --  INNER JOIN [DM_Ref].[ref].[Time] d
    --   ON  (
    --    (b.[Unit] = 'Y' AND d.[NumYearInAll] <= b.[NumYearInAll])
    --    OR (b.[Unit] = 'Q' AND d.[NumQuarterInAll] <= b.[NumQuarterInAll])
    --    OR (b.[Unit] = 'M' AND d.[NumMonthInAll] <= b.[NumMonthInAll])
    --    OR (b.[Unit] = 'W' AND d.[NumWeekInAll] <= b.[NumWeekInAll])
    --    OR (b.[Unit] = 'D' AND d.[NumDayInAll] <= b.[NumDayInAll])
    --   )
    --  INNER JOIN #OUT_DBD__calculateMeasureByAlgorithm_X_BASICTIMENUM_YearSeasons e
    --    ON b.[DatetimeStart] >= e.[LastSeasonDatetimeStart]
    --     AND b.[DatetimeEndExcl] <= e.[DatetimeEndExcl]
    --    AND d.[DatetimeStart] >= e.[DatetimeStart]
    --    AND d.[DatetimeEndExcl] <= e.[DatetimeEndExcl]
    ----------------------------------------------------------------------------------------------------------------
      INNER JOIN ( -- c
       SELECT *, ROW_NUMBER() OVER (ORDER BY [TimeId], [GeoId], [XValue]) AS [id]
       FROM #OUT_DBD__calculateMeasureByAlgorithm_Q_WNF_AFFECTEDAREA_CumulativeCasesAllSeasons
      ) c
       ON  c.[GeoId] = a.[GeoId]
       AND c.[XValue] = a.[XValue]
      INNER JOIN #TimingSection t
       ON c.[TimeId] = t.d_id  -- outer ref
       AND a.[TimeId] = t.b_id -- outer ref
     ) f
     GROUP BY f.[TimeId], f.[GeoId], f.[XValue]
    ) g
     ON h.[TimeId] = g.[TimeId]
     AND h.[GeoId] = g.[GeoId]
     AND h.[XValue] = g.[XValue]
    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi,

    also this query  had run few minutes in sql 2012 and now run hours in 2016.
    Thanks,

    Hadrian

  • Hadrian - Wednesday, June 20, 2018 6:55 AM

    Hi,

    also this query  had run few minutes in sql 2012 and now run hours in 2016.
    Thanks,

    Hadrian

    This may well be due to changes in the query optimizer that took place in SQL 2014.   You can run the query using the older version of the optimizer by adding an option to  your query to enable that, but I don't have that info handy.  Someone who's familiar may post it, or you can search the web for that option.   There is also a trace flag that can enable the old optimizer server-wide, and you'll need to look that up too.   However, that's not usually a good long-term solution, as the newer optimizer tends to choke more on code that is "less than optimal" than on fairly good code.   At least with SQL 2016, you have an option to enable the old version on a per query basis, whereas with 2014, the only option was to go server-wide.

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • Hadrian - Wednesday, June 20, 2018 6:55 AM

    Hi,

    also this query  had run few minutes in sql 2012 and now run hours in 2016.
    Thanks,

    Hadrian

    There were changes to the cardility between those versions. Generally it causes more "poorly" written SQL to run worse, which does force better writing standards; and better written queries run better. πŸ™‚

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Couple of points about the query.

    The query updates the table with values from a copy of the same table grouped by unique key

    TimeId, [GeoId], [XValue].

    Every record with the same combination of the key values gets identical values for every other column.

    Would it be logical to remove duplicates?

    The query uses LEFT JOIN and then updates the table with original values where there is no match for the join.

    Would it be better to use INNER JOIN and update only those records which are actually getting updated?

    _____________
    Code for TallyGenerator

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply