Different query plan on read only seconday replica

  • Hi, a user asked me why a query finishes in  <1 second on an Availability Group secondary replica but takes 4 minutes on the primary replica.

    SELECT job_name, count(*) As JobRunEvents
    FROM dbo.table1 A, dbo.table2 B, dbo.table3 C
    where a.joid = b.joid and a.job_ver = b.job_ver
    and a.joid = c.joid and a.job_ver = c.job_ver
    group by a.job_name
    order by count(*) desc

    I noticed that the secondary replica is using a non parallel plan and finishes quickly.  The primary replica is picking a parallel plan and performing very poorly.

    Both servers are the same version of SQL Server (11.0.6248).
    Both servers have cost threshold for parallelism is set to 50
    Estimated plan cost on both servers is 32 so I don't think it should be opting for parallel.
    Estimated number of rows on both servers is the same.  Stats have been recently updated.
    I've run it with (option recompile) with the same results so don't think it is a bad cached plan.
    The two servers have a different max degree of parallelism. I suspect that would change the time to completion, not the selection of a plan.

    Any ideas?

    Howard

  • PHXHoward - Wednesday, March 15, 2017 12:47 PM

    The two servers have a different max degree of parallelism. I suspect that would change the time to completion, not the selection of a plan.

    That difference can lead to different plans - and exactly in the area you saw.
    If SQL Server sees the query cost as being more than the Cost Threshold for Parallelism setting, it qualifies to potentially use multiple processors for the query.

    Sue

  • PHXHoward - Wednesday, March 15, 2017 12:47 PM

    ...The two servers have a different max degree of parallelism. I suspect that would change the time to completion, not the selection of a plan.

    This difference could directly affect the selection of a plan:
    https://www.brentozar.com/archive/2013/12/q-can-high-maxdop-make-a-query-slower/
    Even though the plan that it settles on is less than the 50 cost threshold for parallelism, the threshold is evaluated against other determined possible plans, so it still may go parallel:
    https://www.brentozar.com/archive/2014/11/sql-server-cost-threshold-for-parallelism/

  • Sue_H - Wednesday, March 15, 2017 1:37 PM

    PHXHoward - Wednesday, March 15, 2017 12:47 PM

    The two servers have a different max degree of parallelism. I suspect that would change the time to completion, not the selection of a plan.

    That difference can lead to different plans - and exactly in the area you saw.
    If SQL Server sees the query cost as being more than the Cost Threshold for Parallelism setting, it qualifies to potentially use multiple processors for the query.

    Sue

    Sue,  the server with maxdop of 8 is choosing a parallel plan and the server with maxdop of 2 is picking a non parallel plan. Interestingly the parallel plan takes 4 minutes to complete and then non parallel plan takes less than 1 second.

     It seems that maxdop of 8 is too high for a busy server with only 16 cores. It might be having a hard time getting 8 cores for the query.  I changed it to maxdop 4 and the query now completes quickly.

    Thanks a lot for the reply.

    Howard

  • PHXHoward - Wednesday, March 15, 2017 2:06 PM

    Sue_H - Wednesday, March 15, 2017 1:37 PM

    PHXHoward - Wednesday, March 15, 2017 12:47 PM

    The two servers have a different max degree of parallelism. I suspect that would change the time to completion, not the selection of a plan.

    That difference can lead to different plans - and exactly in the area you saw.
    If SQL Server sees the query cost as being more than the Cost Threshold for Parallelism setting, it qualifies to potentially use multiple processors for the query.

    Sue

    Sue,  the server with maxdop of 8 is choosing a parallel plan and the server with maxdop of 2 is picking a non parallel plan. Interestingly the parallel plan takes 4 minutes to complete and then non parallel plan takes less than 1 second.

     It seems that maxdop of 8 is too high for a busy server with only 16 cores. It might be having a hard time getting 8 cores for the query.  I changed it to maxdop 4 and the query now completes quickly.

    Thanks a lot for the reply.

    Howard

    I answered that way too quickly - I apologize. Chris took more time and provided some good references to check out..
    Parallelism is weird sometimes. It's worth mentioning that one thing that can make a difference is the load on the system at the time. And loads can fluctuate at any given moment on some systems so you can see inconsistencies. Many times if the CPUs are getting pegged, the optimizer will chose a serial plan. And when you get a mix of a lot of parallel and serial plans all running, sometimes the optimizer can have a hard time figuring out the lowest cost plan. It's generally pretty good but not always.

    Sue

  • Thanks Chris and Sue.  Very good information.

    Howard

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply