High context switching

  • Posting here after a long time :).

    I came across an interesting issue, i have 2 servers which are part of AG. These 2 servers are EXACTLY the same, has same cores, storage etc.  I have a simple query joining two very large tables, on the read-only replica it comes back in 10 secs and on the primary it takes for ever. Few things to keep in mind:

    i)  I have checked and checked, there are no I/O or blocking issues on primary.
    ii) Just to rule out contention i have shutdown all the processes, CPU was ~25% and the query was still slow.
    iii) Flushed buffer pool and cache, so that it picks up a new plan.
    iv) I do notice very high context switching on primary for that query, i am using adam's sp_whoisactive and for that long running spid i see upto 250K context switching.
    v) MAXDOP settings are also same on both the servers. 

    Things that are different.
    i) Primary has CU5 for SP2 and secondary has CU4 for SP2, not sure if this is causing any issue.
    ii) I am checking if Bios updates are any different.

    Has anyone experienced this?

  • curious_sqldba - Monday, March 4, 2019 9:24 AM

    Posting here after a long time :).

    I came across an interesting issue, i have 2 servers which are part of AG. These 2 servers are EXACTLY the same, has same cores, storage etc.  I have a simple query joining two very large tables, on the read-only replica it comes back in 10 secs and on the primary it takes for ever. Few things to keep in mind:

    i)  I have checked and checked, there are no I/O or blocking issues on primary.
    ii) Just to rule out contention i have shutdown all the processes, CPU was ~25% and the query was still slow.
    iii) Flushed buffer pool and cache, so that it picks up a new plan.
    iv) I do notice very high context switching on primary for that query, i am using adam's sp_whoisactive and for that long running spid i see upto 250K context switching.
    v) MAXDOP settings are also same on both the servers. 

    Things that are different.
    i) Primary has CU5 for SP2 and secondary has CU4 for SP2, not sure if this is causing any issue.
    ii) I am checking if Bios updates are any different.

    Has anyone experienced this?

    Check to make sure lightweight pooling isn't enabled, check for differences on the affinity settings for the processors.
    You mention maxdop is the same but you would also want to check cost threshold for parallelism.
    You may want to check sys.dm_os_schedulers on both servers (other than the switches) to compare and see if you notice any glaring differences.

    Sue

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply