CPU and Memory issues

  • I am a DBA  trying to get to the bottom of some instances when we have hit high CPU for a sustained period impacting performance.

    On our spotlight monitoring tool we use, when we hit 100 CPU for a sustained time I see that the procedure cache has dropped significantly from 5GB down to MBs, this usually goes down over a period of a hour or two beforehand although sometimes has been quicker.  The Proc cache of course being where being where reusable execution plans are stored.

    I beleive that if plans are not in the cache to be reused they have to be recompiled which takes a great degree of CPU to do.  I also run sp_blitzcache to see what plans are left and often it comes up saying that some plans have issues with compilation timeouts, which generally mean a bad plan has been generated.

    I also notice on the memory drilldown in Spotlight that it shows big spike for memory areas.  It is always on the Sort, Hash, Index counter which is memory used for each of these operations in SQL Server and I’m assuming if there is not enough memory it takes it from the proc cache. Whenever the proc cache drops I see significant spikes here, sometimes the spikes up to 30GB used in an action of somekind.

    The developers say that the queries that are running at these times always for the most part seem to be standard queries such as.

    (@param0 int,@param1 bit,@param10 int,@param11 int,@param12 int,@param13 int,@param14 int,@param15 int,@param16 int,@param17 int,@param18 int,@param19 int,@param2 bit,@param20 int,@param21 int,@param22 int,@param23 int,@param24 int,@param25 int,@param26 int,@param27 int,@param28 int,@param29 int,@param3 bigint,@param30 int,@param31 int,@param32 int,@param33 int,@param34 int,@param35 int,@param36 int,@param37 int,@param38 int,@param39 int,@param4 int,@param40 int,@param41 int,@param42 int,@param43 int,@param44 int,@param45 int,@param5 int,@param6 int,@param7 int,@param8 int,@param9 int)WITH

    data AS (

    SELECT

    list_requests.createdOn,

    attribute_30.attribute_30_textValue,

    attribute_31.attribute_31_textValue,

    attribute_26.attribute_26_textValue,

    attribute_27.attribute_27_textValue,

    attribute_73.attribute_73_textValue,

    dictionary_51.description AS dictionary_51_description,

    attribute_29.attribute_29_textValue,

    list_requests.description,

    list_requests.contact_name,

    list_requests.contact_location_fullName,

    list_requests.requestNumber,

    dictionary_54.description AS dictionary_54_description,

    dictionary_55.description AS dictionary_55_description,

    list_requests.assignee_name,

    list_requests.requestId,

    list_requests.requestType_acronym,

    list_requests.requestType_baseRequestTypeId,

    list_requests.isMajorIncident,

    ROW_NUMBER() OVER(ORDER BY dictionary_51.description ASC,list_requests.createdOn ASC) AS rowNumber,

    @param0 AS queryUserId

    FROM

    list_requests

    LEFT OUTER JOIN (SELECT requestAttributeValue.requestId, requestAttributeValue.requestAttributeTypeId, requestAttributeValue.textValue AS attribute_30_textValue FROM requestAttributeValue) attribute_30 ON attribute_30.requestId = list_requests.requestId AND attribute_30.requestAttributeTypeId = 30

    LEFT OUTER JOIN (SELECT requestAttributeValue.requestId, requestAttributeValue.requestAttributeTypeId, requestAttributeValue.textValue AS attribute_31_textValue FROM requestAttributeValue) attribute_31 ON attribute_31.requestId = list_requests.requestId AND attribute_31.requestAttributeTypeId = 31

    LEFT OUTER JOIN (SELECT requestAttributeValue.requestId, requestAttributeValue.requestAttributeTypeId, requestAttributeValue.textValue AS attribute_26_textValue FROM requestAttributeValue) attribute_26 ON attribute_26.requestId = list_requests.requestId AND attribute_26.requestAttributeTypeId = 26

    LEFT OUTER JOIN (SELECT requestAttributeValue.requestId, requestAttributeValue.requestAttributeTypeId, requestAttributeValue.textValue AS attribute_27_textValue FROM requestAttributeValue) attribute_27 ON attribute_27.requestId = list_requests.requestId AND attribute_27.requestAttributeTypeId = 27

    LEFT OUTER JOIN (SELECT requestAttributeValue.requestId, requestAttributeValue.requestAttributeTypeId, requestAttributeValue.textValue AS attribute_73_textValue FROM requestAttributeValue) attribute_73 ON attribute_73.requestId = list_requests.requestId AND attribute_73.requestAttributeTypeId = 73

    LEFT OUTER JOIN (SELECT requestAttributeValue.requestId, requestAttributeValue.requestAttributeTypeId, requestAttributeValue.textValue AS attribute_29_textValue FROM requestAttributeValue) attribute_29 ON attribute_29.requestId = list_requests.requestId AND attribute_29.requestAttributeTypeId = 29

    LEFT OUTER JOIN requestClassification as requestClassification_51 ON (requestClassification_51.dictionaryId = 51 AND requestClassification_51.requestId = list_requests.requestId)

    LEFT OUTER JOIN classification AS dictionary_51 ON dictionary_51.classificationId = requestClassification_51.classificationId

    LEFT OUTER JOIN requestClassification as requestClassification_54 ON (requestClassification_54.dictionaryId = 54 AND requestClassification_54.requestId = list_requests.requestId)

    LEFT OUTER JOIN classification AS dictionary_54 ON dictionary_54.classificationId = requestClassification_54.classificationI

     

    It seems to me that the workload of Marval hits this bad plan(s) for a what is usual query enough times to create memory pressure leading to a high number of recompiles, the server hits a tipping point and it struggles to create new plans needed and cope with the workload and the CPU/performance takes the hit.

    As the product is third party I have little scope to tune queries or Indexes and I have suggested adding memory to the server but the business is unwilling.  If it is a bad query, how do I show proof to the third party for them to look at it.

    Is there anythign else I could do?  The DB compatability level is set to 2008 but the sql server version its running on is 2016, would changing the compat level of the DB upwards help create better plans (if that is the problem).  Would setting LPIM help?

    Can this be a correct assumption?  I am at a loss how else to solve this problem.

    • This topic was modified 2 months ago by  garryha.
    Attachments:
    You must be logged in to view attached files.
  • The query doesn't have any kind of filtering through a WHERE clause? It's likely that running the query forces SQL SErver to move all the data in question through memory, which will absolutely cause memory pressure clearing out the plan cache. Yeah, you're then going to see lots and lots of recompiles. I'd say you already have enough to show the vendor that the query is problematic. Now, whether they listen or not is a different matter entirely.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • No there is no WHERE clause in the statement.

    I have tried suggesting this to the vendor before but they dismissed it and said that they query could not be optimised anymore, but now I have noticed this and sent them more information hopefully they will take another look.

    My company is looking to me to fix the issue, but without the vendors assistance theres little that I can influence, hence my suggestion of changing the compat level of the DB to 2016 level to hopefully force the query to create a better enough plan to not continually kill my cache.

  • Yeah, compat level might help, but it could also burn things down. They're probably trying to avoid dealing the change to the cardinality estimation engine that occurred in 2014 by leaving it at the old compatibility. Switch it over, but be ready to switch it back, right away, if things look like they're going badly.

    But, if you can't filter the query in any way, yeah, it's going to use a ton of memory and that will absolutely affect the plans in cache. No way around it really as long as you can't change the query. The only option is really more hardware.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • Grant Fritchey wrote:

    But, if you can't filter the query in any way, yeah, it's going to use a ton of memory and that will absolutely affect the plans in cache. No way around it really as long as you can't change the query. The only option is really more hardware.

    Be aware that more hardware won't eliminate the problem - it may delay the problem long enough to allow the system to get past the issue without causing too many issues, but the buffer cache will still be affected and you will still see an effect.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I think the current query can be tuned up and that it should definitely help performance.  Will it help it enough to meet your needs?  You'll just have to test it and see.

    Note that including the schema names on the table names is not just cosmetic, it can help cut down on recompiles for potential security reasons.

    SELECT

    list_requests.createdOn,

    attributes.attribute_30_textValue,
    attributes.attribute_31_textValue,
    attributes.attribute_26_textValue,
    attributes.attribute_27_textValue,
    attributes.attribute_73_textValue,
    dictionaries.dictionary_51_description,
    attributes.attribute_29_textValue,

    list_requests.description,
    list_requests.contact_name,
    list_requests.contact_location_fullName,
    list_requests.requestNumber,

    dictionaries.dictionary_54_description,
    dictionaries.dictionary_55_description,

    list_requests.assignee_name,
    list_requests.requestId,
    list_requests.requestType_acronym,
    list_requests.requestType_baseRequestTypeId,
    list_requests.isMajorIncident,

    ROW_NUMBER() OVER(ORDER BY dictionary_51.description ASC,list_requests.createdOn ASC) AS rowNumber,

    @param0 AS queryUserId

    FROM

    dbo.list_requests lr

    OUTER APPLY (
    SELECT
    MAX(CASE WHEN rAV.requestAttributeTypeId = 26 THEN rAV.textValue END) AS attribute_26_textValue,
    MAX(CASE WHEN rAV.requestAttributeTypeId = 27 THEN rAV.textValue END) AS attribute_27_textValue,
    MAX(CASE WHEN rAV.requestAttributeTypeId = 29 THEN rAV.textValue END) AS attribute_29_textValue,
    MAX(CASE WHEN rAV.requestAttributeTypeId = 30 THEN rAV.textValue END) AS attribute_30_textValue,
    MAX(CASE WHEN rAV.requestAttributeTypeId = 31 THEN rAV.textValue END) AS attribute_31_textValue,
    MAX(CASE WHEN rAV.requestAttributeTypeId = 73 THEN rAV.textValue END) AS attribute_73_textValue
    FROM dbo.requestAttributeValue rAV
    WHERE rAV.requestId = list_requests.requestId AND
    rAV.requestAttributeTypeId IN ( 26, 27, 29, 30, 31, 73 )
    ) AS attributes
    /* My best guess is that an OUTER APPLY will perform better here, so I coded that first,
    but a LEFT OUTER JOIN could perform better instead. There's no way to be sure which
    will actually be better without looking at the query plan and/or testing each method. *//*LEFT OUTER JOIN (
    SELECT
    rAV.RequestId,
    MAX(CASE WHEN rAV.requestAttributeTypeId = 26 THEN rAV.textValue END) AS attribute_26_textValue,
    MAX(CASE WHEN rAV.requestAttributeTypeId = 27 THEN rAV.textValue END) AS attribute_27_textValue,
    MAX(CASE WHEN rAV.requestAttributeTypeId = 29 THEN rAV.textValue END) AS attribute_29_textValue,
    MAX(CASE WHEN rAV.requestAttributeTypeId = 30 THEN rAV.textValue END) AS attribute_30_textValue,
    MAX(CASE WHEN rAV.requestAttributeTypeId = 31 THEN rAV.textValue END) AS attribute_31_textValue,
    MAX(CASE WHEN rAV.requestAttributeTypeId = 73 THEN rAV.textValue END) AS attribute_73_textValue
    FROM dbo.requestAttributeValue rAV
    WHERE rAV.requestAttributeTypeId IN ( 26, 27, 29, 30, 31, 73 )
    GROUP BY rAV.RequestId
    ) AS attributes ON attributes.requestId = list_requests.requestId
    */
    OUTER APPLY (
    SELECT
    MAX(CASE WHEN rC.dictionaryId = 51 THEN c.description END) AS dictionary_51_description,
    MAX(CASE WHEN rC.dictionaryId = 54 THEN c.description END) AS dictionary_54_description,
    MAX(CASE WHEN rC.dictionaryId = 55 THEN c.description END) AS dictionary_55_description
    FROM (
    SELECT rC.dictionaryId, rC.classificationId
    FROM dbo.requestClassification rC
    WHERE rC.requestId = list_requests.requestId AND
    rC.dictionaryId IN ( 51, 54, 55 )
    ) AS rC
    LEFT OUTER JOIN dbo.classification c ON c.classificationId = rC.classificationId
    ) AS dictionaries

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

  • Hi,

    what is the output from the execution plan. We got a high cpu usage, because there was an explicit type conversion from nvarchar(4000) to char(29).

    We were able to fix this, and there was no more cpu pressure. First time we needed about 8h, and after we changed the type we only needed 30min.

    Kind regards,

    Andreas

  • If you've got QueryStore enabled, you could confirm if you are in fact experiencing regressed plans. From there you can force SQL Server to use a plan that worked better than others, but depending on what parameters are being passed, that might cause other issues as well.

    Regardless, forcing a plan likely means that query needs optimizing as others have said. I know arguing with vendors can be a pain, but I would document very carefully everything you've tried and get your leadership onboard with the idea that this isn't something you can fix by throwing more resources at it.

     

    Good luck!

  • It looks like the query got truncated.... the common-table-expression data does get completed (there is no closing bracket and no other T-SQL statement using it), the rest of the query may or may not be important.

    What version of MSM are you running? For v14.18, the release notes state that SQL Server versions 2012 / 2012 R2 / 2014 / 2016 are supported.   Have you been told to run the database in 2008 compatibility mode?

     

     

    • This reply was modified 1 month, 3 weeks ago by  tripleAxe.
    • This reply was modified 1 month, 3 weeks ago by  tripleAxe.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply