CPU Spikes Caused by Periodic Scheduled Jobs

  • Comments posted to this topic are about the item CPU Spikes Caused by Periodic Scheduled Jobs

    Igor Micev,My blog: www.igormicev.com

  • Two things are going to cause CPU spikes by scheduled jobs:

    1. Poor design that causes a requirement that a job be run too frequently.

    2. Delusions that the function of a task is so critical that it must be run so frequently.

    Both of these are extremely difficult to convince anyone they need to be corrected after they have been implemented.

    ‘When people want to believe something bad enough

    facts and logic never prove to be difficult obstacles.’

    David Baldacci: The Whole Truth

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • skeleton567 (12/19/2016)


    Two things are going to cause CPU spikes by scheduled jobs:

    1. Poor design that causes a requirement that a job be run too frequently.

    2. Delusions that the function of a task is so critical that it must be run so frequently.

    Both of these are extremely difficult to convince anyone they need to be corrected after they have been implemented.

    ‘When people want to believe something bad enough

    facts and logic never prove to be difficult obstacles.’

    David Baldacci: The Whole Truth

    The jobs must be running on every minute, because it's a very busy and fast environment. We've got updates a couple of times per minute.

    Igor Micev,My blog: www.igormicev.com

  • Great insight, thanks.

  • Igor Micev (12/20/2016)


    skeleton567 (12/19/2016)


    Two things are going to cause CPU spikes by scheduled jobs:

    1. Poor design that causes a requirement that a job be run too frequently.

    2. Delusions that the function of a task is so critical that it must be run so frequently.

    Both of these are extremely difficult to convince anyone they need to be corrected after they have been implemented.

    ‘When people want to believe something bad enough

    facts and logic never prove to be difficult obstacles.’

    David Baldacci: The Whole Truth

    The jobs must be running on every minute, because it's a very busy and fast environment. We've got updates a couple of times per minute.

    OK, I think you just illustrated Baldacci's point AND my second point above, so I refer you to my first point. Job initiation and termination can be VERY expensive, so you need to attack that part of the design. This is a classic case for thinking OUTSIDE the box. Get over the 'we've always done it this way' thing.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • skeleton567 (12/20/2016)


    Igor Micev (12/20/2016)


    skeleton567 (12/19/2016)


    Two things are going to cause CPU spikes by scheduled jobs:

    1. Poor design that causes a requirement that a job be run too frequently.

    2. Delusions that the function of a task is so critical that it must be run so frequently.

    Both of these are extremely difficult to convince anyone they need to be corrected after they have been implemented.

    ‘When people want to believe something bad enough

    facts and logic never prove to be difficult obstacles.’

    David Baldacci: The Whole Truth

    The jobs must be running on every minute, because it's a very busy and fast environment. We've got updates a couple of times per minute.

    OK, I think you just illustrated Baldacci's point AND my second point above, so I refer you to my first point. Job initiation and termination can be VERY expensive, so you need to attack that part of the design. This is a classic case for thinking OUTSIDE the box. Get over the 'we've always done it this way' thing. I don't know the particular aspects of your running job, and I have never used it, but I would be looking at the Service Broker and message queing for a possible alternative.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

  • Nice article, thanks for sharing. We have a number of systems that have some high “peakedness” for CPU. It hasn’t become an issue yet, but I’ll look into this approach you’ve outlined.

    In relation to skeleton567 comment(s), I would suggest a different approach to your job processing. I consider Agent as a “batch” system meant to deal with larger operations. Maintenance plans and such makes sense in Agent, as well as larger “functions” of your database application. If you are scheduling a 1 (or 5) minute job to process application data, I think you could consider other approaches.

    I’ve been spending a lot of time lately with Service Broker (SSB), and the concept of asynchronous triggers. The design approach is to use a trigger on a table (in your case, the one being updated sub-minute) that first INSERTs data to another table, then sends a message to a SSB queue. Depending on the data involved (size, complexity, etc) you could put the data in the SSB message itself. Then your “processing” code that currently runs in Agent can be run from within an “activation” procedure.

    If you’re interested in this approach, these 2 site www.davewentzel.com and http://www.sqlnotes.info[/url] are probably the best I’ve found so far.

    Beer's Law: Absolutum obsoletum
    "if it works it's out-of-date"

  • skeleton567 (12/20/2016)


    Igor Micev (12/20/2016)


    skeleton567 (12/19/2016)


    Two things are going to cause CPU spikes by scheduled jobs:

    1. Poor design that causes a requirement that a job be run too frequently.

    2. Delusions that the function of a task is so critical that it must be run so frequently.

    Both of these are extremely difficult to convince anyone they need to be corrected after they have been implemented.

    ‘When people want to believe something bad enough

    facts and logic never prove to be difficult obstacles.’

    David Baldacci: The Whole Truth

    The jobs must be running on every minute, because it's a very busy and fast environment. We've got updates a couple of times per minute.

    OK, I think you just illustrated Baldacci's point AND my second point above, so I refer you to my first point. Job initiation and termination can be VERY expensive, so you need to attack that part of the design. This is a classic case for thinking OUTSIDE the box. Get over the 'we've always done it this way' thing.

    Job initiation and termination can be VERY expensive - If I even get that this is true, then the article shows how to reduce the spikes caused even by the expensive initiation and termination.

    Anyway, that design is imposed by the application developers, so it would be a little bit difficult for me to make them change that. All I can do for them currently is reducing the spikes because of their decision to go that way. Will have this in mind...

    Thanks.

    Igor Micev,My blog: www.igormicev.com

  • DEK46656 (12/20/2016)


    Nice article, thanks for sharing. We have a number of systems that have some high “peakedness” for CPU. It hasn’t become an issue yet, but I’ll look into this approach you’ve outlined.

    In relation to skeleton567 comment(s), I would suggest a different approach to your job processing. I consider Agent as a “batch” system meant to deal with larger operations. Maintenance plans and such makes sense in Agent, as well as larger “functions” of your database application. If you are scheduling a 1 (or 5) minute job to process application data, I think you could consider other approaches.

    I’ve been spending a lot of time lately with Service Broker (SSB), and the concept of asynchronous triggers. The design approach is to use a trigger on a table (in your case, the one being updated sub-minute) that first INSERTs data to another table, then sends a message to a SSB queue. Depending on the data involved (size, complexity, etc) you could put the data in the SSB message itself. Then your “processing” code that currently runs in Agent can be run from within an “activation” procedure.

    If you’re interested in this approach, these 2 site www.davewentzel.com and http://www.sqlnotes.info[/url] are probably the best I’ve found so far.

    Agree, the design could be improved by using SSB.

    Igor Micev,My blog: www.igormicev.com

  • Spreading out jobs to avoid spikes is a good idea BUT... the real fact is that even after all you did, you're still taking spikes to 40% of ALL the CPUs. Someone needs to fix that code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (12/20/2016)


    Spreading out jobs to avoid spikes is a good idea BUT... the real fact is that even after all you did, you're still taking spikes to 40% of ALL the CPUs. Someone needs to fix that code.

    Hi Jeff,

    You're right. The difference is that now the spikes are smaller and narrower.

    Igor Micev,My blog: www.igormicev.com

  • Let me suggest a much simpler way of avoiding the CPU spike caused by running multiple jobs at once. Simply combine them into one job, with the contents of all of the current job converted to steps in the single job. Spacing out the start time certainly helps with the spikes, but there is still a possibility of the jobs overlapping if one takes longer than anticipated. Individual job steps are run sequentially, with no possibility of overlap.

  • gfish@teamnorthwoods.com (12/21/2016)


    Simply combine them into one job, with the contents of all of the current job converted to steps in the single job.

    That does make it much more difficult to temporarily suspend a job.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Igor Micev (12/21/2016)


    Jeff Moden (12/20/2016)


    Spreading out jobs to avoid spikes is a good idea BUT... the real fact is that even after all you did, you're still taking spikes to 40% of ALL the CPUs. Someone needs to fix that code.

    Hi Jeff,

    You're right. The difference is that now the spikes are smaller and narrower.

    Understood but they also occur during a longer period of time possibly causing other problems for a longer period of time. It is a tradeoff.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • gfish@teamnorthwoods.com (12/21/2016)


    Let me suggest a much simpler way of avoiding the CPU spike caused by running multiple jobs at once. Simply combine them into one job, with the contents of all of the current job converted to steps in the single job. Spacing out the start time certainly helps with the spikes, but there is still a possibility of the jobs overlapping if one takes longer than anticipated. Individual job steps are run sequentially, with no possibility of overlap.

    Now that is what I referred to as thinking outside the box. I think this is so far the best proposed solution yet on this discussion. What we used to call the KISS method - Keep It Simple, Stupid'. I don't remember from my active days, but I don't think a running job will start again. And especially if this is that original task that runs every minute, skipping a minute won't hurt a thing, as long as you don't tell anybody.

    Rick
    Disaster Recovery = Backup ( Backup ( Your Backup ) )

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply