Get alerts if the job runs 1 hour longer than average time

  • Experts, I am not good with writing SQL codes but I found this script (shared below) which checks for average job time. I am wondering if anyone has a full script which also sends out alert if the job runs 1 hour longer than the average time. Any help is highly appreciated.

    select
    d.jobname
    ,d.servername
    , avgDurationMinutes=avg(d.durationMinutes)
    , daydate=convert(char(10),startdatetime,101)
    from (
    select
    jobname=j.name
    ,servername=server
    ,startdatetime=
    CONVERT (DATETIME, RTRIM(run_date))
    + (
    run_time * 9
    + run_time % 10000 * 6
    + run_time % 100 * 10
    ) / 216e4
    , durationMinutes=
    (CAST(SUBSTRING((right('0000000' + convert(varchar(7), run_duration), 7)),1,3) AS INT) * 60 * 60
    + CAST(SUBSTRING((right('0000000' + convert(varchar(7), run_duration), 7)),4,2) AS INT) * 60
    + CAST(SUBSTRING((right('0000000' + convert(varchar(7), run_duration), 7)),6,2) AS INT)
    )/60.

    ,enddatetime =
    dateadd
    (ss,
    (CAST(SUBSTRING((right('0000000' + convert(varchar(7), run_duration), 7)),1,3) AS INT) * 60 * 60
    + CAST(SUBSTRING((right('0000000' + convert(varchar(7), run_duration), 7)),4,2) AS INT) * 60
    + CAST(SUBSTRING((right('0000000' + convert(varchar(7), run_duration), 7)),6,2) AS INT)
    )
    ,
    (CONVERT (DATETIME, RTRIM(run_date))
    + (
    run_time * 9
    + run_time % 10000 * 6
    + run_time % 100 * 10
    ) / 216e4 )
    )
    , retries_attempted
    from msdb.dbo.sysjobs j (nolock)
    join msdb.dbo.sysjobhistory h on
    h.job_id = j.job_id
    and h.step_id = 0 -- look only at the job outcome step for the total job runtime

    ) d
    where
    datepart(dw,startdatetime)=7 -- Set your day of week here if desired. 7=Saturday
    group by
    d.jobname
    ,servername
    ,convert(char(10),startdatetime,101)
    order by
    d.jobname
    ,servername
    ,cast(convert(char(10),startdatetime,101)as datetime) desc;
  • This is a multi-step process and depends on your requirements, but one approach would be to use what you have but turn the AVG into a windowing function and store the results in a table. A query similar to this:

    SELECT
    d.jobname
    ,d.servername
    , avgDurationMinutes=avg(d.durationMinutes) OVER (PARTITION BY jobname, servername)
    , [d].[durationMinutes]
    , durationAlert=CASE WHEN avg(d.durationMinutes) OVER (PARTITION BY jobname, servername)+60 <= durationMinutes THEN 'BAD' ELSE 'GOOD' END
    , daydate=convert(char(10),startdatetime,101)
    from (
    select
    jobname=j.name
    ,servername=server
    ,startdatetime=
    CONVERT (DATETIME, RTRIM(run_date))
    + (
    run_time * 9
    + run_time % 10000 * 6
    + run_time % 100 * 10
    ) / 216e4
    , durationMinutes=
    (CAST(SUBSTRING((right('0000000' + convert(varchar(7), run_duration), 7)),1,3) AS INT) * 60 * 60
    + CAST(SUBSTRING((right('0000000' + convert(varchar(7), run_duration), 7)),4,2) AS INT) * 60
    + CAST(SUBSTRING((right('0000000' + convert(varchar(7), run_duration), 7)),6,2) AS INT)
    )/60.

    ,enddatetime =
    dateadd
    (ss,
    (CAST(SUBSTRING((right('0000000' + convert(varchar(7), run_duration), 7)),1,3) AS INT) * 60 * 60
    + CAST(SUBSTRING((right('0000000' + convert(varchar(7), run_duration), 7)),4,2) AS INT) * 60
    + CAST(SUBSTRING((right('0000000' + convert(varchar(7), run_duration), 7)),6,2) AS INT)
    )
    ,
    (CONVERT (DATETIME, RTRIM(run_date))
    + (
    run_time * 9
    + run_time % 10000 * 6
    + run_time % 100 * 10
    ) / 216e4 )
    )
    , retries_attempted
    from msdb.dbo.sysjobs j (nolock)
    join msdb.dbo.sysjobhistory h on
    h.job_id = j.job_id
    and h.step_id = 0 -- look only at the job outcome step for the total job runtime

    ) d
    --where
    -- datepart(dw,startdatetime)=7 -- Set your day of week here if desired. 7=Saturday
    order by
    d.jobname
    ,servername
    ,cast(convert(char(10),startdatetime,101)as datetime) desc;

    I pulled out that filter as it seemed unneeded to me, but it may be needed based on some of your requirements that I am not aware of.

    Now, with the above, you likely only want to send an alert once, so if you write the results of the above to a table, it can be reviewed and investigated going forward to look for a pattern (like if job ABC runs longer than average by 1 hour every Tuesday).  That is why I'd log it to a table.  Now for sending the email, you have a second job step that pulls from wherever you stored the above results looking for the value "BAD" in the durationAlert column and compares it to dayDate that you wish to check.  If you put this as a job step at the end of every job to update the above table and then pull information about itself for todays date, it would be pretty easy to send out alerts as soon as the job runs longer than expected.

    Now, a better (in my opinion) approach to this is to buy a SQL Monitoring tool as those can come with this type of monitoring.  Does baselining and then tells you when things start to deviate from the baseline and can send out alerts immediately.  Writing it yourself, you are going to need to bake it into each job unless you only want to be notified daily of abnormally long running jobs.  If you have an alerting solution such as RedGate SQL Monitor (the solution I have in place), you could adapt the above so that it is just looking at today and if it has a count of more than 1 "BAD" result, it could send out an alert.  You would need to configure it for each SQL instance, but it would be able to raise a flag when some job went long, or if you build enough alerts, you could have it monitor each job instead of each instance.  It would just result in a lot more custom alerts being configured.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • I dunt know if it's just me, but the notifications seem to be pretty slow. I'm seeing some 6-10 minutes after a post has been made or a modmail sent.

     

    PrepaidGiftBalance

    • This reply was modified 2 years, 6 months ago by  Gretchen558.
  • Gretchen558 wrote:

    I dunt know if it's just me, but the notifications seem to be pretty slow. I'm seeing some 6-10 minutes after a post has been made or a modmail sent.

    How does this relate to long-running SQL jobs?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • This was removed by the editor as SPAM

  • When you have to manage hundreds of scheduled jobs on busy production servers, inevitably situations arise when jobs take too long to complete, causing substantial waits or affecting the performance of other processes.

     

     

    securitasepay

  • securitasepay wrote:

    When you have to manage hundreds of scheduled jobs on busy production servers, inevitably situations arise when jobs take too long to complete, causing substantial waits or affecting the performance of other processes.

    securitasepay

    True, but what is your point? Are you suggesting that such alerting is pointless?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply