How can we help you today? How can we help you today?

Can't SQL Monitor distinguish between a job failing and a job being stopped?

I have several SQL Agent Jobs that need to run on both replicas of an availability group. The first step of these jobs checks to see if it is primary, and if not, it stops itself. SQL Monitor generates a "Job Failing" alert for these even though the job outcome message clearly states "The job was stopped".
LyleG
0

Comments

22 comments

  • Russell D
    What version of SQL are you using?

    For Job Failing, we poll the msdb.sysjobhistory table for job outcomes where the run_status field is 0, which corresponds to job failing.

    When a job is stopped the exit code we see is the same as for cancelled (3), Microsoft don't provide an exit code for Stopped. See run_status here: https://docs.microsoft.com/en-us/sql/relational-databases/system-tables/dbo-sysjobhistory-transact-sql.

    For any SQL version >= 2008 the run_status should be 3 if the job is stopped. If the run status is 3, indicating cancelled, we (should) completely ignore the job. Are you seeing run_status 3 jobs being reported?
    Russell D
    0
  • LyleG
    SQL 2016 Standard

    Yes, run_status is 3.

    8p1w9u52vcf5.png
    443yrqzincmf.png
    LyleG
    0
  • Russell D
    Ok, can you email the logs into support please?
    Russell D
    0
  • Russell D
    Just in case anyone else hits this problem, we've determined that there is an issue here and have put this onto the backlog for investigation.
    Russell D
    0
  • MckMurray
    Thanks, yes, same issue here
    MckMurray
    0
  • KathrynM
    I can see a fix went into V9.0.8 (SRP-11786 Getting notifications on Job Failing alerts ended). Is this the fix for this issue?
    KathrynM
    0
  • Russell D
    No, this fixes an issue where alerts where being incorrectly raised again. The above is something we've actually just discussed in sprint planning, but haven't really reached a conclusion so your thoughts on https://sqlmonitor.uservoice.com/forums/91743-suggestions/suggestions/37287307-sql-agent-jobs-with-a-cancelled-status-are-incor would be appreciated.

    Should this be toggle-able? Would you want a separate Job Cancelled alert? Do you just not care at all about cancelled jobs?

    Russell D
    0
  • KathrynM
    I will use your link and add some feedback, but I was wondering if you had an issue number or something I can use to monitor the progress of this fix? We are monitoring a few availability groups so getting these alerts is causing us a lot of headaches. We have to follow up on the alert in case it is genuine, but it is nearly always a cancelled job rather than a failed job. 
    KathrynM
    0
  • Russell D
    SRP-11799.
    Russell D
    0
  • KathrynM
    That's perfect. Thanks for all your help.
    KathrynM
    0
  • Russell D
    No problem at all. I will just point out that its an enhancement rather than a fix, which is an important distinction to be made, but it definitely is something we want to do, we just do not know the best/most useful approach.
    Russell D
    0
  • KathrynM
    If it were a case of SQL Monitor alerting to a job that has been stopped all of the time, or none of the time, then I would agree that this is an enhancement. But because the alert only fires some of the time, I'd be more inclined to say this is a fix becuase the functionality that is in place isn't consistent in its behaviour. 
    KathrynM
    0
  • Russell D
    We might be talking different purposes then - what is your experience of this? The SRP number above is definitely an enhancement.
    Russell D
    0
  • KathrynM
    We have a lot of jobs that check if they are running on a replica that is running the database in read only mode. If the database is read only, the job is stopped. (Very similar to the primary replica check used by most people within an avialbility group) Some of the jobs are stopping and triggering an alert through SQL Monitor, even though the run_status is a 3, but not all of the jobs are doing this. It seems to be very hit and miss and I can't find any reason why some would fire alerts but others don't. 
    KathrynM
    0
  • Russell D
    Yeah ok that makes sense, we just don't handle the alert properly in the case of read only replicas.
    Russell D
    0
  • KathrynM
    So looping back, is this an enhancement or a fix?
    KathrynM
    0
  • Russell D
    Thats a question of semantics really, which I don't have a good answer for but will speak with the other developers and come back to you. I'm inclined to agree its a bug incidentally.

    I guess what I'd ask in relation to an enhancement is do you want a toggleable option for sending this alert or a separate alert for Job cancelled (that can be disabled at will as per the other alerts).
    Russell D
    0
  • KathrynM
    To keep things clean and simple a seperate alert for cancelled jobs would probably be my preference. 
    KathrynM
    0
  • Russell D
    The Job Cancelled alert will be released in 9.0.13.



    Hopefully it fits the bill.
    Russell D
    0
  • KathrynM
    Fantastic, thanks Russell. 
    KathrynM
    0
  • Russell D
  • KathrynM
    Thanks Russell. I'll get the update in next week :)
    KathrynM
    0

Add comment

Please sign in to leave a comment.