Activity overview
Latest activity by fifthrace
Awesome, I never saw that option.
Thanks a ton,
Robbie / comments
Awesome, I never saw that option.
Thanks a ton,
Robbie
Clear 100K alerts
I have an issue where over 155K alerts were fired off for a "SQL Server error log entry". This was all due to a service account's password expiring on one of our application servers. Don't get me...
There are more than 20 jobs running on this particular server.
The "SQL Server instance unreachable" lasted for 14 seconds and has only occurred one other time for this server. That was on August 7th. I think this is probably a coincidence and not related to this issue at all. The number of job failures seems to be significantly higher than the number of alerts I'm seeing.
None of the alert settings are disabled from what I can tell. There are a couple jobs that I manually changed the recipient of the email that gets sent out.
Thanks for continuing to look into this.
Robbie / comments
There are more than 20 jobs running on this particular server.
The "SQL Server instance unreachable" lasted for 14 seconds and has only occurred one other time for this server. That was on August ...
Here are the results (I hope I got the UTC stuff figured out right, but can't guarantee it. Either way the number of alerts is quite a bit lower.)
The number of unsuccessful job runs in Monitor database since 9/4
390
The number of unsuccessful runs in sysjobhistory since 9/4
447
The number of each type of alerts which has been raised since 9/4 grouped by server. SERVER3 is the one in question here:
SERVER1 Long-running query 23
SERVER1 Schema modified - ISDB01 34
SERVER1 Instance1 Long-running query 18
SERVER1 Instance1 Schema modified - ISDB01 3
SERVER1 Instance2 Long-running query 19
SERVER1 Instance2 Schema modified - ISDB01 13
SERVER2 Long-running query 3 SERVER3 Blocked process 27
SERVER3 Job failed 22
SERVER3 Long-running query 71
SERVER3 SQL Server error log entry 2
SERVER3 SQL Server instance unreachable 1
SERVER3 Instance1 Long-running query 17
SERVER4 Deadlock 2
SERVER4 Long-running query 40
SERVER4 Monitoring error (SQL Server data collection) 1
SERVER5 Deadlock 31
SERVER5 Instance1 Long-running query 3
SERVER6 Blocked process 1
SERVER6 Deadlock 1
SERVER6 Log backup overdue 1
SERVER6 Long-running query 1
SERVER6 Instance1 Deadlock 1
SERVER6 Instance1 Long-running query 15
SERVER7 Backup overdue 46
SERVER7 Deadlock 17
SERVER7 Job failed 8
SERVER7 Long-running query 3 / comments
Here are the results (I hope I got the UTC stuff figured out right, but can't guarantee it. Either way the number of alerts is quite a bit lower.)
The number of unsuccessful job runs in Monitor da...
Basic machine data: 2 weeks
Windows process data: 1 week
Basic SQL Server data: 2 weeks
SQL process data: 1 week
Top 10 queries data: 1 week
Database performance counter data: 2 weeks
Custom metric data: 1 week
Basic alert data: 1 month
SQL Trace Data: 3 days / comments
Basic machine data: 2 weeks
Windows process data: 1 week
Basic SQL Server data: 2 weeks
SQL process data: 1 week
Top 10 queries data: 1 week
Database performance counter data: 2 weeks
Custom metric...
I changed your 2nd query a little bit. Changed the > to >=. If I ran your original code I wasn't getting any records from 8/21.
select count(*)
from msdb.dbo.sysjobhistory
where run_date >= '20120821'
and step_name <> '(Job outcome)' and job_id = 'jobid'
Anyway, with the revised code, I got 38 for both queries. So, they seem to match at least.
I dug into the data a bit. For this particular job there were 37 times that it had a run_status of 3 (stopped) and 1 time run_status of 0 (failed). Both msdb and the SQL Monitor database match on these values.
I don't know how to dig into the alert.Alert table in the SQL Monitor database, but I did look through the UI. I only received 3 alerts for the job in question.
Thanks for digging into this with me.
Robbie / comments
I changed your 2nd query a little bit. Changed the > to >=. If I ran your original code I wasn't getting any records from 8/21.
select count(*)
from msdb.dbo.sysjobhistory
where run_date ...
Priya,
Thanks for the reply. If what you are saying is true, then SQL Monitor is misbehaving for me. As stated in my original post, we are stopping the jobs via another job. This should result in a run_status of 3 for the jobs that were stopped. When I use this query I get 188 rows returned:
select step_name, run_status, run_date, run_time
from sysjobhistory
where run_status <> 1 and run_date = '20120822'
and step_name <> '(Job outcome)'
SQL Monitor only registered 22 job failures yesterday (22nd of August). I can look at some of the jobs that SQL Monitor is showing as failed and see that the run_status was indeed 3....however, there are many jobs with a run_status = 3 that did not get alerts yesterday.
When SQL Monitor does report on a run_status = 3, it tends to report on all the jobs that were just set to that status as a result of the last stop jobs run. Most of the time SQL Monitor only reports on jobs that have a run_status = 0 (not sure on 2, since we don't get a lot of those).
Is there anything in the code that would explain this behavior? Could this be a timing issue of some sort?
Thanks for your help,
Robbie / comments
Priya,
Thanks for the reply. If what you are saying is true, then SQL Monitor is misbehaving for me. As stated in my original post, we are stopping the jobs via another job. This should result i...
Brian,
Are you guys taking a look at this any more?
Thanks,
Robbie / comments
Brian,
Are you guys taking a look at this any more?
Thanks,
Robbie
Thanks for the reply.
I really don't see anything different between the runs that show up in sql monitor and those that don't from a job status perspective. Could you tell me about how the software queries for this data?
I'm just looking at the sysjobhistory table and seeing a 3 for run_status. Is the software looking at something different than this for it's information.
Thanks for all your help,
Robbie / comments
Thanks for the reply.
I really don't see anything different between the runs that show up in sql monitor and those that don't from a job status perspective. Could you tell me about how the softwar...
Intermittent Job Failure alert for stopped jobs
We have a series of SSIS SQL Agent jobs that are running. Every hour another job runs that recycles the SSIS jobs by running the following:
USE msdb ;
GO
EXEC dbo.sp_stop_job N'JOB_NAME' ;
GO
USE...