Comments
Sort by recent activity
The reason we use the 1204 trace flag is because it is supported on SQL Server 2000 as well as 2005/2008. Whilst there were probably very good technical reasons for this at the time, we are definitely planning to "upgrade" to the 1222 flag.
I can imagine that the trace flag we use would be automatic depending on the version of SQL Server, but I have added your suggestion of having a user option to the enhancement request.
Thanks again
Chris / comments
The reason we use the 1204 trace flag is because it is supported on SQL Server 2000 as well as 2005/2008. Whilst there were probably very good technical reasons for this at the time, we are definit...
Hi
This is almost certainly a limitation of the trace flag we are currently using (1204) and the amount of information available in the SQL Server error logs.
We already have an enhancement request raised (ref: SRP-3080) internally to review our current implementation. This would probably be done for the v3.0 release for which we have no timescales at the moment sorry.
Hope this helps
Chris / comments
Hi
This is almost certainly a limitation of the trace flag we are currently using (1204) and the amount of information available in the SQL Server error logs.
We already have an enhancement request...
Hi
I currently investigating this issue and will hopefully be able to update you later today with some details.
Regards
Chris / comments
Hi
I currently investigating this issue and will hopefully be able to update you later today with some details.
Regards
Chris
Hi
Sorry for the delay replying. I think that 300 active connections could very well account for the large amount of data. This kind of high activity is something we've tried to simulate on our servers for testing purposes, but there will always be surprises in store with live systems.
Is it possible that you have trace switched on for this server? That would certainly increase the amount of data - more so with busy servers.
We are aware that the way we store SQL Processes needs to improve and our developers already have some good ideas. In the meantime v2.3 will hopefully contain the extra purge option specifically for SQL Processes.
Regards
Chris / comments
Hi
Sorry for the delay replying. I think that 300 active connections could very well account for the large amount of data. This kind of high activity is something we've tried to simulate on our ser...
'2011-02-09 20:26:11.063' looks old for a 7 day purge period but it would depend on the timezone of your base monitor machine. I would expect the oldest row to be no older than base monitor time - 7 days - 1 hour.
It would be interesting to know if this particular row is purged if you ran that query again. We purge every hour or when the base monitor service is restarted. The purge is done gradually in chunks rather than attempting to remove all the data at once. So it's possible that the purge is still ongoing.
Having said all that, 11.5 million rows for a 7 day purge period is probably higher than we would expect. Are your monitored servers very busy or do they contain an unusually large number of objects? (databases, tables, jobs, spids etc)
Regards
Chris / comments
'2011-02-09 20:26:11.063' looks old for a 7 day purge period but it would depend on the timezone of your base monitor machine. I would expect the oldest row to be no older than base monitor time - ...
Hi
Are you sure that the table you're having issues with is Cluster_SqlServer_Services_StableSamples and not Cluster_SqlServer_SqlProcess_UnstableSamples?
The Cluster_SqlServer_SqlProcess_UnstableSamples table can get very large and it is one of our top priorities to add a distinct purge policy for tables that hold SQL Process data in version 2.3.
I'm still surprised that your table contains 11.5 million rows after being left overnight. Is this row count reducing gradually or still increasing?
Regards
Chris / comments
Hi
Are you sure that the table you're having issues with is Cluster_SqlServer_Services_StableSamples and not Cluster_SqlServer_SqlProcess_UnstableSamples?
The Cluster_SqlServer_SqlProcess_UnstableS...
I will double check that the _RunStatus field is the one we use to trigger the job failed alerts.
In the meantime I've created some SQL to check the alerts tables: SELECT alert.AlertId ,
alert.TargetObject ,
RedGateMonitor.Utils.TicksToDateTime(severity.Date) AS [SeverityDate]
FROM [RedGateMonitor].[alert].[Alert] alert
JOIN [RedGateMonitor].[alert].[Alert_Severity] severity ON alert.AlertId = severity.AlertId
JOIN [RedGateMonitor].[alert].[Alert_Type] type ON alert.AlertType = type.AlertType
WHERE type.Name = 'Job failed'
ORDER BY AlertId DESC
The SeverityDate column should be the time that the alert is raised as there is usually only one possible severity for Job Failed alerts. It would be interesting to know if there are any records for the minutes after a job failure was reported on the server.
Regards
Chris / comments
I will double check that the _RunStatus field is the one we use to trigger the job failed alerts.
In the meantime I've created some SQL to check the alerts tables:SELECT alert.AlertId ,
al...
rmrussell1970 wrote:
I actually emailed you Friday and this morning on this.
That's strange - I don't appear to have received any emails. I've double checked and the email address linked above is definitely correct.
rmrussell1970 wrote:
So what would cause me to not get a warning about a failed job?
I'm not 100% sure. The job failed alert is relatively uncomplicated and triggers on seeing a job failure in the job history. We do collect this data and any failed jobs should be present in the SQL Monitor data repository.
This SQL should show any failures:
SELECT Utils.TicksToDateTime(CollectionDate) as [CollectionDateTime]
,[_Message]
,[_RunStatus]
FROM [RedGateMonitor].[data].[Cluster_SqlServer_Agent_Job_History_Instances]
WHERE _RunStatus = 0
It would be worth checking if a row exists at the specific point of time that your job failed. I could probably cobble together some more complicated SQL that displays the job name etc if that helps?
Regards
Chris / comments
rmrussell1970 wrote:
I actually emailed you Friday and this morning on this.
That's strange - I don't appear to have received any emails. I've double checked and the email address linked above ...
Hi
We only use WMI to collect the following information.
• Cluster configuration and status
• Total amount of physical memory
• OS version and service pack
• Window process user
• Host name and DNS name of the machine
We mostly use perfmon and the recent issues appear to be related to this. It is possible to see what the monitoring error is by going to the Monitored Servers page and clicking the Show Log link for the relevant server. This will only show the last 5 minutes of logging however.
Regards
Chris / comments
Hi
We only use WMI to collect the following information.
• Cluster configuration and status
• Total amount of physical memory
• OS version and service pack
• Window process user
• Host na...
Hi
It is possible to see the error that causes monitoring to stop by clicking the Show Log link for the relevant server on the Manager Servers config page. However, this only displays the last five or so minutes worth of logging so wouldn't help here.
In most cases the base deployment log files for the time period in question would be the best place to look. These are located at "C:\ProgramData\Red Gate\Logs\SQL Monitor 2" or "C:\Documents and Settings\All Users\Application Data\Red Gate\Logs\SQL Monitor 2" depending on your operating system. If you send them to chris.spencer@red-gate.com I would gladly look through them and see if anything unusual is getting logged.
Regards
Chris / comments
Hi
It is possible to see the error that causes monitoring to stop by clicking the Show Log link for the relevant server on the Manager Servers config page. However, this only displays the last five...