DBAmon |
Home |
Index/DBAmon Doc. |
|
Category | Event | Metrics | What DBAmon Monitors | Metric Data Source | AutoFix Action Taken (If Any) Backup
| 1. Database Backup Age
| (AutoFix)
BCKAGE_D (DB Backups - Full or Differential) (Hours)
| BCKAGE_L (DB Backups - TLog) (Hours) BCKAGE_FULL_DAYS (DB Backup - Full) (Days)
This Metric monitors The number of hours since the last successful D (Differential) or F (Full) backup is
measured. This metric will be measured for databases listed in the
DBC "Backup_Check_DBs:" parameter. If the
DBC "Backup_Age:"
number of hours is exceeded, then this event will occur.
| Data Dictionary Table: msdb..backupset
|
If specified, the DBC "Backup_Command:" is run.
| Availability
| 2. Instance Running
| INS
|
The DB is verified to be running and can be connected to.
| "net start" Is run to see which services are running +
Data Dictionary View: master..sysdatabases +
An ODBC connect is attempted
|
| Availability
| 3. Alerts
|
|
Any MSSQL Alert (SQL Studio Management -> SQL Server Logs) with a severity of 17 or higher will cause a DBAmon Critical Event.
the xp_readerrorlog procedure is used to read the log.
| System Proc: master..xp_readerrorlog
|
| Availability
| 4. Agent Running
|
|
A check is done to ensure that the SQL Agent is running.
| "net start"
|
| Availability
| 5. Transaction Log Full
| TLOG_PCT_FULL
|
The
DBC "T_Tlog_Full:"
parameter is used for the WARNING and CRITICAL thresholds. If the SQL Transaction Log:
|
DBCC sqlperf (logspace) + sp_helpdb
|
| Availability
| 6. Disk Drive Full
| DRIVE_SPACE
|
All disks which contain MSSQL DB Files are checked for capacity and freespace. If any are >= the
DBC "T_Disk_Full:" percent full, an event will occur.
| dbamon_mssql_di.exe + sp_spaceused
|
| Availability
| 7. Filegroup Full
| FILEGROUP_FULL
|
If ALL of the DB files in the file group DO NOT have MAX SIZE set to UNLIMITED, then
this check will take place. It will compare the TOTAL SIZE of all DB file to the TOTAL MAX SIZE
of all DB files. If the
DBC "T_FG_Full:" percent full is exceeded (warning
and critical threhsolds) are exceeded, a
DBA850 event will occur.
Again, this check will not occur
if ANY of the DB files are MAX SIZE UNLIMTED. In that case, the DRIVE FULL monitoring will be
used.
| sp_helpdb + sp_helpfilegroup
|
| Availability
| 8. Disk Potential Growth
| DISK_POTENTIAL
|
In the FILEGROUP check above, the POTENTIAL GROWTH for each DB file (the MAX size minus the CURRENT size)
is calculated for ALL DB file on ALL DISKS. If the total potential growth for ALL DB file, by disk, exceeds
the FREE SPACE for that drive, then a
DBA851 event will occur.
| sp_helpdb + sp_helpfilegroup
|
| Availability
| 9. DBMail Running OK
| DBMAIL_STATUS
|
A check is performed to endure that DBMail is working properly.
if the proper status is not detected, then a
DBA853 event will occur.
| msdb..sysmail_allitems
|
| Availability
| 10. Database SUSPECT PAGES
|
|
SQL Server will populate MSDB..SUSPECT_PAGES when a corrupt page is detected
in SQL 2005+. This check looks for rows in this table, which would indicate
DB CORRUPTION. The resulting
DBA856 event will occur in this case.
| msdb..suspect_pages
|
| Availability
| 11. Database Recovery Model Inconsistent With TLOG Backups
|
|
This metric checks for the condition where a SQL Database has been created with
RECOVERY MODEL set to FULL or BULK-LOGGED, but there are not any TLOG backups on
record (during the last 30 days). See
DBA519 for details.
|
SELECT name
FROM master.sys.databases
WHERE recovery_model_desc != 'SIMPLE'
AND is_read_only = 0
AND name NOT IN (
SELECT DISTINCT
database_name
FROM [msdb].[dbo].[backupset]
WHERE type = 'L'
AND backup_start_date > DATEADD(dd, -30, GETDATE()) )
|
| Availability
| 12. STACK DUMP Messages found in MSSQL Log
|
|
While reading the MSSQL Log, DBAmon looks for STACK DUMPs. If any are
found then a Critical
DBA948 event occurs.
|
xp_readerrorlog
|
| Management
| 13. TLog Size vs. DB Size Ratio
| TLOG_RATIO
|
If it is found for any MSSQL DB that the SQL Transaction Log is at least 1gb in size
and at least 5 times the total size of all Database files, a
DBA656 event will occur.
| sp_helpdb
|
| Management
| 14. Zero Non-System Databases
|
|
If a MSSQL instance is found to contain ZERO non-system databases, a
DBA663 event will occur.
| master..sysdatabases
|
| Management
| 15. OFFLINE Database(s) Found
|
|
If at least 1 database is OFFLINE, a
DBA804 event will occur.
| An ODBC connect at attempted
|
| Management
| 16. SQL Jobs Running
|
|
It is sometimes necessary in a SQL Instance to ensure that certain jobs are always running. An example
may be a SQL Distribution agent job, etc. By specifying the new
DBC "MSSQL_DDP_Job_Cat_MBR:"
and
DBC "MSSQL_DDP_Job_Name_MBR:"
parameters, you can specify the JOB CATEGORY or JOB NAME of jobs that should always be running.
if a job that is configured in this way is NOT running, then a
DBA351 event will occur.
| sp_help_job
|
| Management
| 17. SQL Job Failure
|
|
If the most recent exection of an Enabled SQL Job ended with a Failed status, the
DBA354 event will occur. This
check will occur if the
DBC "MSSQL_Job_Fail_Check:" parameter
is set to Y.
| sp_help_job
|
| Management
| 18. Hung SQL Backups
|
|
If a SQL backup is found to be executing for at least 24 hours, an event will occur. This is usually
due to a backup process being HUNG. In this case a
DBA854 event will occur.
| master..sysprocesses
|
| Management
| 19. MSSQL DB Ownership
|
|
This may be a matter of opinion, but it is (or can be) a poor practice for a production MSSQL database
to be owned by an individual. If that person were to be removed from DNS, some functionality would be
lost, as well as the obvious security concern when a person changes jobs, etc. The remedy is to have
all DBs owned by SA or some other generic ID (which you can specify), which is a good practice.
DBAmon checks all DBs to see that they are either owned
by SA or by a userid specified in the
DBC "MSSQL_DB_OK_Owners:" parameter.
If any DBs are found not owned by SA or the userids that you specify in the MSSQL_DB_OK_Owners:
DBC parameter, a
DBA353 event will occur. Corrective DDL is supplied in
the event log.
| sp_helpdb
|
| Management
| 20. MSSQL Job Ownership
|
|
This may be a matter of opinion, but it is (or can be) a poor practice for a production MSSQL job
to be owned by an individual. If that person were to be removed from DNS, some functionality would be
lost, as well as the obvious security concern when a person changes jobs, etc. The remedy is to have
all Jobs owned by SA or some other generic ID (which you can specify), which is a good practice.
DBAmon checks all Jobs to see that they are either owned
by SA or by a userid specified in the
DBC "MSSQL_DB_Job_Owners:" parameter.
If any Jobs are found not owned by SA or the userids that you specify in the MSSQL_Job_OK_Owners:
DBC parameter, a
DBA352 event will occur. Corrective DDL is supplied in
the event log.
| sp_helpjob
|
| Management
| 21. SQL Log String Checking
|
|
It is possible to tell DBAmon to check for a string of your choice in the SQL Log, and then to set
a threshold on the number of occurrences of this string.
The
DBC "MSSQL_String_Check:" parameter.
Look at the desciption of this DBC parameter (URL above) for more information.
| xp_readerrorlog
|
| Management
| 22. MSSQL Max Memory Setting
|
|
DBAmon will create a
DBA860 event if
a MSSQL instance is running with MAX SERVER MEMORY (and the 2000 version of that) set to the default
setting of 2^31-1. We feel that it is a poor practice to let a production MSSQL instance run without
deliberately setting MAX SERVER MEMORY. See the event DBA860 documentation above for details.
|
2005+: master.sys.configurations
| 2000: master..sysconfigures
| Management
| 23. MSSQL Replication Lag
| PERF_SQL_REPL_MAX_LAG_HRS
|
This performance metric measures the maximum number of hours of LAG for any subscription. This metric
is present on any MSSQL Instance with a replication distribution database.
|
[distribution*]..[MSdistribution_history]
| [distribution*]..[MSdistribution_agents]
| Management
| 24. MSSQL DB/TLOG Files on C: Drive
|
|
If any DB or TLOG files are found on the C: drive a
DBA807 event will occur.
|
sys.master_files
|
| Management
| 25. MSSQL DB/TLOG Files With Percent Growth
|
|
If any DB or TLOG files are found to have AutoGrow to set a percentge, a
DBA806 event will occur.
|
sys.master_files
|
| Management
| 26. MSSQL Service Account Expiration
|
|
DBAmon captures the names of Service Accounts used to run MSSQL. It is possible, after you the user determines when
the passwords for these accounts will expire, to monitor the number of days until Service Account Password Expiration.
|
(Active Directory)
|
| Management
| 27. MSSQL Non-Default Replication Distribution Agent Profile In Use
|
|
DBAmon examines the currenly in-use Agent Profile for every enabled Replication Distribution Agents.
If the current profile is not Default Agent Profile, then a
DBA862 event will occur. This is a tool that we use
to never forget to switch back to the default profile after using the non-default profile
to solve a problem.
|
SELECT jo.name ,
replace(pr.profile_name , ' ', '_')
FROM [distribution].[dbo].[MSdistribution_agents] ag WITH ( NOLOCK ) ,
msdb..msagent_profiles pr WITH ( NOLOCK ) ,
msdb..sysjobs jo WITH (NOLOCK)
WHERE ag.profile_id = pr.profile_id
AND ag.job_id = jo.job_id
AND jo.enabled = 1
AND pr.profile_name != 'default agent profile'
|
| Management
| 28. SSRS Report Failure
|
|
If, since the last DBAmon iteration, an SSRS report has failed, a
DBA933 event will occur.
|
ReportServer..ExecutionLog2
|
| Management
| 29. *SQL* Windows Service Not Running
|
|
All Windows services whose name contain SQL are checked to see if they are currently
running (AUTOSTART processes only).
|
(WMI Query of Windows services)
|
| Performance
| 30. *ANY* MSSQL or Server Performance Counter
|
|
Performance monitoring for SQL server and the servers that it runs on is
simple because of DBAmon's ability to monitor any Windows Performance Counter .
This Windows feature allows the OS or any underlying program to expose performance data
as a Performance Counter. Do a Google search for more information. With DBAmon
it is possible to take reading from any Performance Counter and to set a threshold.
A DBA852 event will occur if a
counter sample exceeds the threshold that you have set.
| *ANY* MSSQL or Windows Performance Counter.
|
| Performance
| 31. Known BAD DB Options
|
|
If it is found that any DBs have options set which are known to cause performance
problems (now just AUTOCLOSE and AUTOSHRINK), then a new
DBA855 event will occur.
| DATABASEPROPERTYEX
|
| Performance
| 32. "I/O Delay" Messages in SQL Log
|
|
Any messages with the text "I/O Delay" in the SQL Log indicate that the is I/O slowness. A
DBA665 Performance event will occur in this case.
| xp_readerrorlog
|
| Performance
| 33. "A significant part of SQL Memory has been paged out" Messages in SQL Log
|
|
Any messages with the text "A significant part of SQL Memory has been paged out" in the SQL Log indicate that the is I/O slowness. A
DBA670 Performance event will occur in this case.
| xp_readerrorlog
|
| Performance
| 34. TEMPDB Database File Count
| PERF_TEMPDB_FILE_COUNT
|
To prevent PAGE ALLOCATION contention, it is a good practice to have 1 TEMPDB Datbase File per usable CPU (CORE), up to
a maximum of 8. See the documentation for DBAmon (P)erformance event:
DBA857 for more information.
| sp_helpdb
|
| Performance
| 35. SQL Server Instance "Signal Wait" Percent
| PERF_SQL_SIGNALWAIT_PCT
|
SQL DMO's are queried to determine the current SIGNAL WAIT PERCENT. This basically is the percent of ALL wait time
which is spent WAITING FOR CPU. A high value here (over the 25% threshold) over a long period of time indicates a
need for additional CPUs. See DBAmon event:
DBA858 for more information.
| master.sys.dm_os_wait_stats
|
| Performance
| 36. SQL Server VLF (Virtual Log File) Count Per Database
|
PERF_SQL_VLF_COUNT
|
Having too many (you can set the threshold) VLFs can cause performance problems with SQL Updates/Inserts/Deletes and
with DB recovery at startup. A good article of VLFs can be found at:
Kimberly Tripp's BLOG Entry .
DBAmon simply counts the number VLF in each SQL database by running DBCC LOGINFO. The could of VLFs for each
DB is then compared to the
DBC "T_MSSQL_VLF_Count:" parameter. If the
number of VLFs is greater then or equal to the T_MSSQL_VLF_Count: value, then a
DBA859 event will occur. The text for this event
contains URLs to good internet articles that will show you how to reduce the number of VLFs.
| DBCC LOGINFO
|
| Performance
| 37. SQL Server Processes Blocked by Other Process
| PERF_SQL_ | BLOCKEDPROCSOTHERPROC
The value that SQL exposes to the performance counter for BLOCKED PROCESSES is misleading (and NOT THERE AT ALL for
SQL 2000). When an application uses MSSQL Parallel Processing (you know; MAXDOP), it is common for a process to be
waiting for its own parallel subprocesses. This metric better give a picture of SPIDs that are waiting for other
processes.
| select count(*) from master..sysprocesses (nolock) where blocked != 0 and blocked != spid
|
| Performance
| 38. SQL Server "Lock Pages In Memory" Setting
| PERF_SQL_LPIM_LOCKED_KB
|
It seems common that when we do a SQL install that we grant the server role to enable LOCK PAGES IN MEMORY (LPIM), but
for a variety of reasons it is not really enabled. A NON-ZERO value for this metric indicates that LPIM is enabled.
| select locked_page_allocations_kb from master.sys.dm_os_process_memory
|
| Performance
| 39. SQL Server "Runnable Requests"
| PERF_SQL_RUNNABLE_REQS
|
This metric shows the number of RUNNABLE requests. A non-zero value here indicates a possible CPU bottleneck.
| select count(*) from master.sys.dm_exec_requests where status = 'runnable'
|
| Performance
| 40. SQL Server "Top Waits"
| PERF_SQL_Top_Wait_Rank1-5
|
This metric displays the top 5 WAITs (from sys.dm_os_wait_stats) in descending WAIT_TIME_S.
|
sys.dm_os_wait_stats
|
| Performance
| 41. SQL Server Memory Vs. SQL DB Size
| SQL_SQLMem_Vs_DBSize_Pct
|
This metric displays amount of SQL Memory (the SQL_TotMemKB value) divided by the Size of the SQL Instance (SQL_DBSizeKB) expressed as a percentage.
We found a few cases of a SQL instances that were over 400GB, with SQL memory at only 1.7GB (32 bit no AWE). This will now catch this conditon.
|
$perf_readings[$thishost]{"SQL_TotMemKB"} / $perf_readings[$thishost]{"SQL_DBSizeKB"}
|
| Performance
| 42. SQL Known GOOD DB Options Set to FALSE
|
|
If you have known GOOD DB Options such as AutoCreateStatistics and AutoUpdateStatistics set to FALSE, a DBAmon 861 User Event will occur.
|
DATBASEPROPERTYEX
|
| Performance
| 43. SQL Job Duration
| SQL_Job_Max_Dur_Hrs
|
DBAmon runs SQL (to the right) to see how long the all currently
executing SQL jobs have been running. Any jobs that have been running for
at least 12 hours are reported as a DBAmon event. The current maximum Job Duration (in hours)
is saved as performance metric SQL_Job_Max_Dur_Hours.
|
SELECT sj.name AS Running_Job_Name ,
cat.NAME AS Job_Category ,
DATEDIFF(MI, aj.start_execution_date, GETDATE()) / 60.0 AS Duration_Hrs,
DATEDIFF(DD, aj.start_execution_date, GETDATE()) AS Duration_Days
FROM msdb..sysjobactivity aj WITH ( NOLOCK )
JOIN msdb..sysjobs sj WITH ( NOLOCK ) ON sj.job_id = aj.job_id
JOIN msdb..syscategories cat WITH ( NOLOCK ) ON sj.category_id = cat.category_id
WHERE aj.stop_execution_date IS NULL -- job hasn't stopped running
AND aj.start_execution_date IS NOT NULL -- job is currently running
AND cat.name NOT LIKE 'REPL%'
AND DATEDIFF(HH, aj.start_execution_date, GETDATE()) >= $joblr_hours
AND NOT EXISTS ( -- make sure this is the most recent run
SELECT 1
FROM msdb..sysjobactivity new
WHERE new.job_id = aj.job_id
AND new.start_execution_date > aj.start_execution_date )
ORDER BY DURATION_HRS
|
| Performance
| 44. SQL Pending I/O Requests
| SQL_Pending_IO_Requests
|
DBAmon samples the number of MSSQL Pending I/O Requests. This would be handy in diagnosing
long term I/O performance or server load trends.
|
SELECT count(*) from sys.dm_io_pending_io_requests
|
| Performance
| 45. Long Running SQL Transactions
| SQL_MaxTran_Duration_Hrs, SQL_MaxTran_TLOG_GB
|
DBAmon checks for long running SQL transaction. It measure the duration any active transaction
in HOURS and GB of TLOG SPACE (Reserved Space).
|
SELECT
st.session_id AS SPID ,
DATEDIFF(mi, at.database_transaction_begin_time, GETDATE()) AS Trans_Run_Mins ,
DATEDIFF(hh, at.database_transaction_begin_time, GETDATE()) AS Trans_Run_Hrs ,
at.database_transaction_log_bytes_reserved / ( 1024 * 1024 ) AS TLog_MB_Rsvd ,
at.database_transaction_log_bytes_used / ( 1024 * 1024 ) AS TLog_MB_Used ,
sp.cmd AS Sysprocesses_Cmd ,
st.transaction_id AS TransID ,
DB_NAME(at.database_id) AS DBName ,
at.database_transaction_begin_time DB_Trans_Begin_Time ,
CASE at.database_transaction_type
WHEN 1 THEN '1:Read/Write'
WHEN 2 THEN '2:Read Only'
WHEN 3 THEN '3:System'
END AS TransType ,
CASE at.database_transaction_state
WHEN 1 THEN '1:Not Initialized'
WHEN 3 THEN '3:Trans No Log'
WHEN 4 THEN '4:Trans Run w/Log Data'
WHEN 5 THEN '5:Trans Prepared'
WHEN 10 THEN '6:Commited'
WHEN 11 THEN '7:Rolled Back'
WHEN 12 THEN '8:Commited/Log Gend'
END AS TransState ,
SUBSTRING(hostname, 1, 30) AS Hostname ,
cpu ,
physical_io ,
waittime AS WtTim_MS ,
sp.blocked,
SUBSTRING(sp.program_name, 1, 24) AS Program_Name,
sp.nt_username as sysprocesses_nt_username,
sp.loginame as sysprocesses_loginame
FROM [sys].[dm_tran_session_transactions] st
INNER JOIN sys.dm_tran_database_transactions at ON st.transaction_id = at.transaction_id
INNER JOIN sysprocesses sp ON sp.spid = st.session_id
WHERE at.database_transaction_log_bytes_reserved > 0
AND at.database_transaction_begin_time IS NOT NULL
|
| Performance
| 46. MSSQL NTLM (Non-Kerberos) Remote Connections
| SQL_Sessions_NTLM_Remote
|
It is preferable to use Kerberos for remote MSSQL connection versus NTLM. This Performance Metric
is populated with the number of Remote MSSQL NTLM-authenticated connections. If this is non-zero,
then you are not using Kerberos for remote Windows-authenticated MSSQL connections.
|
select
count(*) Ses_Count
from
master.sys.dm_exec_connections dc,
master..sysprocesses sp
where
dc.session_id = sp.spid
and dc.auth_scheme = 'NTLM'
and client_net_address != '<local machine>'
and client_net_address != '127.0.0.1'
group by
sp.hostname, dc.auth_scheme
|
| Performance
| 47. MSSQL Long-Running Commands during Prime Shift/Weekday
|
|
If DBAmon finds any CHECKDBs or FULL or DIFFERENTIAL backups which have been running for at least
5 minutes are running during Prime-Shift (0700-1700 on a weekday),
then the new Sev=U DBA809 event will occur.
This is of interest to us because we do sometimes see performance issues when CHECKDBs or DB backups are running
during our prime shift.
| SQL_Prime_LRC_Count
SELECT
upper(cmd) AS Command,
login_time AS CMD_Start_Time,
DATEDIFF(hh,login_time,GETDATE()) AS Process_Age_Hours,
DATEDIFF(mi,login_time,GETDATE()) AS Process_Age_Mins,
spid as SPID
FROM
master..sysprocesses
WHERE
spid > 50
and DATEDIFF(mi,login_time,GETDATE()) >= $lrc_min_t
and
(UPPER(cmd) LIKE '%BACKUP%' or
UPPER(cmd) LIKE '%DBCC%CHECK%')
and UPPER(cmd) not LIKE '%BACKUP%LOG%'
ORDER BY 1
|
| Performance
| 48. Performance Metric Trend Analysis
|
|
For a few select DBAmon Metrics, trend analysis is performed. This means that the most
recent 3 days AVG readings are compared to the last 180 days AVG readings. If the current is >= 200%
of the historical value,
then the new Sev=U DBA950 DBAmon Trend event will occur.
Initially, we will perform this analysis on DB_SPC_RESERVED_MB to compare recent instance (all DBs) growth
versus historical values.
| SQL_Prime_LRC_Count
(Query of DBAmon repository)
|
| Performance
| 49. SQL Server Lingering Connections
| SQL_Sessions_Week_Old
|
DBAmon monitors for user sessions that have been connected for at least 7 days.
|
SELECT COUNT(*) AS WeekOldSesCount
FROM master..sysprocesses
WHERE spid > 50
AND DB_NAME(dbid) != 'msdb'
AND loginame NOT LIKE '%ntservice%'
AND program_name NOT LIKE '%jobstep%'
AND DATEDIFF(dd, login_time, GETDATE()) >= 7
|
| Performance
| 50. SQL Server Error Log "Untrusted Domain" Messages
| SQL_Untrusted_Domain_Events
|
DBAmon checks the SQL Error Log for "Untrusted Domain" messages.
We are in the process of migrating all Windows-based SQL authentication to Kerberos,
so we care about these errors which are unique to NTLM.
|
SELECT COUNT(*) AS WeekOldSesCount
FROM master..sysprocesses
WHERE spid > 50
AND DB_NAME(dbid) != 'msdb'
AND loginame NOT LIKE '%ntservice%'
AND program_name NOT LIKE '%jobstep%'
AND DATEDIFF(dd, login_time, GETDATE()) >= 7
|
| Performance
| 51. MSSQL TempDB Contention - Waiting Sessions
| SQL_TempDB_Contention_{pagetype)
|
DBAmon looks for sessions waiting for Latch Contention on a TEMPDB Allocation Page.
If any such sessions are found, then this metric will be non-zero with a count of the
sessions waiting. There are many internet articles on the corrective action if you
find non-zero values here.
|
SELECT
CASE
WHEN CAST(RIGHT(a.resource_description,
LEN(a.resource_description)
- CHARINDEX(':', a.resource_description, 3)) AS INT)
- 1 % 8088 = 0 THEN 'PFS_Page'
WHEN CAST(RIGHT(a.resource_description,
LEN(a.resource_description)
- CHARINDEX(':', a.resource_description, 3)) AS INT)
- 2 % 511232 = 0 THEN 'GAM_Page'
WHEN CAST(RIGHT(a.resource_description,
LEN(a.resource_description)
- CHARINDEX(':', a.resource_description, 3)) AS INT)
- 3 % 511232 = 0 THEN 'SGAM_Page'
ELSE 'Other_Page'
END as resourcetype,
a.session_id,
a.wait_type,
a.wait_duration_ms,
a.blocking_session_id,
a.resource_description
--c.text AS SQLText
FROM sys.dm_os_waiting_tasks a
INNER JOIN sys.sysprocesses b
ON
a.session_id = b.spid
OUTER APPLY sys.dm_exec_sql_text(b.sql_handle) c
WHERE a.wait_type LIKE 'PAGE%LATCH_%'
AND a.resource_description LIKE '2:%';
|
| Performance
| 52. MSSQL TempDB I/O Imbalance
| PERF_SQL_TEMPDB_IOBALANCE_PCT, PERF_SQL_TEMPDB_IOBALANCE_Max, PERF_SQL_TEMPDB_IOBALANCE_Min
|
If you size your TEMPDB files unequally, then you could experience most of the TEMPDB I/O going to one file.
This DBAmon metric measures the degree to which this is happening on your instance. The value is the number of I/Os (writes)
for the busiest (highest number of writes) file versus the value for the least busy file. That number is expressed
as a percentage. So, if you have 2 TEMPDB files - one with 60 writes and one with 40 writes, you would have a value of 150 (%)
for this metric. If you have high values for this, then make the TEMPDB files the same size (the SIZE, not MAX SIZE)
which should help to balance the I/O among all of your TEMPDB DB files.
|
SELECT ( 1.00 * max.max_value / min.min_value ) * 100 AS Max_vs_Min_as_Percent ,
MAX.max_value AS Max_IO_Count ,
MIN.min_value AS Min_IO_Count
FROM ( SELECT MAX(num_of_writes) AS max_value
FROM sys.dm_io_virtual_file_stats(NULL, NULL) dm ,
sys.master_files mf
WHERE mf.database_id = dm.database_id
AND mf.FILE_ID = dm.file_id
AND dm.database_id = 2
AND mf.type_desc = 'ROWS'
) max ,
( SELECT MIN(num_of_writes) AS min_value
FROM sys.dm_io_virtual_file_stats(NULL, NULL) dm ,
sys.master_files mf
WHERE mf.database_id = dm.database_id
AND mf.FILE_ID = dm.file_id
AND dm.database_id = 2
AND mf.type_desc = 'ROWS'
) min
|
| Performance
| 53. Database with Multiple TLOG Files
|
|
There are times when it does make sense to have multiple TLOG files. For example, if your SQL 3rd party backup
tool is down and your DB is not in SIMPLE recovery model, your TLOG will grow until your backup tool is working.
We have heard from MS (during a support call) that the LOG READER (transactional replication) must read all
TLOG files, which doesn't make sense if you have a 2nd TLOG file which is never used.
Therefore, it is a best practice of ours to only keep the 2nd TLOG file around as long as it is actually needed.
If DBAmon finds a database with more than 1 TLOG file, then an event
DBA946 will occur.
|
SELECT DB_NAME(database_id) AS DB_Name,
COUNT(*) Num_TLog_Files
FROM sys.master_files
WHERE type_Desc = 'LOG'
GROUP BY database_id
HAVING COUNT(*) > 1
|
| Performance
| 54. MSSQL Buffer Cache Performance Metrics
|
PERF_SQL_BUFFERCACHE_MOD_PAGES
PERF_SQL_BUFFERCACHE_PAGES
PERF_SQL_BUFFERCACHE_ROWS
PERF_SQL_BUFFERCACHE_SPACE_MB
|
There are some additional MSSQL Performance Metrics that pertain to the Buffer Cache.
It is very interesting how the total size of the objects in the buffer varies, depending on
current activity.
|
SELECT
COUNT(*) AS Pages ,
COUNT(*) / 128 AS Space_MB ,
SUM(ROW_COUNT) AS Rows ,
SUM(CASE is_modified
WHEN 1 THEN 1
END) AS Mod_Pages
FROM sys.dm_os_buffer_descriptors WITH ( NOLOCK )
|
| Performance
| 55. MSSQL Plan Cache Performance Metrics
|
PERF_SQL_PLANCACHE_OBJCNT_ADHOC
PERF_SQL_PLANCACHE_OBJCNT_ALL
PERF_SQL_PLANCACHE_OBJCNT_PREPARED
PERF_SQL_PLANCACHE_OBJCNT_PROC
PERF_SQL_PLANCACHE_TOTSIZEMB_ADHOC
PERF_SQL_PLANCACHE_TOTSIZEMB_ALL
PERF_SQL_PLANCACHE_TOTSIZEMB_PREPARED
PERF_SQL_PLANCACHE_TOTSIZEMB_PROC
PERF_SQL_PLANCACHE_TOTUSECNT_ADHOC
PERF_SQL_PLANCACHE_TOTUSECNT_ALL
PERF_SQL_PLANCACHE_TOTUSECNT_PREPARED
PERF_SQL_PLANCACHE_TOTUSECNT_PROC
|
These metrics pertain to the MSSQL Plan Cache.
It is also interesting how the total size of the objects in the buffer varies, depending on
current activity.
|
SELECT
COUNT(*) ObjCnt_All,
CONVERT(BIGINT, ( SUM(CAST (size_in_bytes AS BIGINT)) )) / ( 1024* 1024 ) AS TotSizeMB_All,
SUM(CAST (usecounts AS BIGINT)) as TotUseCnt_All
FROM sys.dm_exec_cached_plans
| ... And ... SELECT objtype as AAA_ObjType, COUNT(*) ObjCnt , CONVERT(BIGINT, ( SUM(CAST (size_in_bytes AS BIGINT)) )) / ( 1024* 1024 ) AS TotSizeMB, SUM(CAST (usecounts AS BIGINT)) as TotUseCnt FROM sys.dm_exec_cached_plans where cacheobjtype = 'Compiled Plan' GROUP BY cacheobjtype , objtype ORDER BY 1 , 2
| Performance
| 56. SSRS Performance and Availability Metrics
|
PERF_SSRS_ACTIVE_CONNECTIONS
PERF_SSRS_BYTES_RCVD_PS
PERF_SSRS_BYTES_SENT_PS
PERF_SSRS_ERRORS_PS
PERF_SSRS_ERRORS_TOT
PERF_SSRS_MEM_PRESSURE_STATE
PERF_SSRS_MEM_SHRINK_BYTES
PERF_SSRS_REQS_AUTHFAIL
PERF_SSRS_REQS_EXECUTING
PERF_SSRS_REQS_PS
PERF_SSRS_REQS_REJECTED_503
PERF_SSRS_REQS_TOTAL
PERF_SSRS_TASKS_QUEUED
|
Windows performance counters pertaining to SSRS are the source for these DBAmon Metrics.
See DBAmon SSRS Monitoring for details.
|
(Windows Performance Counters)
|
| Performance
| 57. VMware PVSCSI Controller Count
|
|
DBAmon monitors the number of VMware PVSCSI controllers on busy VMware servers running
MSSQL to
ensure that you have VMware configured for peak performance. See the documentation
for DBAmon
Event DBA960 for details.
|
(Win32_SCSIController WMI Query)
|
| Performance
| 58. TempDB Space Usage Breakdown
|
SQL_Perf_TempDB_SPC_*
|
DBAmon monitors how MSSQL TempDB Space is being used. This way, you can track why
you are having TempDB Space issues - by knowing how the space is being used.
|
SELECT
FileCount = count(*),
AlllocatedSizeMB = FLOOR(SUM(allocated_extent_page_count)/128.0),
UnalllocatedSizeMB = FLOOR(SUM(unallocated_extent_page_count)/128.0),
TotalSizeMB = FLOOR(SUM(total_page_count)/128.0),
VersionStoreMB = FLOOR(SUM(version_store_reserved_page_count)/128.0),
InternalObjectsMB = FLOOR(SUM(internal_object_reserved_page_count)/128.0),
UserObjectsMB = FLOOR(SUM(user_object_reserved_page_count)/128.0),
MixedExtentMB = FLOOR(SUM(mixed_extent_page_count)/128.0)
FROM sys.dm_db_file_space_usage
|
| Security
| 59. SQL Failed Login Attempts
|
|
The SQL Error Log is read for occurrences of 18456 (Failed SQL Login) errors.
|
(SQL Error Log)
|
| Security
| 60. SA Sessions
| PERF_SQL_Sessions_SA
|
An MSSQL best practice is to minimize or eliminate the use of the SA SQL Login. This metric shows
the number of SQL User Sessions where the SQL Login is SA. This can help to show you the extent to
which this is happening for an MSSQL instance.
|
select count(*) from sysprocesses where loginame = 'SA' and spid > 50
|
DBAmon also have an
Event DBA961 that occurs when the number of
concurrent MSSQL SA logins over the last 24 hours exceed the T_SA_LOGINS_MAX: DBC parameter.
| RJ
| 61 - 63. RJ Metrics
|
|
DBAmon monitors Sesame Relational Junction. See DBAmon RJ Monitoring page
for additional details.
the number of SQL User Sessions where the SQL Login is SA. This can help to show you the extent to
which this is happening for an MSSQL instance.
|
|
| Management
| 64. MSSQL CheckDB Age
| PERF_SQL_CHECKDB_MAX_AGE_DAYS
|
DBAmon monitors how many days have elapsed since the last successful CheckDB for each MSSQL database.
| Declare @dbinfo table (
ParentObject varchar(255),
[Object] varchar(255),
[Field] varchar(255),
[Value] varchar(255) )
Insert into @dbinfo
execute('dbcc dbinfo(''DBAMON'') with tableresults')
Select DISTINCT
Field,
Value,
cast( datediff(mi, value, getdate()) / 60.0 as decimal(12,2) ) as LastGoodHrsAgo
From @dbinfo
Where Field = 'dbi_dbccLastKnownGood'
|
| Management
| 65. MSSQL Logins With Missing Default Database
| PERF_SQL_Logins_Missing_Default_DB
|
DBAmon monitors for SQL Logins (SQL and AD based logins) which have their Default DB set to a non-existent Database.
| SELECT Name, Default_Database_Name FROM sys.server_principals
WHERE default_database_name NOT IN (SELECT name FROM sys.databases)
ORDER BY 1;
|
| Management
| 66. MSSQL Transactional Replication Undistributed Commands
| PERF_SQL_REPL_MAX_UNDIST_CMDS
|
DBAmon monitors for Undistributed Commands on Transactional Subscriptions. Note that execution of this is only for DBC
files where T_Repl_Undistributed_Cmds: is specified.
| SELECT
ag.name ,
SUM(st.UndelivCmdsInDistDB) AS Sum_UndelivCmdsInDistDB
FROM distribution.dbo.MSdistribution_status st WITH ( NOLOCK ) ,
distribution.dbo.MSdistribution_agents ag WITH ( NOLOCK )
WHERE ag.id = st.agent_id
AND ag.subscriber_db != 'virtual'
GROUP BY ag.name
ORDER BY 1;
|
| Management
| 67. MSSQL "Missing Indexes"
| PERF_SQL_MISSING_INDEXES_HIGHIMPACT
|
MSSQL keeps track of "missing indexes" (do a Google search). DBAmon queries SQL Server for high impact missing indexes - and displays them in
the Instance drill-down WWW page.
| SELECT
FLOOR(migs.avg_total_user_cost * ( migs.avg_user_impact / 100.0 )
* ( migs.user_seeks + migs.user_scans )) AS [1_ImprovementMeasure] ,
DB_NAME(mid.database_id) AS DB_Name ,
mid.statement ,
migs.unique_compiles ,
migs.user_seeks ,
-- migs.user_scans, (always zero)
migs.last_user_seek ,
migs.avg_user_impact ,
'CREATE INDEX [NCX_Missing_DBAmon_' + CONVERT(CHAR(8), GETDATE(), 112)
+ '_' + CONVERT (VARCHAR, mig.index_group_handle) + '_'
+ CONVERT (VARCHAR, mid.index_handle) + '_'
+ LEFT(PARSENAME(mid.STATEMENT, 1), 32) + ']' + ' ON ' + mid.STATEMENT
+ ' (' + ISNULL(mid.equality_columns, '')
+ CASE WHEN mid.equality_columns IS NOT NULL
AND mid.inequality_columns IS NOT NULL THEN ','
ELSE ''
END + ISNULL(mid.inequality_columns, '') + ')' + ISNULL(' INCLUDE ('
+ mid.included_columns
+ ')', '') AS CreateNCXStatement
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE migs.avg_total_user_cost * ( migs.avg_user_impact / 100.0 )
* ( migs.user_seeks + migs.user_scans ) > N
ORDER BY 1 DESC;
|
| Management
| 68. MSSQL "Database Compatibility Mismatch"
| PERF_SQL_DBS_COMPAT_DIFFCOUNT
|
DBAmon checks for databases whose Compatibility setting is less than that of the MSSQL instance.
You may have done this intentionally for a good reason, but most often in my experience this seems to occur as an oversight.
Setting this incorrectly can result in a performance hit due to optimizer enhancements in the MSSQL
engine that you won't be able to take advantage of.
| SELECT * from Master.sys.databases;
|
| Management
| 69. MSSQL Snapshot Backups Running in AlwaysOn Cluster
| PERF_SQL_BACKUPS_VSS_CNT_L24H
|
DBAmon checks to see if any VSS (msdb.backupset is_snapshot=1) backups have occurred on an AlwaysOn cluster. In our
env, these cause FCM crashes and failovers.
| SELECT * from MSDB..backupset;
|
| Security
| 70. SA Login Enabled?
| PERF_SQL_SA_LOGIN_DISABLED
|
DBAmon checks the enabled/disabled state of the MSSQL SA login.
| Select is_disabled, name from sys.server_principals where sid = 0x01;
|
| Management
| 71. MSSQL Schedulers Offline
| PERF_SQL_SCHEDULER_STATUS_ | VISIBLE_[ON|OFF]LINE
DBAmon checks for Offline Schedulers.
See DBA867 for details.
| SELECT * from SYS..DM_OS_Schedulers;
|
| |
---|