DBAmon |
Home |
Index/DBAmon Doc. |
|
Event Message Format - Example: DBA210W
Full Error Fail Consistency Check failed PANIC
dynamically allocated new shared memory segment
ALTER TABLE OWNER.TABLE STORAGE ( NEXT ?M );... so that the next extent size is less than the largest freespace area.
-Or
SQL Server has encountered n occurrence(s) of I/O requests taking longer than 15 seconds to complete on file... that were found during the most recent batch of SQL Log messages that were scanned.in database
A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: 0 seconds. Working set (KB): 32992, committed (KB): 64696, memory utilization: 50%.This means that Lock Pages in Memory is not enabled.
Alter login [sa] disable;
Some good articles on TEMPDB:
Corrective Action:
Add TEMPDB files to get to the optimum DB file count mentioned in this message. Make
sure that they are of a uniform size and that the size is preallocated.
A high number of SQL Virtual Log Files (VLFs) can cause DB update problems and can elongate DB recovery time. Having a high number of VLFs is caused
by having a GROWTH setting on the TLOG with a low growth increment.
Here are some good articles:
A good script to consolidate VLFs can be found in the ADVENTURESINSQL article in the URLs above.
The steps:
This event is telling you that you are using the out-of-the-box default instance setting for "Max Server Memory (MB)".
This means that MSSQL is free to grab as much memory as it needs from the OS for the MSSQL Buffer Pool. Although it
may be a matter of opinion, it is a good practice to deliberately set the MSSQL Max Server Memory setting, rather than
let it default. As with any DBAmon event, you can suppress it with the Suppress_Events:
DBC parameter if you disagree.
Corrective Action:
I suggest Googling "sql server max server memory". You will see MANY hits.
Our pratice on a server that is ONLY running MSSQL is to leave 500MB-750MB for Windows, and up to 1GB for other MSSQL
memory (The max server memory parameter only specifies the size of the buffer pool), and then give what's left to max server memory.
Of course if your server is running other applications in addition to MSSQL, your mileage will vary (you will have to reserve some
memory for your app).
Here is a good article from Paul Randall on this:
http://www.sqlskills.com/blogs/paul/multiple-log-files-and-why-theyre-bad/ .
Corrective Action:
Here are some steps which should help you to remove additional TLOG files (secondary TLOG files):
DBA858 - MSSQL Instance SIGNAL_WAIT_PCT=$a (Threshold=$b%),Signal-Wait-${this_signal_wait_pct}%
Description:
This performance event indicates that the current SQL SIGNAL WAIT % value exceeds the threshold (defaults to 25%).
A good internet article explaining the concept can be found:
Here
(Pinal Dave's SQLAuthority WWW site).
Corrective Action:
A sustained high SQL SIGNAL WAIT % indicates a need for additional CPUs.
DBA859 - $zvlf_event_cnt MSSQL Database(s) VLF Count >= DBC Threshold: $t_mssql_vlf_count[$thishost]","VLF-Count"
Description:
Corrective Action:
DBA860 - "MSSQL Instance Max Memory ($F3) is set to INFINITE ($sql_config_msm)","MaxMemINFINITE"
Description:
DBA861 - Database(s) Found With GOOD-For-Performance Options Set to OFF
Description:
DBAmon has found that there are one or more MSSQL databases which do not have OPTIONS specified which are good for MSSQL Performance.
For example, the AUTO CREATE STATISTICS option is known to be good for performance. So, if this OPTION is not specified, then this
event will occur.
Corrective Action:
Set the options mentioned in this event to ON or TRUE. If you have a good reason to have these turned off, then of course you can
suppress this event by specifying it in the "Suppress_Events" parameter of the DBC file for this instance.
DBA862 - Replication Distribution Agent(s) Found Using Non-Default Agent Profile
Description:
DBAmon has found that there are one or more MSSQL Replication Distribution Agents that are not using the DEFAULT agent profile.
This is monitored because I have found times when we need (for example) skip a certain type of error. The danger is in forgetting
to switch back to the DEFAULT profile after the problem has been resolved. Not using the DEFAULT profile, IMHO, is a poor practice
as you can skip errors that you really want to know about.
Corrective Action:
If applicable to your environment, switch back to the "Default Agent Profile" profile. As with any DBAmon event, if this is normal
for your environment, you may
suppress this event by specifying it in the "Suppress_Events" parameter of the DBC file for this instance.
DBA863 - Too Many Undistributed Replication Commands - Critical Threshold of ($t_repl_undist_cmds_c[$thishost]) Exceeded
Description:
DBAmon has found that one or more MSSQL Replication Distribution Agents have an excessive number of undistibuted SQL commands (the number
exceeds the T_Repl_Undistributed_Cmds: DBC parameter.
Corrective Action:
Since one or more Distribution Agents are not distributing commands, read the long text of this report - and restart the
appropriate Distribution Agent(s). It could be that distribution agents are failing or not running at all.
DBA867 - Found $perf_val MSSQL Schedulers which are VISIBLE OFFLINE
Description:
In MSSQL it is possible for a Windows server to have more CPUs that the engine can use - for a variety of reasons. This event
occurs when at least 1 row in SYS.DM_OS_Schedulers which indicates a VISIBLE OFFLINE scheduler.
Corrective Action:
You may search Google for yourself - this is a good hit:
https://www.mssqltips.com/sqlservertip/4801/sql-server-does-not-use-all-assigned-cpus-on-vm/ .
DBA901 - "Oracle/NT Not Active - Status=$dbstatus"
Description:
Oracle is down.
Corrective Action:
Start Oracle.
DBA903 - "Oracle/NT Listener Not Active - Status=$lsnr_status"
Description:
DBAmon attempted to run 'tnsping $ORACLE_SID'. It failed.
Corrective Action:
Determine the cause of the problem and fix it!
DBA904 - "Found $login_missingdefdb_cnt SQL Login(s) Where Default DB Does Not Exist"
Description:
DBAmon found one or more SQL Logins whose Default Datbase does not exist. This will prevent this login from connecting to SQL, and this
problem is challenging to diagnose. This can happen when a login is created using a default that does exist, and that DB is dropped.
Corrective Action:
In the body of this event you will see syntacticaly correct DDL to set the default DBs of Logins with this problem to Default DB=Master.
DBA905 - "Connect Logic Error - Probe dbamon_orant.pl Did Not Finish"
Description:
A DBAmon probe module failed.
Corrective Action:
Examine accompanying messages. Contact
DBAmon Support.
DBA909 - "DBAMON.TIMESTAMP Has > $timestamp_rowlimit rows ($timestamp_rows[$thishost])"
Description:
The number of rows in this table exceeds the threshold. The purge process must not be working.
Corrective Action:
Contact BB.
DBA910 - "db_block_buffer Read Hit Ratio of $bufhitratio[$thishost] < threshold of $t_readhit[$thishost]"
Description:
The Oracle db_block_buffer hit ratio (specifed in message) is below the threshold specified for this instance by
the T_Read_Hit: DBC parameter (or the default). This DB is performing poorly.
Corrective Action:
Increase the db_block_buffers init.ora parameter.
DBA911 - "db_block_buffer Read Hit Ratio of $bufhitratio[$thishost] is invalid (< 0 or > 100)"
Description:
While examining the Oracle db_block_buffer hit ratio, the value was found to be
invalid. Check the accompanying text to see if some error occured while querying
the Oracle dictionary.
Corrective Action:
Solve the problem which was causing the query to return invalid data.
DBA912 - RMANHUNG: Found rman process (pid=$) that has been running for $ days (threshold=2 days) !!!
Description:
A Unix process containing the string rman was found to have been running for
more than the THRESHOLD-DAYS number of days long. It is probably a dead process
which may be consuming resources even though it is not doing anything useful.
Corrective Action:
Kill the hung process at the UX level. If RMAN was invoked from a backup script,
also make sure that you kill the script.
DBA913 - $num_otrace[$thishost] ORACLE_HOME/otrace/admin/*.dat Files Found - OTRACE Is ON Which Causes Performance Problems!!!
Description:
OTRACE is on for this DB because .dat files were found in ORACLE_HOME/otrace/admin.
OTRACE can be bad for performance, so it should be turned off. You turn it off by
rm'ing the .dat files in ORACLE_HOME/otrace/admin and restarting Oracle.
Corrective Action:
Turn off OTRACE. You turn it off by rm'ing the .dat files in ORACLE_HOME/otrace/admin and restarting Oracle.
DBA914 - Instance SQL_TRACE=TRUE - This Causes Performance Problems!!!
Description:
SQL_TRACE set to true at the instance level will cause serious performance problems.
Corrective Action:
Turn off SQL_TRACE. You turn it off by running ALTER SYSTEM SET SQL_TRACE=FALSE And/Or removing this setting
from init.ora.
DBA915 - $dfltsys_cnt DB Users Found With DEFAULT TABLESPACE Set To SYSTEM !!!
Description:
DBAmon found DB users whose default tablespace is SYSTEM. This is very bad for performance.
Corrective Action:
Alter the user so that their default tablespace is not SYSTEM.
DBA916 - $tempsys_cnt DB Users Found With TEMPORARY TABLESPACE Set To SYSTEM !!!
Description:
DBAmon found DB users whose temporary tablespace is SYSTEM. This is very bad for performance.
Corrective Action:
Alter the user so that their temporary tablespace is not SYSTEM.
DBA917 - $tempperm_cnt DB Users Found Whose TEMPORARY TABLESPACE Is a PERMANENT Tablespace !!!
Description:
DBAmon found DB users whose temporary tablespace is permanent tablespace. This is very bad for performance.
Corrective Action:
Alter the user so that their temporary tablespace is a TEMP tablespace, or alter their temp
tablespace to be a type=TEMP tablespace.
DBA918 - *** oraUp() DBA_REGISTRY Shows that component (Oracle9i Catalog Views ) is at version 9.2.0.2.0 which is less than DB engine version 9.2.0.3.0(64) - rg_status=VALID ***
Description:
Starting in 9i, the Oracle dictionary catalog contains components that are registered
products. This event can also occur for DB internal components like Java. The event
means that the version of the product mentioned is lower than the version of the
DB engine. What probably happened is that the DB was upgraded within the same version
(9i for example) and catpatch was not run.
Corrective Action:
Run:
Next, verify that the versions of the internal components match the DB engine version.
Run orastat -rg.
DBA919 - MTS is being used for this Non-RAC/OPS instance - Bad for performance - mts_queue=$mts_queue[$thishost]
Description:
This instance is not using RAC or OPS, but MTS is being used. This can cause major
performance problems. So, it would be best to only use DEDICATED SERVER connections.
Corrective Action:
An easy way to effectively disable MTS is to set USE_DEDICATED_SERVER=ON in sqlnet.ora.
DBA920 - TEMP Tablespace is a PERMANENT Tablespace (It Should Be TEMPORARY) !!!
Description:
DBAmon found that your tablespace named TEMP is a permanent tablespace.
This can be very bad for performance. Any user which has this tablespace
specified for its TEMPORARY TABLESPACE will perform poorly when doing disk
sorts.
Corrective Action:
Alter the TEMP tablespace so that it is a type=TEMP tablespace. SQL:
ALTER TABLESPACE TEMP TEMPORARY;
DBA921 - Library Cache Hit Ratio is $libhitratio[$thishost]% (Should Be >= 90%) - Increase shared_pool_size !!!
Description:
The Oracle library cache hit ratio was found to be < 90%.
Corrective Action:
Increase the init.ora shared_pool_size parameter.
DBA922 - Dictionary Cache Hit Ratio is $dicthitratio[$thishost]% (Should Be >= 90%) - Increase shared_pool_size !!!
Description:
The Oracle dictionary cache hit ratio was found to be < 90%.
Corrective Action:
Increase the init.ora shared_pool_size parameter.
DBA923 - Rollback Segment Header Waits Are Too High (Gets/Waits Ratio > 1.00%) - Add Rollback Segments
Description:
The total number of Rollback Segments Header Waits exceed 1% of the total number of Rollback
Segment Header Gets.
Corrective Action:
Add more rollback segments.
DBA924 - There are fewer than 10 Free DB Cache Buffers (free_buffers=$free_buffers[$thishost]) - It would be beneficial to increase DB Cache Size
Description:
The total number of FREE DB Cache buffers is less than 10. So, this DB would probably benefit
from you specifying a large DB buffer cache.
Corrective Action:
If you have enough unused memory, increase the size of the DB Buffer Cache.
DBA925 - UNDO_MANAGEMENT Is not set to AUTO - It should be - undo_mgmt=$undo_mgmt[$thishost]
Description:
In Oracle 9i and higher, SMU (System Managed UNDO) is a GOOD THING. It should always be on.
For this 9i+ instance, it is not turned on.
Corrective Action:
Turn on SMU. This will require DB downtime.
DBA926 - FORCE LOGGING Should be turned on - force_logging=$force_logging[$thishost]
Description:
There is a very nice feature in Oracle 9 and higher called FORCE LOGGING. If this DB option is ON, then
NOLOGGING operations are all automatically disallowed.
Corrective Action:
Run: ALTER DATABASE FORCE LOGGING
DBA927 - ? Dictionary Objects have been ANALYZED - This is bad for performance
Description:
Analyzing the Oracle Dictionary can cause some very serious and hard to diagnose performance problems.
One symptom is high "recursive CPU" in a statspack report. If any SYS objects other than DUAL or
PLAN_TABLE have been analyzed, this event will occur.
Corrective Action:
Remove the analyze data for the SYS objects.
Run: execute dbms_stats.delete_schema_stats('SYS');
DBA928 - DB Cache is only 1 granule in size - It was probably underspecified - granule_size=$granule_size[$thishost] db_cache _size=$db_cache_size[$thishost]?
Description:
It was determined that this 9i or higher instance has a DB Buffer Cache that is only 1 granule
in size. This can occur when Oracle sees that you have specifed a db_cache_size that is less
than 1 granule. In this case Oracle will round up to 1 granule.
Corrective Action:
It is better to intentionally specify the cache size. And the default is 48M, so a good
minimum db_cache_size is 50M. Change the init.ora file and bounce the instance.
DBA929 - Server Memory is $hw_memory_pct_full[$thishost] used (t=$hw_memory_threshold_w/$hw_memory_threshold_c physmem=$h
w_memory_size_gb[$thishost](gB) memfree=$hw_memory_free[$thishost](gB))
Description:
This server has very little free memory. On HP-UX, this is bad for performance (paging
and swapping increase).
Corrective Action:
Reduce memory usage or increase the amount of memory. If you have any Oracle
instances with overallocated SGA memory, reduce memory consumption.
DBA930 - The instance default PERM tablespace is set to SYSTEM - Could cause performance problems
Description:
The DEFAULT TEMPORARY or PERMANENT tablespace is set to SYSTEM. If non-dictionary
objects are create in SYSTEM, performance problem probably will result.
Corrective Action:
ALTER DATABASE DEFAULT TABLESPACE tsname; (10g+ only)-or-
ALTER DATABASE DEFAULT TEMPORARY TABLESPACE tsname; (9i+ only)
DBA931 - RMAN Process $F5 was automatically KILLED
Description:
An RMAN process was found running on this server which:
This is indicative of an orphan RMAN process which is consuming CPU resources and is not accomplishing anything useful. DBAmon
has issed the OS kill command against this process.
Corrective Action:
(None)
DBA932 - UX NUSERPROC (HP-UX maxuprc Kernel Value) $os_nuserproc_pct[$thishost]% Used - MAXUPRC=$os_maxuprc[$thishost] OSUserProcCount=$os_ nuserproc[$thishost] (Thresholds: $t_nuserproc_w[$thishost]%/$t_nuserproc_c[$thishost]%)
Description:
This event is unique to HP-UX. There is a UX kernel parameter maxuprc. This parameter
controls the number of OS processes that any 1 UX userid can have running concurrently. If
this is exceeded, the OS will not fork any new processes until the process count is reduced
below this value. This can be VERY BAD for a running DB. DBAmon monitors the current OS process
count against the configured maxuprc kernel value as a percentage.
Corrective Action:
DBA934 - "Found $ssrs_failed_job_cnt Unsuccessful SSRS Report(s)"
Description:
For MSSQL instances that are running SSRS, DBAmon checks ReportServer..ExecutionLog2 for failed reports.
So, if any reports that have run since the last iteration have a STATUS of something other than
"rsSuccess", this event will occur.
Corrective Action:
There are many good resources on the internet on how to diagnose SSRS Report failures. Google away!
DBA934 - Complex DB User Passwords Enforced
Description:
Information only event. This DB has a UTLPWDMG routine active (password_verify_funtion)
in the DEFAULT profile.
Corrective Action:
No action required.
DBA940 - DB Block Corruption - $F3 Block segments
Description:
Rows were found in V$DATABASE_BLOCK_CORRUPTION.
Corrective Action:
Restore the corrupted blocks or datafiles or drop corrupt datafile.
DBA941 - DB Has $autoextend_cnt[$thishost] AUTOEXTEND=YES datafiles
Description:
This DB has at least 1 datafile with Autoextend set to YES. This makes it impossible for DBAmon to monitor for full tablespaces.
Corrective Action:
This event does not indicate a problem with your DB, but DBAmon will only monitor for
tablespace full if you disable this attribute for all tablespaces.
DBA942 - MSSQL Error=18456 Invalid Login(s) Found in Error Log
Description:
DBAmon has detected one or more 18456 SQL Login Failure events during this DBAmon iteration.
Corrective Action:
Depends on your shop. For details of where the failed login attempt originated and the login name, look at the SQL Error Log.
If you're looking for a good guide on how to troubleshoot MSSQL 18456 errors, try:
Aaron Bertrand's Excellent Article.
DBA943 - Found $lr_job_cnt Long Running SQL Job(s)
Description:
DBAmon monitors MSSQL for long running SQL Jobs. This event means that one or more SQL Jobs has been running for at least
12 hours.
Corrective Action:
If this job should be running for more than 12 hours, then add all or part of the Job name to the DBC parameter:
MSSQL_Job_LR_Check_Name_Excl_String: .
DBA944 - Large DB Transaction(s): $tr_reason
Description:
DBAmon monitors MSSQL for long running SQL Transactions. This event means that one or both of these conditions have occurred:
Corrective Action:
If this job should be running for more than 12 hours, then add all or part of the Job name to the DBC parameter:
MSSQL_Job_LR_Check_Name_Excl_String: .
DBA945 - MSSQL SvcAcct=$mssql_service_account[$thishost] Exception - Password Will Expire in $svcacct_return Days at: $svcacct_ts (LE $svcacct_t_crit)
Description:
DBAmon monitors the number of days until the Password for the MSSQL AD Service Account expires. This event
means that either the WARNING or CRITICAL thresholds have been met.
Corrective Action:
You must change the MSSQL Service Account password before it expires, but you also must change the SERVICE Login Credentials.
DBA946 - Found $tlogmultifile_cnt Database(s) with Multi-File TLOGs
Description:
DBAmon monitors for MSSQL databases with more than 1 TLOG file. While there is one good reason to have multiple TLOG files
(space problems - due to a variety of reasons - which force you to create a 2nd TLOG file on a different drive)
it is a poor practice to keep them after the space problems are resolved. This check falls under the "good housekeeping"
category. So, if a DB has more than one TLOG file, this event will occur.
-- This process assumes that the DB in question is in FULL recovery model
-- 1. Take 2 TLOG backups. You want to see "BACKUP LOG successfully processed 0 pages" for the TLOG file that you're deleting.
BACKUP LOG [bbtest3] TO DISK = N'C:\MSSQL\MSSQL11.MSSQLSERVER\MSSQL\Backup\bbtest3.bak'
WITH NOFORMAT, NOINIT, NAME = N'bbtest3-Transaction Log Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO
-- 2. Shrink the log to EMPTY. Look at the output to ensure that this worked.
DBCC SHRINKFILE ('bbtest3_log2', EMPTYFILE )
-- 3. Delete a file from a database, run this:
ALTER DATABASE bbtest3 REMOVE FILE bbtest3_log2;
-- 4. You will still see the file here (sys.master_file), but the STATE_DESC should show OFFLINE.
SELECT * FROM sys.master_files
-- 5. Run another TLOG backup.
BACKUP LOG [bbtest3] TO DISK = N'C:\MSSQL\MSSQL11.MSSQLSERVER\MSSQL\Backup\bbtest3.bak'
WITH NOFORMAT, NOINIT, NAME = N'bbtest3-Transaction Log Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO
-- 6. Now, you should NOT see the deleted TLOG file in sys.master_files:
SELECT * FROM sys.master_files
There was a SQLPASS Session in 2011 given by Wanda He (it was DBA-310) which deals with MSSQL on VMware. This talk had some excellent performance recommendations, one of which was to use multiple vSCSI (actually PVSCSI) contollers for servers running MSSQL and have a significant I/O load.
Our goal for our production environment therefore is to implement this recommendation. However, we only want to do this for *busy* servers running MSSQL (there is a downside to doing this for every server running MSSQL - including relatively *idle* servers. So, starting with DBAmon version 5.49.B, we monitoring for the following conditions:
Some URLs which speak to this:
Corrective Action:
I am not a VMware administrator, but configure multiple PVSCSI adapters and balance your I/O load among these controllers.
Corrective Action:
It is a best practice to use the SA MSSQL login as little as possible (or not at all). If you have applications connecting using the SA login,
then you should correct his by giving each application a distinct MSSQL Login, preferabley a Windows-authenticated MSSQL Login.
Here is the text that is produced with this event:
It is an MSSQL Best Practice to use the SA login only when absolutely necessary and
only by authorized (DBA) personnel. An Excessive number of concurrent SA logins may
indicate that an application is connecting to MSSQL using the SA login. If this is the
case, you should create a Window-autheticated MSSQL Login which is used exclusively by
only one application. If this is not possible, create an MSSQL Login that is used by
only one application.
Corrective Action:
Set the Compatibility value to the highest possible value - in SSMS -> Database Properties.
DBA961 - "MSSQL Login=SA Max Concurrent Logins: $tr_sa_maxsalogins (Last $tr_sa_days Day(s)) Exceeds Threshold=$t_sa_max_logins[$thishost]"
Description:
The maximum number of concurrent MSSQL Login=SA logins (sessions - from SYSPROCESSES) found during preceeding 24 hours
exceed the threshold specified in the T_SA_Logins_Max DBC parameter. If this
parameter is not specified, then the default is 2.
DBA970 - "Found X Databases With Compatibility Set to Lower than Instance Version: Y"
Description:
If the Compatibility setting of a database is less than that of the MSSQL instance, this event will occur. For me,
this most often occurs when restoring a DB from another server - and I forget to set this to the value of the new instance.
If you have this set incorrectly (lower than the instance), you could take a performance hit because you will not be able
to take advantage of MSSQL optimizer improvements in the current version.
DBA980 - "Instance DBAmon Duration: $instance_duration[$i_rm] Sec (GE Threshold: $config_parm{'default_instance_dur_max_secs'})"
Description:
This means that it took DBAmon too long to perform it checks for this instance.
Corrective Action:
Check the probeout directory and the /opt/dbamon/log/instances directory for this instance to see where it is spending too much time.
DBA990 - ","Invalid Data: Drive=($dr_drive) shows a total size of ZERO - Something may be wrong with WMI?","DriveNG","DBA990"
Description:
This is a critical event which is telling you about a near-fatal data problem. The message will vary depending on the problem.
Corrective Action:
Depends. :@) The error should give you a starting place.
DBA998 - "DB=$tl_dbname - TLog with UNLIMITED growth - Drive $tl_drive is $tl_drivepct full ($event_sev_long Threshold of $event_pct% exceeded - TLSize=$tl_tlsize(mB) TLPath=$tl_filename)"
Description:
The transaction log is full or almost full. This particular TLOG does not have a size
limit. So, it is >= 90% full internally, and the disk where the TLOG resides is
full or almost full.
Corrective Action:
Backup the transaction log.
This Document:
http://dbamon.com/errors.shtml