DBAmon
Event/Error Message Reference

Home | Index/DBAmon Doc. | DBAmon Version/Change History | DBAmon Event/Error Doc. | What DBAmon Monitors | DBAmon Download | Free Oracle Tool: orastat | Request Support

Event Message Format - Example: DBA210W


DBA005 - "DBAmon Governor Is Active"

Description: Due to a high ticket rate, the DBAmon Governor has activated itself. See
Governor Overview for more information.
Corrective Action: DBAmon will automatically inhibit monitoring of all instances for 1-2 hours. Once the problem is solved, you can manually remove the contents of the ALL inhibit file to resume monitoring, but the situation will correct itself.

DBA010 - "Logic Error - Invoked dbamon_orant.pl Did Not Finish"

Description: While attempting to check Oracle on NT, the DBAmon probe script failed. Check accompanying error message from "D" WWW page or DBAmon event email.
Corrective Action: Solve problem indicated by event diagnostics.

DBA020 - "Error(s) Found in DBC File: $dbcfilename[$i]"

Description: You have specified a DBC parameter which does not exist.
Corrective Action: The reference of correct DBC parameters can be could at
DBC Reference. Specfify a valid DBC parameter.

DBA101 - "$applhost - Missing $F3 process(es) - Number Found: $F5 - Should Be: $F6"

Description: SAP/UX - There are missing SAP work processes of the type indicated.
Corrective Action:

DBA101 - "$applhost - SAP Down - sapstart Process Not Running - Number Found: $F5 - Should Be: $F6"

Description: DBAmon did not find the sapstart process running. It should be if SAP is up.
Corrective Action: Start SAP and ensure the sapstart process is running.

DBA102 - "$applhost - SAP (R3check) Cannot Connect to DB"

Description: DBAmon ran R3check to test the connection to the DB and it was not able connect.
Corrective Action: See the accompanying messages for details. Solve problem so that R3check -d works.

DBA165 - "$applhost - /usr/sap/SID Filesystem is $bdfpct full ($bdfaval MB Free)"

Description: The named filesystem is >= 99% full.
Corrective Action: Remove unneeded files or resize filesystem.

DBA166 - "$applhost - Error running bdf /usr/sap/SID"

Description: The bdfcmd to check filesystem full failed.
Corrective Action: Check the accompanying messages and act accordingly.

DDBA200 - BName=$fg_dbname FileGroup=$fg_fgname Has At Least 1 File With MAXSIZE=Unlimited

Description: MSSQL: This event occurs when at least 1 Database File in the specified DB and Filegroup is MAXSIZE Unlimited. This is sometimes seen as a poor DBA practice.
Corrective Action: Set the maximum size of the Database File.

DBA209 - "No onbar/ontape archive data retrieved (No backup ever or DBAmon pgm. error)"

Description: It was detected by the absence of any onarchive or onbar/ontape archive data that no database archive has ever been run for this instance.
Corrective Action: Run on onbar/ontape level 0 backup.

DBA210 - "(Backup Not Running) $rounded1-$rounded2 Hours since last good onarchive backup (threshold=$backup_age[$thishost])"

Description: This event is generated when the age of the most recent onarchive backup for an Informix instance exceeds the Backup_Age specified for this host in dbamonrc.
Corrective Action: Run on onarchive backup.

DBA211 - "$rounded1-$rounded2 Hours since last good onbar/ontape backup (threshold=$backup_age[$thishost])"

Description: This event is generated when the age of the most recent onbar/ontape backup for an Informix instance exceeds the Backup_Age specified for this host in dbamonrc.
Corrective Action: Run a backup.

DBA212 - "ovtbls Value: $ovtbls Non-Zero"

Description: The OVTBLS section of onstat -p is non-zero. This means that the TBLSPACES onconfig parameter has been exceeded.
Corrective Action: Increate the TBLSPACE onconfig value.

DBA213 - "ovlock Value: $ovlock Non-Zero"

Description: The OVLOCK section of onstat -p is non-zero. This means that the LOCKS onconfig parameter has been exceeded.
Corrective Action: Increate the LOCKS onconfig value.

DBA214 - "ovuserthread Value: $ovuser Non-Zero"

Description: The OVUSERTHREAD section of onstat -p is non-zero. This means that the USERTHREADS onconfig parameter has been exceeded.
Corrective Action: Increate the USERTHREADS onconfig value.

DBA215 - "ovbuff Value: $ovbuff Non-Zero"

Description: The OVBUFF section of onstat -p is non-zero. This means that the BUFFERS onconfig parameter has been exceeded.
Corrective Action: Increate the BUFFERS onconfig value.

DBA216 - "Hours since last backup ($this_age) is a negative value"

Description: The number of hours between the current datetime (on the server being monitored) and the datetime of the newest backup on that server is a negative value. This could be caused if:
  1. A system (possibly for Y2K testing) has its date set to a date in the future.
  2. Backup(s) are completed.
  3. The date is set back to the current date.

Corrective Action: From the error message, determine which backup (L0 or L1) occured in the future. Rerun that backup.

DBA220 - "Critical Informix Message Log Message(s) Found"

Description: After issuing the onstat -m command, one (or more) of the following strings were found (the strings are specified in /opt/dbamon/adm/dbamon.msg_critical):
Full
Error
Fail Consistency Check
failed
PANIC

Corrective Action: Research the cause of the message and any accompanying onstat -m messages.

DBA221 - "DBS Reporting Error: $error_text"

Description: After issuing the onstat -m command, one (or more) of the following strings were found:
dynamically allocated new shared memory segment

Corrective Action: Research the cause of the message and any accompanying onstat -m messages.

DBA222 - "TEMP DBSpace Full - DBS Probe Error"

Description: DBAmon was attempting to see how full your dbspaces are, but it encountered an error because the TEMP dbspace is full.
Corrective Action: Solve the TEMP space problem and allow DBAmon to rerun the probe.

DBA223 - "Informational Informix Message Log Message(s) Found"

Description: DBAmon looks in the Informix message log for all strings found int /opt/oracle/adm/dbamon.msg_warn. If it finds one or more of the strings, this event occurs.
Corrective Action: Solve the Informix problem causing this message(s).

DBA230 - "Total llog files=$num_llogs Full llog files=$full_llogs ($logpct percent)"

Description: From onstat -l command output, it was determined that > 60% of the log files are full.
Corrective Action: If this system archive logs (onconfig LTAPEDEV not equal /dev/null) determine why log archive task is not executing. For systems where LTAPEDEV is /dev/null, this is probably indicative of a long transaction (see Informix System Admin guide for info on long transactions).

DBA231 - "Fatal dbaccess Error Checking Logical Logs"

Description: Check accompanying messages.
Corrective Action: Solve problem.

DBA232 - "Fatal dbaccess Error Checking DBSpaces"

Description: Check accompanying messages.
Corrective Action: Solve problem.

DBA233 - "Fatal dbaccess Error Checking Locks"

Description: Check accompanying messages.
Corrective Action: Solve problem.

DBA234 - "Fatal Error running onstat -"

Description: Check accompanying messages.
Corrective Action: Solve problem.

DBA236 - "$F2 (DBC Parm) File=$F3 Not Found -or- Not Readable"

Description: DBAmon was looking for the filename mentioned here because of the DBC parameters that you specifed. The file was not found or DBAmon lacks permission to open it.
Corrective Action: Solve problem.

DBA237 - "SQL Will Not Run"

Description: DBAmon tried to select from sysdatabase to see if Informix is up. That SQL statement failed.
Corrective Action: Solve problem.

DBA238 - "Informix Object(s) Found Whose Size Exceeds 25gB"

Description: There is a hard limitation (as of 7.x) that no table (or table fragment) may be 32gB in size. The number isn't exactly 32gB and it depends on a number of factors (number of indices, ...), but it is approximately 32gB. The objects listed in this event are at least 25gB in size.
Corrective Action: Reduce the size of these objects, or if that is not possible, fragment the objects.

DBA239 - "ROOTPATH: $F2 Device File Is Not Readable"

Description: In order to properly monitor your Informix DB, the DBAmon probe must be able to read the header pages of the first root chunk. This event means that DBAmon does not have UX read-access to the device file for the first root chunk.
Corrective Action: The problem may be the the Userid: DBC parameter is not equal to the userid that this Informix instance is running with. You must specify a UX userid in the Userid: DBC parameter which can read the first root chunk.

DBA240 - "Read Hit Ratio $rh < Threshold of $t_readhit[$thishost]"

Description: From onstat -p output, the %cached (reads) is less than the T_Read_Hit.
Corrective Action: The standard remedy for this is to increase the BUFFERS onconfig parameter. However, some applications will always have a small write hit ratio (due to large row sizes). The general rule of thumb is to increase the size of BUFFERS until you reach a diminishing return.

DBA241 - "Write Hit Ratio $wh < Threshold of $t_writehit[$thishost]"

Description: From onstat -p output, the %cached (writes) is less than the T_Write_Hit.
Corrective Action: The standard remedy for this is to increase the BUFFERS onconfig parameter. However, some applications will always have a small write hit ratio (due to large row sizes). The general rule of thumb is to increase the size of BUFFERS until you reach a diminishing return.

DBA250 - "DBSpace $dbspace (${dbspct}% full ${dbsfree}MB free) exceeds critical threshold of $t_diskcrit[$thishost]"

Description: From the onstat -d command, it was determined that one or more dbspaces exceed the T_Disk_Full dbamonrc critical value.
Corrective Action: If it is not possible to remove any data (tempdbs), add a chunk of space to the dbspace.

DBA251 - "DBSpace $dbspace (${dbspct}% full ${dbsfree}MB free) exceeds warning threshold of $t_diskwarn[$thishost]"

Description: From the onstat -d command, it was determined that one or more dbspaces exceed the T_Disk_Full dbamonrc warning value.
Corrective Action: If it is not possible to remove any data (tempdbs), add a chunk of space to the dbspace.

DBA260 - "Informix Instance Not On-Line/Read-Only"

Description: An instance which has been designated Must_be_up: = y in dbamonrc in not On-Line. DBAmon runs the onstat -i command to check the status of an Informix instance.
Corrective Action: Inspect Informix log and bring system to On-Line (multiuser) mode.

DBA261 - "Informix Instance Not On-Line/Read-Only"

Description: An instance which has NOT been designated Must_be_up: = y in dbamonrc in not On-Line. DBAmon runs the onstat -i command to check the status of an Informix instance.
Corrective Action: Inspect Informix log and bring system to On-Line (multiuser) mode.

DBA265 - "INFORMIXDIR Filesystem is $bdfpct full ($bdfaval MB Free)."

Description: The filesystem that INFORMIXDIR resides on is >= 99% full.
Corrective Action: Remove unneeded files, or expand the filesystem.

DBA266 - "Error running 'bdf \$INFORMIXDIR'"

Description: While attempting to check the filesystem that INFORMIXDIR resides on, there was an error.
Corrective Action: Solve the problem from the diagnostic messages that were displayed with this error.

DBA270 - "Table(s) found with >= 200 extents"

Description: The tables listed in this message are in more than 200 extents. There is an Informix limitation of ~219 extents per table. If a table reaches this limit, it will not be able to grow.
Corrective Action: Reorganize the table with a large extent size.
Primary Support Action: Inform the customer of the situation and the need to reorg table(s) in this state.

DBA271 - "Table(s) found with >= $max_extents[$thishost] extents"

Description: The tables listed in this message are in more extents than what you specified in the dbamonrc parm Max_Extents. There is an Informix limitation of ~219 extents per table. If a table reaches this limit, it will not be able to grow.
Corrective Action: Reorganize the table with a large extent size.
Primary Support Action: Inform the customer of the situation and the need to reorg table(s) in this state.

DBA280 - "$offline_chunks Offline Chunk(s) Found"

Description: From onstat -d, it was determined that there are chunks that are NOT in a PO/MO state.
Corrective Action: This usually means that some kind of disk hardware error has occured. Check the Informix log for messages indicating the error that caused the chunks to go offline.

DBA290 - "Connect Failure: ($conshort) After $remsh_try Attempt(s) - Server $pingable - Err=($remsherr) RC=$rc"

Description: DBAmon uses remsh to execute commands on all systems. This error means that the remsh command failed.
Corrective Action: This command can mean that the system in question is down (UX is down). It can also indicate network problems between the system running DBAmon and this system.

DBA291 - "Downloaded DBAmon Probe Software Not Found - Will retry download on next iteration"

Description: DBAmon expected the DBAmon probe software to be on this server, but it was gone. Possible causes:
  • Someone deleted the software from the server
  • For windows servers, perhaps the default directory location is something other than \. It must be the root dir of any drive.

Corrective Action: Find out why it was deleted. For Windows servers, check to make sure that the default dir of the remsh service is \.

DBA292 - "Connect resulted in compilation error"

Description: When DBAmon tried to run a problem on the server corresponding to this instance, a PERL compilation error resulted.
Corrective Action: Probably a DBAmon bug, or incorrect installation (does /usr/local/bin/perl point to a valid version of perl5?).

DBA293 - "Probe Connect TIMEOUT (dbamonrc:Probe_Timeout=$probe_timeout)"

Description: This event will only occur if you have specifed dbamonrc Probe_Timeout: parameter. The number of seconds that the connection to this instance took exceeds the Probe_Timeout: value. The connection was killed.
Corrective Action: The DB instance or server are probably hung. Respond accordingly.

DBA299 - "Informix BUG Encountered (lockspct > 100) - Used Locks=$locksused Max Locks=$locksmax ($lockspct percent)"

Description: There is a bug in some versions of Informix 7.30 where the LOCKS column of the syssesprof SMI table has incorrect values. That must be the case here.
Corrective Action: Ignore - This bug will (someday) be fixed by Informix.

DBA300 - "Used Locks=$locksused Max Locks=$locksmax ($lockspct percent)"

Description: This instance of Informix is in danger of using all available LOCKS
Corrective Action: If possible (due to shared memory constraints), increase the LOCKS ONCONFIG parameter (Informix restart required).

DBA301 - "ALL - DBAmon checking inhibited Date=$nowdate Hour=$nowhour"

Description: Monitoring for this instance has been inhibited.
Corrective Action:

DBA301 - "DBAmon checking inhibited Day=$nowday Monitor_Days=$mon_days[$h]"

Description: Monitoring for this instance has been inhibited.
Corrective Action:

DBA301 - "DBAmon checking inhibited DayHour=$dayhour Monitor_Excl=$mon_excl[$h]"

Description: Monitoring for this instance has been inhibited.
Corrective Action:

DBA301 - "DBAmon checking inhibited Hour=$nowhour Monitor_Hours=$mon_hours[$h]"

Description: Monitoring for this instance has been inhibited.
Corrective Action:

DBA301 - "DBAmon checking inhibited by ora_dbshut"

Description: Monitoring for this instance has been inhibited.
Corrective Action:

DBA301 - "This Server - DBAmon checking inhibited Date=$nowdate Hour=$nowhour"

Description: Monitoring for this instance has been inhibited.
Corrective Action:

DBA301 - "This Server_Instance - DBAmon checking inhibited Date=$nowdate Hour=$nowhour"

Description: Monitoring for this instance has been inhibited.
Corrective Action:

DBA302 - "FILECHECK File=$fc_file Not Found"

Description: You have specifed the FILECHECK parameter in the /opt/dbamon/bin/download/dbamon_[inf|ora].cfg file. The filename that you specified is not found.
Corrective Action: Place the file on the DB server.

DBA303 - "FILECHECK File=$fc_file Is > 28 Hours Old"

Description: You have specifed the FILECHECK parameter in the /opt/dbamon/bin/download/dbamon_[inf|ora].cfg file. This means that > 28 hours have elapsed since this file was last updated.
Corrective Action: Update the file on the DB server.

DBA304 - "DBAmon checking inhibited by AUTO-INHIBIT"

Description: Some processes (like standby refresh tools) bring down a database regularly. DBAmon allows tools to create a file called /tmp/DBAmon_Lock_{ORACLE_SID}.txt which will prevent DBAmon from monitoring. This event means that this file was found to exist.
Corrective Action: If the tool which touched the lock file removes the lock file, then this event will correct itself.

DBA305 - "There are $userpw_cnt Non-System USERs whose userid=password"

Description: Some Non-System Oracle users were found with the userid equal to the password. This is a breach of security.
Corrective Action: Alter the password for these users to an unpredictable value.

DBA306 - "There are $userpwchanged_cnt System USERs whose userid=password - password changed"

Description: Some System Oracle users were found with predicable passwords. They were changed to the value that you specified.
Corrective Action: None.

DBA306 - "There are $userpwsys_cnt System USERs whose userid=password"

Description: Some System Oracle users were found with obvious passwords. This is a breach of security.
Corrective Action: Alter the password for these users to an unpredictable value.

DBA308 - "DB is in ARCHIVELOG mode, but the archiver is STOPPED"

Description: The database is "doomed" because it is in archivelog mode, but the archiver is STOPPED. The online redo logs will fill and hang the instance. If you run orastat -l you should see that "Automatic Archival" is disabled.
Corrective Action:
  • To correct this dynamically, ALTER SYSTEM ARCHIVE LOG START .
  • To correct this permanently, in init.ora change log_archive_start to true.

DBA309 - "Instance is HUNG waiting for ARCHIVER process to finish (All ONLINE REDO LOGS are full)"

Description: The DB is hung and requires immediate attention. The online redo logs all need to be archived and for some reason the ARCHIVER is not clearing them out. THIS IS NOT A BACKUP PROBLEM. It is an archiver problem.
Corrective Action: See if the archiver is running. Run archive log list and look at the Automatic archival line. If it is Disabled, then you need to start the archiver (see DBA308).

DBA310 - "Instance has $shmvseg[$thishost] V shmem segments (critical threshold=$shmcrit)"

Description: This Informix instance has > $shmcrit shmem segments. This can be bad for performance.
Corrective Action: Consolidate the total size of all segments into a smaller number of larger segments.

DBA311 - "Instance has $shmvseg[$thishost] V shmem segments (warning threshold=$shmcrit)"

Description: This Informix instance has > $shmcrit shmem segments. This can be bad for performance.
Corrective Action: Consolidate the total size of all segments into a smaller number of larger segments.

DBA312 - "There are $dbarole_cnt Non-System USERs who have been granted the DBA role"

Description: The Oracle ID's listed have been granted the DBA role. They therefore that the authority to do nasty things like stopping Oracle, changing the SYS/SYSTEM password, etc..
Corrective Action: If desired, revoke DBA from the user. First, ensure the ORADBA has been granted to the user. Then revoke DBA from the user.

DBA313 - "There are $znum Possible Orphan Datafile(s) - $orphan_size gB (Threshold: 14 Days Since Last Update)"

Description: DBAmon was looking for datafiles that once belonged to this instance which are no longer mentioned in any data dictionary table. In order to qualify, the datafile must not have been touched for the last 14 days.
Corrective Action: "rm" or gzip these datafiles.

DBA314 - "There are $sec_fileperm_cnt[$thishost] Oracle file(s) with incorrect permission(s)"

Description: An Oracle file has file permission that violates security policy.
Corrective Action: chmod the file(s) to the proper permission.

DBA315 - "There are $sec_dbarole_cnt[$thishost] Non-System USERs who have been granted the DBA role"

Description: The DBA is typically defined as the UX group which has "connect internal" access. Any user which has access to this group therefore can perform "connect internal". So, the "oracle" or ora* userids are the only one which should belong to the DBA group.
Corrective Action: Un-enroll these users from the DBA group.

DBA316 - "The init.ora parm ? is (?) - It must be ? for security reasons"

Description: It has been determined that this init.ora parameter, set the way that you have chosen to set it, poses a security risk.
Corrective Action: Change the setting.

DBA317 - "Tablespace(s) Found With NO Datafiles: $F3"

Description: The tablespaces listed have no datafiles (or tempfiles).
Corrective Action: This should not happen under normal circumstances. Add a datafile or recover the missing datafile.

DBA318 - "There are 1 Non-System USERs who have been granted the ALTER USER priv"

Description: The named privilege was granted to a non-SYSTEM user.
Corrective Action: Revoke this privilege as appropriate.

DBA319 - "There are $dbarole_cnt Non-System USERs who have been granted the IMP_FULL_DATABASE role"

Description: The named privilege was granted to a non-SYSTEM user.
Corrective Action: Revoke this privilege as appropriate.

DBA320 - "DST2007 ($sw_version[$thishost]) JVM Is installed"

Description: This event was coded for the 2007 DST Change. This instance DOES have the JVM installed.
Corrective Action: This instance must be patched.

DBA321 - "DST2007 ($sw_version[$thishost]) SYS TZ Columns Founds"

Description: This event was coded for the 2007 DST Change. This instance DOES have the SYS-owned *TIME ZONE* columns.
Corrective Action: This instance must be patched.

DBA322 - "DST2007 ($sw_version[$thishost]) Non-SYS TZ Columns Found"

Description: This event was coded for the 2007 DST Change. This instance DOES have the Non-SYS-owned *TIME ZONE* columns.
Corrective Action: This instance must be patched.

DBA323 - "DST2007 ($sw_version[$thishost]) TZ-Column/JVM Patch Required and Missing"

Description: This event was coded for the 2007 DST Change. It was determined that for either "TZ-Columns" or JVM that the DST2007 patching is required, but not installed.
Corrective Action: This instance must be patched.

DBA330 - "CLEANERS ($cleaners[$thishost]) must be at least 75% of the number of disks ($num_disks[$thishost]) and >= (LRUS/2) (LRUS=$lrus[$thishost])"

Description: For good performance, the number of CLEANERS should be >= 75% of the number of disks that contain chunks. Also, the number of CLEANERS should be >= LRU's/2.
Corrective Action: Set the CLEANERS value to the correct setting and bounce Informix.

DBA330 - "HDR Not Active - Type=$hdr_type State=$hdr_state Name=hdr_name"

Description: DBAmon looked at the 'sysdri' SMI table and found that HDR was configured but not active.
Corrective Action: Restart HDR.

DBA340 - "Duplicate ONCONFIG Parameters Found"

Description: The ONCONFIG parms listed were specified twice. This probably means that you meant to change something, but did so in the wrong place.
Corrective Action: Remove the duplicate parameter(s).

DBA350 - "Current Average Checkpoint Duration is $ckptavg[$thishost] Seconds"

Description: The most recent average checkpoint duration was > 120 seconds.
Corrective Action: Probably due to too much data to write at checkpoint time, or too few cleaners. Try increasing CLEANERS and decreasing LRU_MIN_DIRTY and LRU_MAX_DIRTY to reduce the number of dirty pages when the checkpoint time arrives.

DBA351 - "JobName=($this_job_name) Category=($this_job_category) Should be EXECUTING"

Description: You specified the DBC MSSQL_DDP_Job_Cat_MBR: or MSSQL_DDP_Job_Name_MBR: parameter to ensure that certain MSSQL Jobs are always running. This event indicates that the specified JOB NAME or job(s) of the specified JOB CATEGORY are not running.
Corrective Action: Start the job.

DBA352 - "Job Owner - Found $job_owner_counter Job(s) with Incorrect JOB OWNER (Owner should be $job_good_owner)","Job-Owner","DBA352"

Description: The Job(s) displayed in this message are owned by a SQL Login other than ... whatever is the designated GOOD owner (problaby SA).
Corrective Action: Run the supplied SQL to alter the job(s). The SQL will correct the JOB Owner.

DBA353 - "DB Owner - Found $db_owner_counter DB(s) with Incorrect DB OWNER (Owner should be $db_good_owner)","DB-Owner","DBA353"

Description: The DB(s) displayed in this message are owned by a SQL Login other than ... whatever is the designated GOOD owner (problaby SA).
Corrective Action: Run the supplied SQL to alter the DB(s). The SQL will correct the DB Owner.

DBA354 - "Job Status - Found $job_failed_counter Job(s) with LastRunOutcome=Failed (Enabled Jobs only)","Job-Failed=$job_failed_counter"

Description: This check only applies to ENABLED jobs. If the most recent execution Failed, then this will occur.
Corrective Action: Correct the problem which caused this job to fail, and rerun the job.

DBA360 - "(KAIO on) - NUMAIOVPS value of $numaiovps[$thishost] is greater than recommended value of 1 or 2"

Description: If KAIO is on, then you only need to specify 1 or 2 NUMAIOVPS to do file I/O.
Corrective Action: Reduce the number of NUMAIOVPS to 2.

DBA370 - "(KAIO off) - AIO Least to Most Ratio $aiorat[$thishost] > Threshold of 40 - NUMAIOVPS value: $numaiovps[$thishost] too small"

Description: In an attempt to help you tune the correct number of AIO VP's, DBAmon runs onstat -g iov to see how many I/O's have been issued by the most and least busy AIO VP. If the most busy is >= 40 times busier than the least busy, then more AIO VP's should be configured.
Corrective Action: The message contains a recommended number NUMAIOVPs. Change the ONCONFIG file to this value and bounce Informix.

DBA380 - "Normal Backup Schedule Not In Place"

Description: This Informix HDR Primary has no registered backups.
Corrective Action: Run a LVL0 backup.

DBA390 - "(Backup Running Now) $rounded1-$rounded2 Hours since last good onbar/ontape backup (threshold=$backup_age[$thishost])"

Description: You have specified the Backup_Age: parameter to turn on backup age checking. The age of the most recent ontape-style backup exceeds the threshold that you specified. However, a backup is running now, so this event is a WARNING.
Corrective Action: This event will go away when this backup ends successfully.

DBA390 - "(Backup Was Rerun) $rounded1-$rounded2 Hours since last good onbar/ontape backup (threshold=$backup_age[$thishost])"

Description: You have specified the Backup_Age: parameter to turn on backup age checking. The age of the most recent onbar/ontape-style backup exceeds the threshold that you specified. However, since you also specified Backup_Command: DBAmon has automatically launched a backup according to the command that you specified.
Corrective Action: This event will go away when this backup ends successfully.

DBA391 - "(Backup Running Now) $rounded1-$rounded2 Hours since last good onarchive backup (threshold=$backup_age[$thishost])"

Description: You have specified the Backup_Age: parameter to turn on backup age checking. The age of the most recent onarchive-style backup exceeds the threshold that you specified. However, a backup is running now, so this event is a WARNING.
Corrective Action: This event will go away when this backup ends successfully.

DBA391 - "(Backup Was Rerun) $rounded1-$rounded2 Hours since last good onarchive backup (threshold=$backup_age[$thishost])"

Description: You have specified the Backup_Age: parameter to turn on backup age checking. The age of the most recent onarchive-style backup exceeds the threshold that you specified. However, since you also specified Backup_Command: DBAmon has automatically launched a backup according to the command that you specified.
Corrective Action: This event will go away when this backup ends successfully.

DBA399 - This Oracle instances uses SPFILE - spfile=$spfile[$thishost]"

Description: This instance has a non-null value for the SPFILE init.ora parameter. This event only occurs if the Run_SPFile_Check: dbamonrc parameter is set to Y.
Corrective Action: This event will go away when this backup ends successfully.

DBA400 - " $mrl0days[$thishost] Days since last good Level=0 backup (threshold=$l0_age[$thishost])"

Description: You have specified the Backup_Age: parameter to turn on backup age checking. The age of the most recent onarchive-style backup exceeds the threshold that you specified.
Corrective Action: Run a backup.

DBA410 - "Possible Long TX Detected - HWMPCT: $hwmpct (threshold=$t_longtx[$thishost])"

Description: See
Long Transaction Detection.
Corrective Action:

DBA460 - "$msg"

Description: This event means that the number of UX file descriptors used is >= 90% of the kernel configured value. This will kill Informix if it reaches 100%.
Corrective Action: Stop processes that are using file descriptors, or increase the appropriate kernel parm.

DBA461 - "UX NFILEs %s%% Used - Current Value=%s Kernel Maximum=%s (Critical Threshold=95%%)"

Description: This event means that the number of UX file descriptors used is either >= 90% of the kernel configured value (Warning event) or >= 95% (Critical event). This will cause Oracle to crash if it reaches 100%.
Corrective Action: Stop processes that are using file descriptors, or increase the appropriate kernel parm.

DBA470 - "The Oracle Autostart software (/sbin/init.d/oracle) is obsolete - it does not invoke oraadmin"

Description: The current /sbin/init.d/oracle (as of 06/2005) invokes oraadmin. This file that was found does not invoke oraadmin.
Corrective Action: Run /usr/local/dba/tools/oracle_autostart/setup as root.

DBA471 - "This server does not have Oracle Autostart configured - Start and/or Stop symlink missing"

Description: The HP-UX autostart files (/sbin/rc2.d/K800oracle and /sbin/rc3.d/S200oracle) don't exist. So autostart is not configured.
Corrective Action: Run /usr/local/dba/tools/oracle_autostart/setup as root.

DBA472 - "The Oracle Listener Log $F5 is $F3 MB (exceeds threshold of $F4 MB)"

Description: The $ORACLE_HOME/network/log/listener.log file is TOO BIG.
Corrective Action: Remove or compress file.

DBA502 - "Undiagnosed rcp Error - Userid=$userid[$thishost]"

Description: DBAmon was attempting to download the DBAmon "Probe" software to this host, but the rcp command failed.
Corrective Action: Read the accompanying messages and solve the problem accordingly.

DBA502 - "rcp Error - Userid=$userid[$thishost] account disabled"

Description: DBAmon was attempting to download the DBAmon "Probe" software to this host, but the rcp command failed. The problem is that the account that we are rcp'ing to is disabled.
Corrective Action: Re-enable the account and retry.

DBA502 - "rcp Error - Userid=$userid[$thishost] login incorrect"

Description: DBAmon was attempting to download the DBAmon "Probe" software to this host, but the rcp command failed. The problem is that the .rhosts file on the remote for this userid does not have a correct entry for the DBAmon master.
Corrective Action: Add the appropriate .rhosts entry to the userid's account on the remote host.

DBA502 - "rcp Error - Userid=$userid[$thishost] password expired"

Description: DBAmon was attempting to download the DBAmon "Probe" software to this host, but the rcp command failed. The problem is that the password for this userid has expired.
Corrective Action: Set a new password.

DBA502 - "rcp error - userid=$userid[$thishost] Undiagnosable"

Description: DBAmon was attempting to download the DBAmon "Probe" software to this host, but the rcp command failed.
Corrective Action: Correct the remsh connectivity problem.

DBA503 - "$perlpath[$thishost] Does Not Invoke Perl Version 5"

Description: DBAmon checks the Perl version on the remote host to ensure that it is >= version 5.
Corrective Action: Upgrade Perl.

DBA503 - "/usr/local/bin/perl Does Not Invoke Perl Version 5"

Description: DBAmon checks the Perl version on the remote host to ensure that it is >= version 5.
Corrective Action: Upgrade Perl.

DBA504 - "Unable to find ORACLE_SID=$orasid[$thishost] in /etc/oratab - Unable to convert * to value"

Description: You specified ORACLE_HOME: of * in the DBC file. While DBAmon was attempting to resolve this to a specific ORACLE_HOME value, it looked in /etc/oratab for an entry with the SID that specified. This entry did not exist.
Corrective Action: Create the correct entry in /etc/oratab on the DB server.

DBA505 - "Filecheck: File=$filecheck Not Found"

Description: You must have specified the Filecheck: dbamonrc parm. DBAmon did not find the file that you specified.
Corrective Action: Place the file on the DB server.

DBA511 - "$rounded1-$rounded2 Hours since last good ora_backup backup - threshold: $backup_age[$thishost] $running_str"

Description: You specified the Backup_Age: DBC parameter for this instance. The number of hours since the last 'ora_backup' backup exceeds the threshold that you specified.
Corrective Action: Run a backup.

DBA512 - "$bckmsg - threshold: $backup_age[$thishost] $running_str"

Description: Similar to DBA511.
Corrective Action: Run a backup of the correct type.

DBA513 - "Backup Method for this DB is $backup_method_long[$thishost] - Backup_Age: requires that Backup Method be RMAN; EXP; TBS or FULL"

Description: You specified the Backup_Age: DBC parameter for this instance. DBAmon then tried to determine the DB backup type for this instance. Is must be one of the types listed above.
Corrective Action: Run a backup of the correct type.

DBA514 - "MSSQL DB=$db - Backup Was Invoked - Method=$amethod"

Description: You specified the Backup_Age: DBC parameter for this MSSQL instance. The number of hours since the last backup exceeds the threshold that you specified.
Corrective Action: Run a backup.

DBA515 - "MSSQL DB=$db - Invoked Backup Failed - Method=$amethod"

Description: You specified the Backup_Command: DBC parameter. DBAmon tried to invoke a backup, but it failed.
Corrective Action: Examine the accompanying error messages and act accordingly.

DBA516 - "Backup Method for this DB is $backup_method_long[$thishost]"

Description: The backup type for this Oracle DB is 'NONE'. If you specify Backup_Age: in the DBC file, then you should have backups scheduled.
Corrective Action: Schedule backups.

DBA517 - "$bckmsg - Threshold: $backup_age_lvl0[$thishost] (Hours) $t_lvl0_days (Days) $running_str_lvl0"

Description: The number of hours since the last successful RMAN LVL0 backup exceeds the threshold. The threshold is calculated by multiplying the Backup_Age: DBC parameter (specified in hours) by 7. If the resulting number is > 15 days it is set to 15 days.
Corrective Action: Run an RMAN LVL0 backup. To prevent this event, set the Backup_Command: DBC parameters to run a LVL0 backup. Then DBAmon will automatically run a LVL0 backup when this threshold is exceeded.

DBA518 - "There have been $unrecdf_cnt UNRECOVERABLE Datafile Changes since the last RMAN LVLx backup"

Description: This event occurs when an UNRECOVERABLE (NOLOGGING) change has been made to the database since the last LVL0 backup. You will not be able to roll the entire database forward past the time of the unrecoverable change without corrupting data if you have to RECOVER the database.
Corrective Action: Run an RMAN LVL0 backup and stop making unrecoverable changes to the database.

DBA519 - "Found $F3 Database(s) in Full/Bulk-Logged Recovery Model with no TLOG backups (last 30 days) - DBs: $F4","DBsNoTLOGBkups","DBA519"

Description: This event occurs when thie SQL instance has at least 1 database with its recovery model set to FULL or BULK-LOGGED, but there have not been any TLOG backups within the last 30 days.
Corrective Action: Either set the RECOVERY MODE for the database to SIMPLE, or start running regular TLOG backups.

DBA520 - "Redo Log Switch Rate for the last 24 hours is $redo_1day_count[$thishost] Switches/Hour (Threshold: $df_redo_rate) - Excessive log switches are BAD for DB performance - Increase Online Redo Log size"

Description: DBAmon measures the redo log switch rate for the last 24 hours. That number is compared to the the REDO SWITCHES PER HOUR threshold. If that threshold is exceeded, then this event occurs. Excessive log switches are BAD for DB performance.
Corrective Action: To reduce the number of Online Redo Log switches, increase the size of the Online Redo Logs.

DBA521 - "Peak Last-30-Days Redo Rate ($redo_30day_max_gb[$thishost] GB) vs. Archivelog FS Size ($redo_archv_fs_gb[$thishost] GB)Ratio is $redo_archv_ratio[$thishost] (Threshold=$df_redo_fs_ratio) - Increase Archivelog FS Size to ${z} GB "

Description: It is a good pratice to size the archivelog FS of an Oracle instance to hold 1 PEAK Days worth of redo data. It was determined that the archivelog FS of this instance does not meeting this criteria (it is too small)
Corrective Action: Increase the size of the archivelog filesystem.

DBA530 - "NZ Not Online"

Description: DBAmon runs the command nzstate to query the status of the NZ query environment. If the result of this command is anything but Online then this event will occur.
Corrective Action: Start the NZ environment - so that the nzstate command returns Online.

DBA531 - "Too Long Since Last Good Groom/Reorg (T=$t_reorg_age_hrs[$thishost] hours)"

Description: For the database names that are shown in the body of this event, the number of hours since the last groom exceed the DBC parameter Reorg_Age_Hrs: then this event will occur.
Corrective Action: Run (or rerun) the groom command(s).

DBA532 - "Too Long Since Last Good Database Backup (t=$backup_age[$thishost] hours) - WorstDB=$bckage_worstdb"

Description: DBAmon has determined that backup age (the number of hours since the last successful backup) for least one NZ database exceeds the number of hours that you specified in the Backup_Age: DBC parameter. The body of this event will contain the details of the age of the most recent backup for all databases.
Corrective Action: Run (or rerun) the NZ backup(s).

DBA601 - "Oracle Max Processes Exceeded: $reason"

Description: SQL could not run because processes has been exceeded.
Corrective Action: Solve the cause of the excessive processes.

DBA602 - "Oracle Not Active/DB Not Open: $reason"

Description: Oracle is down.
Corrective Action: Restart Oracle.

DBA603 - "Oracle Crashed - $z"

Description: DBAmon found from the end of the alert log that Oracle has crashed. Depending on the error and the setting of the DBC parameter Must_Be_Up: DBAmon may have tried to restart Oracle.
Corrective Action: Restart Oracle if DBAmon did not already do this for you.

DBA604 - "Online Redo Log $F2 $F3 In Exception Status"

Description: The online redo log mentioned is not in normal status.
Corrective Action: Examine the alert log for the cause of the problem.

DBA605 - "Oracle PROCESSES - Current Count: $current - INIT.ORA Value: $config - Percent Used: ${procpct}%"

Description: The current number of processes (from v$process) is this percent of the init.ora "processes" parameter.
Corrective Action: Get rid of some sessions before you use them all up.

DBA606 - "Oracle DB_FILES - Current Count: $f_current - INIT.ORA Value: $f_config - Percent Used: ${f_pct}%"

Description: The current number of datafiles (from v$datafile) is this percent of the DB_FILES init.ora parameter.
Corrective Action: Drop unneeded tablespaces or increase DB_FILES (this requires an instance bounce).

DBA607 - "Tool ora_oddjob not found in crontab"

Description: The ora_oddjob tool was not found in cron (crontab -l was run).
Corrective Action: Add a correct entry to cron for ora_oddjob.

DBA608 - "Tool ora_backup_sched not found in crontab"

Description: The ora_backup_sched tool was not found in cron (crontab -l was run).
Corrective Action: Add a correct entry to cron for ora_backup_sched.

DBA610 - "Critical Messages Found in Oracle Alert Log"

Description: DBAmon looks for certain strings in the last 20 lines of the alert log. It found at least of these strings there. The strings that DBAmon looks for can be found in /opt/dbamon/bin/download/dbamon_ora.cfg
Corrective Action: Solve the Oracle problem.

DBA611 - "Listener Not Active: $lsnrerror"

Description: When DBAmon checked to see if the listener was running, it issued: lsnrctl status. This is what failed.
Corrective Action: Get the listener running for this DB.

DBA612 - "Tnsping failed: $tnserror"

Description: When DBAmon checked to see if the listener was running, it issued: tnsping $ORACLE_SID.world. This is what failed. The accompanying error message should give a hint as to what the problem is.
Corrective Action: Get tnsping $ORACLE_SID.world to work for this DB.

DBA613 - "Listener Was Restarted"

Description: DBAmon found that the default listener was not runing. It attempted to issue 'lsnrctl start' and it worked.
Corrective Action: This is an informational message.

DBA614 - "Listener Restart Failed"

Description: DBAmon found that the default listener was not runing. It attempted to issue 'lsnrctl start' and it failed.
Corrective Action: Examine the accompanying error messages and act accordingly.

DBA615 - "Permissions changed on listener.ora to 700"

Description: DBAmon changed the file permissions of listener.ora to 700.
Corrective Action: Informational Message.

DBA616 - "Password established on listener.ora"

Description: DBAmon established a non-encrypted password in the listener.ora file.
Corrective Action: Informational Message.

DBA621 - "Oracle TS: $ts - ${pct}% Full - Critical ($mbfree MB Free - Threshold: ${t_diskcrit[$thishost]}%)"

Description: The tablespace mentioned is full or almost full.
Corrective Action: Add a datafile.

DBA622 - "Oracle TS: $ts - ${pct}% Full - Warning ($free MB Free - Threshold: ${t_diskcrit[$thishost]}%)"

Description: The tablespace mentioned is full or almost full.
Corrective Action: Add a datafile.

DBA623 - "Add Datafile Command Failed - rc=$F2 cmd=$ts_command[$thishost] ts=$ts pc=$pc ad=$ad"

Description: You have specified the T_TS_Command: in the DBC file for this DB. At least one tablespace was at least the Warning Threshold full, so DBAmon invoked this command that you specified. This event means that this command ended with a non-zero return code.
Corrective Action: Solve the problem which is preventing your Add Datafile command from working.

DBA624 - "Added Datafile to TS: $ts - Was: ${pc}% Full - Added: $ad MB"

Description: You have specified the T_TS_Command: in the DBC file for this DB. At least one tablespace was at least the Warning Threshold full, so DBAmon invoked this command that you specified. This event means that this command ended with a zero return code.
Corrective Action: None. This event is notification that your Add Datafile command worked.

DBA625 - "Oracle Error encountered while checking tablespaces"

Description: DBAmon encountered a critical error while trying to check your tablespaces. Look at the accompanying diagnostic messages.
Corrective Action: Solve the problem which is preventing this from working.

DBA626 - "Tablespace $F2 Was Coalesced"

Description: This tablespace was full or almost full - DBAmon automatically coalesced it.
Corrective Action: None. If this did not free space, then you will have to add space.

DBA627 - "RBS Segments In Tablespace $F2 Were Shrunk "

Description: The RBS tablespace was full or almost full. DBAmon automatically shrank the RBS segments that reside there.
Corrective Action: None. If you don't want this to happen, don't let RBS fill!

DBA630 - "Object(s) Found With Extents >= ${t_extents[$thishost]}% Of Max_Extents"

Description: You specifed the "T_Extents:" DBC parameter of X. The tables in question have >= X percent of maxextents.
Corrective Action: You can run reorg the table to 1 extent, or ALTER TABLE x MAXEXTENTS UNLIMITED or let DBAmon do this for you by specifying:
T_Extents: x Fix

DBA631 - "Oracle In RESTRICTED SESSION Mode:"

Description: Oracle is in RESTRICTED SESSION mode.
Corrective Action: Run ALTER SYSTEM DISABLE RESTICTED SESSION.

DBA632 - "Found $foundtbls Whose Next_Extent Will Not Fit"

Description: The objects listed have a next extent size that will not fit in the indicated tablespace. The issue is that there is not an area of CONTIGUOUS freespace large enough in the tablespace. This can also be caused by PCTINCREASE being set to non-zero. It has been my experience that this is a bad practice to ever set PCTINCREASE to a non-zero value. It causes runaway extent size and non-uniform "holes" in tablespaces.
Corrective Action: One of:
  • Reduce the size of the next extent:
    ALTER TABLE OWNER.TABLE STORAGE ( NEXT ?M ); 
    ... so that the next extent size is less than the largest freespace area.

    -Or

  • Add an amount of space to the tablespace that is greater than the size of the next extent.
Also, if PCTINCREASE is non-zero, set it to 0. ALTER the object in question so that the next extent size is less than the largest contiguous piece of freespace. Then, contact the business partner to inform them about what you have done. Advise the BP that if they have an issue with our taking this action that they need to open a ticket to us to discuss alternatives. Also, if the object has PCTINCREASE set to non-zero, inform them that you intend to set it to 0. The reason for asking the BP is that they may have intentionally specifed a large NEXT EXTENT size.

DBA640 - "Oracle ORACLE_HOME FS Is Full/Almost Full ({$homefull[$thishost]}%)"

Description: The disk/filesystem where ORACLE_HOME reside is full or almost full.
Corrective Action: Free some disk space.

DBA641 - "Oracle Archive Log Dir: $arcdir FS is ${arcdirfull}% Full - Warning (Threshold: ${t_arclog_w[$thishost]}%)"

Description: The disk/filesystem containing the archive log destination is almost full.
Corrective Action: Make some space on the disk before it fills!

DBA642 - "Oracle Archive Log Dir: $arcdir FS is ${arcdirfull}% Full - Critical (Threshold: ${t_arclog_c[$thishost]}%)"

Description: The filesystem containing the archive log destination is almost full.
Corrective Action: Make some space on the disk before it fills!

DBA643 - "Oracle Archive Logging is On; But Auto Archiving is Off"

Description: Having archivelog mode on and auto archiving off doesn't make sense.
Corrective Action: Either turn on auto archiving (set log_archive_start = true in init.ora) or turn off archivelog mode.

DBA644 - "Either: (1) NT srvinfo Command Did Not Run -or- (2) NT srvinfo Command Did Not Find The ARCLOG Disk -- Something Is Wrong"

Description: DBAmon ran the Toolkit srvinfo command to see how full the disk where the archive log reside is. It failed.
Corrective Action: If srvinfo is not found, install the Microsoft NT Resource Kit.

DBA645 - "Oracle Archive Log Dir is NULL: Something went wrong with svrmgrl"

Description: While DBAmon was checking value of log_archive_dest, a null value was returned. This could be because Oracle is DOWN.
Corrective Action: The next time that DBAmon checks to see if Oracle is up, this condition will be further diagnosed.

DBA646 - "Oracle Mandatory Archive Log Destination(s): $arcdestlist in ERROR Status"

Description: An Archive Destination with a binding of MANDATORY was found to be in ERROR status.
Corrective Action: Correct the cause of the error and issue the appropriate ALTER SYSTEM command to cause Oracle to reopen this destination.

DBA647 - "Oracle Archive Log OPTIONAL Destination(s) (with reopen > 0): $arcdestlist in ERROR Status"

Description: An Archive Destination with a binding of OPTIONAL and REOPEN > 0 was found to be in ERROR status.
Corrective Action: Correct the cause of the error and wait for Oracle to automatically reopen this destination.

DBA648 - "OFFLINE Datafile(s) Found"

Description: For some reason, you have datafiles that are not ONLINE.
Corrective Action: Examine the alert log and act accordingly.

DBA641 - "Oracle Archive Log Dir: $arcdir FS $arcfs is ${arcdirfull}% Full - Warning (Threshold: ${t_arclog_w[$thishost]}%) $rerun"

Description: The archivelog FS named here is above the warning threshold full.
Corrective Action: Run the appropriate process to reduce the amount of space used in this filesystem.

DBA649 - "init.ora Error: $z"

Description: A contradiction was found in your init.ora file.
Corrective Action: Fix it!

DBA650 - "Archivelog $F3 has invalid format - Should be arch%t_%s.dbf"

Description: At least 1 archivelog was found in one of your archivelog destinations whose format was either:
  • filesystemname/1_NNNNN.dbf
  • filesystemname/archarchv1_NNNNN.dbf
... which violates our standard of /arch%t_%s.dbf.
Corrective Action: Change init.ora log_archive_dest* and/or log_archive_format. This can be done dynamically with alter system in 8i+.

DBA651 - "MSSQL Eventlog Alert(s) Found"

Description: While DBAmon was examining the MSSQL Alert Log, it found this message with a severity of 17 or higher.
Corrective Action: Act according to the error message text.

DBA652 - "$dbmsout[$thishost] Version $sw_version[$thishost] is older than the minumum 'good' version ($this_minver) for this family ($this_family)"

Description: This message originates with DBAmon DBMS Version Oversight. It only appears if you have configured it from the DBAmon Console. This message means that the indicated DBMS instance is running a version of the vendor DBMS software which is lower than the version that you have specified as the "Minimum Good" versoin for this version family. For example, if you have configured Oracle 8.1.7.4 as the "Minimum Good" version for the 8.1.7 family, any instance that is running 8.1.7, but less than 8.1.7.4 will receive this event.
Corrective Action: Upgrade the instance to a higher software version.

DBA653 - "The filesystem(s) that match $fs_check_mask[$thishost] are $pct_fsused[$thishost]% full (threshold : $fs_check_threshold"

Description: This event is created the FS Full Checking. The filesystems which match the FS_Check_Mask: are >= the FS_Check_Threshold: .
Corrective Action: Remove files from the filesystems or add space.

DBA654 - "All [controlfiles/redo logs] are one 1 Drive - They should be spread out to multiple drives"

Description: It is a poor practice to place all controlfiles/redo logs on 1 disk.
Corrective Action: Make sure that the controlfiles/redo logs are placed on different disks.

DBA655 - "All redo logs are one 1 Drive - They should be spread out to multiple drives"

Description: It is a poor practice to place all controlfiles/redo logs on 1 disk.
Corrective Action: Make sure that the controlfiles/redo logs are placed on different disks.

DBA656 - "DB=$tl_dbname - TLog is $tl_ratio times DB size and TLog is >= 1gB (DBSize=$tl_dbsize (mB) TLSize=$tl_tlsize (mB) Threshold=$tl_ratio_threshold)"

Description: The transaction log has grown to at least 1gB in size and is at least 5 times the size of the database datafiles. As this does not make sense, it indicates a problem where, probably, the transaction log is not getting backed up and cleaned out.
Corrective Action: You need to backup and shrink the transaction log AND ensure that backups start running on a regular basis so as to prevent this from reoccuring.

DBA657 - "DB=$tl_dbname - TLog with LIMITED size ${this_tl_full_vs_limit}% full ($event_sev_long Threshold of $event_pct% exceeded - TLSize=${tl_tlsize}(mB) TLLimit=${tl_growth}(mB) Log_Reuse_Wait_Desc=$z)"

Description: The transaction log is full or almost full. This particular TLOG has a size limit. So, it is >= 90% full internally, and it has reached or almost reached its limit.
Corrective Action: Depends on the Log_Reuse_Wait_Desc value. I suggest a Google search on Log_Reuse_Wait_Desc.

DBA658 - "MSSQL Agent Eventlog Alert(s) Found"

Description: The string /error/i or /unable to/i was found in the SQL Agent Error Log file.
Corrective Action: Act according to the error message text.

DBA659 - "You only have $cf_rows - You should have at least $df_cf_min (cf_rows=$cf_rows)"

Description: In Oracle it is a good practice to have >1 controlfile (there's no good reason not to). So DBAmon will monitor the number of controlfiles for you. If you have the Default_Min_CF: dbamonrc parameter set, then the actual number of controlfiles will be compared against the value that you specify. If the number of controlfiles is less, then this event will occur.
Corrective Action: Add controlfile(s). To do this, stop the instance, copy the existing controlfile to the new controlfile filenames, change init.ora to include your new controlfile in the "control_files" parameter, and start the instance.

DBA660 - "Oracle SGA is $sga_pct[$thishost]% Full ($sev threshold: $t_sga_c[$thishost]%)"

Description: The Oracle SGA is full or nearly full.
Corrective Action:
  • Workaround: Run "alter system flush shared_pool". This will clear all data from the shared pool. However, if the activity on the DB is similar afterwords to the activity that filled the shared pool, then the condition will likely return. Flushing the shared pool does de-fragment it.
  • Long Term Fix: Increase the "shared_pool_size" init.ora parameter.

DBA661 - "User=$userid[$thishost] crontab is empty"

Description: This crontab is empty.
Corrective Action: Populate cron.

DBA662 - "Archivelog destination(s) ($F3) are in ORACLE_HOME filesystem"

Description: The default archivelog destination is $ORACLE_HOME/dbs. This is not a good practice to put archivelogs into the $ORACLE_HOME filesystem.
Corrective Action: Set the appropriate archive_log_dest_N init.ora parameter. In the case of a standby database, you will need to set standby_archive_dest.

DBA663 - "Non-System DB Count is Zero"

Description: There aren't any Non-System (msdb, master, model, tempdb) databases. Why have an instance if you don't have any data in it?
Corrective Action: Create a Database.

DBA664 - "Found $F3 I/O Delay Message(s) in SQL Log","MSSQL_SQL_Log","DBA664"

Description: This event displays the number of SQL Log messages:
SQL Server has encountered n occurrence(s) of I/O requests taking longer than 15 seconds to complete on file  in database 
... that were found during the most recent batch of SQL Log messages that were scanned.
Corrective Action: These can indicate a problem with your I/O subsystem, or in some cases a shortage of SQL memory. Do a GOOGLE search on the error for some good advice.

DBA665 - "This Standby DB is $delta Minutes Behind Primary (Threshold is $is_threshold[$thishost])"

Description: You have invoked Standby In-Sync checking by specifying the In_Sync* parameters in the DBC file for this instance. DBAmon compares the CONTROLFILE_TIME in v$database values of the standby database to the same value of the primary database. If these two values differ more than the In_Sync_Age: value that you specfied (in minutes) then this event will occur. In other words, The amount of time since the standby DB has been refreshed exceeds the In_Sync_Age: parameter that you specified.
Corrective Action: Your process for updating the standby DB is not working. Fix it.

DBA666 - "$nologobj NOLOGGING Object(s) Found On Primary DB $is_prihost[$thishost]/$is_prisid[$thishost]"

Description: You have invoked Standby In-Sync checking by specifying the In_Sync* parameters in the DBC file for this instance. In order for a standby database to be kept in sync with its primary, there cannot be any tables or indexes in the primary DB which were created with the NOLOGGING parameter. This event means that DBAmon has detected the existence of at least on NOLOGGING table or index on the primary database.
Corrective Action: Just because there are NOLOGGING objects on the primary DB, this does not necessarily mean that there has been a NOLOGGING operation on the primary DB which would invalidate the standby DB. You must alter the NOLOGGING object(s) to LOGGING. A DBA668 event will occur if an UNRECOVERABLE change was made to the primary.

DBA667 - "Primary DB $is_prihost[$thishost]/$is_prisid[$thishost] is DOWN (SQL will not run)"

Description: You have invoked Standby In-Sync checking by specifying the In_Sync* parameters in the DBC file for this instance. In order for a standby database to be kept in sync with its primary, the primary DB, whose server and ORACLE_SID you have specified in the DBC file, is down.
Corrective Action: Start the primary DB, or correct the DBC parameters that specify the primary DB.

DBA668 - "Primary DB Has $priunrec datafile(s) with UNRECOVERABLE changes since the last rebuild"

Description: You have invoked Standby In-Sync checking by specifying the In_Sync* parameters in the DBC file for this instance. DBAmon has detected that an unrecoverable change has occured on the primary which therefore was not transmitted to the standby. This unrecoverable change has occured after the most recent rebuild of the standby database.
Corrective Action: Rebuild the standby DB.

DBA670 - "Found $F3 Memory_Paged_Out Message(s) in SQL Log","SQL_Mem_Paged_Out","DBA670"

Description: This message would found in the SQL Log:
A significant part of sql server process memory has been paged out. This may result in a performance degradation. 
  Duration: 0 seconds. Working set (KB): 32992, committed (KB): 64696, memory utilization: 50%.
This means that Lock Pages in Memory is not enabled.
Corrective Action: It may be that you see this message even if you have granted the SQL Service User the Lock Paged in Memory privilege. This is a good article on how to troubleshoot this problem:
MS KB Article: 918483

DBA671 - "$dbs_restoring DB(s) In RESTORING State","DBs-Restoring"

Description: This event simply means that at least 1 MSSQL DB is in a RESTORING state.
Corrective Action: Complete the restore of the DB(s) and this event will magically go away.

DBA680 - "Tablespace $zts Datafile Count $zdfc Approaching Maxiumum of 1022 (${zpct}%) - Thresholds W/C $t_dfcount_pct_w[$thishost]/$t_dfcount_pct_c[$thishost] (?)"

Description: There is a tablespace whose datafile count is approaching the 1022 datafile per tablespace Oracle limitation (for non-bigfile tablespaces).
Corrective Action: DO NOT allow this tablespace to hit 1022 datafiles.

DBA690 - "LAN Interface $lancard is in Half-Duplex mode ($lanmsg)"

Description: By running 'lanadmin -x ?' DBAmon found that this lan interface is running 100mb Half-Duplex.
Corrective Action: The UX sysadmin needs to reconfigure this lan interface to run full-duplex.

DBA701 - "Program Error: $F0 - $msg"

Description: While trying to check the status of your OracleApps instances, an error was encountered.
Corrective Action: Depends on text of message.

DBA701 - "OracleApps $proctype Process(es) Missing - Found: $proccnt MinThreshold: $procthr"

Description: The number of OracleApps processes of the specified type was less that the threshold minimum number of processes that you specified in our DBC file.
Corrective Action: Restart missing processes or reduce minimum process threshold in DBC file.

DBA702 - "Critical OracleApps Processes: $errorprocesses Not Active"

Description: Process(es) of the specified type should be running, but they are not.
Corrective Action: Restart the process.

DBA710 - "AlwaysOn Sync Health is $ao_noun - Should be HEALTHY"

Description: DBAmon monitors the Health of an MSSQL AlwaysOn Cluster. One of the items that we check is Sync Health. When and AO cluster is running normally, the value for Sync Health is HEALTHY. There is something wrong with your AO cluster is this value is something other than HEALTHY.
Corrective Action: It's complicated. Rebooting both servers in the cluster may solve your problem, but that will obviously cause an outage which goes against the purpose of AlwaysOn.

DBA740 - "EM/iSQLPlus emctl Was restarted OK"

Description: DBAmon automatically restarted EM or iSQLPlus.

DBA801 - "MSSQL Not Active: $dbstatus $dbmsg"

Description: MSSQL was found to be down.
Corrective Action: Restart MSSQL.

DBA802 - "$oldperl_cnt Old Perl process(es) found - Max process age: $oldperl_maxage hours - Threshold: ? hours"

Description: DBAmon looks for Perl.exe processes that have been started by the same userid that the DBAmon probe runs under that have been running for at least 24 hours. Any such processes will be listed in the long text of this event. It is assumed that any Perl process that has been running for at least 24 hours is hung and will not finish without "help"; If you do have any Perl long running jobs, such as services, run them under a different userid that the one that the remsh service runs under.
Corrective Action: Run the "pskill" commands that appear with the long text to kill these processes.

DBA803 - "MSSQL Agent Service Not Active"

Description: The MSSQL Agent process in not active.
Corrective Action: Start the MSSQL Agent.

DBA804 - "MSSQL Active But DB $F2 Is OFFLINE"

Description: A DB is offline.
Corrective Action: Bring the specified DB online.

DBA805 - "Drive $dr_drive is $dr_pct full ($sev threshold: $t_disk$sev[$thishost]%)"

Description: The Drive mentioned contains at least one MSSQL database file and is full or almost full.
Corrective Action: Add space to this drive or remove unneeded files.

DBA806 - "Found $zfilecnt FileType=$zfiletype DB Files with PERCENT GROWTH"

Description: We believe that it is poor practice to set DB or TLOG files to PERCENT Growth. It is a better practice to "know your instance" and set the growth increment to a known value.
Corrective Action: In the long text of this event is a query which you can run which will find all of the DB/TLOG files in your instance with PERCENT GROWTH specified. After you have identified he files, you can use SQL Studio to alter the growth increment.

DBA807 - "Found $zfilecnt DB/LOG Files on C: Drive"

Description: We also believe that it is poor practice place DB or TLOG file on the C: drive. The usage of C: is unpredictable so it's better to place the critical SQL files onto other drives. Of course on some servers this may not be possible, so you can suppress this event for this instance by specifying this event in the
Suppress_Events: parameter. Note that the idea for check came from a PASS talk by Brent Ozar.
Corrective Action: In the long text of this event is a query which you can run which will find all of the DB/TLOG files in your instance which live on the C: drive. You can move them with DETACH/ATTACH or in the case of SYSTEM DB files, there are many scripts on the internet on how to move SYSTEM DB files.

DBA808 - Found $checkdb_event_cnt Database(s) where Days Since Last Good CheckDB Exceeds Threshold of $t_checkdb_age_days[$thishost] (Days)

Description: This event displays MSSQL databases which need to be CheckDB'd. It is an MSSQL best practice to CheckDB all databases on a regular basis.
Corrective Action: Run CheckDB on the databases listed in this event - on a regular basis.

DBA809 - "Found $mssql_lrc_cnt Long Running SQL Command(s) during Weekday/PrimeShift"

Description: This tells you that a CHECKDB or FULL/DIFFERENTIAL backup is running on a weekday between 0700 and 1700, and has been running for at least 30 minutes.
Corrective Action: This is just an FYI. In your environment it may be non-disruptive to run CHECKDBs and FULL BACKUPS during prime shitft. It is not in ours, so we do want to know.

DBA810 - "MSSQL DB=$db - Backup Never Run - Threshold=$backup_age[$thishost] $bckrerun_msg"

Description: A backup was never run for the DB listed.
Corrective Action: Run a backup for this DB.

DBA811 - "MSSQL DB=$db - $rounded1-$rounded2 Hours Since Last Good Backup - Threshold=$backup_age[$thishost] $bckrerun_msg"

Description: The number of hours since the most recent successful backup exceeds what you specified in the "Backup_Age:" DBC parameter.
Corrective Action: Run a backup.

DBA811 - "$rounded1-$rounded2 Hours Since Last Good Backup (Threshold=$backup_age[$thishost])"

Description: The number of hours since the most recent successful backup exceeds what you specified in the "Backup_Age:" DBC parameter.
Corrective Action: Run a backup.

DBA813 - "$bckage_full_days Days Since Last Good Full DB Backup For DB=$bckage_full_db - Threshold=$backup_age_full[$thishost] days"

Description: The number of hours since the most recent successful FULL DB backup exceeds what you specified in the "Backup_Age_Full_Days:" DBC parameter. Note that this check is different from the DBA811 check. This looks at the number of days since the last successful FULL DB backup. We saw the need for this with a DB that was getting a good Differential backup every day, but the FULL DB backup was failing every week. What can happen in this case is that the most recent FULL DB backup will age off of your backup media which makes the recent successful Differential backups worthless.
Corrective Action: Run a FULL DB backup.

DBA814 - "Found $bt_vss_cnt_L24H VSS Backup(s) During L24H on AO Cluster Instance"

Description: We have found that Vmware VSS backups against AlwaysOn instances - cause problems. This event occurs where there have been any VSS DB backups during the last 24 hours on an AO instance.
Corrective Action: Stop running VSS backups for an AlwaysOn Instance

DBA817 - There are $zcount *SQL* Windows Services set to AUTOSTART which are Not Running: $zservices

Description: DBAmon (using WMI) checks all Windows Services with the string SQL and set to AUTOSTART to see if they're running. If they are not, then this DBAmon event will occur.
Corrective Action: If the Services mentioned in the event should be running, start them in Windows Services. If the service(s) mentioned are ones that you don't want to have running (possibly like the SQL Browser service), then just change them from AUTO Start to MANUAL Start, so that DBAmon won't care if they're running or not.

DBA819 - MSSQL Login=01 ($zsa_name) Is Enabled

Description: It is a fairly well known MSSQL best pratice to disable the SA login (Google for yourself for more information). If this is not the case at your shop, then Suppress this event.
Corrective Action: Disable the SA login - for improved instance security:
Alter login [sa] disable; 

DBA850 - "DB=$fg_dbname Filegroup=$fg_fgname - DB File(s) are $fg_pctfull_vs_max % Full vs. Filegroup Total MAXSIZE"

Description: For a SQL DB file, you can specify a MAX size, or you set MAX to to UNLIMITED. All DB files are part of a SQL FILEGROUP. If you do not specify a FILEGROUP, then SQL places the file into a FILEGROUP called PRIMARY. This event is stating that for the named FILEGROUP that the total size of all DB files is approaching the total MAX filesize for all files in this FILEGROUP. Note that if you have ANY DB files in a filegroup with MAX size to UNLIMITED, then this event will never occur. In that case, you need to monitor the DRIVES for fullness.
Corrective Action: Either increase the MAX size of any or all files in this FILEGROUP, or add an additional file to this FILEGROUP. This check is controled by the
T_FG_Full: parameter. To disable this check specify N for the T_FG_Full: parameter.

DBA851 - "DB Files On Drive ${dg_drive}: Have the Potential to Grow: $dg_potgrowth_mb (MB) Which is At or Near Drive Freespace: $dg_drivefree_mb (MB) - % of Freespace: ${dg_pct}% Warning/Critical Threshold=${t_dg_critical}%"

Description: See event DBA850 above. DBAmon adds the MAX DB file sizes for all files that resize on each DRIVE. If the total GROWTH POTENTIAL (the MAX size of each file minus the current size of the file) exceeds the amount of freespace on that drive, then this event will occur. The reason for the event is that if a file attempts to GROW beyond the capacity of the DRIVE, then SQL Errors occur.
Corrective Action: Either increase the size of the DRIVE or reduce the MAX size which is causing this event (the possibility of DRIVE OVERFLOW). This check can be disabled by specifying the
T_FG_Full: parameter as N.

DBA852 - "PerfCounter $this_metric=$perf_value - WarningThreshold=$zwarn","PerfCounter-${this_metric}","DBA852"

Description: DBMS=MSSQL - This event will only occur if you have set the DBC parameter T_PerfCounter" parameter. The current reading of this Performance Counter exceeds either the Warning or Critical threshold that you specified.
Corrective Action: Depends on the Performance Counter.

DBA853 - "DBMail Status NG - Should be 'sent' - Status: $dbmail_status[$thishost]","DBMail_NG","DBA853"

Description: This event only applies to SQL 2005+ (DBMail). The view MSDB..SYSMAIL_ALLITEMS was examined to determine the current status of DBMail. Specifically, the last (more recently created row) row was examined. The SENT_STATUS column of this table did not contain Sent so something is wrong.
Corrective Action: Fix DBMail. Look at the most recently created row in MSDB..SYSMAIN_ALLITEMS to see the same data that DBAmon examines.

DBA854 - "Longest Running DB Backup Has Been Running for $lrbackup_age_hrs[$thishost] hours (>= Threshold: $t_lrbackup_age_hrs[$thishost] hours) - Process Status: $lrbackup_message[$thishost]","LongRunningDBBackup","DBA854"

Description: DBAmon checks MASTER..SYSPROCESSES for any CMD which contains (caseless) BACKUP. If that process has been running more than (the threshold value displayed here) hours, then this event will occur. This indicates that there is possibly/probably a HUNG BACKUP.
Corrective Action: If the backup is HUNG, then KILL the SPID listed in this event message.

DBA855 - "Poorly Performing DB Parms Specified = DB=$F3 Parm(s)=($F4)","BadDBParms","DBA855"

Description: DBAmon checks the DB Attributes (options) for all DBs in a SQL instance >= 2005. If any of these parms are set to TRUE, this event will occur. At the present time, we are checking for AUTOCLOSE and AUTOSHRINK.
Corrective Action: These parms are BAD for performance. Turn them off. :) Here is a good article on AUTOSHRINK and AUTOCLOSE:
http://support.microsoft.com/kb/315512.

DBA856 - SUSPECT PAGES Found (N rows) - Investigate immediately

Description: The MSDB..SUSPECT_PAGES system table is checked for rows. A row in this table indicates DB CORRUPTION. A Google search will show you more details.
Corrective Action: You will need to restore these broken page(s). A good article is
http://msdn.microsoft.com/en-us/library/ms175168.aspx

DBA857 - TEMPDB DB File Count ($a) less than CPU Count ($this_cpucount) - Optimum TEMPDB DB File Count: $b,TEMPDB-File-Count

Description: For performance reasons, it is a good practice to have muliple TEMPDB database files on multi-core servers. The problem is that page allocation waits will occur on TEMPDB with multiple concurrent processes trying to allocate pages. This
Article (mssqltips.com) explains the benefits of having multiple TEMPDB files. DBAmon recommends that you have 1 TEMPDB database file per usable CORE, up to a maximum of 4 database files.

Some good articles on TEMPDB:

  • A good article from bradmcgehee.com
  • Another good article from Jonathan Kehayias

    Corrective Action: Add TEMPDB files to get to the optimum DB file count mentioned in this message. Make sure that they are of a uniform size and that the size is preallocated.


    DBA858 - MSSQL Instance SIGNAL_WAIT_PCT=$a (Threshold=$b%),Signal-Wait-${this_signal_wait_pct}%

    Description: This performance event indicates that the current SQL SIGNAL WAIT % value exceeds the threshold (defaults to 25%). A good internet article explaining the concept can be found:
    Here (Pinal Dave's SQLAuthority WWW site).
    Corrective Action: A sustained high SQL SIGNAL WAIT % indicates a need for additional CPUs.

    DBA859 - $zvlf_event_cnt MSSQL Database(s) VLF Count >= DBC Threshold: $t_mssql_vlf_count[$thishost]","VLF-Count"

    Description:

    A high number of SQL Virtual Log Files (VLFs) can cause DB update problems and can elongate DB recovery time. Having a high number of VLFs is caused by having a GROWTH setting on the TLOG with a low growth increment. Here are some good articles:


    Corrective Action:

    A good script to consolidate VLFs can be found in the ADVENTURESINSQL article in the URLs above.

    The steps:

    1. Count VLFs by running: DBCC LOGINFO (DBNAME)
    2. In DB Properties, note the size of the TLOG (used size, and MAX).
    3. The reason that this happens (too many VLFs) is that AUTOGROW is ON and with a small value. Increase the AUTGROW increment.
    4. If the DB that you're working with is in FULL recovery model, make these changes immediately following a successful TLOG backup.
    5. Shrink the TLOG as much as possible (down to almost 0 if possible).
    6. Again count VLFs by running: DBCC LOGINFO (DBNAME). The number should now be much smaller if the SHRINK worked.
    7. In DB Properties, Files set the current size (on the left) to what you noted above before you began.

    DBA860 - "MSSQL Instance Max Memory ($F3) is set to INFINITE ($sql_config_msm)","MaxMemINFINITE"

    Description:

    This event is telling you that you are using the out-of-the-box default instance setting for "Max Server Memory (MB)". This means that MSSQL is free to grab as much memory as it needs from the OS for the MSSQL Buffer Pool. Although it may be a matter of opinion, it is a good practice to deliberately set the MSSQL Max Server Memory setting, rather than let it default. As with any DBAmon event, you can suppress it with the Suppress_Events: DBC parameter if you disagree.

    Corrective Action:

    I suggest Googling "sql server max server memory". You will see MANY hits.

    Our pratice on a server that is ONLY running MSSQL is to leave 500MB-750MB for Windows, and up to 1GB for other MSSQL memory (The max server memory parameter only specifies the size of the buffer pool), and then give what's left to max server memory. Of course if your server is running other applications in addition to MSSQL, your mileage will vary (you will have to reserve some memory for your app).


    DBA861 - Database(s) Found With GOOD-For-Performance Options Set to OFF

    Description: DBAmon has found that there are one or more MSSQL databases which do not have OPTIONS specified which are good for MSSQL Performance. For example, the AUTO CREATE STATISTICS option is known to be good for performance. So, if this OPTION is not specified, then this event will occur.
    Corrective Action: Set the options mentioned in this event to ON or TRUE. If you have a good reason to have these turned off, then of course you can suppress this event by specifying it in the "Suppress_Events" parameter of the DBC file for this instance.

    DBA862 - Replication Distribution Agent(s) Found Using Non-Default Agent Profile

    Description: DBAmon has found that there are one or more MSSQL Replication Distribution Agents that are not using the DEFAULT agent profile. This is monitored because I have found times when we need (for example) skip a certain type of error. The danger is in forgetting to switch back to the DEFAULT profile after the problem has been resolved. Not using the DEFAULT profile, IMHO, is a poor practice as you can skip errors that you really want to know about.
    Corrective Action: If applicable to your environment, switch back to the "Default Agent Profile" profile. As with any DBAmon event, if this is normal for your environment, you may suppress this event by specifying it in the "Suppress_Events" parameter of the DBC file for this instance.

    DBA863 - Too Many Undistributed Replication Commands - Critical Threshold of ($t_repl_undist_cmds_c[$thishost]) Exceeded

    Description: DBAmon has found that one or more MSSQL Replication Distribution Agents have an excessive number of undistibuted SQL commands (the number exceeds the T_Repl_Undistributed_Cmds: DBC parameter.
    Corrective Action: Since one or more Distribution Agents are not distributing commands, read the long text of this report - and restart the appropriate Distribution Agent(s). It could be that distribution agents are failing or not running at all.

    DBA867 - Found $perf_val MSSQL Schedulers which are VISIBLE OFFLINE

    Description: In MSSQL it is possible for a Windows server to have more CPUs that the engine can use - for a variety of reasons. This event occurs when at least 1 row in SYS.DM_OS_Schedulers which indicates a VISIBLE OFFLINE scheduler.
    Corrective Action: You may search Google for yourself - this is a good hit:
    https://www.mssqltips.com/sqlservertip/4801/sql-server-does-not-use-all-assigned-cpus-on-vm/ .

    DBA901 - "Oracle/NT Not Active - Status=$dbstatus"

    Description: Oracle is down.
    Corrective Action: Start Oracle.

    DBA903 - "Oracle/NT Listener Not Active - Status=$lsnr_status"

    Description: DBAmon attempted to run 'tnsping $ORACLE_SID'. It failed.
    Corrective Action: Determine the cause of the problem and fix it!

    DBA904 - "Found $login_missingdefdb_cnt SQL Login(s) Where Default DB Does Not Exist"

    Description: DBAmon found one or more SQL Logins whose Default Datbase does not exist. This will prevent this login from connecting to SQL, and this problem is challenging to diagnose. This can happen when a login is created using a default that does exist, and that DB is dropped.
    Corrective Action: In the body of this event you will see syntacticaly correct DDL to set the default DBs of Logins with this problem to Default DB=Master.

    DBA905 - "Connect Logic Error - Probe dbamon_orant.pl Did Not Finish"

    Description: A DBAmon probe module failed.
    Corrective Action: Examine accompanying messages. Contact
    DBAmon Support.

    DBA909 - "DBAMON.TIMESTAMP Has > $timestamp_rowlimit rows ($timestamp_rows[$thishost])"

    Description: The number of rows in this table exceeds the threshold. The purge process must not be working.
    Corrective Action: Contact BB.

    DBA910 - "db_block_buffer Read Hit Ratio of $bufhitratio[$thishost] < threshold of $t_readhit[$thishost]"

    Description: The Oracle db_block_buffer hit ratio (specifed in message) is below the threshold specified for this instance by the T_Read_Hit: DBC parameter (or the default). This DB is performing poorly.
    Corrective Action: Increase the db_block_buffers init.ora parameter.

    DBA911 - "db_block_buffer Read Hit Ratio of $bufhitratio[$thishost] is invalid (< 0 or > 100)"

    Description: While examining the Oracle db_block_buffer hit ratio, the value was found to be invalid. Check the accompanying text to see if some error occured while querying the Oracle dictionary.
    Corrective Action: Solve the problem which was causing the query to return invalid data.

    DBA912 - RMANHUNG: Found rman process (pid=$) that has been running for $ days (threshold=2 days) !!!

    Description: A Unix process containing the string rman was found to have been running for more than the THRESHOLD-DAYS number of days long. It is probably a dead process which may be consuming resources even though it is not doing anything useful.
    Corrective Action: Kill the hung process at the UX level. If RMAN was invoked from a backup script, also make sure that you kill the script.

    DBA913 - $num_otrace[$thishost] ORACLE_HOME/otrace/admin/*.dat Files Found - OTRACE Is ON Which Causes Performance Problems!!!

    Description: OTRACE is on for this DB because .dat files were found in ORACLE_HOME/otrace/admin. OTRACE can be bad for performance, so it should be turned off. You turn it off by rm'ing the .dat files in ORACLE_HOME/otrace/admin and restarting Oracle.
    Corrective Action: Turn off OTRACE. You turn it off by rm'ing the .dat files in ORACLE_HOME/otrace/admin and restarting Oracle.

    DBA914 - Instance SQL_TRACE=TRUE - This Causes Performance Problems!!!

    Description: SQL_TRACE set to true at the instance level will cause serious performance problems.
    Corrective Action: Turn off SQL_TRACE. You turn it off by running ALTER SYSTEM SET SQL_TRACE=FALSE And/Or removing this setting from init.ora.

    DBA915 - $dfltsys_cnt DB Users Found With DEFAULT TABLESPACE Set To SYSTEM !!!

    Description: DBAmon found DB users whose default tablespace is SYSTEM. This is very bad for performance.
    Corrective Action: Alter the user so that their default tablespace is not SYSTEM.

    DBA916 - $tempsys_cnt DB Users Found With TEMPORARY TABLESPACE Set To SYSTEM !!!

    Description: DBAmon found DB users whose temporary tablespace is SYSTEM. This is very bad for performance.
    Corrective Action: Alter the user so that their temporary tablespace is not SYSTEM.

    DBA917 - $tempperm_cnt DB Users Found Whose TEMPORARY TABLESPACE Is a PERMANENT Tablespace !!!

    Description: DBAmon found DB users whose temporary tablespace is permanent tablespace. This is very bad for performance.
    Corrective Action: Alter the user so that their temporary tablespace is a TEMP tablespace, or alter their temp tablespace to be a type=TEMP tablespace.

    DBA918 - *** oraUp() DBA_REGISTRY Shows that component (Oracle9i Catalog Views ) is at version 9.2.0.2.0 which is less than DB engine version 9.2.0.3.0(64) - rg_status=VALID ***

    Description: Starting in 9i, the Oracle dictionary catalog contains components that are registered products. This event can also occur for DB internal components like Java. The event means that the version of the product mentioned is lower than the version of the DB engine. What probably happened is that the DB was upgraded within the same version (9i for example) and catpatch was not run.
    Corrective Action: Run:
    • shutdown immediate
    • startup migrate
    • @?/rdbms/admin/catpatch
    • shutdown immediate
    • startup
    Next, verify that the versions of the internal components match the DB engine version. Run orastat -rg.

    DBA919 - MTS is being used for this Non-RAC/OPS instance - Bad for performance - mts_queue=$mts_queue[$thishost]

    Description: This instance is not using RAC or OPS, but MTS is being used. This can cause major performance problems. So, it would be best to only use DEDICATED SERVER connections.
    Corrective Action: An easy way to effectively disable MTS is to set USE_DEDICATED_SERVER=ON in sqlnet.ora.

    DBA920 - TEMP Tablespace is a PERMANENT Tablespace (It Should Be TEMPORARY) !!!

    Description: DBAmon found that your tablespace named TEMP is a permanent tablespace. This can be very bad for performance. Any user which has this tablespace specified for its TEMPORARY TABLESPACE will perform poorly when doing disk sorts.
    Corrective Action: Alter the TEMP tablespace so that it is a type=TEMP tablespace. SQL:
    ALTER TABLESPACE TEMP TEMPORARY;

    DBA921 - Library Cache Hit Ratio is $libhitratio[$thishost]% (Should Be >= 90%) - Increase shared_pool_size !!!

    Description: The Oracle library cache hit ratio was found to be < 90%.
    Corrective Action: Increase the init.ora shared_pool_size parameter.

    DBA922 - Dictionary Cache Hit Ratio is $dicthitratio[$thishost]% (Should Be >= 90%) - Increase shared_pool_size !!!

    Description: The Oracle dictionary cache hit ratio was found to be < 90%.
    Corrective Action: Increase the init.ora shared_pool_size parameter.

    DBA923 - Rollback Segment Header Waits Are Too High (Gets/Waits Ratio > 1.00%) - Add Rollback Segments

    Description: The total number of Rollback Segments Header Waits exceed 1% of the total number of Rollback Segment Header Gets.
    Corrective Action: Add more rollback segments.

    DBA924 - There are fewer than 10 Free DB Cache Buffers (free_buffers=$free_buffers[$thishost]) - It would be beneficial to increase DB Cache Size

    Description: The total number of FREE DB Cache buffers is less than 10. So, this DB would probably benefit from you specifying a large DB buffer cache.
    Corrective Action: If you have enough unused memory, increase the size of the DB Buffer Cache.

    DBA925 - UNDO_MANAGEMENT Is not set to AUTO - It should be - undo_mgmt=$undo_mgmt[$thishost]

    Description: In Oracle 9i and higher, SMU (System Managed UNDO) is a GOOD THING. It should always be on. For this 9i+ instance, it is not turned on.
    Corrective Action: Turn on SMU. This will require DB downtime.

    DBA926 - FORCE LOGGING Should be turned on - force_logging=$force_logging[$thishost]

    Description: There is a very nice feature in Oracle 9 and higher called FORCE LOGGING. If this DB option is ON, then NOLOGGING operations are all automatically disallowed.
    Corrective Action: Run: ALTER DATABASE FORCE LOGGING

    DBA927 - ? Dictionary Objects have been ANALYZED - This is bad for performance

    Description: Analyzing the Oracle Dictionary can cause some very serious and hard to diagnose performance problems. One symptom is high "recursive CPU" in a statspack report. If any SYS objects other than DUAL or PLAN_TABLE have been analyzed, this event will occur.
    Corrective Action: Remove the analyze data for the SYS objects. Run: execute dbms_stats.delete_schema_stats('SYS');

    DBA928 - DB Cache is only 1 granule in size - It was probably underspecified - granule_size=$granule_size[$thishost] db_cache _size=$db_cache_size[$thishost]?

    Description: It was determined that this 9i or higher instance has a DB Buffer Cache that is only 1 granule in size. This can occur when Oracle sees that you have specifed a db_cache_size that is less than 1 granule. In this case Oracle will round up to 1 granule.
    Corrective Action: It is better to intentionally specify the cache size. And the default is 48M, so a good minimum db_cache_size is 50M. Change the init.ora file and bounce the instance.

    DBA929 - Server Memory is $hw_memory_pct_full[$thishost] used (t=$hw_memory_threshold_w/$hw_memory_threshold_c physmem=$h w_memory_size_gb[$thishost](gB) memfree=$hw_memory_free[$thishost](gB))

    Description: This server has very little free memory. On HP-UX, this is bad for performance (paging and swapping increase).
    Corrective Action: Reduce memory usage or increase the amount of memory. If you have any Oracle instances with overallocated SGA memory, reduce memory consumption.

    DBA930 - The instance default PERM tablespace is set to SYSTEM - Could cause performance problems

    Description: The DEFAULT TEMPORARY or PERMANENT tablespace is set to SYSTEM. If non-dictionary objects are create in SYSTEM, performance problem probably will result.
    Corrective Action: ALTER DATABASE DEFAULT TABLESPACE tsname; (10g+ only)-or- ALTER DATABASE DEFAULT TEMPORARY TABLESPACE tsname; (9i+ only)

    DBA931 - RMAN Process $F5 was automatically KILLED

    Description: An RMAN process was found running on this server which:
    1. Has a Parent Pid of 1 (this indicates that it has become orphaned)
    2. Has been running for at least 5 minutes
    3. Is consuming >= 50% of 1 CPU
    This is indicative of an orphan RMAN process which is consuming CPU resources and is not accomplishing anything useful. DBAmon has issed the OS kill command against this process.
    Corrective Action: (None)

    DBA932 - UX NUSERPROC (HP-UX maxuprc Kernel Value) $os_nuserproc_pct[$thishost]% Used - MAXUPRC=$os_maxuprc[$thishost] OSUserProcCount=$os_ nuserproc[$thishost] (Thresholds: $t_nuserproc_w[$thishost]%/$t_nuserproc_c[$thishost]%)

    Description: This event is unique to HP-UX. There is a UX kernel parameter maxuprc. This parameter controls the number of OS processes that any 1 UX userid can have running concurrently. If this is exceeded, the OS will not fork any new processes until the process count is reduced below this value. This can be VERY BAD for a running DB. DBAmon monitors the current OS process count against the configured maxuprc kernel value as a percentage.
    Corrective Action:
    • Short Term: Kill any unneeded process that are owned by this user.
    • Long Term: Increase the maxuprc HP-UX kernel parameter.

    DBA934 - "Found $ssrs_failed_job_cnt Unsuccessful SSRS Report(s)"

    Description: For MSSQL instances that are running SSRS, DBAmon checks ReportServer..ExecutionLog2 for failed reports. So, if any reports that have run since the last iteration have a STATUS of something other than "rsSuccess", this event will occur.
    Corrective Action: There are many good resources on the internet on how to diagnose SSRS Report failures. Google away!

    DBA934 - Complex DB User Passwords Enforced

    Description: Information only event. This DB has a UTLPWDMG routine active (password_verify_funtion) in the DEFAULT profile.
    Corrective Action: No action required.

    DBA940 - DB Block Corruption - $F3 Block segments

    Description: Rows were found in V$DATABASE_BLOCK_CORRUPTION.
    Corrective Action: Restore the corrupted blocks or datafiles or drop corrupt datafile.

    DBA941 - DB Has $autoextend_cnt[$thishost] AUTOEXTEND=YES datafiles

    Description: This DB has at least 1 datafile with Autoextend set to YES. This makes it impossible for DBAmon to monitor for full tablespaces.
    Corrective Action: This event does not indicate a problem with your DB, but DBAmon will only monitor for tablespace full if you disable this attribute for all tablespaces.

    DBA942 - MSSQL Error=18456 Invalid Login(s) Found in Error Log

    Description: DBAmon has detected one or more 18456 SQL Login Failure events during this DBAmon iteration.
    Corrective Action: Depends on your shop. For details of where the failed login attempt originated and the login name, look at the SQL Error Log. If you're looking for a good guide on how to troubleshoot MSSQL 18456 errors, try:
    Aaron Bertrand's Excellent Article.

    DBA943 - Found $lr_job_cnt Long Running SQL Job(s)

    Description: DBAmon monitors MSSQL for long running SQL Jobs. This event means that one or more SQL Jobs has been running for at least 12 hours.
    Corrective Action: If this job should be running for more than 12 hours, then add all or part of the Job name to the DBC parameter: MSSQL_Job_LR_Check_Name_Excl_String: .

    DBA944 - Large DB Transaction(s): $tr_reason

    Description: DBAmon monitors MSSQL for long running SQL Transactions. This event means that one or both of these conditions have occurred:
  • An active transaction has been running for at least DBC: MSSQL_Tran_Duration_Hrs: hours.
  • An active transaction is consuming at least DBC: MSSQL_Tran_TLOG_GB: of TLOG space ("Reserved" space).
    Corrective Action: If this job should be running for more than 12 hours, then add all or part of the Job name to the DBC parameter: MSSQL_Job_LR_Check_Name_Excl_String: .

    DBA945 - MSSQL SvcAcct=$mssql_service_account[$thishost] Exception - Password Will Expire in $svcacct_return Days at: $svcacct_ts (LE $svcacct_t_crit)

    Description: DBAmon monitors the number of days until the Password for the MSSQL AD Service Account expires. This event means that either the WARNING or CRITICAL thresholds have been met.
    Corrective Action: You must change the MSSQL Service Account password before it expires, but you also must change the SERVICE Login Credentials.

    DBA946 - Found $tlogmultifile_cnt Database(s) with Multi-File TLOGs

    Description: DBAmon monitors for MSSQL databases with more than 1 TLOG file. While there is one good reason to have multiple TLOG files (space problems - due to a variety of reasons - which force you to create a 2nd TLOG file on a different drive) it is a poor practice to keep them after the space problems are resolved. This check falls under the "good housekeeping" category. So, if a DB has more than one TLOG file, this event will occur.

    Here is a good article from Paul Randall on this: http://www.sqlskills.com/blogs/paul/multiple-log-files-and-why-theyre-bad/ .

    Corrective Action: Here are some steps which should help you to remove additional TLOG files (secondary TLOG files):

    -- This process assumes that the DB in question is in FULL recovery model
    
    -- 1. Take 2 TLOG backups. You want to see "BACKUP LOG successfully processed 0 pages" for the TLOG file that you're deleting.
    BACKUP LOG [bbtest3] TO  DISK = N'C:\MSSQL\MSSQL11.MSSQLSERVER\MSSQL\Backup\bbtest3.bak' 
    WITH NOFORMAT, NOINIT,  NAME = N'bbtest3-Transaction Log  Backup', SKIP, NOREWIND, NOUNLOAD,  STATS = 10
    GO
    
    -- 2. Shrink the log to EMPTY. Look at the output to ensure that this worked.
    DBCC SHRINKFILE ('bbtest3_log2', EMPTYFILE )  
    
    -- 3. Delete a file from a database, run this:
    ALTER DATABASE bbtest3  REMOVE FILE bbtest3_log2;
    
    -- 4. You will still see the file here (sys.master_file), but the STATE_DESC should show OFFLINE.
    SELECT * FROM sys.master_files
    
    -- 5. Run another TLOG backup. 
    BACKUP LOG [bbtest3] TO  DISK = N'C:\MSSQL\MSSQL11.MSSQLSERVER\MSSQL\Backup\bbtest3.bak' 
    WITH NOFORMAT, NOINIT,  NAME = N'bbtest3-Transaction Log  Backup', SKIP, NOREWIND, NOUNLOAD,  STATS = 10
    GO
    
    -- 6. Now, you should NOT see the deleted TLOG file in sys.master_files:
    SELECT * FROM sys.master_files
    

    DBA947 - STACK DUMP(s) Found in SQL Log

    Description: DBAmon found occurrence(s) of STACK DUMP in the SQL Log. This indicates a serious problem with your MSSQL instance.
    Corrective Action: I would first try a Google search. If you don't find a good hit, then open a support case with the vendor.

    DBA948 - AlwaysOn Status Change

    Description: This event occurs when DBAmon finds SQL Event 35278 in the SQL Error Log.
    Corrective Action: Informational, not actionable.

    DBA949 - COMPATIBLE Version ($this_compat) is < DBMS software version ($this_ver)

    Description: The COMPATIBLE parameter is set a full version lower than the version of the DBMS software.
    Corrective Action: Set COMPATIBLE to the same version as the DBMS software.

    DBA950 - sqlTrends: Metric ZZZ - Recent Reading is $nnn Percent versus Historical Reading (T=$tr_threshold)

    Description: Trend analysis logic within DBAmon has determined for this Metric that the recent reading (last N days) versus this historical reading (last N2 days) exceeds the threshold of $tr_threshold percent.
    Corrective Action: Possibly nothing - depends on your environment. This is just to make you aware of a change in behavior for this Metric.

    DBA951 - Found $autoreorg_cnt Database(s) Requiring Auto-Reorg/Reindex

    Description: The MSSQL database(s) mentioned in this event have not had an Auto-Reorg/Reindex from our tool.
    Corrective Action: Configure these databases to be Auto-Reorg'd at least on a weekly basis.

    DBA955 - "IO Slave Count Of $numioslave Is >= $maxioslave_pct% of $maxioslave_cnt Maximum"

    Description: The number of I/O slave processes is approaching 40. This is probably caused by hung RMAN processes, or dbwr_io_slaves set near 40 (the maximum).
    Corrective Action: If 40 is reached, you will not be able to run any RMAN backups. In that case, bounce the instance.

    DBA956 - "Server cron Daemon does not appear to be running - num_cron=$num_cron[$thishost]"

    Description: There is no cron daemon owned by root running on this server.
    Corrective Action: The cron daemon needs to be started. Own this ticket to the OS group.

    DBA960 - "VMware SQL Server With High Historical I/O Rate Should Have Multiple PVSCSI Controllers"

    Description: This event has to do with MSSQL Performance.

    There was a SQLPASS Session in 2011 given by Wanda He (it was DBA-310) which deals with MSSQL on VMware. This talk had some excellent performance recommendations, one of which was to use multiple vSCSI (actually PVSCSI) contollers for servers running MSSQL and have a significant I/O load.

    Our goal for our production environment therefore is to implement this recommendation. However, we only want to do this for *busy* servers running MSSQL (there is a downside to doing this for every server running MSSQL - including relatively *idle* servers. So, starting with DBAmon version 5.49.B, we monitoring for the following conditions:

    • The server is running on VMware.
    • The average I/O rate over the last 30 days is at least 200 I/O's per second.
    • The server has less then 2 PVSCSI controllers.
    If all of these conditions are true, then a DBA960 event will result.

    Some URLs which speak to this:

  • A VMware blog posting on how to achieve 350k I/Os per second using PVSCSI
  • A possible issue when using the PVSCSI controller
  • A VMware White Paper on how to achieve 1M I/Os per second using the PVSCSI controller

    Corrective Action: I am not a VMware administrator, but configure multiple PVSCSI adapters and balance your I/O load among these controllers.


    DBA961 - "MSSQL Login=SA Max Concurrent Logins: $tr_sa_maxsalogins (Last $tr_sa_days Day(s)) Exceeds Threshold=$t_sa_max_logins[$thishost]"

    Description: The maximum number of concurrent MSSQL Login=SA logins (sessions - from SYSPROCESSES) found during preceeding 24 hours exceed the threshold specified in the T_SA_Logins_Max DBC parameter. If this parameter is not specified, then the default is 2.

    Corrective Action: It is a best practice to use the SA MSSQL login as little as possible (or not at all). If you have applications connecting using the SA login, then you should correct his by giving each application a distinct MSSQL Login, preferabley a Windows-authenticated MSSQL Login.

    Here is the text that is produced with this event:

    It is an MSSQL Best Practice to use the SA login only when absolutely necessary and only by authorized (DBA) personnel. An Excessive number of concurrent SA logins may indicate that an application is connecting to MSSQL using the SA login. If this is the case, you should create a Window-autheticated MSSQL Login which is used exclusively by only one application. If this is not possible, create an MSSQL Login that is used by only one application.


    DBA970 - "Found X Databases With Compatibility Set to Lower than Instance Version: Y"

    Description: If the Compatibility setting of a database is less than that of the MSSQL instance, this event will occur. For me, this most often occurs when restoring a DB from another server - and I forget to set this to the value of the new instance. If you have this set incorrectly (lower than the instance), you could take a performance hit because you will not be able to take advantage of MSSQL optimizer improvements in the current version.

    Corrective Action: Set the Compatibility value to the highest possible value - in SSMS -> Database Properties.


    DBA980 - "Instance DBAmon Duration: $instance_duration[$i_rm] Sec (GE Threshold: $config_parm{'default_instance_dur_max_secs'})"

    Description: This means that it took DBAmon too long to perform it checks for this instance.
    Corrective Action: Check the probeout directory and the /opt/dbamon/log/instances directory for this instance to see where it is spending too much time.

    DBA990 - ","Invalid Data: Drive=($dr_drive) shows a total size of ZERO - Something may be wrong with WMI?","DriveNG","DBA990"

    Description: This is a critical event which is telling you about a near-fatal data problem. The message will vary depending on the problem.
    • Drive=(?) shows a total size of zero - It obviously is a contradiction for a Windows drive to have a total size of zero. Something is wrong with either WMI of the DBAmon-supplied command that queries the drives.

    Corrective Action: Depends. :@) The error should give you a starting place.

    DBA998 - "DB=$tl_dbname - TLog with UNLIMITED growth - Drive $tl_drive is $tl_drivepct full ($event_sev_long Threshold of $event_pct% exceeded - TLSize=$tl_tlsize(mB) TLPath=$tl_filename)"

    Description: The transaction log is full or almost full. This particular TLOG does not have a size limit. So, it is >= 90% full internally, and the disk where the TLOG resides is full or almost full.
    Corrective Action: Backup the transaction log.
    DBAmon.com
    This Document: http://dbamon.com/errors.shtml