DBAmon
What DBAmon Monitors

Home | Index/DBAmon Doc. | DBAmon Version/Change History | DBAmon Event/Error Doc. | What DBAmon Monitors | DBAmon Download | Free Oracle Tool: orastat | Request Support

What DBAmon Monitors: Oracle/UX

Category Event What
DBAmon
Monitors
Action
Taken
Availabilty 1. Oracle Instance Running
(AutoFix)
  • Database OPEN
  • Backgroud processes: pmon, smon, lgwr, dbw* running
  • Able to run SQL via svrmgrl/sqlplus
  • (AutoFix: Oracle Crash)
    When DBAmon finds the pmon task dead, it will attempt to startup Oracle, if ALL of the following conditions are true:
  • The DBC Must_Be_Up: parameter is not set to N
  • The last 2 lines of the Oracle Alert Log are the format:
    PMON: terminating instance due to error 474
    Instance terminated by PMON, pid = 7485 
  • The oerr output for the above error (474 in this case) contains the string Warm start instance.
    00474, 00000, "SMON process terminated with error"
    // *Cause:  The system cleanup process died
    // *Action: Warm start instance 
    If all of these conditions are true, then DBAmon will attempt to startup the instance using svrmgrl/sqlplus. Note that even if DBAmon does successfully restart Oracle, a Critical Event will occur. The reason behind this is that you would always want to know if you are having Oracle crashes, even if you do not have to restart Oracle yourself. You will then know to diagnose the problem to prevent it from reoccuring.
  • Availabilty 2. Listener Running
    (AutoFix)
  • Is Listener running
  • Successful "status" command
  • (AutoFix: Listener Down)
    Listener automatically started (lsnrctl start).
    Availabilty 3. Tablespace Full
    (AutoFix)
  • Tablespaces are monitored against DBC-Specified T_TS* Thresholds
  • TEMP (tempfile) tablespaces are monitored using v$sort_usage
  • (AutoFix: Tablespace Full or Almost Full)
  • The DBC T_TS_Command: command is invoked when a tablespace reaches the Warning severity threshold. This will cause space to be added to this tablespace.
  • Availabilty 4. Object Extents
    (AutoFix)
  • Objects (Tables and Indices) extent count, versus MAXEXTENTS, is monitored against DBC-specified T_Extents: Thresholds
  • (AutoFix: Object at or near Max-Extents)
  • The affected objects are altered:
    ALTER {OBJECT} storage ( maxextents unlimited );
  • Availability 5. Archivelog Filesystem Full
    (AutoFix)
  • The UX filesystem for each Archive Log destination is examined. If any are found to be at least the DBC-specified T_Arclog: percent full, then an event occurs.
  • The number of hours since the last successful Archive Log backup is optionally measured. If the DBC-specified Backup_Age: number of hours is exceeded, then a Backup Age Event occurs.
  • The logic to record the timestamp of each backup (ARC, ARCFSCHECK, ARCKEEPn, ARCEMERGENCY) is incorporated into backup tools.
  • (AutoFix: Archive Log filesystem full or Too much elapsed time since last successful Archive Log backup)
  • If specified, the DBC-specified Backup_Command: Is Invoked (in the background) to run a backup of the correct type.
  • Availability 6. Listener Log
    (AutoFix)
  • The $ORACLE_HOME/network/log/listener.log can become very large, even causing the $ORACLE_HOME filesystem to fill. If the size exceeds 50M, an event occurs.
  • (AutoFix: DB Listener Log > 50M)
  • The $ORACLE_HOME/network/log/listener.log is automatically gzipped.
  • Backup 7. Database Backup Age
    (AutoFix)
  • The number of hours since the last successful backup is measured. If the DBC-specified Backup_Age: number of hours is exceeded, then a Backup Age Event occurs.
  • The logic to record the timestamp of each backup (RMAN, Full, Export, TBS, BCV) is incorporated into each our backup tools.
  • (AutoFix: Too much elapsed time since last successful DB backup)
  • If specified, the DBC-specified Backup_Command: Is Invoked (in the background) to run a backup of the correct type.
  • Backups 8. Hung RMAN OS Processes Consuming CPU
    (AutoFix)
  • If there are any rman OS processes which:
    • Have a parent pid of 1
    • Are consuming >= 75% of 1 CPU
    • Have been running for at least 5 minutes
    are orphaned processes. RMAN processes with these attributes are always orphan processes which will never die on their own.
  • (AutoFix: Hung, orphaned RMAN processes)
  • These processes are automatically killed.
  • Security 9. DB "System" Users With Obvious Passwords
    (AutoFix)
  • If a "system" user (SYS, SYSTEM, OUTLN, ...) is found with a default password, an event occurs.
  • (AutoFix: DB System user found with insecure password)
  • (HP-UX Only) The password of this user is ALTERed to one of your choosing.
  • Security 10. Listener Password
    (AutoFix)
  • The password needs to be set for the listener (/etc/listener.ora).
  • (AutoFix: Listener Password Not Set)
  • (HP-UX Only) The password in the listener file is automatically set.
  •  
    Availability 11. Alert Log
  • The alert log is checked for the occurance of certain strings (this is user-configurable). If any of these strings are found (and not EXCLUDED by the exclude strings), then a Critical Event will occur. Instance specific EXCLUDE strings can be specifed in file: /home/oracle/.dbamon_ORACLESID_alert_exclude.txt .
  • Availability 12. Object Next Extent Size
  • If the next extent of an object will not fit in its tablespace, an event will occur. (This feature is availably if Extent Checking is enabled for an instance).
  • Availability 13. SGA Full
  • If the SGA is 100% full, a critical event will occur.
  • If the SGA is >= 99% full, a warning event will occur.
  • Availabilty 14. ORACLE_HOME Filesystem Full
  • If this occurs, Oracle can hang. If this filesystem is >= 99% full, an event will occur.
  • Availabilty 15. RESTRICTED SESSION Enabled
  • If Oracle is in RESTRICTED SESSION, an event will occur.
  • Availabilty 16. Offline Datafile(s)
  • If any datafiles are not ONLINE (from v$datafile) an event will occur.
  • Availabilty 17. Archive Destination Status
  • If any Archive Log destinations (v$archive_dest) is in error status, an event will occur.
  • Availabilty 18. Redo Log Member Status
  • If any Redo Log member has a non-null (v$logfile) status, an event will occur.
  • Availabilty 19. Process Table
  • If the current number of DB processes is close to the INIT.ORA processes parameter value, an event will occur.
  • Availability 20. I/O Slave Count
  • The maximum number of I/O slaves (dbwr or tape) is 40. If we are close to this count, an event will occur.
  • Availability 21. UX File Descriptors
  • If the current number of UX file descriptors is close to the kernel configured value, this event will occur.
  • Availabilty 22. Orphan Datafiles
  • If any unused (not in v$datafile) datafiles are found in the location where database datafiles should be located, and these files are of the same naming convention as active datafiles, this event will occur. This could happen if a tablespace is dropped and the datafiles are not manually removed (pre-9i).
  • Availabilty 23. Non-Duplexed Controlfile
  • If you only have one controlfile, an event will occur. This is dangerous.
  • Availabilty 24. Tablespaces With No Datafiles
  • If any tablespaces are found which have no datafiles, an event will occur. Even in the case of a TEMP tablespace for a standby it is a good practice to create at least 1 tempfile.
  • Availability 25. UX "maxuprc" Process Limit
  • If the current OS process count for the UX userid running the DB is nearing the maxuprc HP-UX Kernel value, an event will occur.
  • Backups 26. Hung RMAN OS Processes
  • If there are any rman OS processes that have been running for at least 24 hours, an event will occur.
  • Backups 27. Backups - Unrecoverable Changes
  • If there are any unrecoverable changes since the most recent RMAN LVL0 backup, an event will occur.
  • DRP 28. Standby DB - Primary Delta
  • The update delta (in minutes) between this Standby DB and its Primary DB is measured. The DBC-specified InSync* Parameters specify the threshold.
  • DRP 29. Standby DB - NOLOGGING Objects
  • If there are any NOLOGGING objects on the Primary DB, an event will occur. The severity of these events can be specified with the DBC-specified InSync* Parameters .
  • DRP 30. Configuration Save
  • To rebuild an instance after a server crash (or ???) it would useful to have a copy of the INIT.ORA, a datafile map and a tablespace map.
  • A copy of this information is automatically saved in /opt/dbamon/dat/config_save/ for each instance.
  • DRP 31. Standby DB - Unrecoverable Changes
  • If there hvae been any unrecoverable changes on the Primary DB SINCE the last standby rebuild an event will occur. The severity of these events can be specified with the DBC-specified InSync* Parameters . The comparison is made between the DBAMON.STANDBY_REBUILD table and the date(s) of the most recent unrecoverable change. The standby rebuild tools automatically insert a row into DBAMON.STANDBY_REBUILD upon a successful standby rebuild.
  • DRP 32. Forced Logging
  • In 9i+, you can set Forced Logging at the database level. This eliminates the problems of standby and backup unrecoverable changes. This event will occur if forced logging is OFF.
  • Performance 33. Is OTRACE On?
  • Oracle OTRACE can cause performance problems. If the $ORACLE_HOME/otrace/admin*.dat file(s) are present, then this even will occur.
  • Performance 34. Is SQL_TRACE On?
  • Instance-wide Oracle SQL_TRACE can cause performance problems. If the SQL_TRACE init.ora parameter is on, this event will occur.
  • Performance 35. Rollback Segment Gets:Waits Ratio
  • If the ratio of Rollback segments get to waits is > 1%, more rollback segments are probably needed. Note that in 9i+ SMU this is managed automatically, so this event will probably not occur.
  • Performance 36. Users With Default Tablespace of SYSTEM
  • Performance problems can result from storing non-SYSTEM objects in the SYSTEM tablespace. If any users are found with a default tablespace of SYSTEM, an event will occur.
  • Performance 37. Users With Temporary Tablespace of SYSTEM
  • Performance problems can result from storing non-SYSTEM objects in the SYSTEM tablespace. If any users are found with a temporary tablespace of SYSTEM, an event will occur.
  • Performance 38. TEMP Tablespace is Type=Permanent
  • MAJOR performance problems can result from your TEMP (temporary) tablespace being a permanent tablespace. The tablespace type of the temporary tablespace for all users is examined.
  • Performance 39. Has the Data Dictionary Been Analyzed
  • MAJOR performance problems (high recusive CPU) can results from analyzing the SYS and SYSTEM objects. An event will occur if any of these objects have been analyzed.
  • Performance 40. DB Buffer Cache Hit Ratio
  • If the DB Cache Hit Ratio is <= 50%, an event will occur.
  • Performance 41. Incorrect Default / Temporary Tablespace
  • In 10g+, you can specify the default default and temporary tablespaces. If either of these are set to SYSTEM then an event will occur.
  • Performance 42. SMU
  • In 9i+, SMU (System Managed UNDO) should be on. If it is not an event will occur.
  • Performance 43. DB Buffer Cache 1 Granule
  • It is possible to create a DB cache (especially in 9i+) with a small number of BYTES. Oracle will round up to the nearest granule size. If that value is only 1 granule, then that must be what happened. An event will occur.
  • Performance 44. MTS In Use
  • MTS (in a non-RAC environment) can be very bad for performance. If it is found to be on, an event will occur.
  • Performance 45. Library Cache Hit Ratio
  • If the Library Cache Hit Ratio is < 90%, an event will occur.
  • Performance 46. Dictionary Cache Hit Ratio
  • If the Dictionary Cache Hit Ratio is < 90%, an event will occur.
  • Performance 47. Server Memory Utilization
  • If server memory is >= 99% used, a Warning event will occur.
  • If server memory is >= 95% used, a Performance event will occur.
  • Security 48. Dangerous INIT.ORA Parameters
  • If any dangerous INIT.ORA parameters are set (for example, O7_DICTIONARY_ACCESSIBILITY set to TRUE) an event will occur.
  • Security 49. DB Users With Userid=Password
  • If any user has a password equal to userid, and event will occur. This is a serious security breach.
  • Security 50. UX Users in DBA Group
  • If any users, other than oracle have been placed into the dba group a security event will occur.
  • Security 51. Non-System Users Using DBA Role
  • If any non-system DB users have been granted the DBA role, a security event will occur.
  • Security 52. Oracle File Permission
  • If certain Oracle config files have world-readable permission, a security event will occur.
  • Management 53. DBMS Software Oversight
  • If DBMS Software Oversight has been configured (via the DBAmon Console) then the Oracle version is compared to the "Minimum Good Version" for this version family. If it is less then an event occurs.
  • Management 54. DBAMON.TIMESTAMP Rows
  • The DBAMON.TIMESTAMP table is used to record the timestamp of a database. If the purge process of the ora_timestamp tool is not working for some reason, then this table can become quite large. If the row count for DBAMON.TIMESTAMP exceeds 100,000 then a critical event will occur.
  • Management 55. Registry Component Mismatch
  • If a component is found in DBA_REGISTRY whose version does not match that of the DBMS, a Warning event will occur.
  • Management 56. Autostart Software
  • If a server autostart configuration (HP-UX: /sbin/init.d) does not exist for Oracle or if it does exist but does not invoke oraadmine, then an event will occur.
  • Management 57. Tools Timestamp
  • If the /usr/local/dba/tools/TIMESTAMP.txt file has not been updated within 28 hours, a critical event will occur.
  • Management 58. Half-Duplex LAN
  • The lanadmin command is run to test all known LAN cards. If any are found to be in Half-Duplex mode, then a critical event will occur. has not been updated within 28 hours, a critical event will occur. The resulting ticket is automatically assigned to the UX Team.
  • Management 59. DBMS Patch Management
  • DBAmon keeps track of which patches have been applied to Oracle. Detailed reports (via WWW interface) can then be viewed to see if all instances of a particular Risk Level are compliant. See Patch Reporting .
  • Management 60. Backup Schedule (Customized Logic) If a backup schedule (both ARC and LVL*) does not exist, a critical event will occur.
    Management 61. oddjob Schedule (Customized Logic) If ora_oddjob is not scheduled in cron, a critical event will occur.
    Management 62. DB_FILES Usage The number of rows in V$DATAFILE is compared against the DB_FILE init.ora setting. if >= 95% of the DB_FILES values is found, a critical event will occur.
    Management 63. cron Daemon If the cron OS daemon is not running, a critical event will occur.
    Management 64. DB Corruption If rows are found in V$DATABASE_BLOCK_CORRUPTION, a critical event will occur.
    Management 65. ASM Diskgroups For +ASM instances, diskgroups are monitored for fullness.
    Management 66. Flash Recovery Area Full For +ASM instances where the archivelog dest is the FRA, the FRA diskgroup is monitored for fullness.
    Management 67. COMPATIBLE Parameter If the COMPATIBLE pfile parameter is set at least one version lower than the software version, an event will occur.
    Management 68. DST2007 Patch Status DBAmon will automatically determine if the DST2007 patches (for either TZ-Columns or JVM) are needed and installed. If they are needed and not installed, an event will occur.


    DBAmon.com
    This Document: http://dbamon.com/misc/monitors_oracleux.shtml