Availabilty
| 1. Oracle Instance Running
(AutoFix)
|
Database OPEN
Backgroud processes: pmon, smon, lgwr, dbw* running
Able to run SQL via svrmgrl/sqlplus
(AutoFix: Oracle Crash) When DBAmon finds the pmon task dead, it will
attempt to startup Oracle, if ALL of the following conditions are true:
The DBC Must_Be_Up: parameter is not set to N
The last 2 lines of the Oracle Alert Log are the format:
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 7485
The oerr output for the above error (474 in this case) contains the string
Warm start instance.
00474, 00000, "SMON process terminated with error"
// *Cause: The system cleanup process died
// *Action: Warm start instance
If all of these conditions are true, then DBAmon will attempt to startup the instance
using svrmgrl/sqlplus. Note that even if DBAmon does successfully restart Oracle, a Critical Event will
occur. The reason behind this is that you would always want to know if you are having Oracle
crashes, even if you do not have to restart Oracle yourself. You will then know to diagnose the
problem to prevent it from reoccuring.
Availabilty
| 2. Listener Running
(AutoFix)
|
Is Listener running
Successful "status" command
(AutoFix: Listener Down)
Listener automatically started (lsnrctl start).
Availabilty
| 3. Tablespace Full
(AutoFix)
|
Tablespaces are monitored against DBC-Specified T_TS* Thresholds
TEMP (tempfile) tablespaces are monitored using v$sort_usage
(AutoFix: Tablespace Full or Almost Full)
The DBC T_TS_Command: command is invoked when a tablespace
reaches the Warning severity threshold. This will cause space to be
added to this tablespace.
Availabilty
| 4. Object Extents
(AutoFix)
|
Objects (Tables and Indices) extent count, versus MAXEXTENTS, is monitored against
DBC-specified T_Extents: Thresholds
(AutoFix: Object at or near Max-Extents)
The affected objects are altered: ALTER {OBJECT} storage ( maxextents unlimited );
Availability
| 5. Archivelog Filesystem Full
(AutoFix)
|
The UX filesystem for each Archive Log destination is
examined. If any are found to be at least the
DBC-specified T_Arclog:
percent full, then an event occurs.
The number of hours since the last successful Archive Log backup is
optionally measured. If the
DBC-specified Backup_Age:
number of hours is exceeded, then a Backup Age Event occurs.
The logic to record the timestamp of each
backup (ARC, ARCFSCHECK, ARCKEEPn, ARCEMERGENCY) is incorporated into
backup tools.
(AutoFix: Archive Log filesystem full or Too much elapsed time since last successful Archive Log backup)
If specified, the
DBC-specified Backup_Command: Is Invoked (in the
background) to run a backup of the correct type.
Availability
| 6. Listener Log
(AutoFix)
|
The $ORACLE_HOME/network/log/listener.log can become
very large, even causing the $ORACLE_HOME filesystem to fill.
If the size exceeds 50M, an event occurs.
(AutoFix: DB Listener Log > 50M)
The $ORACLE_HOME/network/log/listener.log is automatically
gzipped.
Backup
| 7. Database Backup Age
(AutoFix)
|
The number of hours since the last successful backup is
measured. If the
DBC-specified Backup_Age:
number of hours is exceeded, then a Backup Age Event occurs.
The logic to record the timestamp of each
backup (RMAN, Full, Export, TBS, BCV) is incorporated into
each our backup tools.
(AutoFix: Too much elapsed time since last successful DB backup)
If specified, the
DBC-specified Backup_Command: Is Invoked (in the
background) to run a backup of the correct type.
Backups
| 8. Hung RMAN OS Processes Consuming CPU
(AutoFix)
|
If there are any rman OS processes which:
- Have a parent pid of 1
- Are consuming >= 75% of 1 CPU
- Have been running for at least 5 minutes
are orphaned processes. RMAN processes with these attributes are
always orphan processes which will never die on their own.
(AutoFix: Hung, orphaned RMAN processes)
These processes are automatically killed.
Security
| 9. DB "System" Users With Obvious Passwords
(AutoFix)
|
If a "system" user (SYS, SYSTEM, OUTLN, ...) is found with
a default password, an event occurs.
(AutoFix: DB System user found with insecure password)
(HP-UX Only) The password of this user is ALTERed to one of your choosing.
Security
| 10. Listener Password
(AutoFix)
|
The password needs to be set for the listener (/etc/listener.ora).
(AutoFix: Listener Password Not Set)
(HP-UX Only) The password in the listener file is automatically set.
|
Availability
| 11. Alert Log
|
The alert log is checked for the occurance of certain strings (this
is user-configurable). If any of these strings are found (and not
EXCLUDED by the exclude strings), then a Critical Event will occur. Instance
specific EXCLUDE strings can be specifed in file:
/home/oracle/.dbamon_ORACLESID_alert_exclude.txt .
Availability
| 12. Object Next Extent Size
|
If the next extent of an object will not fit in its tablespace,
an event will occur.
(This feature is availably if Extent Checking is enabled for an instance).
Availability
| 13. SGA Full
|
If the SGA is 100% full, a critical event will occur.
If the SGA is >= 99% full, a warning event will occur.
Availabilty
| 14. ORACLE_HOME Filesystem Full
|
If this occurs, Oracle can hang. If this filesystem is >= 99%
full, an event will occur.
Availabilty
| 15. RESTRICTED SESSION Enabled
|
If Oracle is in RESTRICTED SESSION, an event will occur.
Availabilty
| 16. Offline Datafile(s)
|
If any datafiles are not ONLINE (from v$datafile) an event will occur.
Availabilty
| 17. Archive Destination Status
|
If any Archive Log destinations (v$archive_dest) is in
error status, an event will occur.
Availabilty
| 18. Redo Log Member Status
|
If any Redo Log member has a non-null (v$logfile) status,
an event will occur.
Availabilty
| 19. Process Table
|
If the current number of DB processes is close to the INIT.ORA
processes parameter value, an event will occur.
Availability
| 20. I/O Slave Count
|
The maximum number of I/O slaves (dbwr or tape) is 40. If
we are close to this count, an event will occur.
Availability
| 21. UX File Descriptors
|
If the current number of UX file descriptors is close to the kernel configured
value, this event will occur.
Availabilty
| 22. Orphan Datafiles
|
If any unused (not in v$datafile) datafiles are found in the location where
database datafiles should be located, and these files are of the same naming
convention as active datafiles, this event will occur. This could happen if a
tablespace is dropped and the datafiles are not manually removed (pre-9i).
Availabilty
| 23. Non-Duplexed Controlfile
|
If you only have one controlfile, an event will occur. This is
dangerous.
Availabilty
| 24. Tablespaces With No Datafiles
|
If any tablespaces are found which have no datafiles, an
event will occur. Even in the case of a TEMP tablespace for a standby
it is a good practice to create at least 1 tempfile.
Availability
| 25. UX "maxuprc" Process Limit
|
If the current OS process count for the UX userid running the DB
is nearing the maxuprc HP-UX Kernel value, an
event will occur.
Backups
| 26. Hung RMAN OS Processes
|
If there are any rman OS processes that have been running for
at least 24 hours, an event will occur.
Backups
| 27. Backups - Unrecoverable Changes
|
If there are any unrecoverable changes since the most recent RMAN
LVL0 backup, an event will occur.
DRP
| 28. Standby DB - Primary Delta
|
The update delta (in minutes) between this Standby DB
and its Primary DB is measured.
The
DBC-specified InSync* Parameters
specify the threshold.
DRP
| 29. Standby DB - NOLOGGING Objects
|
If there are any NOLOGGING objects on the Primary DB,
an event will occur. The severity of these events can be specified with the
DBC-specified InSync* Parameters
.
DRP
| 30. Configuration Save
|
To rebuild an instance after a server crash (or ???) it would
useful to have a copy of the INIT.ORA, a datafile map and a
tablespace map.
A copy of this information is automatically saved in
/opt/dbamon/dat/config_save/ for each instance.
DRP
| 31. Standby DB - Unrecoverable Changes
|
If there hvae been any unrecoverable changes on the Primary DB SINCE the last
standby rebuild
an event will occur. The severity of these events can be specified with the
DBC-specified InSync* Parameters
. The comparison is made between the DBAMON.STANDBY_REBUILD table and the
date(s) of the most recent unrecoverable change. The standby rebuild
tools automatically insert a row into DBAMON.STANDBY_REBUILD upon a successful
standby rebuild.
DRP
| 32. Forced Logging
|
In 9i+, you can set Forced Logging at the database level.
This eliminates the problems of standby and backup unrecoverable
changes. This event will occur if forced logging is OFF.
Performance
| 33. Is OTRACE On?
|
Oracle OTRACE can cause performance problems. If the
$ORACLE_HOME/otrace/admin*.dat
file(s) are present, then this even will occur.
Performance
| 34. Is SQL_TRACE On?
|
Instance-wide Oracle SQL_TRACE can cause performance problems.
If the SQL_TRACE init.ora parameter is on, this event
will occur.
Performance
| 35. Rollback Segment Gets:Waits Ratio
|
If the ratio of Rollback segments get to waits is > 1%, more
rollback segments are probably needed. Note that in 9i+ SMU
this is managed automatically, so this event will probably not
occur.
Performance
| 36. Users With Default Tablespace of SYSTEM
|
Performance problems can result from storing non-SYSTEM objects
in the SYSTEM tablespace. If any users are found with a default
tablespace of SYSTEM, an event will occur.
Performance
| 37. Users With Temporary Tablespace of SYSTEM
|
Performance problems can result from storing non-SYSTEM objects
in the SYSTEM tablespace. If any users are found with a temporary
tablespace of SYSTEM, an event will occur.
Performance
| 38. TEMP Tablespace is Type=Permanent
|
MAJOR performance problems can result from your TEMP (temporary)
tablespace being a permanent tablespace. The tablespace type of
the temporary tablespace for all users is examined.
Performance
| 39. Has the Data Dictionary Been Analyzed
|
MAJOR performance problems (high recusive CPU) can results from
analyzing the SYS and SYSTEM objects. An event will occur if any
of these objects have been analyzed.
Performance
| 40. DB Buffer Cache Hit Ratio
|
If the DB Cache Hit Ratio is <= 50%, an event will occur.
Performance
| 41. Incorrect Default / Temporary Tablespace
|
In 10g+, you can specify the default default and temporary
tablespaces. If either of these are set to SYSTEM then an
event will occur.
Performance
| 42. SMU
|
In 9i+, SMU (System Managed UNDO) should be on. If it is
not an event will occur.
Performance
| 43. DB Buffer Cache 1 Granule
|
It is possible to create a DB cache (especially in 9i+) with a small
number of BYTES. Oracle will round up to the nearest granule size. If that
value is only 1 granule, then that must be what happened. An event will
occur.
Performance
| 44. MTS In Use
|
MTS (in a non-RAC environment) can be very bad for performance.
If it is found to be on, an event will occur.
Performance
| 45. Library Cache Hit Ratio
|
If the Library Cache Hit Ratio is < 90%, an event will occur.
Performance
| 46. Dictionary Cache Hit Ratio
|
If the Dictionary Cache Hit Ratio is < 90%, an event will occur.
Performance
| 47. Server Memory Utilization
|
If server memory is >= 99% used, a Warning event will occur.
If server memory is >= 95% used, a Performance event will occur.
Security
| 48. Dangerous INIT.ORA Parameters
|
If any dangerous INIT.ORA parameters are set
(for example, O7_DICTIONARY_ACCESSIBILITY set to TRUE)
an event will occur.
Security
| 49. DB Users With Userid=Password
|
If any user has a password equal to userid, and event will occur. This
is a serious security breach.
Security
| 50. UX Users in DBA Group
|
If any users, other than oracle have been placed into the dba group
a security event will occur.
Security
| 51. Non-System Users Using DBA Role
|
If any non-system DB users have been granted the DBA role,
a security event will occur.
Security
| 52. Oracle File Permission
|
If certain Oracle config files have world-readable permission,
a security event will occur.
Management
| 53. DBMS Software Oversight
|
If DBMS Software Oversight has been configured
(via the DBAmon Console) then the Oracle version is compared
to the "Minimum Good Version" for this version family. If it is less
then an event occurs.
Management
| 54. DBAMON.TIMESTAMP Rows
|
The DBAMON.TIMESTAMP table is used to record the timestamp
of a database. If the purge process of the ora_timestamp tool is
not working for some reason, then this table can become quite
large. If the row count for DBAMON.TIMESTAMP exceeds 100,000 then
a critical event will occur.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |