--------------------------------------------------------------------------

 #####   #####     ##    #    #   ####   #    #
 #    #  #    #   #  #   ##  ##  #    #  ##   #
 #    #  #####   #    #  # ## #  #    #  # #  #
 #    #  #    #  ######  #    #  #    #  #  # #
 #    #  #    #  #    #  #    #  #    #  #   ##
 #####   #####   #    #  #    #   ####   #    #

 Current Version: 3.00 (10/24/97)

 See URL http://web.cs.itc/solutions/sap/dba/dbamon.html for more info.

--------------------------------------------------------------------------


--------------------------------------------------------------------------
SUPPORT
--------------------------------------------------------------------------

  This software is provided and supported on a 'when I have time/If I 
 feel like it' basis.


--------------------------------------------------------------------------
DESCRIPTION
--------------------------------------------------------------------------

  dbamon is a CTS Colorado Springs developed UX application that "wakes
 up" periodically and executes various Informix and shell commands on 
 all of the systems being monitored (via remsh). It then parses the 
 output from the commands and issues:

   Warning Messages to:

     - Administrators log file (specified in dbamonrc)

     - OpC (optionally)

     - HTML file (which OpC message can reference)

   Critical Messages to:

     - Email

     - Postnote

     - OpC (referencing name of above HTML file)

     - Alphanumeric Pagers (via EMail pager gateway)

 The program runs on my workstation as a cron job and issue remsh's to 
 our various Informix systems in the CTS and DRP areas (both SAP and 
 non-SAP).


  The metrics measured are:

   (see http://web.cs.itc/solutions/sap/dba/dbamon_what_monitored.html)

  Also, each iteration of dbamon creates a dbamon.rpt file (see dbamonrc
 comments) while contains a report of all chunks and dbspaces formatted
 in a readable fashion.


  As more metrics are available and are programmatically measureable, they
 will be added. 

  Possible future enhancements:

   -  Checkpoint Duration (V6+ Only)

   -  onstat -g rea (thread ready queue checking)


--------------------------------------------------------------------------
SOFTWARE REVISIONS
--------------------------------------------------------------------------

  The current software revision level as of 04Nov94 is 1.20.

  History - 

    1.18 - bb 10/12/94 - Added message history log file (see dbamonrc)

    1.19 - bb 11/02/94 - Added code to save each instances tbstat -d and
	   tbstat -c output every time that dbamon is started.

    1.20 - bb 11/04/94 - Added code to check:
	   1. Read and write hit rations
	   2. ov* fields of tbstat -p for non-zero values.

    1.21 - bb 02/14/95 - Added code to check for down chunks.

    1.22 - bb 02/17/95 - Added code to send postnotes instead of mail
           messages.

    1.23 - bb 02/23/95 - Added code to process different warning and 
           critical disk full thresholds. This will require that all
           existing parm files be changes from nn format to nn/nn format
           (see dbamonrc file for details).

    1.24 - bb 04/03/95 - Added code to check for full log files. If more 
           than 50% of existing logical log files are unavailable, a critical
           message will be triggered.

    1.30 - bb 04/07/95 - Major revision - Added code to send OpC messages
           in addition to postnotes and email. If the recipient name in the
           rc file is OpC, an opc message will sent. Also, warning messages
           can now be written in HTML format to the path of your choice. This
           path can be written as part of the opc message.

    2.00 - bb 04/10/95 - Another Major revision 

            - Devloped code that completely Re-works the way that the HOSTS 
              section of the rc file is coded. Rather than one line
              per system (as before), it is now coded in keywork style (see new
              sample dbamonrc). Supplied new program dbamon_rc_convert to 
              convert old style dbamonrc files.

            - Added HTML section to the rc file to specify HTML path names.

            - The OpC message parameters are now configurable on a by-host
              basis.

            - The Read/Write hit ration warning threshold are not configurable
              on a per system basis.

    2.01 - bb 04/12/95 - Various cleanup stuff. Added Images directory
           as part of the HTML section and clpostXm path as part of the
           SYSTEMS section (see dbamonrc for details).

    2.02 - bb 04/17/95 - Changed format of SYSTEM section to be keyword
           rather than positional (see dbamonrc for details). Also changed
           format of Disk Usage Report to html. You will have to rework
           the SYSTEM section of any pre-2.02 dbamonrc files.

    2.03 - bb 04/18/95 - Added code so the the HTML message log and HTML 
           disk usage report are written to a temporary file and at the 
           end of each iteration, copied to the permanent file name. This
           way, there are not incomplete reports in the published HTML 
           pathname.

    2.04 - bb 05/02/95 - Added report of disk I/O balance by UX device.  
           This report is appended to the Disk Usage Report. Misc
           cleanup.

    2.05 - bb 05/17/95 - Added HTML System Summary report. This report
           lists all monitored systems with a colored ball indicating
           the status of that system (red=critical ...). This document
           also contains pointer to the other two HTML docuements; the
           Events Log and the Disk Usage Report. In my continuing effort
           to make the rc file more consistent, I got rid of the HTML
           section, changed its format to "keyword" (instead of positional)
           and move the data to the SYSTEM section. See dbamonrc for 
           details. 

    2.06 - bb 05/23/95 - Cleanup - The Disk Usage Report had a 
           discrepancy between the chunk and dbspace reports. I fixed
           it. Also formalized the distinction between the 3 types 
           of events. Further documentation in WWW page. Added total 
           disk space line to System Summary Report. etc...

    2.07 - bb 06/05/95 - Added parameterization to the alphanumeric
           pager code. There are now 2 new optional parameters in
           dbamonrc, Pager_Subject and Pager_EMail that control
           how email messages are sent to the pager gateway software
           (give me a call if you want more info about Pager Gateway
           services). See new dbamonrc for more details about these
           parms.

    2.08 - bb 06/28/95 - 
           1. Added functionality to measure the time since
           the last backup and if it exceeds your specified value, send
           a warning message (new dbamonrc parm Backup_Age:). See the 
           comments in dbamonrc for more information. This new feature
           requires 2 new supplied programs; dbamon_arc_ontape (perl) and
           dbamon_arc_onarch. Check the comments in these programs for
           more info (they're wicked cool). These work correctly in 
           Informix 5.00.uc6, 6.00.ue1 and 6.00.usap1. You'll have to 
           try them if your version is different. The dbamon_arc_ontape
           reads reserved pages, so if their format changes in your
           release, I'd have to detect that and change the code.
           2. Changed the log full percentage from 50% to 60%. This
           is because most of our systems have LTXHWM set to 50% and
           we don't want to get beeped every time there is a log 
           transaction.
           3. For trimming the log files (dbamon_mgr), I supplied a
           new "quickie" tail program which (unlike the UX one) does
           correctly process a file when > 500 lines are tailed. This
           new program is called by dbamon_mgr and shouldn't require
           and special attention.

    2.09 - bb 07/06/95 - 
           Reworked all files to facilitate install into /opt/dbamon
           making it a "real" software product. 
           
    2.10 - bb 07/18/95 - 
           Support for Informix 7.10. Minor bug fixes. Informix version
           now prints on log file.
           
    2.11 - bb 08/14/95 - 
           DBAmon now creates (once a day) a dbspace size history file. 
           It's in /opt/dbamon/dat/history/dbspace/*. This file contains
           one line for each dbspace of each system monitored. This file
           can then be used as input to SAS (my package of choice) to create
           data growth reports/graphs.

    2.12 - bb 08/16/95 - 
           Table extents are now monitored. There is a new SYSTEMS section
           parm called "Max_Extents" which specifies the maxiumun tolerable
           number of extents per table.
   
    2.13 - bb 08/30/95 - 
           DBAmon now has error messages. The HTML event log now has URL
           pointer to the correct message ID. In future releases I will
           further identify all messages. Look at my event log for sample
           output.

    2.14, 2.15 - bb 10/31/95 - 
           Various HTML asthetic changes. Event log is now created for 
           each system, in addition to globally. HTML System Summary now
           has URL for the individual files.

    2.16 - bb 12/95
           Major enhancements to the System Summary WWW page, Missed DB
           backup messages are now critical (they were warning). 

    2.17 - bb 03/01/96
           A "-P" now appears next to the instance name for systems with
           paging enabled (usually production instances). Also, OV* (onstat 
           -p) conditions are now critical (were warning).

    2.18 - bb 03/06/96
           Creation of "inhibition" logic. It often becomes necessary to
           turn off monitoring of an Informix instance for scheduled down
           time. During those times, we need to turn off messages and paging.
           Logic was added to dbamon to check for the presence and read the
           contents of an "inhibit" file that can be set up for any Informix
           instance. This file contains dates and time for which DBAmon 
           monitoring is NOT to take place. See the "Inhibition" section of
           the DBAmon Home Page for more info.

    2.19 - bb 03/25/96
           Added logic to exclude "bogus" Informix messages that appear on the
           Informix message log. The example is a -27001 error that appears 
           during an incorrect user connection attempt. This should not trigger
           a critical event. There is a new file in the /adm/ directory called
           dbamon.msg_critical_bogus which lists strings that appear in bogus
           critical error messages. These messages no longer appear as critical
           events.

    2.20 - bb 04/26/96
           Performance improvements (each iteration has been taking 1-1.5     
           hours!). DB object counting and extent checking will not be done
           every 10th iteration. Added auto-refresh code to the system summary
           HTML doc.

    2.21 - bb 09/23/96
           Cosmetic - The HP-UX version is now displayed on the System Summary page. Also,
           the amount of output in the log should be reduced; I stopped displaying diagnostic
           info for archive age checking. 

    2.22 - bb 10/01/96
           The UX box type is now displayed on the System Summary page. 

    2.23 - bb 10/16/96
           Newdbamonrc parms "Title1/Title2" were added. The values are displayed on the 
           System Summary page. 

    2.24 -> 2.26 bb 06/03/97
          dbamonrc cleanup: InformixVersion and LocalOrRemote now have defaults 7/R. Added new
           parms Title1_URL, Title2_URL and Host_URL to specify URL to be associated with those
           fields of the System Summary.
   
    2.27   Newdbamonrc parameter:
           T_DBSpace_Free: (MB) This parm serves as an override to the
           T_Disk_Full dbspace critical thresholds. A dbspace will be
           critical if it is > T_Disk_Full % full -AND- the amount of
           freespace is less than T_DBSpace_Free.
    
    2.28 -> 2.29 Added scoreboard to System Summary to show how many of each type of event. 
           Other asthestic changes to WWW page.

    2.30   Logical log checking works now for Informix V7. Also moved scoreboard to the
           top of the WWW System Summary page.

    3.00   Starting with this version, changes are documented only at:  
           http://web.cs.itc/solutions/sap/dba/dbamon.html
 
--------------------------------------------------------------------------
SYSTEM SETUP
--------------------------------------------------------------------------

 All systems that dbamon will be remsh'ing to must have userid's equal
to that of the system executing and correct .rhost files.

 For example, if the system that will run dbamon is called foo and you
have code the dbamonrc file to monitor an Informix instance on a system
called dbsys, you must have a working account on dbsys with the same id
as you are using on foo. Also, there must be a line in the $HOME/.rhosts
file on dbsys that looks like: 

foo.subnet.abc.com youruserid

This is the standard requirement for remsh.


--------------------------------------------------------------------------
USAGE
--------------------------------------------------------------------------

 From the shell: dbamon [test] 

   - With test operand, runs monitor in test mode - that is, no EMail 
       messages are sent and the config file used is $HOME/dbamonrc_test
       instead of the usual $HOME/dbamonrc. Instance config files are 
       read from /opt/dbamon/adm/instances_test.     
   - With no operands, runs monitor in normal mode.


--------------------------------------------------------------------------
INSTALLATION (with the use of dbamon_install script)
--------------------------------------------------------------------------

  Refer to http://web.cs.itc/solutions/sap/dba/dbamon.html

--------------------------------------------------------------------------
DIRECTORIES
--------------------------------------------------------------------------

 /opt/dbamon/adm            -  Main Config files
 /opt/dbamon/adm/instances  -  Instance Config files
 /opt/dbamon/bin            -  Executables, SQL
 /opt/dbamon/dat            -  Work files
 /opt/dbamon/doc            -  Documentation
 /opt/dbamon/tmp            -  Temp files

--------------------------------------------------------------------------
FILES
--------------------------------------------------------------------------

dbamonrc
---------

 A sample parameter file. This must be in the $HOME directory of the
 userid that executes dbamon. The neccessary parameters are described
 in comments in dbamonrc.

dbamon 
------
 
 This is that script that executes the main awk program, dbamon.awk 

dbamon.README
-------------

 What you're reading now

dbamon.awk
----------

 The actual dbamon code. Under normal circumstances, shouldn't 
 need to be changed. This file MUST live in the same directory
 as the file: dbamon.

dbamon.msg_critical
-------------------

 A file containing excerpts from critical Informix log messages. If a 
 match is found on the Informix log, a critical message is generated.
 This file MUST live in the same directory as the file: dbamon.

dbamon.msg_critical_bogus
-------------------------

 A file containing strings found in bogus Informix error messages. For
 example, there is a message which appears in the message log:

  11:11:32  listener-thread: err = -27001: Read error occurred during connection attempt.

 Which would otherwise generate a DBAmon critical event. If the string 
 -27001 is placed in this file, that error will NOT create a DBAmon
 event.

dbamon.msg_warn
---------------

 A file containing excerpts from critical Informix log messages. If a 
 match is found on the Informix log, a critical message is generated.
 This file MUST live in the same directory as the file: dbamon.

dbamon_arc_onarch (New in 2.08)
-------------------------------

 This program is employed to create SQL to check the age of the most
 recent onarchive backup.

dbamon_arc_ontape (New in 2.08)
-------------------------------

 This program is executed dd (remotely) to dump the 11th and 12th pages
 of the root dbspace. It then determines the age of the most recent 
 ontape backup and passes the result back to dbamon.awk.

dbamon_mgr
----------

 This is the script that is run by the cron entry (above). It:

  -  Checks the log files to see if they exceed the max # of
     lines. If they do, dbamon is stopped and the log file(s)
     are trimmed to the specified number of lines (see comments).

  -  Checks to see if dbamon is active. If it is not, it is 
     started and a mail notification is sent to the dbamon
     administrator.

dbamon_package
--------------

 Don't worry about - just the script to create the tar file.

dbamon_stop
--------------

 A script to stop the cron'ed dbamon. Used by dbamon_mgr.

dbamon_stop
--------------

 A program to convert a version 1.xx parameter file to the 
 new version 2 format. It reads ~/dbamonrc_old and writes
 ~/dbamonrc_new.


(History Files)
---------------

 Live under /opt/dbamon/dat/history/

(Work Files)
------------

 Created by each iteration of dbamon in /tmp and $HOME.

(Report File)

 Recreated by each iteration of dbamon in pathname specified in dbamonrc.

(HTML System Summary Report)

 Recreated by each iteration of dbamon in pathname specified in dbamonrc.

(HTML Event Log)

 Recreated by each iteration of dbamon in pathname specified in dbamonrc.

(HTML Disk Usage Report)

 Recreated by each iteration of dbamon in pathname specified in dbamonrc.

(Log File)

 Any startup, shutdown, warning or critical message is appended to this
 file (the name is specified in dbamonrc). 

(config.? Files)

 Every time that dbamon is started, it will issue [tb|on]stat -cd and save 
 the output in the directory named in dbamonrc. This is useful for
 system recovery. 


--------------------------------------------------------------------------