Wednesday, January 14, 2015

Automate Startup/Shutdown of OSWatcher with Oracle HAS (GridInfrastructure Standalone)

Grid Infrastructure Standalone 11.2.0.3
Operation system AIX 6.1

Download Oracle OSWatcher (oswbb732.tar, current version 7.3.2) from My Oracle Support. Install OSWatcher:
$ mkdir /u01/app/grid/product/7.3.2
$ cd /u01/app/grid/product/7.3.2
$ tar xvf /tmp/oswbb732.tar
  ... ...
$ ls -l oswbb/*OSWbb*
-rwxr-xr-x    1 grid  oinstall         2385 Oct 07 2013  oswbb/startOSWbb.sh
-rwxr-xr-x    1 grid  oinstall          558 Apr 17 2014  oswbb/stopOSWbb.sh

Create an Oracle HAS(Grid Infrastructure Standalone) Action Script file startOSWbb.sh:
#!/bin/sh
#
# ACTION_SCRIPT for Oracle HAS to manage OSWatcher
#
OSWBB_HOME=/u01/app/grid/product/7.3.2/oswbb
OSWBB_LOG=/u01/app/grid/local/logs/oswbb/alert_OSWbb.log
OSWBB_ARCHIVE_DEST=/u01/app/grid/local/logs/oswbb/archive
SNAPSHOT_INTERVAL=30   # in seconds
RETENTION_POLICY=720   # in hours
COMPRESS_COMMAND=NONE  # compress utility command such as zip, gzip, etc
#
case "$1" in
  'start')
     cd $OSWBB_HOME
     echo " " >> $OSWBB_LOG
     TMP=`ps -ef | grep OSWatcher  | grep -v grep |wc -l`
     if [ $TMP -eq 1 ]; then  # Only one process (OSWatcher or OSWatcherFM) running is not enough
        ./stopOSWbb.sh        # Stop orphan process
     fi
     echo "******************************************************" >> $OSWBB_LOG
     echo "...Starting OSWbb at `date` " >> $OSWBB_LOG
     ./startOSWbb.sh $SNAPSHOT_INTERVAL $RETENTION_POLICY $COMPRESS_COMMAND $OSWBB_ARCHIVE_DEST >> $OSWBB_LOG &
     if [ $? -eq 0 ]; then
        sleep 60    # Wait for all processes to be started
        exit 0
     else
        exit 1
     fi
     ;;
  'stop')
     cd $OSWBB_HOME
     echo "...Stopping OSWbb  at `date` " >> $OSWBB_LOG
     ./stopOSWbb.sh
     exit $?
     ;;
  'check')
    TMP=`ps -ef | grep OSWatcher  | grep -v grep |wc -l`
    if [ $TMP -eq 2 ]; then  # Both OSWatcher and OSWatcherFM (2 processes) running
       exit 0
    else
       exit 1
    fi
    ;;
  'clean')
     cd $OSWBB_HOME
     echo "...Cleaning/Stopping OSWbb  at `date` " >> $OSWBB_LOG
     ./stopOSWbb.sh
     exit $?
     ;;
esac

Note:
1. startOSWbb.sh has to be started in background by appending sign '&' at the end of the command line.
2. the output of startOSWbb.sh needs to be redirected to a file

Add a resource to HAS
crsctl add resource ora.OSWbb -type local_resource -attr "ACTION_SCRIPT=/u01/app/grid/local/bin/startOSWbb.sh,CHECK_INTERVAL=60,RESTART_ATTEMPTS=2"

Start resource ora.OSWbb
$ crsctl start resource ora.OSWbb
CRS-2672: Attempting to start 'ora.OSWbb' on 'host02'
CRS-2674: Start of 'ora.OSWbb' on 'host02' failed
CRS-4000: Command Start failed, or completed with errors.

Check OHASD agent log for errors:
$ tail /u01/app/grid/product/11.2.0/gi11204/log/host02/agent/ohasd/scriptagent_grid/scriptagent_grid.log
2015-01-14 10:00:49.240: [    AGFW][2057]{0:0:14962} Agent received the message: RESOURCE_START[ora.OSWbb host02 1] ID 4098:444107
2015-01-14 10:00:49.240: [    AGFW][2057]{0:0:14962} Preparing START command for: ora.OSWbb host02 1
2015-01-14 10:00:49.240: [    AGFW][2057]{0:0:14962} ora.OSWbb host02 1 state changed from: UNKNOWN to: STARTING
2015-01-14 10:00:49.242: [    AGFW][1543]{0:0:14962} Entering script entry point...
2015-01-14 10:00:49.242: [ora.OSWbb][1543]{0:0:14962} [start] Executing action script: /u01/app/grid/local/bin/startOSWbb.sh[start]
2015-01-14 10:00:49.304: [    AGFW][1543]{0:0:14962} Command: start for resource: ora.OSWbb host02 1 completed with status: SUCCESS
2015-01-14 10:00:49.305: [CLSFRAME][1] TM [MultiThread] is changing desired thread # to 3. Current # is 2
2015-01-14 10:00:49.306: [    AGFW][1543]{0:0:14962} Entering script entry point...
2015-01-14 10:00:49.306: [ora.OSWbb][1543]{0:0:14962} [check] Executing action script: /u01/app/grid/local/bin/startOSWbb.sh[check]
2015-01-14 10:00:49.307: [    AGFW][2057]{0:0:14962} Agent sending reply for: RESOURCE_START[ora.OSWbb host02 1] ID 4098:444107
2015-01-14 10:00:49.365: [    AGFW][2057]{0:0:14962} ora.OSWbb host02 1 state changed from: STARTING to: OFFLINE
2015-01-14 10:00:49.365: [    AGFW][2057]{0:0:14962} Agent sending last reply for: RESOURCE_START[ora.OSWbb host02 1] ID 4098:444107
2015-01-14 10:00:49.365: [    AGFW][2057]{0:0:14962} Agent has no resources to be monitored, Shutting down ..
2015-01-14 10:00:49.365: [    AGFW][2057]{0:0:14962} Agent sending message to PE: AGENT_SHUTDOWN_REQUEST[Proxy] ID 20486:25
2015-01-14 10:00:49.370: [    AGFW][2057]{0:0:14962} Agent is shutting down.
2015-01-14 10:00:49.370: [ USRTHRD][2057]{0:0:14962} Script agent is exiting..

Did not find helpful message, check OSWatcher log:
$ cat  /u01/app/grid/local/logs/oswbb/alert_OSWbb.log

******************************************************
The archive directory you specified for parameter 4 in startOSWbb.sh:/u01/app/grid/local/logs/oswbb/archive does not exist. Please create this directory and rerun ./startOSWbb.sh

******************************************************
...Starting OSWbb at Wed Jan 14 10:00:49 EST 2015
The archive directory you specified for parameter 4 in startOSWbb.sh:/u01/app/grid/local/logs/oswbb/archive does not exist. Please create this directory and rerun ./startOSWbb.sh

OSWatcher archive directory does not exist, manually create it:
$ mkdir /u01/app/grid/local/logs/oswbb/archive

Re-start resource ora.OSWbb
$ crsctl start resource ora.OSWbb
CRS-2672: Attempting to start 'ora.OSWbb' on 'host02'
CRS-2676: Start of 'ora.OSWbb' on 'host02' succeeded

Resource ora.OSWbb started successfully, make sure OSWatcher is running:
$ ps -ef | grep OSWatcher
 grid  7405596  5046352   0 10:13:16  pts/6  0:00 grep OSWatcher
 grid  7078198 11469280   0 10:11:23      -  0:00 /bin/sh ./OSWatcherFM.sh 720 /u01/app/grid/local/logs/oswbb/archive
 grid 11469280        1   0 10:10:54      -  0:00 /bin/sh ./OSWatcher.sh 30 720 NONE /u01/app/grid/local/logs/oswbb/archive

Note: Keyword 'OSWatcher' is reserved by OSWatcher stopping script to determine if the utility is running. DO NOT using string 'OSWatcher' anywhere (directory name, file name, command name, etc).

Test resource ora.OSWbb  stop function:
$ crsctl stop resource ora.OSWbb
CRS-2673: Attempting to stop 'ora.OSWbb' on 'host02'
CRS-2677: Stop of 'ora.OSWbb' on 'host02' succeeded
$ ps -ef | grep OSWatcher
 oragrid 12058682  5046352   0 15:15:50  pts/6  0:00 grep OSWatcher

Test resource ora.OSWbb  restart function (check & start functions):
$ ps -ef | grep OSWatcher
 oragrid 11010394  5046352   0 15:18:33  pts/6  0:00 grep OSWatcher
$ crsctl start resource ora.OSWbb
CRS-2672: Attempting to start 'ora.OSWbb' on 'host02'
CRS-2676: Start of 'ora.OSWbb' on 'host02' succeeded
$  ps -ef | grep OSWatcher
 oragrid  7864326        1   0 15:18:42      -  0:00 /bin/sh ./OSWatcher.sh 30 720 NONE /u01/app/grid/local/logs/oswbb/archive
 oragrid  8388952  7864326   0 15:19:10      -  0:00 /bin/sh ./OSWatcherFM.sh 720 /u01/app/grid/local/logs/oswbb/archive
 oragrid 11665602  5046352   0 15:19:53  pts/6  0:00 grep OSWatcher

OSWatcher is running, kill one of two running processes:
$ kill 8388952
$
$ ps -ef | grep OSWatcher
 oragrid  7405662  5046352   0 15:20:26  pts/6  0:00 grep OSWatcher
 oragrid  7864326        1   0 15:18:42      -  0:00 /bin/sh ./OSWatcher.sh 30 720 NONE /u01/app/grid/local/logs/oswbb/archive

Process OSWatcherFM.sh is killed. Now wait for 1 minute and see what happens:
$ ps -ef | grep OSWatcher
 oragrid 14614732  6816014   0 15:21:11      -  0:00 /bin/sh ./OSWatcherFM.sh 720 /u01/app/grid/local/logs/oswbb/archive
 oragrid  6816014        1   0 15:20:42      -  0:00 /bin/sh ./OSWatcher.sh 30 720 NONE /u01/app/grid/local/logs/oswbb/archive
 oragrid 10355082  5046352   0 15:22:23  pts/6  0:00 grep OSWatcher
$

Wow! The processes are restarted.

No comments: