修改Nagioscheck_oracle脚本来监控Oracle的临时表空间
 
1.前言
NagiosNagios Plugins中有很多程序或脚本提供给我们,用于监控相应的服务、资源等等。在Nagios Plugins中有一个用于实现对Oracle数据库进行监控的脚本,叫做check_oracle,位于Nagios安装路径下的libexec目录中。
check_oracle脚本可以监控Oracle数据库的cachetnstablespace等信息,但是通过“--tablespace”选项监控表空间时,我们发现这个脚本不能监控临时表空间。仔细查看该脚本,发现其中的确没有对临时表空间进行处理,现对该脚本做修改,使之能够监控Oracle的临时表空间。本文以Oracle10g作为实验数据库。
 
2.分析源码和问题的解决方法
查看check_oracle源码的tablespace部分,发现其对于表空间信息的获取是通过sql语句完成的。sql语句如下:
select NVL(b.free,0.0),a.total,100 - trunc(NVL(b.free,0.0)/a.total * 1000) / 10 prc
  from
    (select tablespace_name,sum(bytes)/1024/1024 total
      from dba_data_files group by tablespace_name) A
    LEFT OUTER JOIN
    (select tablespace_name,sum(bytes)/1024/1024 free
      from dba_free_space group by tablespace_name) B
    ON a.tablespace_name=b.tablespace_name
  WHERE a.tablespace_name='${5}';
其中${5}是表空间的名字。
由于临时文件的的信息不在dba_data_files表中,所以通过上述脚本显然不能获得临时表空间的任何信息。
那么如何获得临时表空间的空间使用情况呢,具体sql语句如下:
select NVL(b.free,0.0),a.total,100 - trunc(NVL(b.free,0.0)/a.total * 1000) / 10 prc
  from
    (select tablespace_name,sum(bytes)/1024/1024 total
      from dba_temp_files group by tablespace_name) A
    LEFT OUTER JOIN
    (select tablespace_name,sum(bytes_cached)/1024/1024 free
      from v\\$temp_extent_pool group by tablespace_name) B
    ON a.tablespace_name=b.tablespace_name
  WHERE a.tablespace_name='TEMP';
这条sql语句可以获得临时表空间“TEMP”的空间使用情况。
解决的方法找到了,下面我们只需将脚本做小小修改就可以达到要求了。
3.check_oracle脚本的修改
check_oracle脚本中的“case”语句中增加一个“--tablespaceTEMP”分支,用于完成监控Oracle临时表空间的功能。修改后的脚本如下(修改、添加的部分已标出):
#! /bin/ksh
#
# [email]latigid010@yahoo.com[/email]
# 01/06/2000
#
#  This Nagios plugin was created to check Oracle status
#
 
PROGNAME=`basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
REVISION=`echo '$Revision: 1749 $' | sed -e 's/[^0-9.]//g'`
 
. $PROGPATH/utils.sh
 
 
print_usage() {
  echo "Usage:"
  echo "  $PROGNAME --tns <Oracle Sid or Hostname/IP address>"
  echo "  $PROGNAME --db <ORACLE_SID>"
  echo "  $PROGNAME --login <ORACLE_SID>"
  echo "  $PROGNAME --cache <ORACLE_SID> <USER> <PASS> <CRITICAL> <WARNING>"
  echo "  $PROGNAME --tablespace <ORACLE_SID> <USER> <PASS> <TABLESPACE> <CRITICAL> <WARNING>"
  echo "  $PROGNAME --tablespaceTEMP <ORACLE_SID> <USER> <PASS> <TABLESPACE> <CRITICAL> <WARNING>"
  echo "  $PROGNAME --oranames <Hostname>"
  echo "  $PROGNAME --help"
  echo "  $PROGNAME --version"
}
 
print_help() {
  print_revision $PROGNAME $REVISION
  echo ""
  print_usage
  echo ""
  echo "Check Oracle status"
  echo ""
  echo "--tns SID/IP Address"
  echo "   Check remote TNS server"
  echo "--db SID"
  echo "   Check local database (search /bin/ps for PMON process) and check"
  echo "   filesystem for sgadefORACLE_SID.dbf"
  echo "--login SID"
  echo "   Attempt a dummy login and alert if not ORA-01017: invalid username/password"
  echo "--cache"
  echo "   Check local database for library and buffer cache hit ratios"
  echo "       --->  Requires Oracle user/password and SID specified."
  echo "          --->  Requires select on v_\$sysstat and v_\$librarycache"
  echo "--tablespace"
  echo "   Check local database for tablespace capacity in ORACLE_SID"
  echo "       --->  Requires Oracle user/password specified."
  echo "          --->  Requires select on dba_data_files and dba_free_space"
  echo "--tablespaceTEMP"
  echo "   Check local temporary database for tablespace capacity in ORACLE_SID"
  echo "       --->  Requires Oracle user/password specified."
  echo "                --->  Requires select on dba_temp_files and v_\$temp_extent_pool"
  echo "--oranames Hostname"
  echo "   Check remote Oracle Names server"
  echo "--help"
  echo "   Print this help screen"
  echo "--version"
  echo "   Print version and license information"
  echo ""
  echo "If the plugin doesn't work, check that the ORACLE_HOME environment"
  echo "variable is set, that ORACLE_HOME/bin is in your PATH, and the"
  echo "tnsnames.ora file is locatable and is properly configured."
  echo ""
  echo "When checking local database status your ORACLE_SID is case sensitive."
  echo ""
  echo "If you want to use a default Oracle home, add in your oratab file:"
  echo "*:/opt/app/oracle/product/7.3.4:N"
  echo ""
  support
}
 
case "$1" in
1)
    cmd='--tns'
    ;;
2)
    cmd='--db'
    ;;
*)
    cmd="$1"
    ;;
esac
 
# Information options
case "$cmd" in
--help)
    print_help
    exit $STATE_OK
    ;;
-h)
    print_help
    exit $STATE_OK
    ;;
--version)
    print_revision $PROGNAME $REVISION
    exit $STATE_OK
    ;;
-V)
    print_revision $PROGNAME $REVISION
    exit $STATE_OK
    ;;
esac
 
# Hunt down a reasonable ORACLE_HOME
if [ -z "$ORACLE_HOME" ] ; then
  # Adjust to taste
  for oratab in /var/opt/oracle/oratab /etc/oratab
  do
  [ ! -f $oratab ] && continue
  ORACLE_HOME=`IFS=:
    while read SID ORACLE_HOME junk;
    do
      if [ "$SID" = "$2" -o "$SID" = "*" ] ; then
        echo $ORACLE_HOME;
        exit;
      fi;
    done < $oratab`
  [ -n "$ORACLE_HOME" ] && break
  done
fi
# Last resort
[ -z "$ORACLE_HOME" -a -d $PROGPATH/oracle ] && ORACLE_HOME=$PROGPATH/oracle
 
if [ "$cmd" != "--db" ]; then
  if [ -z "$ORACLE_HOME" -o ! -d "$ORACLE_HOME" ] ; then
    echo "Cannot determine ORACLE_HOME for sid $2"
    exit $STATE_UNKNOWN
  fi
fi
PATH=$PATH:$ORACLE_HOME/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib
export ORACLE_HOME PATH LD_LIBRARY_PATH
 
case "$cmd" in
--tns)
    tnschk=` tnsping $2`
    tnschk2=` echo  $tnschk | grep -c OK`
    if [ ${tnschk2} -eq 1 ] ; then
  tnschk3=` echo $tnschk | sed -e 's/.*(//' -e 's/).*//'`
  echo "OK - reply time ${tnschk3} from $2"
  exit $STATE_OK
    else
  echo "No TNS Listener on $2"
  exit $STATE_CRITICAL
    fi
    ;;
--oranames)
    namesctl status $2 | awk '
    /Server has been running for:/ {
  msg = "OK: Up"
  for (i = 6; i <= NF; i++) {
      msg = msg " " $i
  }
  status = '$STATE_OK'
    }
    /error/ {
  msg = "CRITICAL: " $0
  status = '$STATE_CRITICAL'
    }
    END {
  print msg
  exit status
    }'
    ;;
--db)
    pmonchk=`ps -ef | grep -v grep | grep -c "ora_pmon_${2}$"`
    if [ ${pmonchk} -ge 1 ] ; then
  echo "${2} OK - ${pmonchk} PMON process(es) running"
  exit $STATE_OK
    #if [ -f $ORACLE_HOME/dbs/sga*${2}* ] ; then
  #if [ ${pmonchk} -eq 1 ] ; then
    #utime=`ls -la $ORACLE_HOME/dbs/sga*$2* | cut -c 43-55`
      #echo "${2} OK - running since ${utime}"
      #exit $STATE_OK
  #fi
    else
  echo "${2} Database is DOWN"
  exit $STATE_CRITICAL
    fi
    ;;
--login)
    loginchk=`sqlplus dummy/user@$2 < /dev/null`
    loginchk2=` echo  $loginchk | grep -c ORA-01017`
    if [ ${loginchk2} -eq 1 ] ; then
  echo "OK - dummy login connected"
  exit $STATE_OK
    else
  loginchk3=` echo "$loginchk" | grep "ORA-" | head -1`
  echo "CRITICAL - $loginchk3"
  exit $STATE_CRITICAL
    fi
    ;;
--cache)
    if [ ${5} -gt ${6} ] ; then
  echo "UNKNOWN - Warning level is less then Crit"
  exit $STATE_UNKNOWN
    fi
    result=`sqlplus -s ${3}/${4}@${2} << EOF
set pagesize 0
set numf '9999999.99'
select (1-(pr.value/(dbg.value+cg.value)))*100
from v\\$sysstat pr, v\\$sysstat dbg, v\\$sysstat cg
where pr.name='physical reads'
and dbg.name='db block gets'
and cg.name='consistent gets';
EOF`
 
    if [ -n "`echo $result | grep ORA-`" ] ; then
      error=` echo "$result" | grep "ORA-" | head -1`
      echo "CRITICAL - $error"
      exit $STATE_CRITICAL
    fi
 
    buf_hr=`echo "$result" | awk '/^[0-9\. \t]+$/ {print int($1)}'`
    buf_hrx=`echo "$result" | awk '/^[0-9\. \t]+$/ {print $1}'`
    result=`sqlplus -s ${3}/${4}@${2} << EOF
set pagesize 0
set numf '9999999.99'
select sum(lc.pins)/(sum(lc.pins)+sum(lc.reloads))*100
from v\\$librarycache lc;
EOF`
 
    if [ -n "`echo $result | grep ORA-`" ] ; then
      error=` echo "$result" | grep "ORA-" | head -1`
      echo "CRITICAL - $error"
      exit $STATE_CRITICAL
    fi
 
    lib_hr=`echo "$result" | awk '/^[0-9\. \t]+$/ {print int($1)}'`
    lib_hrx=`echo "$result" | awk '/^[0-9\. \t]+$/ {print $1}'`
 
    if [ $buf_hr -le ${5} -o $lib_hr -le ${5} ] ; then
    echo "${2} CRITICAL - Cache Hit Rates: $lib_hrx% Lib -- $buf_hrx% Buff|lib=$lib_hrx%;${6};${5};0;100 buffer=$buf_hrx%;${6};${5};0;100"
  exit $STATE_CRITICAL
    fi
    if [ $buf_hr -le ${6} -o $lib_hr -le ${6} ] ; then
    echo "${2} WARNING  - Cache Hit Rates: $lib_hrx% Lib -- $buf_hrx% Buff|lib=$lib_hrx%;${6};${5};0;100 buffer=$buf_hrx%;${6};${5};0;100"
  exit $STATE_WARNING
    fi
    echo "${2} OK - Cache Hit Rates: $lib_hrx% Lib -- $buf_hrx% Buff|lib=$lib_hrx%;${6};${5};0;100 buffer=$buf_hrx%;${6};${5};0;100"
 
    exit $STATE_OK
    ;;
--tablespace)
    if [ ${6} -lt ${7} ] ; then
  echo "UNKNOWN - Warning level is more then Crit"
  exit $STATE_UNKNOWN
    fi
    result=`sqlplus -s ${3}/${4}@${2} << EOF
set pagesize 0
set numf '9999999.99'
select NVL(b.free,0.0),a.total,100 - trunc(NVL(b.free,0.0)/a.total * 1000) / 10 prc
  from
    (select tablespace_name,sum(bytes)/1024/1024 total
      from dba_data_files group by tablespace_name) A
    LEFT OUTER JOIN
    (select tablespace_name,sum(bytes)/1024/1024 free
      from dba_free_space group by tablespace_name) B
    ON a.tablespace_name=b.tablespace_name
  WHERE a.tablespace_name='${5}';
EOF`
    if [ -n "`echo $result | grep ORA-`" ] ; then
      error=` echo "$result" | grep "ORA-" | head -1`
      echo "CRITICAL - $error"
      exit $STATE_CRITICAL
    fi
 
    ts_free=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print int($1)}'`
    ts_total=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print int($2)}'`
    ts_pct=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print int($3)}'`
    ts_pctx=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print $3}'`
    if [ "$ts_free" -eq 0 -a "$ts_total" -eq 0 -a "$ts_pct" -eq 0 ] ; then
        echo "No data returned by Oracle - tablespace $5 not found?"
        exit $STATE_UNKNOWN
    fi
    if [ "$ts_pct" -ge ${6} ] ; then
    echo "${2} : ${5} CRITICAL - $ts_pctx% used [ $ts_free / $ts_total MB available ]|${5}=$ts_pctx%;${7};${6};0;100"
  exit $STATE_CRITICAL
    fi
    if [ "$ts_pct" -ge ${7} ] ; then
    echo "${2} : ${5} WARNING  - $ts_pctx% used [ $ts_free / $ts_total MB available ]|${5}=$ts_pctx%;${7};${6};0;100"
  exit $STATE_WARNING
    fi
    echo "${2} : ${5} OK - $ts_pctx% used [ $ts_free / $ts_total MB available ]|${5}=$ts_pctx%;${7};${6};0;100"
    exit $STATE_OK
    ;;
--tablespaceTEMP)
# edit by sky,TEMP tablespace
    if [ ${6} -lt ${7} ] ; then
  echo "UNKNOWN - Warning level is more then Crit"
  exit $STATE_UNKNOWN
    fi
    result=`sqlplus -s ${3}/${4}@${2} << EOF
set pagesize 0
set numf '9999999.99'
select NVL(b.free,0.0),a.total,100 - trunc(NVL(b.free,0.0)/a.total * 1000) / 10 prc
  from
    (select tablespace_name,sum(bytes)/1024/1024 total
      from dba_temp_files group by tablespace_name) A
    LEFT OUTER JOIN
    (select tablespace_name,sum(bytes_cached)/1024/1024 free
      from v\\$temp_extent_pool group by tablespace_name) B
    ON a.tablespace_name=b.tablespace_name
  WHERE a.tablespace_name='${5}';
EOF`
    if [ -n "`echo $result | grep ORA-`" ] ; then
      error=` echo "$result" | grep "ORA-" | head -1`
      echo "CRITICAL - $error"
      exit $STATE_CRITICAL
    fi
 
    ts_free=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print int($1)}'`
    ts_total=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print int($2)}'`
    ts_pct=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print int($3)}'`
    ts_pctx=`echo "$result" | awk '/^[ 0-9\.\t ]+$/ {print $3}'`
    if [ "$ts_free" -eq 0 -a "$ts_total" -eq 0 -a "$ts_pct" -eq 0 ] ; then
        echo "No data returned by Oracle - tablespace $5 not found?"
        exit $STATE_UNKNOWN
    fi
    if [ "$ts_pct" -ge ${6} ] ; then
    echo "${2} : ${5} CRITICAL - $ts_pctx% used [ $ts_free / $ts_total MB available ]|${5}=$ts_pctx%;${7};${6};0;100"
  exit $STATE_CRITICAL
    fi
    if [ "$ts_pct" -ge ${7} ] ; then
    echo "${2} : ${5} WARNING  - $ts_pctx% used [ $ts_free / $ts_total MB available ]|${5}=$ts_pctx%;${7};${6};0;100"
  exit $STATE_WARNING
    fi
    echo "${2} : ${5} OK - $ts_pctx% used [ $ts_free / $ts_total MB available ]|${5}=$ts_pctx%;${7};${6};0;100"
    exit $STATE_OK
    ;;
*)
    print_usage
    exit $STATE_UNKNOWN
esac
 
4.测试、结语
脚本修改完了,下面测试一下。
$ ./check_oracle --tablespaceTEMP skydb sky sky TEMP 90 80
skydb : TEMP OK - 15.00% used [ 17 / 20 MB available ]|TEMP=15.00%;80;90;0;100
如果你没有成功,那可能是没有给用户分配dba_temp_filesv_$temp_extent_pool对象的select权限。
通过—help选项可以详细了解check_oracle的使用方法。
$ ./check_oracle --help
check_oracle v1749 (nagios-plugins 1.4.11)
The nagios plugins come with ABSOLUTELY NO WARRANTY. You may redistribute
copies of the plugins under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.
 
Usage:
  check_oracle --tns <Oracle Sid or Hostname/IP address>
  check_oracle --db <ORACLE_SID>
  check_oracle --login <ORACLE_SID>
  check_oracle --cache <ORACLE_SID> <USER> <PASS> <CRITICAL> <WARNING>
  check_oracle --tablespace <ORACLE_SID> <USER> <PASS> <TABLESPACE> <CRITICAL> <WARNING>
  check_oracle --tablespaceTEMP <ORACLE_SID> <USER> <PASS> <TABLESPACE> <CRITICAL> <WARNING>
  check_oracle --oranames <Hostname>
  check_oracle --help
  check_oracle --version
 
Check Oracle status
 
--tns SID/IP Address
   Check remote TNS server
--db SID
   Check local database (search /bin/ps for PMON process) and check
   filesystem for sgadefORACLE_SID.dbf
--login SID
   Attempt a dummy login and alert if not ORA-01017: invalid username/password
--cache
   Check local database for library and buffer cache hit ratios
       --->  Requires Oracle user/password and SID specified.
                --->  Requires select on v_$sysstat and v_$librarycache
--tablespace
   Check local database for tablespace capacity in ORACLE_SID
       --->  Requires Oracle user/password specified.
                --->  Requires select on dba_data_files and dba_free_space
--tablespaceTEMP
   Check local temporary database for tablespace capacity in ORACLE_SID
       --->  Requires Oracle user/password specified.
                --->  Requires select on dba_temp_files and v$temp_extent_pool
--oranames Hostname
   Check remote Oracle Names server
--help
   Print this help screen
--version
   Print version and license information
 
If the plugin doesn't work, check that the ORACLE_HOME environment
variable is set, that ORACLE_HOME/bin is in your PATH, and the
tnsnames.ora file is locatable and is properly configured.
 
When checking local database status your ORACLE_SID is case sensitive.
 
If you want to use a default Oracle home, add in your oratab file:
*:/opt/app/oracle/product/7.3.4:N
 
Send email to [email]nagios-users@lists.sourceforge.net[/email] if you have questions
regarding use of this software. To submit patches or suggest improvements,
send email to [email]nagiosplug-devel@lists.sourceforge.net[/email].
Please include version information with all correspondence (when possible,
use output from the --version option of the plugin itself).