IBM Support

Using the trapit script to watch for events

Troubleshooting


Problem

During the investigation of a problem, you need to watch for events like error log messages and take immediate action as they occur.

Resolving The Problem

The trapit script provides an easy way to perform actions based on events such as error messages written to log files, or data being written to new diagnostic files such as IBM MQ FDC or WebSphere Application Server ffdc files. If you need to monitor SystemOut.log files from WebSphere Application Server, use the TrapIt.ear tool, which is optimized for that case.


 
 

Using trapit

In order to use trapit, you must first download the script to your system and make it executable, for example by running: chmod a+x trapit

   

Syntax

trapit -e Error... -f File... [-i Interval] [-t Trigger]...

 

Required Parameters

-e Error

The error message or other pattern you want trapit to find. You can ask trapit to look for a simple string or you can use an extended regular expression (ERE) instead. Use quotation marks around the error string when it contains special characters.

You can repeat this parameter to specify as many error strings or regular expressions as you need.

 

-f File

The file names for trapit to watch. You can give trapit a simple file name, or you can provide a wildcard pattern. Use quotation marks around the file name when it contains special characters like wildcards.

You can repeat this parameter to specify as many file names or wildcards as you need.

 

Optional Parameters

-i Interval

How often trapit waits between each scan of the files (default: 5 seconds). A shorter value means trapit might find errors more quickly, at the expense of efficiency. A longer value might be better for large files that are frequently updated.

 

-t Trigger

A command or script for trapit to run when it finds the error message or pattern. By default, trapit ends successfully when it finds the error, but you can ask trapit to run commands or scripts instead. Be sure to use quotation marks around each trigger command and arguments.

You can repeat this parameter to specify as many trigger commands as you like. However, it is probably easier to put all your commands into a simple script and tell trapit to run the script.

 
 

Usage Notes

The trapit script can look for any text in any file, provided you have authority to read the files. Although trapit was written by the IBM MQ team, anyone can use it to watch for errors in:

 
  • Application logs
  • Operating system logs
  • Product logs, such as the IBM MQ error log files (AMQERRxx.LOG)
  • Files created to record information about a specific occurrence of an error, such as IBM MQ FDC files and WebSphere Application Server ffdc files
 

If the error message or pattern you are looking for is already in one of the files when you run trapit, the script triggers immediately. Your only option is to delete or archive the files containing the error, then start trapit and let it watch for new occurrences of the error.

 

The trapit script is efficient with things like IBM MQ FDC and WebSphere Application Server ffdc files, where problems are usually recorded in new files rather than appending to a single log. For example, if you ask trapit to watch "/var/mqm/errors/*.FDC" for a particular message, trapit starts by scanning every one of your FDC files (which could number in the thousands). Thereafter, trapit scans new and updated FDC files.

 

While the trapit script can be used to turn tracing on or off when a problem occurs, IBM MQ can turn tracing on and off automatically when FDC entries with matching Probe Id values are generated. Using this feature of strmqtrc is faster and more efficient than the trapit script:

 

sh> strmqtrc -m PROD.QMGR -c FDC=XC308010,XC307040

 
 

Examples

 

The following examples demonstrate how to use the trapit script. In some cases, there are multiple ways to express the error string with extended regular expressions (EREs), so several examples are provided.

 

Example 1

Ask trapit to check every 10 seconds for messages AMQ7466 and AMQ7469 in the error logs for the IBM MQ queue manager MY.QMGR:

sh> trapit -e AMQ7466 -e AMQ7469 -f "/var/mqm/errors/MY!QMGR/errors/*.LOG" -i 10

sh> trapit -e "AMQ7466|AMQ7469" -f "/var/mqm/errors/MY!QMGR/errors/*.LOG" -i 10

sh> trapit -e "AMQ746[69]" -f "/var/mqm/errors/MY!QMGR/errors/*.LOG" -i 10

 

Example 2

To run the stackit script against a queue manager when an IBM MQ FDC file showing Probe Id ZX159002 or error code xecL_W_LONG_LOCK_WAIT is generated:

sh> trapit -e ZX159002 -e xecL_W_LONG_LOCK_WAIT -f "/var/mqm/errors/*.FDC" -t "stackit -o All -m PROD.QMGR > /tmp/stackit.log"

sh> trapit -e "ZX159002|xecL_W_LONG_LOCK_WAIT" -f "/var/mqm/errors/*.FDC" -t "stackit -o All -m PROD.QMGR > /tmp/stackit.log"

 

Example 3

To gather full IBM MQ diagnostic information from the system with the runmqras command when your application records a message in its own log files:

sh> trapit -e "com.example.MyAppUnexpectedException" -e "JMSCMQ0002: The method 'MQCTL' failed." -f "/var/MyApp/MyApp-*.log" -t "runmqras -section defs 1>/dev/null 2>&1"

 

Example 4

Trigger commands and their arguments must be enclosed with quotation marks. If the command you want to trigger also uses quotation marks, you have two choices: Use backslashes to escape the quotation marks inside the command, or use double quotation marks to enclose the trigger and single quotation marks inside the command. Shell variables inside double quotation marks are expanded, while those inside single quotation marks are not:

sh> trapit -e "AMQ7305" -f "/var/mqm/qmgrs/QMA/errors/AMQERR01.LOG" -t "echo 'DISPLAY QSTATUS(SYSTEM.CHANNEL.INITQ)' | runmqsc QMA > /tmp/mqsc.txt"

sh> trapit -e "AMQ7305" -f "/var/mqm/qmgrs/QMB/errors/AMQERR01.LOG" -t "echo \"DISPLAY QSTATUS(${QNAME})\" | runmqsc QMB > /tmp/mqsc.txt"

 

Example 5

To run an operating system command and a custom script (whether provided by IBM or developed in-house) when a particular symptom appears in a WebSphere Application Server ffdc file:

sh> trapit -e "DSRA8100E: Unable to get a PooledConnection from the DataSource" -e "ERRORCODE=-1042, SQLSTATE=58004" -f "/usr/IBM/WebSphere/AppServer/profiles/AppSrv01/logs/ffdc/*.txt" -t "netstat -an > /tmp/diag_netstat.txt" -t "/tmp/diag_script.sh -s server1 -p 297364 > /tmp/diag_script.log"

 

Example 6

As an alternative, you could call trapit from within a diagnostic script and use it to pause until an error occurs. Because thetrapit script exits with reason code 0 when it finds a match, you could run a script as in this example:

sh> /tmp/diag_script.sh > /tmp/diag_script.log

 

diag_script.sh

#!/bin/sh

printf "Diagnostic script started at %s\n" "`date`"
printf "Waiting for the system to run low on memory...\n"

if ! trapit -e SIGDANGER -f "/var/mqm/errors/*.LOG"; then
  printf "Trapit failed: Exiting without collecting data\n"
  exit 1
fi

printf "IBM MQ received a low memory warning (SIGDANGER) at %s\n" "`date`"

printf " * Process listing:\n"
ps -eo pid,ppid,nlwp,s,vsz,pmem,pcpu,start,time,user,egroup,args

printf " * System V IPC listing:\n"
ipcs -a

printf " * Network connections:\n"
netstat -an

printf "Finished gathering data: Exiting successfully\n"
exit 0
 

DISCLAIMER: All source code and/or binaries attached to this document are referred to here as "the Program". IBM is not providing program services of any kind for the Program. IBM is providing the Program on an "AS IS" basis without warranty of any kind. IBM WILL NOT BE LIABLE FOR ANY ACTUAL, DIRECT, SPECIAL, INCIDENTAL, OR INDIRECT DAMAGES OR FOR ANY ECONOMIC CONSEQUENTIAL DAMAGES (INCLUDING LOST PROFITS OR SAVINGS), EVEN IF IBM, OR ITS RESELLER, HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"ARM Category":[{"code":"a8m3p000000PCH0AAO","label":"Administration"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"ARM Category":[{"code":"a8m50000000Cd9OAAS","label":"WebSphere Application Server traditional-All Platforms-\u003ERuntime Operational-\u003EPlatform Specific"}],"Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"}],"Version":"All Versions"}]

Document Information

Modified date:
02 February 2023

UID

swg21590151