Collecting Logs, Stats, and Dumps using the collect-debug-artifacts Script

This section uses the term 'node' frequently. A node denotes a server or a locator member when a purely TIBCO ComputeDB system is there. In a TIBCO ComputeDB distributed system a node can mean server, locator or lead member.

The script collect-debug-artifacts enables you to collect the debug information like logs and stats. It also has an option to dump stacks of the running system. Details of all the options and capabilities of the script can be found below. The main purpose of this is to ease the collection of these information. The script collects all the artifacts node wise and outputs a tar file which contains member wise information.

Pre-requisites for running the script:

The script assumes certain conditions to be fulfilled before it is invoked. Please ensure that these requirements are fulfilled because the script does not validate these. The conditions are:

  1. This script is expected to be run by a user who has read and write permissions on the output directories of all the TIBCO ComputeDB nodes.

  2. The user should have one way passwordless ssh setup from one machine to the other machines where the TIBCO ComputeDB nodes are running.

Below is the usage of the script


      <linux-shell> ./sbin/collect-debug-artifacts.sh -h

Usage: collect-debug-artifacts
       [ -c conffile|--conf=conffile|--config=conffile ]
       [ -o resultdir|--out=resultdir|--outdir=resultdir ]
       [ -h|--help ]
       [ -a|--all ]
       [ -d|--dump ]
       [ -v|--verbose ]
       [ -s starttimestamp|--start=starttimestamp ]
       [ -e endtimestamp|--end=endtimestamp ]
       [ -x debugtarfile|--extract=debugtarfile ]

       Timestamp format: YYYY-MM-DD HH:MM[:SS]

Options:

All the options of the script are optional. By default the script tries to get the current logs. All the logs starting from the last restart and the last file before that. It also brings all the stat file in the output directory. However if you want to change this behavior of the script you can use the following options to collect the debug information as per your requirements. Please note that no stack dumps are collected by default. You need to use the '-d, --dump' option to get the stack dumps.

-h, --help Prints a usage message summary briefly summarizing the command line options

-c, --conf The script uses a configration file which has three configuration elements. 1. MEMBERS_FILE -- This is a text file which has member information. Each line has the host machine name followed by the full path to the run directory of the member. This file is generated automatically when the sbin/start-all-scripts.sh is used. 2. NO_OF_STACK_DUMPS -- This parameter tells the script that how many stack dumps will be attempted per member/node of the running system. 3. INTERVAL_BETWEEN_DUMPS -- The amount of time in seconds the script waits between registering stack dumps.

-o, --out, --outdir The directory where the output file in the form of tar, will be created.

-a, --all With '-a or --all' option all the logs and stats file are collected from each members output directory.

-d, --dump Stack dumps are not collected by default or with -a, --all option. The user need to explicitly provide this argument if the stack dumps need to be collected.

-v, --verbose verbose mode is on.

-s, --start The script can also be asked to collect log files for specified time interval. The time interval can be specified using the start time and an end time parameter. Both the parameter needs to be specified. The format in which the time stamp can be specified is 'YYYY-MM-DD HH:MM[:SS]'

-x, --extract=debugtarfile To extract the contents of the tar file.

       Timestamp format: YYYY-MM-DD HH:MM[:SS]

Along with these options, if you want to filter unwanted lines from the logs, then you must specify EGREP_INVERT_MATCH_PATTERNS with a list of patterns in the conf/debug.conf.template . The lines that match those patterns will not be collected by the script. For example: EGREP_INVERT_MATCH_PATTERNS='tid|gemfire'

For offline analysis, as the production logs can be pretty huge and not easily sharable, you can provide a list of patterns which can be excluded for collection. A list of patterns can be specified in the conf > debug.conf file. Before doing this, you must rename the debug.conf.template file to debug.conf. The lines matching those patterns will be ignored by the script and excluded for collection.