IMPORTANT!!! This project is no longer maintained and our focus has been shifted to a much better dynamic tracing platform named OpenResty XRay. Existing users of the tools here are recommended to switch too.
Prerequisites
You need at least systemtap 2.1+ and perl 5.6.1+ on your Linux system. For building latest systemtap from source, please refer to this document: http://openresty.org/#BuildSystemtap
Also, you should ensure the (DWARF) debuginfo for your Nginx (and other dependencies) is already enabled (or installed separately)
if you did not compile your Nginx from source.
Finally, you need to install the kernel debug symbols and kernel headers as well. Usually you just need the kernel-devel and kernel-debuginfo packages (matching your current kernel package) from your Linux distributions, respectively.
For old Linux systems
If you are on Linux kernels older than 3.5, then you may have to apply the utrace patch (if not yet) to your kernel to get
user-space tracing support for your systemtap installation. But if you are using Linux distributions in the RedHat family (like RHEL, CentOS, and Fedora), then your old kernel should already has the utrace patch applied.
The mainstream Linux kernel 3.5+ does have support for the uprobes API for userspace tracing.
Running systemtap-based tools requires special user permissions. To prevent running
these tools with the root account,
you can add your own (non-root) account name to the stapusr and staprun user groups.
But if the user account running the Nginx process is different from your current
user account, then you will still be required to run "sudo" or other means to run these tools
with root access.
As with any other dynamic tracing tools based on SystemTap, you must ensure that your system's default C compiler is of exactly the same version of
the C compiler originally used to build your current Linux kernel. Because SystemTap builds a Linux kernel module and Linux kernels never have an ABI,
different versions of the C compiler may lead to incompatible ABI, which can cause memory corruptions in the kernel space.
IMPORTANT!!! The tools below are no longer maintained and our focus has been
shifted to a much better dynamic tracing platform named
OpenResty XRay. Existing users of the tools here are recommended to switch too.
This tool lists detailed information about all the active requests that
are currently being processed by the specified Nginx worker or master process. When the master process pid is specified, all its worker processes will be monitored.
Here is an example:
# assuming the nginx worker pid is 32027
$ ./ngx-active-reqs -p 32027
Tracing 32027 (/opt/nginx/sbin/nginx)...
req "GET /t?", time 0.300 sec, conn reqs 18, fd 8
req "GET /t?", time 0.276 sec, conn reqs 18, fd 7
req "GET /t?", time 0.300 sec, conn reqs 18, fd 9
req "GET /t?", time 0.300 sec, conn reqs 18, fd 10
req "GET /t?", time 0.300 sec, conn reqs 18, fd 11
req "GET /t?", time 0.300 sec, conn reqs 18, fd 12
req "GET /t?", time 0.300 sec, conn reqs 18, fd 13
req "GET /t?", time 0.300 sec, conn reqs 18, fd 14
req "GET /t?", time 0.276 sec, conn reqs 18, fd 15
req "GET /t?", time 0.276 sec, conn reqs 18, fd 16
found 10 active requests.
212 microseconds elapsed in the probe handler.
The time field is the elapsed time (in seconds) since the current request started.
The conn reqs field lists the requests that have been processed on the current (keep-alive) downstream connection.
The fd field is the file descriptor ID for the current downstream connection.
The -m option will tell this tool to analyze the request memory pools for each active request:
$ ./ngx-active-reqs -p 12141 -m
Tracing 12141 (/opt/nginx/sbin/nginx)...
req "GET /t?", time 0.100 sec, conn reqs 11, fd 8
pool chunk size: 4096
small blocks (< 4017): 3104 bytes used, 912 bytes unused
large blocks (>= 4017): 0 blocks, 0 bytes (used)
total used: 3104 bytes
req "GET /t?", time 0.100 sec, conn reqs 11, fd 7
pool chunk size: 4096
small blocks (< 4017): 3104 bytes used, 912 bytes unused
large blocks (>= 4017): 0 blocks, 0 bytes (used)
total used: 3104 bytes
req "GET /t?", time 0.100 sec, conn reqs 11, fd 9
pool chunk size: 4096
small blocks (< 4017): 3104 bytes used, 912 bytes unused
large blocks (>= 4017): 0 blocks, 0 bytes (used)
total used: 3104 bytes
total memory used for all 3 active requests: 9312 bytes
274 microseconds elapsed in the probe handler.
For Nginx servers that are not busy enough, it is handy to specify the Nginx master process pid as the -p option value.
Another useful option is -k, which will keep probing when there's no active requests found in the current event cycle.
This tool analyzes all the shared memory zones in the specified running nginx process.
# you should ensure the worker is still handling requests
# otherwise the timer_resoluation must be set in your nginx.conf
# assuming the nginx worker pid is 15218
$ cd /path/to/nginx-systemtap-toolkit/
# list the zones
$ ./ngx-shm -p 15218
Tracing 15218 (/opt/nginx/sbin/nginx)...
shm zone "one"
owner: ngx_http_limit_req
total size: 5120 KB
shm zone "two"
owner: ngx_http_file_cache
total size: 7168 KB
shm zone "three"
owner: ngx_http_limit_conn
total size: 3072 KB
shm zone "dogs"
owner: ngx_http_lua_shdict
total size: 100 KB
Use the -n <zone> option to see more details about each zone.
34 microseconds elapsed in the probe.
# show the zone details
$ ./ngx-shm -p 15218 -n dogs
Tracing 15218 (/opt/nginx/sbin/nginx)...
shm zone "dogs"
owner: ngx_http_lua_shdict
total size: 100 KB
free pages: 88 KB (22 pages, 1 blocks)
22 microseconds elapsed in the probe.
This tool computes the real-time memory usage of the nginx global "cycle pool"
in the specified nginx (worker) process.
The "cycle pool" is mainly for configuration related data block allocation and other long-lived
data blocks with a lifetime as long as the nginx server configuration (like the compiled PCRE data stored in the regex cache for the ngx_lua module).
# you should ensure the worker is handling requests
# or the timer_resoluation is set in your nginx.conf
# assuming the nginx worker pid is 15004
$ ./ngx-cycle-pool -p 15004
Tracing 15004 (/usr/local/nginx/sbin/nginx)...
pool chunk size: 16384
small blocks (< 4096): 96416 bytes used, 1408 bytes unused
large blocks (>= 4096): 6 blocks, 26352 bytes (used)
total used: 122768 bytes
12 microseconds elapsed in the probe handler.
The memory block size for the "large blocks" is approximated based on
the intermal implementation of glibc's malloc on Linux. If you have replaced the malloc with other allocator,
then this tool is very likely to quit with memory access errors
or to give meaningless numbers for the "large blocks" total size
(but even in such bad cases, SystemTap should not affect the nginx process being analyzed at all).
Tracks creations and destructions of Nginx memory pools and report the top 10 leaked pools'
backtraces.
The backtraces are in the raw form of hexidecimal addresses.
You can use the ngx-backtrace tool to print out the source
code file names, source line numbers, as well as function names.
# assuming the nginx worker pid is 5043
$ ./ngx-leaked-pools -p 5043
Tracing 5043 (/opt/nginx/sbin/nginx)...
Hit Ctrl-C to end.
^C
28 pools leaked at backtrace 0x4121aa 0x43c851 0x4300a0 0x42746a 0x42f927 0x4110d8 0x3d35021735 0x40fe29
17 pools leaked at backtrace 0x4121aa 0x44d7bd 0x44e425 0x44fcc1 0x47996d 0x43908a 0x4342c3 0x4343bd 0x43dfcc 0x44c20e 0x4300a0 0x42746a 0x42f927 0x4110d8 0x3d35021735 0x40fe29
16 pools leaked at backtrace 0x4121aa 0x44d7bd 0x44e425 0x44fcc1 0x47996d 0x43908a 0x4342c3 0x4343bd 0x43dfcc 0x43f09e 0x43f6e6 0x43fcd5 0x43c9fb 0x4300a0 0x42746a 0x42f927 0x4110d8 0x3d35021735 0x40fe29
Run the command "./ngx-backtrace -p 5043 <backtrace>" to get details.
For total 200 pools allocated.
$ ./ngx-backtrace -p 5043 0x4121aa 0x44d7bd 0x44e425 0x44fcc1 0x47996d 0x43908a 0x4342c3 0x4343bd
ngx_create_pool
src/core/ngx_palloc.c:44
ngx_http_upstream_connect
src/http/ngx_http_upstream.c:1164
ngx_http_upstream_init_request
src/http/ngx_http_upstream.c:645
ngx_http_upstream_init
src/http/ngx_http_upstream.c:447
ngx_http_redis2_handler
src/ngx_http_redis2_handler.c:108
ngx_http_core_content_phase
src/http/ngx_http_core_module.c:1407
ngx_http_core_run_phases
src/http/ngx_http_core_module.c:890
ngx_http_handler
src/http/ngx_http_core_module.c:872
This script requires Nginx instances that have applied the latest dtrace patch. See the nginx-dtrace project for more details.
The bundle OpenResty 1.2.3.3+ includes the right dtrace patch by default. And you just need to build it with the --with-dtrace-probes configure option.
This script tracks the PCRE compiled regex execution (i.e., the pcre_exec calls)
in the specified Nginx worker process,
and checks whether the compiled regexes being executed is JIT'd or not.
# assuming the Nginx worker process handling the traffic is 31360.
$ ./ngx-pcrejit -p 31360
Tracing 31360 (/opt/nginx/sbin/nginx)...
Hit Ctrl-C to end.
^C
ngx_http_lua_ngx_re_match: 1000 of 2000 are PCRE JIT'd.
ngx_http_regex_exec: 0 of 1000 are PCRE JIT'd.
Below is another more complete example. Consider the following nginx.conf snippet:
Running curl localhost:8080/t twice while this tool is tracing the (only) nginx worker yields
the following output:
$ ./ngx-pcrejit -p `pgrep -f 'nginx: worker'`Tracing 97156 (/home/agentzh/git/lua-nginx-module/work/nginx/sbin/nginx)...Hit Ctrl-C to end.^Cngx_http_regex_exec: 2 of 2 are PCRE JITted.ngx_http_lua_ngx_re_match_helper: 2 of 2 are PCRE JITted.
This is exactly what we would expect.
When statically linking PCRE with your Nginx, it is important to enable
debug symbols in your PCRE compilation.
That is, you should build your Nginx and PCRE like this:
./configure --with-pcre=/path/to/my/pcre-8.39 \
--with-pcre-jit \
--with-pcre-opt=-g \
--prefix=/opt/nginx
make -j8
make install
For dynamically-linked PCRE, you are still need
to install the debug symbols for your PCRE (or the debuginfo RPM package for Yum-based systems).
This tool has been renamed to sample-bt because this tool is not specific to Nginx
in any way and it makes no sense to keep the ngx- prefix in its name.
This script can be used to sample backtraces in either user space or kernel space
or both for any user process that you specify (yes, not just Nginx!).
It outputs the aggregated backtraces (by count).
For example, to sample a running Nginx worker process (whose pid is 8736) in user space
only for total 5 seconds:
$ ./sample-bt -p 8736 -t 5 -u > a.bt
WARNING: Tracing 8736 (/opt/nginx/sbin/nginx) in user-space only...
WARNING: Missing unwind data for module, rerun with 'stap -d stap_df60590ce8827444bfebaf5ea938b5a_11577'
WARNING: Time's up. Quitting now...(it may take a while)
WARNING: Number of errors: 0, skipped probes: 24
The resulting output file a.bt can then be used to generate a Flame Graph by using Brendan Gregg's FlameGraph tools:
where both the stackcollapse-stap.pl and flamegraph.pl are from the FlameGraph toolkit.
If everything goes right, you can now use your web browser to open the a.svg file.
A sample flame graph for user-space-only sampling can be seen here (please open the link with a modern web browser that supports SVG rendering):
You can also sample in both the user space and kernel space by specifying the -k and -u options at the same time, as in
$ ./sample-bt -p 8736 -t 5 -uk > a.bt
WARNING: Tracing 8736 (/opt/nginx/sbin/nginx) in both user-space and kernel-space...
WARNING: Missing unwind data for module, rerun with 'stap -d stap_90327f3a19b0e42dffdef38d53a5860_11799'
WARNING: Time's up. Quitting now...(it may take a while)
WARNING: Number of errors: 0, skipped probes: 38
WARNING: There were 73 transport failures.
A sample flame graph for kenerl-and-user-space sampling can be seen here:
In fact, this script is general enough and can be used to sample user processes other than Nginx.
The overhead exposed on the target process is usually small. For example, the throughput (req/sec) limit of an nginx worker process doing simplest "hello world" requests drops by only 11% (only when this tool is running), as measured by ab -k -c2 -n100000 when using Linux kernel 3.6.10 and systemtap 2.5. The impact on full-fledged production processes is usually smaller than even that, for instance, only 6% drop in the throughput limit is observed in a production-level Lua CDN application.
WARNING This tool can only work with interpreted Lua code and has various limitations. For
LuaJIT 2.1, it is recommended to use the new ngx-lj-lua-stacks
tool for sampling both interpreted and/or compiled Lua code.
Similar to the sample-bt script, but samples the Lua language level backtraces.
Specify the --lua51 option when you're using the standard Lua 5.1 interpreter in your Nginx build, or --luajit20 if LuaJIT 2.0 is used instead.
You need to enable or install the debug symbols for your Lua library, in addition to your Nginx executable.
Also, you should not omit frame pointers while building your Lua library.
If LuaJIT 2.0 is used, you need to build your LuaJIT 2.0 library like this:
make CCDEBUG=-g
The Lua backtraces generated by this script use Lua source file name and source line number where the Lua function is defined. So to get more meaningful backtraces, you can call the fix-lua-bt script to process the output of this script.
Here is an example for standard Lua 5.1 interpreter embedded Nginx:
# sample at 1K Hz for 5 seconds, assuming the Nginx worker
# or master process pid is 9766.
$ ./ngx-sample-lua-bt -p 9766 --lua51 -t 5 > tmp.bt
WARNING: Tracing 9766 (/opt/nginx/sbin/nginx) for standard Lua 5.1...
WARNING: Time's up. Quitting now...(it may take a while)
$ ./fix-lua-bt tmp.bt > a.bt
Or if LuaJIT 2.0 is used:
# sample at 1K Hz for 5 seconds, assuming the Nginx worker
# or master process pid is 9768.
$ ./ngx-sample-lua-bt -p 9768 --luajit20 -t 5 > tmp.bt
WARNING: Tracing 9766 (/opt/nginx/sbin/nginx) for LuaJIT 2.0...
WARNING: Time's up. Quitting now...(it may take a while)
$ ./fix-lua-bt tmp.bt > a.bt
The resulting output file a.bt can then be used to generate a Flame Graph by using Brendan Gregg's FlameGraph tools:
where both the stackcollapse-stap.pl and flamegraph.pl are from the FlameGraph toolkit.
If everything goes right, you can now use your web browser to open the a.svg file.
A sample flame graph for user-space-only sampling can be seen here (please open the link with a modern web browser that supports SVG rendering):
If the pid of the Nginx master proces is specified as the -t option value,
then this tool will automatically probe all its worker processes at the same time.
This tool has been renamed to sample-bt-off-cpu because this tool is not specific to Nginx
in any way and it makes no sense to keep the ngx- prefix in its name.
By default, this tool samples the userspace backtraces. And 1 (logical) sample of backtraces in the output corresponds to 1 microsecond of off-CPU time.
Here is an example to demonstrate this tool's usage:
# assuming the nginx worker process to be analyzed is 10901.
$ ./sample-bt-off-cpu -p 10901 -t 5 > a.bt
WARNING: Tracing 10901 (/opt/nginx/sbin/nginx)...
WARNING: _stp_read_address failed to access memory location
WARNING: Time's up. Quitting now...(it may take a while)
WARNING: Number of errors: 0, skipped probes: 23
where the -t 5 option makes the tool sample for 5 seconds.
The resulting a.bt file can be used to render Flame Graphs just as with sample-bt and its other friends. And this type of flamegraphs can be called "off-CPU Flame Graphs" while the classic flamegraphs are essentially "on-CPU Flame Graphs".
Below is such a "off-CPU flamegraph" for a loaded Nginx worker process accessing MySQL with the lua-resty-mysql library:
By default, off-CPU time intervals shorter than 4 us (microseconds) are discarded. You can control this threshold via the --min option, as in
$ ./sample-bt-off-cpu -p 12345 --min 10 -t 10
where we ignore off-CPU time intervals shorter than 10 us and sample the user process with the pid 12345 for total 10 seconds.
The -l option can be control the upper limit of different backtraces to be outputed. By default, the hottest 1024 different backtraces are dumped.
The --distr option can be specified to print out a base-2 logarithmic histogram for all the off-CPU time intervals (larger than the threshold specified by the --min option). For example,
Here we can see that most of the samples (for total 259 samples) fall in the off-CPU time interval range [4us, 8us). And the largest off-CPU time interval is 1739us, i.e., 1.739ms.
You can specify the -k option to sample the kernel space backtraces instead of
请发表评论