linux - Dynamically determining where a rogue AVX-512 instruction is executing

Question

Welcome To Ask or Share your Answers For Others

linux - Dynamically determining where a rogue AVX-512 instruction is executing

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

linux - Dynamically determining where a rogue AVX-512 instruction is executing

I have a process running on an Intel machine that supports AVX-512, but this process doesn't directly use any AVX-512 instructions (asm or intrinsics) and is compiled with -mno-avx512f so that the compiler doesn't insert any AVX-512 instructions.

Yet, it is running indefinitely at the reduced AVX turbo frequency. No doubt there is an AVX-512 instruction sneaking in somewhere, via a library, (very unlikely) system call or something like that.

Rather than try to "binary search" down where the AVX-512 instruction is coming from, is there some way I can find it immediately, e.g., trapping on such an instruction?

OS is Ubuntu 16.04.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:27:32+0000

As suggested in comments, you may search all ELF files of your system and disassemble them in order to check if they use AVX-512 instructions:

$ objdump -d /lib64/ld-linux-x86-64.so.2 | grep %zmm0
14922:       62 f1 fd 48 7f 44 24    vmovdqa64 %zmm0,0xc0(%rsp)
14a2d:       62 f1 fd 48 6f 44 24    vmovdqa64 0xc0(%rsp),%zmm0
14c2c:       62 f1 fd 48 7f 81 50    vmovdqa64 %zmm0,0x50(%rcx)
14ca0:       62 f1 fd 48 6f 84 24    vmovdqa64 0x50(%rsp),%zmm0

(BTW, libc and ld.so do include AVX-512 instructions, they are not the ones you are looking for?)

However, you may find binary that you do not even execute and miss code dynamically uncompressed, etc...

If you have a doubt on process (because perf report CORE_POWER.LVL*_TURBO_LICENSE events), I suggest to generate a core-dump if this process and disassemble it (notice first line allows to also dump code):

$ echo 0xFF > /proc/<PID>/coredump_filter 
$ gdb --pid=<PID>
[...]
(gdb) gcore
Saved corefile core.19602
(gdb) quit
Detaching from program: ..., process ...
$ objdump -d core.19602 | grep %zmm0
7f73db8187cb:       62 f1 7c 48 10 06       vmovups (%rsi),%zmm0
7f73db818802:       62 f1 7c 48 11 07       vmovups %zmm0,(%rdi)
7f73db81883f:       62 f1 7c 48 10 06       vmovups (%rsi),%zmm0
[...]

Next, you can easily write a small python script to add a breakpoint (or a tracepoint) on every AVX-512 instructions. Something like

(gdb) python
>import os
>with os.popen('objdump -d core.19602 | grep %zmm0 | cut -f1 -d:') as pipe:
>    for line in pipe:
>         gdb.Breakpoint("*" + line)

Sure it will create multiple hundreds (or thousands) of breakpoints. However, overhead of a breakpoint is small enough for gdb to support that (I think <1kB for each breakpoint).

One another way would be to run your code in a a virtual machine. Especially, I suggest libvex. libvex is used to dynamically instrument code (memory leak, memory profiling, etc..). libvex interpret machine code, translate it to an intermediate representation and re-encode machine code for CPU execution. The most famous project using libvex is valgrind (to be fair, libvex is back-end of valgrind).

Therefore, you can run your application with libvex without any instrumentation with:

$ valgrind --tool=none YOUR_APP

Now you have to write a tool around libvex in order to detect AVX-512 usage. However, libVEX does NOT (yet) support AVX-512. So, as soon as it have to execute AVX-512 instruction, it will fail with an Illegal instruction.

$ valgrind --tool=none YOUR_APP
[...]   
vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x48 0x28 0x84 0x24 0x8 0x1 0x0
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==20061== valgrind: Unrecognised instruction at address 0x10913e.
==20061==    at 0x10913E: main (in ...)
==20061== Your program just tried to execute an instruction that Valgrind
==20061== did not recognise.  There are two possible reasons for this.
==20061== 1. Your program has a bug and erroneously jumped to a non-code
==20061==    location.  If you are running Memcheck and you just saw a
==20061==    warning about a bad jump, it's probably your program's fault.
==20061== 2. The instruction is legitimate but Valgrind doesn't handle it,
==20061==    i.e. it's Valgrind's fault.  If you think this is the case or
==20061==    you are not sure, please let us know and we'll try to fix it.
==20061== Either way, Valgrind will now raise a SIGILL signal which will
==20061== probably kill your program.
==20061== 
==20061== Process terminating with default action of signal 4 (SIGILL)
==20061==  Illegal opcode at address 0x10913E
==20061==    at 0x10913E: main (in ...)
==20061==

Note: this answer has been tested with:

#include <immintrin.h>
int main(int argc, char *argv[]) {
    __m512d a, b, c;
    _mm512_fnmadd_pd(a, b, c);
}

Categories

linux - Dynamically determining where a rogue AVX-512 instruction is executing

linux - Dynamically determining where a rogue AVX-512 instruction is executing

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags