Within kernel probes (kprobes), the eBPF virtual machine has read-only access to the syscall parameters and return value.
However the eBPF program will have a return code of it's own. It is possible to apply a seccomp profile that traps BPF (NOT eBPF; thanks @qeole) return codes and interrupt the system call during execution.
The allowed runtime modifications are:
SECCOMP_RET_KILL
: Immediate kill with SIGSYS
SECCOMP_RET_TRAP
: Send a catchable SIGSYS
, giving a chance to emulate the syscall
SECCOMP_RET_ERRNO
: Force errno
value
SECCOMP_RET_TRACE
: Yield decision to ptracer or set errno
to -ENOSYS
SECCOMP_RET_ALLOW
: Allow
https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt
The SECCOMP_RET_TRACE
method enables modifying the system call performed, arguments, or return value. This is architecture dependent and modification of mandatory external references may cause an ENOSYS error.
It does so by passing execution up to a waiting userspace ptrace, which has the ability to modify the traced process memory, registers, and file descriptors.
The tracer needs to call ptrace and then waitpid. An example:
ptrace(PTRACE_SETOPTIONS, tracee_pid, 0, PTRACE_O_TRACESECCOMP);
waitpid(tracee_pid, &status, 0);
http://man7.org/linux/man-pages/man2/ptrace.2.html
When waitpid
returns, depending on the contents of status
, one can retrieve the seccomp return value using the PTRACE_GETEVENTMSG
ptrace operation. This will retrieve the seccomp SECCOMP_RET_DATA
value, which is a 16-bit field set by the BPF program. Example:
ptrace(PTRACE_GETEVENTMSG, tracee_pid, 0, &data);
Syscall arguments can be modified in memory before continuing operation. You can perform a single syscall entry or exit with the PTRACE_SYSCALL
step. Syscall return values can be modified in userspace before resuming execution; the underlying program won't be able to see that the syscall return values have been modified.
An example implementation:
Filter and Modify System Calls with seccomp and ptrace