I like examples, so I wrote a bit of self-modifying code in c...
#include <stdio.h>
#include <sys/mman.h> // linux
int main(void) {
unsigned char *c = mmap(NULL, 7, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|
MAP_ANONYMOUS, -1, 0); // get executable memory
c[0] = 0b11000111; // mov (x86_64), immediate mode, full-sized (32 bits)
c[1] = 0b11000000; // to register rax (000) which holds the return value
// according to linux x86_64 calling convention
c[6] = 0b11000011; // return
for (c[2] = 0; c[2] < 30; c[2]++) { // incr immediate data after every run
// rest of immediate data (c[3:6]) are already set to 0 by MAP_ANONYMOUS
printf("%d ", ((int (*)(void)) c)()); // cast c to func ptr, call ptr
}
putchar('
');
return 0;
}
...which works, apparently:
>>> gcc -Wall -Wextra -std=c11 -D_GNU_SOURCE -o test test.c; ./test
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
But honestly, I didn't expect it to work at all. I expected the instruction containing c[2] = 0
to be cached upon the first call to c
, after which all consecutive calls to c
would ignore the repeated changes made to c
(unless I somehow explicitedly invalidated the cache). Luckily, my cpu appears to be smarter than that.
I guess the cpu compares RAM (assuming c
even resides in RAM) with the instruction cache whenever the instruction pointer makes a large-ish jump (as with the call to the mmapped memory above), and invalidates the cache when it doesn't match (all of it?), but I'm hoping to get more precise information on that. In particular, I'd like to know if this behavior can be considered predictable (barring any differences of hardware and os), and relied on?
(I probably should refer to the Intel manual, but that thing is thousands of pages long and I tend to get lost in it...)
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…