(as you can see in line 4 of the assembly that the value stored in local variable x is not the address of the plt entry)
Huh? The value isn't visible in the disassembly, only the location it's loaded from. (In practice it's not loading a pointer to the PLT entry, but line 4 of the assembly doesn't tell you that1.) Use objdump -dR
to see dynamic relocations.
That's a load from memory using a RIP-relative addressing mode. In this case it's loading a pointer to the real printf
address in libc. That pointer is stored in the Global Offset Table (GOT).
To make this work, the printf
symbol gets "early binding" instead of lazy dynamic linking, avoiding PLT overhead for later uses of that function pointer.
Footenote 1: Although maybe you were basing that reasoning on the fact that it's a load instead of a RIP-relative LEA. That pretty much does tell you it's not the PLT entry; part of the point of the PLT is to have an address that's a link-time constant for call rel32
, which also enables LEA with a RIP+rel32 addressing mode. The compiler would have used that if it wanted the PLT address in a register.
BTW, the PLT stub itself also uses the GOT entry for its memory-indirect jump; for symbols that are only used as function call targets, the GOT entry holds a pointer back to the PLT stub, to the push
/ jmp
instructions that invoke the lazy dynamic linker to resolve that PLT entry. i.e. to update the GOT entry.
Don't all the calls to functions undefined in the executable go first through the plt for better performance
No, the PLT costs runtime performance by adding an extra level of indirection to every call. gcc -fno-plt
uses early binding instead waiting for the first call, so it can inline the indirect call
through the GOT right into each call site.
The PLT exists to avoid runtime fixups of call rel32
offsets during dynamic linking. And on 64-bit systems, to allow reaching addresses that are more than 2GB away. And also to support symbol interposition. See https://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux/ (written before -fno-plt
existed; it's basically like one of the ideas he was suggesting).
The PLT's lazy binding can improve startup performance vs. early binding, but on modern systems where cache hits are very important, doing all the symbol-scanning stuff at once during startup is nice.
and for pic code?
Your code is PIC, or actually PIE (position-independent executable), which most distros configure GCC to do by default.
I expected x
to point to the address of the PLT entry of printf
If you use -fno-pie
, then the address of the PLT entry is a link-time constant, and at compile time the compiler doesn't know whether you're going to link libc statically or dynamically. So it uses mov $printf, %eax
to get the address of a function-pointer into a register, and at link time that can only convert to mov $printf@plt, %eax
.
See it on Godbolt. (The Godbolt default is -fno-pie
, unlike on most current Linux distros.)
# gcc9.2 -O3 -fpie for your first block
movq printf@GOTPCREL(%rip), %rbp
leaq .LC0(%rip), %rdi
xorl %eax, %eax
movq %rbp, %rsi # saved for later in rbp
call printf@PLT
vs.
# gcc9.2 -O3 -fno-pie
movl $printf, %esi # linker converts this symbol reference to printf@plt
movl $.LC0, %edi
xorl %eax, %eax
call printf # will convert at link-time to printf@plt
# next use also just uses mov-immediate to rematerialize, instead of saving a load result in a register.
So a PIE executable actually has better efficiency for repeated-use of function pointers to functions in standard libraries: the pointer is the final address, not just the PLT entry.
-fno-plt -fno-pie
works more like PIE mode for taking function pointers. Except it can still use $foo
32-bit immediates for the addresses of symbols in the same file, instead of a RIP-relative LEA.
# gcc9.2 -O3 -fno-plt -fno-pie
movq printf@GOTPCREL(%rip), %rbp # saved for later in RBP
movl $.LC0, %edi
xorl %eax, %eax
movq %rbp, %rsi
call *printf@GOTPCREL(%rip)
# pointers to static functions can use mov $foo, %esi
It seems you need int foo(const char*,...) __attribute__((visibility("hidden")));
to tell the compiler it definitely doesn't need to go through the GOT for this symbol, with pie
or -fno-plt
.
Leaving it until link-time for the linker to convert symbol
to symbol@plt
if necessary allows the compiler to always use efficient 32-bit absolute immediates or RIP-relative addressing and only end up with PLT indirection for functions that turn out to be in a shared library. But then you end up with pointers to PLT entries, instead of pointers to the final address.
If you were using Intel syntax, it would be mov rbp, QWORD PTR printf@GOTPCREL[rip]
in GCC's output for this, if you look at asm instead of disassembly.
Looking at compiler output gives you significantly more information that just numeric offsets from RIP in plain objdump
output. -r
to show relocation symbols helps some, but compiler output is generally better. (Except you don't see that printf
gets rewritten to printf@plt
)