The compiler is trying to maintain 16 byte alignment on the stack. This also applies to 32-bit code these days (not just 64-bit). The idea is that at the point before executing a CALL instruction the stack must be aligned to a 16-byte boundary.
Because you compiled with no optimizations there are some extraneous instructions.
0x0804835a <main+3>: sub esp,0x18 ; Allocate local stack space
0x0804835d <main+6>: and esp,0xfffffff0 ; Ensure `main` has a 16 byte aligned stack
0x08048360 <main+9>: mov eax,0x0 ; Extraneous, not needed
0x08048365 <main+14>: sub esp,eax ; Extraneous, not needed
ESP is now 16-byte aligned after the last instruction above. We move the parameters for the call starting at the top of the stack at ESP. That is done with:
0x08048367 <main+16>: mov DWORD PTR [esp+12],0x4
0x0804836f <main+24>: mov DWORD PTR [esp+8],0x3
0x08048377 <main+32>: mov DWORD PTR [esp+4],0x2
0x0804837f <main+40>: mov DWORD PTR [esp],0x1
The CALL then pushes a 4 byte return address on the stack. We then reach these instructions after the call:
0x08048344 <test_function+0>: push ebp ; 4 bytes pushed on stack
0x08048345 <test_function+1>: mov ebp,esp ; Setup stackframe
This pushes another 4 bytes on the stack. With the 4 bytes from the return address we are now misaligned by 8 bytes. To reach 16-byte alignment again we will need to waste an additional 8 bytes on the stack. That is why in this statement there is an additional 8 bytes allocated:
0x08048347 <test_function+3>: sub esp,0x28
- 0x08 bytes already on stack because of return address(4-bytes) and EBP(4 bytes)
- 0x08 bytes of padding needed to align stack back to 16-byte alignment
- 0x20 bytes needed for local variable allocation = 32 bytes.
32/16 is evenly divisible by 16 so alignment maintained
The second and third number above added together is the value 0x28 computed by the compiler and used in sub esp,0x28
.
0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69
So why [ebp-12]
in this instruction? The first 8 bytes [ebp-8]
through [ebp-1]
are the alignment bytes used to get the stack 16-byte aligned. The local data will then appear on the stack after that. In this case [ebp-12]
through [ebp-9]
are the 4 bytes for the 32-bit integer flag
.
Then we have this for updating buffer[0]
with the character 'A':
0x08048351 <test_function+13>: mov BYTE PTR [ebp-40],0x41
The oddity then would be why a 10 byte array of characters would appear from [ebp+40]
(beginning of array) to [ebp+13]
which is 28 bytes. The best guess I can make is that compiler felt that it could treat the 10 byte character array as a 128-bit (16-byte) vector. This would force the compiler to align the buffer on a 16 byte boundary, and pad the array out to 16 bytes (128-bits). From the perspective of the compiler, your code seems to be acting much like it was defined as:
#include <xmmintrin.h>
void test_function(int a, int b, int c, int d){
int flag;
union {
char buffer[10];
__m128 m128buffer; ; 16-byte variable that needs to be 16-bytes aligned
} bufu;
flag = 31337;
bufu.buffer[0] = 'A';
}
The output on GodBolt for GCC 4.9.0 generating 32-bit code with SSE2 enabled appears as follows:
test_function:
push ebp #
mov ebp, esp #,
sub esp, 40 #,same as: sub esp,0x28
mov DWORD PTR [ebp-12], 31337 # flag,
mov BYTE PTR [ebp-40], 65 # bufu.buffer,
leave
ret
This looks very similar to your disassembly in GDB.
If you compiled with optimizations (such as -O1
, -O2
, -O3
), the optimizer could have simplified test_function
because it is a leaf function in your example. A leaf function is one that doesn't call another function. Certain shortcuts could have been applied by the compiler.
As for why the character array seems to be aligned to a 16-byte boundary and padded to be 16 bytes? That probably can't be answered with certainty until we know what GCC compiler you are using (gcc --version
will tell you). It would also be useful to know your OS and OS version. Even better would be to add the output from this command to your question gcc -Q -v -g my_program.c