Resurrected from a1drin.net
consider the following piece of code:
int sum(int x, int y){
int result = x+y;
return result;
}
int main(){
return sum(1,2);
}
as c programmers, we expect the main
routine to call sum
. however, the
compiler has a lot of details to sort out before generating the machine
instructions to realize this: how should the arguments be passed, how will
the result be returned, etc. to address these implementation details, the
compiler adheres to certain conventions. this note attempts to uncover the
conventions used by the compiler i have at hand (visual studio c/c++
compiler). for a similar discussion with gcc on x86-linux - see this page.
we start by building the simple program shown above and loading the
executable in a debugger for stepwise execution (the /INCREMENTAL:NO
linker option disables the use of incremental link tables which complicate
function calls and are irrelevant to our current discussion, anyway.)
c:\>cl /nologo /c /Zi test.c
c:\>link /nologo /INCREMENTAL:NO /DEBUG test.obj
c:\>cdb test.exe
0:000> uf main
00401020 55 push ebp
00401021 8bec mov ebp,esp
00401023 6a02 push 2
00401025 6a01 push 1
00401027 e8d4ffffff call test!sum (00401000)
0040102c 83c408 add esp,8
0040102f 5d pop ebp
00401030 c3 ret
0:000> uf sum
test!sum:
00401000 55 push ebp
00401001 8bec mov ebp,esp
00401003 51 push ecx
00401004 8b4508 mov eax,dword ptr [ebp+8]
00401007 03450c add eax,dword ptr [ebp+0Ch]
0040100a 8945fc mov dword ptr [ebp-4],eax
0040100d 8b45fc mov eax,dword ptr [ebp-4]
00401010 8be5 mov esp,ebp
00401012 5d pop ebp
00401013 c3 ret
the unassembled output shows the machine instructions generated for the two
routines we’re looking into. the actual instructions depends on the
compiler and the target machine (which, in my case are cl 14.00.50727.42
and 80x86.) note that both routines start (with - push ebp
& mov
ebp,esp
) and end similarly (with - pop ebp
& ret
). these similar
prologue and epilogue are independent of the function’s business logic
and manage its stack frame. on x86, the stack frame is made up of the
memory between the addresses in the ebp
(stack bottom pointer) and the
esp
(stack top pointer). as we’ll see later, all memory locations
relevant to the function’s execution are referenced in relation to this
frame.
so then, without further ado, let’s start the program and break at the
point where main gets started with its calling business, i.e. at address
00401023
.
the caller prepares for the call by pushing the arguments it wishes to pass
onto the stack. the two instructions following our break point in main
do just that:
0:000> bp 00401023
0:000> g
Breakpoint 0 hit
test!main+0x3:
00401023 6a02 push 2
0:000> r esp,ebp
esp=0012ff70 ebp=0012ff70
0:000> p
test!main+0x5:
00401025 6a01 push 1
0:000> p
test!main+0x7:
00401027 e8d4ffffff call test!sum (00401000)
0:000> r esp,ebp
esp=0012ff68 ebp=0012ff70
0:000> dds esp ebp
0012ff68 00000001
0012ff6c 00000002
0012ff70 0012ffc0
notice that the arguments are pushed in the reverse order of
declaration. on x86 the push
instruction decrements the esp
by an
offset (equal to the bytes needed to hold the value we’re pushing) and
stores the value starting at the new address in the esp
. since we’ve
pushed two integers the esp
is reduced by 8 and the stack frame (i.e.
the memory between the addresses in ebp and esp
) now contains the two
arguments we’ve just pushed (in addition to whatever it had earlier.)
the next instruction is the actual call to the sum routine. on x86 the
call
instruction saves the instruction pointer on the stack and jumps to
the label passed as the argument, which, in our case is test!sum
.
0:000> u eip l2
test!main+0x7:
00401027 e8d4ffffff call test!sum (00401000)
0040102c 83c408 add esp,8
0:000> t
test!sum:
00401000 55 push ebp
0:000> r esp,ebp
esp=0012ff64 ebp=0012ff70
0:000> dds esp ebp
0012ff64 0040102c test!main+0xc
0012ff68 00000001
0012ff6c 00000002
0012ff70 0012ffc0
the debugger output confirms what we knew about call
- the esp is reduced
by 4 and the top of the stack contains the address of the instruction just
after the call
in main. note that the eip
now contains the instruction
of the first instruction in sum
- which means we’ve entered our callee.
the caller’s stack-frame at this point of time looks as:
+----------|----------+
esp| 0012ff64 | 0040102c | return location in main
+----------|----------+
| 0012ff68 | 00000001 | argument ####1
+----------|----------+
| 0012ff6c | 00000002 | argument ####2
+----------|----------+
ebp| 0012ff70 | 0012ffc0 | main's caller's ebp
+----------|----------+
address content
one of the calling conventions the compiler follows ensures that value of
the ebp
register is preserved during a call. in other words, after the
callee finishes, the ebp register is restored to the value it held before
the callee execution started. this is achieved by the prologue and the
epilogue we identified earlier. the convention actually extends to the
values in esi
, edi
and ebx
registers too. however, since our sum
routine does not use any of those registers there is no need to preserve
their values.
let’s start tracing the callee execution:
0:000> u eip l1
test!sum:
00401000 55 push ebp
0:000> p
test!sum+0x1:
00401001 8bec mov ebp,esp
0:000> p
test!sum+0x3:
00401003 51 push ecx
0:000> dds esp ebp
0012ff60 0012ff70
0:000> p
test!sum+0x4:
00401004 8b4508 mov eax,dword ptr [ebp+8] ss:0023:0012ff68=00000001
0:000> dds esp ebp
0012ff5c 00000001
0012ff60 0012ff70
the first instruction pushes the existing ebp
value to the stack and the
next moves the esp
value into the ebp
register. effectively the two
setup a fresh stack frame for the sum routine, very much like the main’s
stack frame when we started tracing the program. the next instruction is a
bit tricky - we didn’t keep anything in ecx
, so why push ecx
? why is
the compiler pushing an irrelevant value on the stack?
actually what the compiler wants to achieve is to allocate space for the
local variable result
. typically, it allocates space by subtracting the
combined size of all the locals from the esp, but here it sees scope for an
optimization. all that this routine needs is 4 bytes of local storage for
the integer variable and esp can be appropriately decremented by pushing a
dummy register. sub 4 esp
would need more space and time to achieve the
same effect.
so now the stack frame looks as:
+----------|----------+
[ebp-4] esp| 0012ff5c | ???????? | local ####1 = result (uninitialized)
+----------|----------+
ebp| 0012ff60 | 0012ff70 | main's ebp
+----------|----------+
| 0012ff64 | 0040102c | return address in main
+----------|----------+
[epb+8] | 0012ff68 | 00000001 | argument ####1 = x
+----------|----------+
[ebp+12] | 0012ff6c | 00000002 | argument ####2 = y
+----------|----------+
[old ebp] | 0012ff70 | 0012ffc0 | main's caller's ebp
+----------|----------+
address contents
the next few instructions implement the business logic of the
routine. they’re expected to add the two parameters and store the result in
the local variable. to refer to these variables and parameters, the
compiler uses another calling convention. the convention ensures that local
variables are stored at negative offsets from the ebp while the parameters
are at positive ones. for example: the address of the first and the second
parameters is ebp+8
and ebp+12
respectively. similarly, the local
variable result
is situated at ebp-4
. the next three instructions use
these addresses to compute the sum of the two parameters and move it to the
local variable result.
test!sum+0x4:
00401004 8b4508 mov eax,dword ptr [ebp+8] ss:0023:0012ff68=00000001
0:000>p
test!sum+0x7:
00401007 03450c add eax,dword ptr [ebp+0Ch] ss:0023:0012ff6c=00000002
0:000>p
test!sum+0xa:
0040100a 8945fc mov dword ptr [ebp-4],eax ss:0023:0012ff5c=00000001
the next instruction is responsible for ensuring the result value is returned back to main.
0:000> p
test!sum+0xd:
0040100d 8b45fc mov eax,dword ptr [ebp-4] ss:0023:0012ff5c=00000003
the compiler convention assumes that the caller will look for the result in the eax register. consequently, return values that are upto 32 bits wide are returned directly in the eax register (as is the case, in our code). if the computed result is larger than 32 bits, it is returned by reference with eax containing a pointer to the result.
with the computation done, its time for the callee to return, but before that happens, we need to deallocate the locals and restore the the ebp to its original value. the epilog achieves this quite succintly in just two instructions. the first sets esp to ebp thus reclaiming the space used by the locals and the second resets ebp to the value on the top of the stack (which is the caller’s ebp)
test!sum+0x10:
00401010 8be5 mov esp,ebp
0:000> p
test!sum+0x12:
00401012 5d pop ebp
0:000> dds esp ebp
0012ff60 0012ff70
0:000> p
test!sum+0x13:
00401013 c3 ret
0:000> dds esp ebp
0012ff64 0040102c test!main+0xc
0012ff68 00000001
0012ff6c 00000002
0012ff70 0012ffc0
notice that at the end of the two instructions the stack has been restored
to as it was just before we started the execution of sum. the final
instruction in sum ret
pops the return address pushed by call
and jumps
to the address, thus returning the control back to the caller.
following ret
the control returns back to the main where there’s not much
left to do. it only has to return the result obtained from sum
and since
the result is already in eax there’s nothing to be done. the stack,
however, still contains the call arguments on it, which need to be
deallocated.
test!main+0xc:
0040102c 83c408 add esp,8
0:000> dds esp ebp
0012ff68 00000001
0012ff6c 00000002
0012ff70 0012ffc0
0:000> p
test!main+0xf:
0040102f 5d pop ebp
0:000> dds esp ebp
0012ff70 0012ffc0
the argument memory is reclaimed by just adjusting the esp. once that’s done returns to the state it was when we started tracing main. the remaining epilog is the same as in sum. here, the control returns to the runtime routine which had invoked main.
for stack cleanup the microsoft compiler provides the programmer with a bunch of options. for example, the programmer can choose to make the callee responsibile for cleaning up the arguments (instead of the caller, as is the case with our code). the idea is to reduce code size by localizing the clean-up code at one place, instead of scattering and repeating it in all places where the call is made. for a frequently used routine in large program this might save some code space.
there are some interesting writeups on this matter - see this and this. however i see little use of these proprietary options in code that i usually write. they’re good things to be aware of though, especially in case we run into some stack corruption problems due to mismatches in these conventions.
here are the steps for review:
ebp
.esp
.ebp
.ebp
.ebp
value.the last step is what distinguishes the __cdecl
(default) calling
convention from the __stdcall
convention. in the latter the clean-up code
is in the callee.