what's in a call?

Security C++

Resurrected from a1drin.net

consider the following piece of code:

int sum(int x, int y){
  int result = x+y;
  return result;
}
int main(){
  return sum(1,2);
}

as c programmers, we expect the main routine to call sum. however, the compiler has a lot of details to sort out before generating the machine instructions to realize this: how should the arguments be passed, how will the result be returned, etc. to address these implementation details, the compiler adheres to certain conventions. this note attempts to uncover the conventions used by the compiler i have at hand (visual studio c/c++ compiler). for a similar discussion with gcc on x86-linux - see this page.

getting started

we start by building the simple program shown above and loading the executable in a debugger for stepwise execution (the /INCREMENTAL:NO linker option disables the use of incremental link tables which complicate function calls and are irrelevant to our current discussion, anyway.)

c:\>cl /nologo /c /Zi test.c
c:\>link /nologo /INCREMENTAL:NO /DEBUG test.obj
c:\>cdb test.exe
0:000> uf main
00401020 55              push    ebp
00401021 8bec            mov     ebp,esp
00401023 6a02            push    2
00401025 6a01            push    1
00401027 e8d4ffffff      call    test!sum (00401000)
0040102c 83c408          add     esp,8
0040102f 5d              pop     ebp
00401030 c3              ret
0:000> uf sum
test!sum:
00401000 55              push    ebp
00401001 8bec            mov     ebp,esp
00401003 51              push    ecx
00401004 8b4508          mov     eax,dword ptr [ebp+8]
00401007 03450c          add     eax,dword ptr [ebp+0Ch]
0040100a 8945fc          mov     dword ptr [ebp-4],eax
0040100d 8b45fc          mov     eax,dword ptr [ebp-4]
00401010 8be5            mov     esp,ebp
00401012 5d              pop     ebp
00401013 c3              ret

the unassembled output shows the machine instructions generated for the two routines we’re looking into. the actual instructions depends on the compiler and the target machine (which, in my case are cl 14.00.50727.42 and 80x86.) note that both routines start (with - push ebp & mov ebp,esp) and end similarly (with - pop ebp & ret). these similar prologue and epilogue are independent of the function’s business logic and manage its stack frame. on x86, the stack frame is made up of the memory between the addresses in the ebp (stack bottom pointer) and the esp (stack top pointer). as we’ll see later, all memory locations relevant to the function’s execution are referenced in relation to this frame.

so then, without further ado, let’s start the program and break at the point where main gets started with its calling business, i.e. at address 00401023.

the caller

the caller prepares for the call by pushing the arguments it wishes to pass onto the stack. the two instructions following our break point in main do just that:

0:000> bp 00401023
0:000> g
Breakpoint 0 hit
test!main+0x3:
00401023 6a02            push    2
0:000> r esp,ebp
esp=0012ff70 ebp=0012ff70
0:000> p
test!main+0x5:
00401025 6a01            push    1
0:000> p
test!main+0x7:
00401027 e8d4ffffff      call    test!sum (00401000)
0:000> r esp,ebp
esp=0012ff68 ebp=0012ff70
0:000> dds esp ebp
0012ff68  00000001
0012ff6c  00000002
0012ff70  0012ffc0

notice that the arguments are pushed in the reverse order of declaration. on x86 the push instruction decrements the esp by an offset (equal to the bytes needed to hold the value we’re pushing) and stores the value starting at the new address in the esp. since we’ve pushed two integers the esp is reduced by 8 and the stack frame (i.e. the memory between the addresses in ebp and esp) now contains the two arguments we’ve just pushed (in addition to whatever it had earlier.)

the call

the next instruction is the actual call to the sum routine. on x86 the call instruction saves the instruction pointer on the stack and jumps to the label passed as the argument, which, in our case is test!sum.

0:000> u eip l2
test!main+0x7:
00401027 e8d4ffffff      call    test!sum (00401000)
0040102c 83c408          add     esp,8
0:000> t
test!sum:
00401000 55              push    ebp
0:000> r esp,ebp
esp=0012ff64 ebp=0012ff70
0:000> dds esp ebp
0012ff64  0040102c test!main+0xc
0012ff68  00000001
0012ff6c  00000002
0012ff70  0012ffc0

the debugger output confirms what we knew about call - the esp is reduced by 4 and the top of the stack contains the address of the instruction just after the call in main. note that the eip now contains the instruction of the first instruction in sum - which means we’ve entered our callee.

the caller’s stack-frame at this point of time looks as:

   +----------|----------+
esp| 0012ff64 | 0040102c | return location in main
   +----------|----------+
   | 0012ff68 | 00000001 | argument ####1
   +----------|----------+
   | 0012ff6c | 00000002 | argument ####2
   +----------|----------+
ebp| 0012ff70 | 0012ffc0 | main's caller's ebp
   +----------|----------+
     address    content

the callee

one of the calling conventions the compiler follows ensures that value of the ebp register is preserved during a call. in other words, after the callee finishes, the ebp register is restored to the value it held before the callee execution started. this is achieved by the prologue and the epilogue we identified earlier. the convention actually extends to the values in esi, edi and ebx registers too. however, since our sum routine does not use any of those registers there is no need to preserve their values.

let’s start tracing the callee execution:

0:000> u eip l1
test!sum:
00401000 55              push    ebp
0:000> p
test!sum+0x1:
00401001 8bec            mov     ebp,esp
0:000> p
test!sum+0x3:
00401003 51              push    ecx
0:000> dds esp ebp
0012ff60  0012ff70
0:000> p
test!sum+0x4:
00401004 8b4508          mov     eax,dword ptr [ebp+8] ss:0023:0012ff68=00000001
0:000> dds esp ebp
0012ff5c  00000001
0012ff60  0012ff70

the first instruction pushes the existing ebp value to the stack and the next moves the esp value into the ebp register. effectively the two setup a fresh stack frame for the sum routine, very much like the main’s stack frame when we started tracing the program. the next instruction is a bit tricky - we didn’t keep anything in ecx, so why push ecx? why is the compiler pushing an irrelevant value on the stack?

actually what the compiler wants to achieve is to allocate space for the local variable result. typically, it allocates space by subtracting the combined size of all the locals from the esp, but here it sees scope for an optimization. all that this routine needs is 4 bytes of local storage for the integer variable and esp can be appropriately decremented by pushing a dummy register. sub 4 esp would need more space and time to achieve the same effect.

so now the stack frame looks as:

                 +----------|----------+                                                      
[ebp-4]       esp| 0012ff5c | ???????? | local ####1 = result (uninitialized)
                 +----------|----------+                                                      
              ebp| 0012ff60 | 0012ff70 | main's ebp                                           
                 +----------|----------+                                                      
                 | 0012ff64 | 0040102c | return address in main                               
                 +----------|----------+                                                      
[epb+8]          | 0012ff68 | 00000001 | argument ####1 = x                                      
                 +----------|----------+                                                      
[ebp+12]         | 0012ff6c | 00000002 | argument ####2 = y                                      
                 +----------|----------+                                                      
[old ebp]        | 0012ff70 | 0012ffc0 | main's caller's ebp                                  
                 +----------|----------+                                                      
                   address    contents
the computation

the next few instructions implement the business logic of the routine. they’re expected to add the two parameters and store the result in the local variable. to refer to these variables and parameters, the compiler uses another calling convention. the convention ensures that local variables are stored at negative offsets from the ebp while the parameters are at positive ones. for example: the address of the first and the second parameters is ebp+8 and ebp+12 respectively. similarly, the local variable result is situated at ebp-4. the next three instructions use these addresses to compute the sum of the two parameters and move it to the local variable result.

test!sum+0x4:
00401004 8b4508          mov     eax,dword ptr [ebp+8] ss:0023:0012ff68=00000001
0:000>p
test!sum+0x7:
00401007 03450c          add     eax,dword ptr [ebp+0Ch] ss:0023:0012ff6c=00000002
0:000>p
test!sum+0xa:
0040100a 8945fc          mov     dword ptr [ebp-4],eax ss:0023:0012ff5c=00000001

the next instruction is responsible for ensuring the result value is returned back to main.

0:000> p
test!sum+0xd:
0040100d 8b45fc          mov     eax,dword ptr [ebp-4] ss:0023:0012ff5c=00000003

the compiler convention assumes that the caller will look for the result in the eax register. consequently, return values that are upto 32 bits wide are returned directly in the eax register (as is the case, in our code). if the computed result is larger than 32 bits, it is returned by reference with eax containing a pointer to the result.

the return

with the computation done, its time for the callee to return, but before that happens, we need to deallocate the locals and restore the the ebp to its original value. the epilog achieves this quite succintly in just two instructions. the first sets esp to ebp thus reclaiming the space used by the locals and the second resets ebp to the value on the top of the stack (which is the caller’s ebp)

test!sum+0x10:
00401010 8be5            mov     esp,ebp
0:000> p
test!sum+0x12:
00401012 5d              pop     ebp
0:000> dds esp  ebp
0012ff60  0012ff70
0:000> p
test!sum+0x13:
00401013 c3              ret
0:000> dds esp  ebp
0012ff64  0040102c test!main+0xc
0012ff68  00000001
0012ff6c  00000002
0012ff70  0012ffc0

notice that at the end of the two instructions the stack has been restored to as it was just before we started the execution of sum. the final instruction in sum ret pops the return address pushed by call and jumps to the address, thus returning the control back to the caller.

the cleanup

following ret the control returns back to the main where there’s not much left to do. it only has to return the result obtained from sum and since the result is already in eax there’s nothing to be done. the stack, however, still contains the call arguments on it, which need to be deallocated.

test!main+0xc:
0040102c 83c408          add     esp,8
0:000> dds esp ebp
0012ff68  00000001
0012ff6c  00000002
0012ff70  0012ffc0
0:000> p
test!main+0xf:
0040102f 5d              pop     ebp
0:000> dds esp ebp
0012ff70  0012ffc0

the argument memory is reclaimed by just adjusting the esp. once that’s done returns to the state it was when we started tracing main. the remaining epilog is the same as in sum. here, the control returns to the runtime routine which had invoked main.

microsoft specific calling conventions

for stack cleanup the microsoft compiler provides the programmer with a bunch of options. for example, the programmer can choose to make the callee responsibile for cleaning up the arguments (instead of the caller, as is the case with our code). the idea is to reduce code size by localizing the clean-up code at one place, instead of scattering and repeating it in all places where the call is made. for a frequently used routine in large program this might save some code space.

there are some interesting writeups on this matter - see this and this. however i see little use of these proprietary options in code that i usually write. they’re good things to be aware of though, especially in case we run into some stack corruption problems due to mismatches in these conventions.

summary

here are the steps for review:

  • the caller pushes the function arguments in the reverse order of declaration.
  • the caller makes the actual call.
  • the callee saves and updates the ebp.
  • the callee allocates the local variables by adjusting the esp.
  • the callee saves the CPU registers it wants to modify.
  • the callee refers to the parameters giving positive offsets from its ebp.
  • the callee refers to its local variables by giving negative offsets from its ebp.
  • the callee stores the return value of the computation (or a pointer to it) in the eax.
  • the callee deallocates the space used by local variables, by popping them off the stack.
  • the callee restores the caller’s values of the registers it had saved.
  • the callee restores the caller’s ebp value.
  • the callee returns.
  • the caller cleans up the space used by the function arguments.

the last step is what distinguishes the __cdecl (default) calling convention from the __stdcall convention. in the latter the clean-up code is in the callee.