The
task is to link a C++ module with an Assembly language module. To accomplish
this, I began with a simple C++ application that calls an external function:
void sample();
int main(){
sample();
return 0;
}
I
set the compile options to produce an assembly listing file (with source). Here
is the result (reduced to a few critical lines):
.386P
.model FLAT
PUBLIC _main
EXTRN ?sample@@YAXXZ:NEAR ; sample
_TEXT SEGMENT
_main PROC
NEAR ;
COMDAT
; 4 : sample();
call ?sample@@YAXXZ ;
sample
The
.386 and model info are necessary when we actually write the function in
assembly language. The curious part is the actual symbol used for the external
function sample: ?sample@@YAXXZ. This is one of the oddities of the C++
compiler. It is called name decoration. The extra information helps the
compiler differentiate between various overloads of functions. We could try to
name our assembly language implementation of this function using the decorated
name, but this is not the best option.
Through
a search of the MSDN on linking mixed language modules, you can find how to
eliminate this annoying decoration. By enclosing the prototype declaration in
an extern “C” statement, you can get the usual C name “decoration” which is
simply the addition of an underscore to the function name:
extern "C" {void sample();}
Recompiling
yields the following function call:
call _sample
This
will be a simpler function name to try to match in our separate module. We also
get an expected link error (due to the
missing function):
main.obj : error LNK2001: unresolved external symbol
_sample
Next,
we will create an assembly language program defining the procedure _sample. The
following is entered in sample,asm and then assembled to produce the object
file, sample.obj.
.386
.model flat
.code
public _sample
_sample proc
ret
_sample endp
end
The
object file should next be made a part of the C++ project, so the linker knows
to look in the file for the required public function. A rebuild of the project
yields the following:
.\Sample.obj : warning LNK4033: converting object
format from OMF to COFF
main.obj : error LNK2001: unresolved external symbol
_sample
The
first warning may be ignored. The linker used in Visual Studio expects object
files to be in Common Object File Format.
The assembler creates object files in 32-bit Object Module Format. The
linker automatically converts to the appropriate format. There are utilities to
convert the object files permanently (exe2bin or the 32-bit lib utility
supplied with Visual Studio).
The
second problem – still being unable to find the function _sample – is a bit
more problematic. By loading the
sample.obj file into Visual Studio (it is displayed in hex), the problem
becomes apparent; the function name is defined in the object file as _SAMPLE.
Remember – C++ is a case-sensitive language. There are at least 2 solutions.
The first is to make the assembler preserve the case of the symbols (use the
/Mx option) or to simply change the case of the function in the C++ program to
match the one in the object file. I chose to reassemble with the /Mx option.
After that step, linking went smoothly.
The
next step is to investigate how arguments are passed and what the functions responsibilities
are. The MSDN Library provides some helpful information and explains the
options available. The info is under Calling Conventions. C++ compiled
functions can use a variety of calling mechanisms. The programmer selects one
using a keyword (__cdecl, __stdcall, __fastcall). Arguments are pushed on the
stack in right to left order (fastcall uses registers for the first 2 suitably
sized arguments and resorts to the stack for the others). In cdecl, the calling
function must remove the arguments after the return. In stdcall, the called
function removes the arguments. Each of these options has slightly different
name decoration schemes. Cdecl and stdcall both prepend an underscore. Stdcall
also adds an @ and an integer indicating the total number of bytes represented
by the parameter list (in decimal). Fastcall prepends an @ sign and adds the
parameter list size to the end of the name.
Also
of interest is how arguments are actually stored on the stack. The compiler
always widens data to fit in a double word (32-bits) so arguments are spaced by
4 bytes. Passing by reference is always accomplished by passing the address of
the argument. Addresses are 4-byte values.
Changing
the earlier example to incorporate arguments:
extern "C" {void sample(int, int &,
char, int []);}
int main(){
int a, b,
c[10];
char w;
sample(a,
b, w, c);
Results
in the following assembly listing:
_a$ = -4
_b$ = -8
_c$ = -48
_w$ = -52
lea eax, DWORD PTR _c$[ebp]
push eax
mov cl, BYTE PTR _w$[ebp]
push ecx
lea edx, DWORD PTR _b$[ebp]
push edx
mov eax, DWORD PTR _a$[ebp]
push eax
call _sample
add esp, 16 ;
00000010H
Notice
how the locations of the automatic variables are defined symbolically. These
are locations on the runtime stack (in main’s stack frame). Look how the
arguments are converted to double words and pushed onto the stack. Also note
that the arguments are removed after the call by the calling program (not the
procedure).
If
the calling convention is changed to stdcall, look at the effect:
Source: extern "C" {void _stdcall
sample(int, int &, char, int []);}
Assembly
listing:
lea eax, DWORD PTR _c$[ebp]
push eax
mov cl, BYTE PTR _w$[ebp]
push ecx
lea edx, DWORD PTR _b$[ebp]
push edx
mov eax, DWORD PTR _a$[ebp]
push eax
call _sample@16
There
is no stack operation after the call as the calling program has the
responsibility of removing the arguments from the stack. Note also the function
name has changed. We would have to make the appropriate change in our module.
Be sure you understand which calling convention is in effect because failing to
do the correct thing to the stack will result in a disaster.
To
allow the assembly language module to call one of the C++ functions, we do
essentially the same thing, only in reverse. In the assembly module we declare
the name to be extern. In C++, functions automatically have external linkage so
nothing special needs to be done other than to define the function.
In
the assembly module:
extern _backAtYou:proc
;also add a suitable call wrapped with appropriate
stack manipulations
In
the C++ module:
extern "C" {void _cdecl backAtYou(int,
char *);}
void _cdecl backAtYou(int x, char * y){
char temp;
temp =
(char)x;
*y = temp;
}
Here
is the result of the compile (unrelated items removed). A careful study of the
function will help in understanding how the passed arguments are accessed and
utilized. Notice that the formal parameters are assigned names representing
offsets in the stack. Look carefully at the allocation of local storage for
temp and how it is accessed. Notice that ret 0 is the exit protocol – this
function was declared using cdecl so it does not remove the arguments.
PUBLIC _backAtYou
_TEXT SEGMENT
_x$ = 8
_y$ = 12
_temp$ = -4
_backAtYou PROC NEAR ;
COMDAT
; 13 : void
_cdecl backAtYou(int x, char * y){
push ebp
mov ebp, esp
sub esp, 4
; 14 : char temp;
; 15 : temp = (char)x;
mov al, BYTE PTR _x$[ebp]
mov BYTE PTR _temp$[ebp], al
; 16 : *y = temp;
mov ecx, DWORD PTR _y$[ebp]
mov dl, BYTE PTR _temp$[ebp]
mov BYTE PTR [ecx], dl
; 15 : }
mov esp, ebp
pop ebp
ret 0
_backAtYou ENDP
The
program successfully links, and if the stack is manipulated properly, the
program can be executed. Using the debugger, one can follow the steps from main
into sample back to backAtYou and then reverse to main again. You may need to
view the Dissassembly window (under View menu) to step into the sample
function’s code.
Assignment:
Write a C++ program that accepts strings from the user and then calls an
assembly language routine that will cause the strings to be displayed in all
uppercase and all lowercase. The assembly language function must allocate space
on the stack to copy the array (be sure to leave room for the nul character).
Then the array is converted to all uppercase characters, passed to a C++
function named display, converted to all lowercase characters, and passed to
display. Display of course prints the string followed by a newline.
Use
stdcall calling conventions for your assembly routine and cdecl for display.
The required protocols are as follows:
extern “C” {void _stdcall UpDn(int length, char *
thestring); }
extern “C” {void _cdecl display(char * thestring); }