Well, you have to understand ABIs and mangling, honestly. Basically, C is the standard linking and such is based on (not assembly, since assemblers basically use the C standard, even though C gets compiled int assembly): so everything must use the usual letters, numbers, and certain symbols like underscores only. What C++ compilers do is something called "mangling," which is where it has a ruleset for how it would actually go about doing this. In theory, if you know how to mangle properly, you could write ugly-looking C code that works well with C++ code. Fortunately for you, I already tackled this.
My C++ code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
|
#include <stdio.h>
#include <stdlib.h>
extern "C" int seed;
extern "C" unsigned short prng();
extern unsigned short mprng();
int main(int argc, char** argv){
int read;
seed=0x882829;
for(int i = 10; i>0; i--){
read = mprng();
printf("%i = 0x%04x\n", read, read);
}
return 0;
}
| |
The extern means we want to grab the symbols from a .o file (pre-compiled, but pre-linked [linker mixes pre-compiled files and makes an executable] file) or from another source that will be compiled separately, but the code doesn't reference it (like a .S [assembly] or .cpp file). Extern is implicit in some cases (function templates) where you might end up with an "undefined reference" thing.
The "C" part tells the compiler we need to expect the function name rules to conform to the C standard (no overloading, classes, etc) as opposed to the C++ standard (uses mangling to make all that possible).
Now, if you use "g++ main.cpp -c" you'll get a main.o file (assuming you named it main.cpp), then use "nm main.o" you'll see something like
1 2 3 4 5
|
U atoi
00000000 T main
U printf
U seed
U _Z5mprngv
| |
What happened here is that I didn't actually use prng, but I did use seed (no mangling assumed since I used "C" after extern) and I used _Z5mprngv, which is the mangled version of mprng. The "_Z" is the hint that it's mangled C++, "5" represents how long the name of the function is ("mprng" is 5 characters), and the "v" probably means short is returned (it's been a while). But, don't take my word for it, try doing that much and experimenting for yourself.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
|
#Note that the cpp file was compiled on a 32bit system, so you might have to change some flags to get it to be happy with 32bit.
.intel_syntax noprefix #Standard MASM syntax, instead of ugly GNU assembler syntax
.code32 #This is 32bit code, not 64bit or 16bit.
.global seed #global means that we want to export that label (things with the syntax of "abcdef:")
.global prng
.global _Z5mprngv
.section .text #executable code below
#Our random number generator, which isn't that good, but it is cheap.
prng: mov eax, seed #moves the value of seed into the eax register.
shr eax, 3 #eax >>= 3;
xor eax, seed #eax ^= seed;
rol eax, 5 #eax = eax << 5 || (unsigned long)eax >> 27;
mov seed, eax #seed = eax;
and eax, 0xFFFF #eax &= 0xFFFF;
ret #return eax;
_Z5mprngv: #Same thing, only with the mangled name.
mov eax, seed
shr eax, 3
xor eax, seed
rol eax, 5
mov seed, eax
and eax, 0xFFFF
ret
.section .data #readable and writeable, but not executable stuff below
seed: .long 0xdeaddead #long (int), as opposed to short (int).
| |
Now, you'll notice that a good part of this assembly code is actually about formatting, rather than actual executable code. The labels thare there to keep track of pointers, the stuff beginning with periods are all directives to the assembler and linker, not for the ending binary. Also, this wasn't objects and classes, but code I just had laying around for the purpose of discussing this topic. You could think of overloaded functions (functions that share the same name, but have different qualities) as functions of a "global class," if it makes it easier to understand. Experiment with the stuff above the assembly to see how it looks. The big lesson here is that the assembly doesn't care about objects or how they're stored, but the linker does. Now, for fun, look at this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
|
#include <stdio.h>
#include <stdlib.h>
extern "C" int seed;
extern "C" unsigned short prng();
extern "C" unsigned short _Z5mprngv();
int main(int argc, char** argv){
int read;
seed=0x882829;
for(int i = 10; i>0; i--){
read = _Z5mprngv();
printf("%i = 0x%04x\n", read, read);
}
return 0;
}
| |
Rather than assembly, if you were to write the function in a .c file and compile with gcc, you could totally do the same thing without learning assembly. Try making a class and some functions, see how it works. Remember, normally you wouldn't export the objects themselves, but, internally, the compiler does this same mangling for when it spits out the assembly.
But that's GCC. The actual mangling methods may be different for other compilers, however, odds are, it's going to be C oriented and otherwise very similar.