Microassembler in C++

Hello!

I have a school project due in two months and I have to make a microassembler in C++. I've looked for some tutorials but haven't found any. Could anyone please indicate me some tutorials and or books for this?

Any type of information is useful!

Thank you!
You have to start by defining what your input will be and what your output will be. Start with a very small sample program, say something that assigns a value to a register.

There's no time to make a general implementation for different instruction sets, so just go for one and stick to it.

Forget the macros to begin with.
Do you have any examples of code for this? You could recommend any books or something like that where I can learn how to use C++ to work with registers? Thanks!
You can have your C/C++ compiler write an assembler file during compilation. It'll have a lot of noise though.

I can't recommend a book as I don't know what platform you're using and the books I used are long out of print.
Platform = the computer i'm going to make the program on
OR
the device i'm making the program for?

I'll give more details, maybe you'll understand better.

I'm running an x64 Windows 7 OS and my project consists of making a microassembler for the EMMA-2 processor (link here: http://www.icsa.inf.ed.ac.uk/cgi-bin/hase/emma-m.pl?arch2-t.html,arch2-f.html,menu2.html )

I hope i answered your question about my platform as I'm not sure I gave the right answer :-)

Anything that could help me understand how to program low-level with C++ would be really helpful so...if anything comes through your mind, please leave a reply.

Thanks!
Does a predefined syntax for EMMA-2 assembly exist, or will you be inventing the syntax as well?
I have never ever done CC. Problems of this kind (related to compiler development) are the second most scary thing on my list (after database engines). But here are my 5 cents.

Three things, I think you absolutely must settle before you start working (as kbw and PanGalactic did mostly suggest already):

1. what is the source syntax
-- is it your own design or what
-- what features you absolutely need to support:
*** macros (as you've been told that's a no-no, and I agree)
*** symbols (usually designate address locations of code or data - do you absolutely need them here).

The bare minimum is to have uninterrupted flow of instruction mnemonics, but I doubt that this is possible in your case, because the entry point to the microcode is part of the op-code of the instruction entering the decoder. So you need some way to anchor the microcode to specific location (if I understand correctly), or to produce memory map after assembly. Not to mention that your code space is split in half in relation to the complexity of the operation. And it seems there are address operands too.

2. what is the target platform - you settled this (are the docs there are sufficient?)

3. what is the target format
I mean who is going to upload/download this microcode to the EMMA-2 processor. Code doesn't run by itself, so it must have loader that understands it and can transfer it to wherever it has to go. Formats that are used for desktop are ELF and COFF. But I have no idea what something like that uses. Is it possible that the output format is going to be just simulated? (Invented, unreal.)

Here's how I would approach the problem (and I am not in CC, so keep that in mind). I would begin with uninterrupted flow of mnemonics and operands, where all operands are literals. I would initially support only 2 or 3 instructions, not even necessarily those from the guide. No real output at first, only tests. No syntax checks, say for the number of the operands. Each instruction is on a separate line. I read a line, and try to convert it into numerical data. Particularly, the job is to split the mnemonic from the operands, to save the operands in some variables, and to convert the mnemonic into number, and save it in another variable.

Then, you have to decide how to make syntax checks. At first - is the number of the operands sufficient, or excessive. But you need some way to handle this that must scale to more instructions and different types of operands. I imagine some data structure with the syntax description for each instruction.

The design problem of the project is how to generalize the minimal solution in the presence of all the instruction variants and additional features:
- different number of operands: binary, trinary
- different types of operands: parsing register operands, address operands (especially if you support symbols for those)
- parsing and handling directives: those are the non-executable specifications in the source, that tell your assembler something that affects the compilation, like announcing variables and labels

It should be your very last concern IMO to spit out the code, because this is highly independent of the other stuff, and for assembler should be sufficiently straightforward.

Regards
Thank you everyone for your help so far.
I have managed to ask a few questions to my teacher about my project and how it should end up looking. These are a few details that I can give now (more to come next Monday):

- I will simulate the EMMA-2 processor on my computer with an emulator (which I'm looking for now)
- I will use EMMA's instruction set (so I don't have to do any mnemonics)

If I understood corectly after talking about 1-2 minutes with my teacher during break I have to make a C++ program which reads EMMA-2 instructions, compiles them, makes the "upload file" and tests it on the emulator.

So my issue is stricly related to creating the program that will "interface" my computer with EMMA-2's emulator.

I hope I give relevant post replies, I'm not a native English speaker and plus the fact that I'm not yet completely familiar with my project leaves me a bit puzzled after reading simeonz's post (but thank you again for your interest and intention to help me!!). I'll read your post a few more times after I had a longer discussion with my teacher and I'll post a reply with what I think will be the steps I will follow in this project.

Thank you all, again!
Topic archived. No new replies allowed.