COMPILER RELATED BITS AND PIECES ================================ Author: Rowan Crowe rowan@sensation.net.au 3:635/728.1@fidonet This archive contains source code (MoonRock) and executables which may be of interest to those who are contemplating or in the process of writing their own compiler. CALC demonstrates the basic principles of left to right evaluation, CODEGEN takes this a step further by also showing the actual generation of 80x86 assembly code after an optimisation phase, and finally SLC is a complete self contained compiler which uses similar evaluation techniques but makes no attempt at optimisation, either during or after code generation. To recompile the source you'll need the MoonRock compiler, which is at http://www.rowan.sensation.net.au/moonrock.html. All source files can be compiled with "MRC filename" or "MRC filename !DEBUG" if you want some extra debugging and informational output. CALC.MOO & CALC.COM =================== This is a simple calculator which was included in the last couple of releases of the MoonRock compiler. It's intended to be a demonstration of the techniques and a framework for additions, rather than a sophisticated tool. Numbers and the +, -, *, / symbols are accepted in the expression. Here's an example: C:\SLC\ZIP>calc enter expression: 1 + (2 * 3) - 5 answer is: 2 It also handles negative numbers: C:\SLC\ZIP>calc enter expression: -1--2 answer is: 1 CODEGEN.MOO & CODEGEN.COM ========================= This is similar to CALC, except that it outputs 80x86 assembly language, showing how a processor would calculate the answer in real time. Variables are also permitted in the expression, as we're outputting evaluation code rather than an absolute answer. CODEGEN is more complex internally, as it uses an intermediate pseudo code stage. The pseudo code is actually pretty much identical to its equivalent 80x86 mneumonic, but it could probably be expanded to a more abstract form which is not so processor dependent. CODEGEN also features optional optimisation of the generated pseudo code, which can remove several redundant lines in the output by modifying and/or removing portions of the pseudo code. If the output was stored in text form performing optimisation over several lines of assembler code would be more difficult. See the source for a better idea of how the pseudo code is generated, stored, optimised, and output. Here's an example: C:\SLC\ZIP>codegen enter expression: a+b+c mov ax,a push ax mov ax,b pop bx add ax,bx push ax mov ax,c pop bx add ax,bx To see optimisation, use the -o switch on the commandline: C:\SLC\ZIP>codegen -o enter expression: a+b+c mov ax,a add ax,b add ax,c SLC.MOO & SLC.COM ================= This is a reasonably complete but very simple compiler. SLC stands for "Stupid Little Compiler", a working name that stuck. As an experiment I decided to abandon the still somewhat kludgy parsing of the new MoonRock compiler (which hasn't been released) and instead use a stack method for evaluation. It is again based on CALC, and generates code on the fly rather than attempting to optimise in intermediate form like CODEGEN does. This complete lack of optimisation is done deliberately to ensure the compiler is as simple as possible, yet still functional. The code is produces is VERY inefficient to look at and would probably make any competent assembly programmer burst into tears, but it works. The idea with this small and simple compiler is to write a set of library routines (this time in native SLC, not in 80x86 ASM like MoonRock's library), along with some necessary low level "glue" code in 80x86 format. Then, rewrite the compiler in native SLC. At this stage we have a complete compiler and set of library routines which are portable - only the low level glue code needs to be rewritten for a new processor or operating system, plus some modifications to the code generator. Of course, life is not as simple as that, but porting this compiler + library to another processor or OS will be a lot simple than say, trying to port MoonRock, which is written in QuickBASIC with an ASM library - definitely unportable material! At this stage the compiler generates something close to an output that can be assembled directly, but you may still need to edit the filename.asm file before it can be successfully assembled. Note that I'm using TASM as an assembler and did whatever I needed to in order to get it to work. I don't know how well it work with MASM. Remember - this is just a simple bootstrap to get the second version working! SLC looks a little like C with some BASICish looking keywords thrown in. The basic (pun unintended) elements are: [global] char|int [*]variablename [global] char|int|void functionname (parameters) { } variablename = expression if (expression) { } repeat { } until (expression) while (expression) { } asm return (expression) "expression" is a set of numbers and/or variables with operators, such as <, >, ==, &, |, !, +, -, *, / That's about it. Everything else is a function that you write yourself (or one that is included in the standard library). See the sample files for an idea of how it all goes together. This compiler has a reasonable amount of error checking but it will still allow some strange constructs without complaining. To use it, simply specify the filename on the commandline: C:\SLC>slc hello Stupid Little Compiler v0.01 [DOS]; Copyright (C) 1998 by Rowan Crowe processed 26 line(s) successfully, outfile is 'hello.asm' CONCLUSION ========== I hope these source files will be of some use to you. I can be contacted at the addresses at the top of this document if you'd like to discuss anything. Hopefully I'll be doing a bit of work on SLC here and there, but no promises as I'm quite busy these days with (paid) work. ;-)