Inside Kenaz Compiler
Here is the technical part, for those of you that are interested in such things. Kenaz compilation process is divided into 5 main parts :
- Parsing
- Metadata constraints verification
- Compilation in KzIL1 (Kenaz Intermediate Language 1)
- Optimization (optionnal)
- Compilation in KzIL2
The results are then stored to disk for later use
The parser was created using GNU tools Flex / Bison. Consequently, the grammar is LALR1, without any fancy features. Modying the cast operator to a "D-like" cast operator instead of the standart () cleared up things quite a lot. Although the parser is now fully functionnal, it still lacks error production, meaning than any syntaxt error is
likely to immediately terminate the compilation process. During the parsing, an AST (Abstract Syntaxt Tree) is built. Used in conjunction with a symbol table,
it is able to perform the real first compilation step, namely convert the user source code into KzIL1
Metadata constraint verification
Metadata constraints are stored as a tree, so basically checking if the constraints are satisfied is no more than a tree traversal and checking user's value.
KzIL1 is the first intermediate representation of user input code. It is a standard medium-level intermediate language. Instructions are mostly
represented as a quadruplet ( operand x instruction x operand x operand), although some instructions take an arbitrary number of operands
(like call to a function), or just one (unary function). Local variables are indentified using an unsigned short.
Example :
Input Kenaz source code :
c = 27 * 4;
c = c * a * 15;
Kzil1representation :
t2 <= mul-i (27, 4)
t4 <= mul-i (t2, t1)
t2 <= mul-i (t4, 15)
KzIL1 is strongly typed, there is one instruction per operand type (like add-i for integer and add-f for float). Using such a medium-level
representation allows us to implement various optimizations.
The compilation in KzIL1 is done very naively, resulting in a lot of useless redundant code, or code that can actually be evaluated at compile time
instead of run-time. This is why an optimization pass is applied. The code is first turned into Control Flow Graph and Single Static Assignement Graph, and
both representations allow us to try to optimize the code. The current compiler applies the following optimizations :
- Constant folding
- Local common subexpression elimination
- Global value numbering
- Dead code elimination
- Scalar replacement of aggregate
I plan to add the following optimization as soon as possible :
- Sparse conditional constant propagation
- Global subexpression elimination
- Loop invariant code motion
I got the algorithm for all those optimization from the books "Advanced Compiler design & implementation".
Not suprisingly, KzIL2 is the second Kenaz intermediate representation. Its instruction set matches the Intel x86's, the only difference resides in the
fact that it features an unlimited number of symbolic registers. Compilation from KzIL1 to KzIL2 is done in a naive way, resulting in poorly optimized code,
using and wasting lots of register. Intel IA32 architecture doesn't feature plenty of registers (actually, 7 usable registers), but in the other hand
nearly all instructions can have one of its operand in memory. Using operand in memory instead of operand in register is a good way to reduce register
pressure, and should be used as often as possible. With modern cache implementation, the performance hit resulting from using a memory operand
instead of a register operand should be insignificant (the local frame is likely to reside in cache), while leaving us with more free registers available
for the allocator. When I started to write this part of Kenaz, I didn't have internet-access anymore, so I had to come with something of my own, and this is
what I'm going to describe here.
For each KzIl1 instruction, we take a look at both input operands, and also compute their liveness (that is, if they are live past this point or not.
This can be determined using data-flow analysis). If one of the operand is dead past this point, use this as destination register. If both operands are alive
at this point, create a new symbolic (virtual) register, copy into it one of the operands, and use this register as destination register. This is a really
naïve approach, resulting in a lot of symbolic register to be created. Two operations are then locally applied to reduce the number of used symbolic
registers : register copy propagation and address copy propagation.Register copy propagation is similar to local copy propagation. Address propagation
tracks value copied into register from memory, and replace later use of this register as source operand by its source.
Example : Let’s say we have the following kenaz source code instruction :
c = b + this.a.a;
The KzIL1 form of this instruction is :
t0 <= getField(this, “a”);
t1 <= getField(T0, “a”);
c <= b + T1;
And, after naïve translation to KzIl2.1, with “this” residing in ECX and b was previously put in s7: (Sn represents symbolic register)
mov s0, ecx;
mov s1, [s0 + offset(a)]
mov s2, s1
mov s3, [s2 + offset(a)]
add s7, s3
c will now reside in s7. Taking this code, we apply register copy propagation on it:
mov s0, ecx;
mov s1, [ecx + offset(a)]
mov s2, s1
mov s3, [s1 + offset(a)]
add s7, t3
Next, address propagation :
mov s0, ecx;
mov s1, [ecx + offset(a)]
mov s2, [ecx + offset(a)]
mov s3, [s1 + offset(a)]
add s7, [s1 + offset(a)]
And, finally, a dead-code elimination pass to remove useless code :
mov s1, [ecx + offset(a)]
add s7, [s1 + offset(a)]
which is quite a pleasant result. Both copy propagation and address propagation
are done locally. I don’t know if it is worth performing them globally, as they
mostly help to reduce number of symbolic register used within a single
statement and not across basic block boundaries. Care must be taken to avoir over-aggressivness when a function call is encountered, because the called function may overwrite eax/ecx/edx. So, when encountering a 'call', every information about the content of eax/ecx and edx must be deleted.
The last needed pass is machine register allocation. This is done using graph-coloring schema, with
symbolic register as allocatable object. As the allocator does not feature
register coalescing yet, it suffers from the multiple copies introduced by the
conversion back from the SSA form. I plan to includes this features when I’ll
have enough time. (well, I should first rewrite the interference graph code, as
this Kenaz’s ugliest peace of code yet.).
When the KzIL2 code is generated, with machine register substitued to
symbolic register, a last operation is performed. The purpose of this operation
is to remove useless code that may be generated by previous pass (like mov eax,
eax) or replace a set of instructions by another set of instructions,
performing the same task faster (for example, replacing mov eax, 0 with xor
eax, eax). This pass is parametrable via a config file, describing the
pattern to match and the substitution operation to apply. For
example, our config file may be :
search
{
mov reg.a, 0;
}
replace
{
xor reg.a, reg.a
}
search
{
mov reg.a, reg.a;
}
replace
{
}
Pattern to match may of course include more than one instruction or one register.
When this is done, we may proceed to convert KzIL2 instruction to Intel x86 opcodes and serialize them to disk. Job is done.
top of page