The document concludes with examples of translations of a couple of simple Obr programs.
The code that supports the tree structure and encoding described in
this document can be found in RISCTree.scala and
Encoder.scala in the Obr compiler. The mapping from Obr
trees to RISC trees can be found in Transformation.scala.
All of the information that the translation task of the compiler provides about the target program is embodied in the target program tree. If a particular item of information cannot be accessed via this tree, then it cannot be obtained at all. Information is encoded in the "shape" of the tree and in values stored at the leaves.
This section defines the set of possible target program trees by defining all of the concepts and constructs of the target language.
A datum is a construct yielding an explicit value that can be
stored or used as an operand for other operations. The encoder
uses the attribute reg to associate a local machine
register with each Datum to provide storage for the Datum's
value; hence the transformation phase does not have to perform
register allocation.
An item is a construct that does not yield a value.
A RISCProg node represents a complete RISC program. This RISCProg node is the root of the target program tree, and never appears in any other position.
The following productions summarise the constructs of the RISC by giving the structure of the subtree for each construct.
RISCProg: RISCNode: Item+ Beq: Item: Datum Label Bne: Item: Datum Label Jmp: Item: Label LabelDef: Item: Label Read: Item: Address Ret: Item StW: Item: Address Datum Write: Item: Datum AddW: Datum: Datum Datum Cond: Datum: Datum Datum Datum CmpeqW: Datum: Datum Datum CmpneW: Datum: Datum Datum CmpgtW: Datum: Datum Datum CmpltW: Datum: Datum Datum DivW: Datum: Datum Datum IntDatum: Datum: Int LdW: Datum: Address MulW: Datum: Datum Datum NegW: Datum: Datum Not: Datum: Datum RemW: Datum: Datum Datum SubW: Datum: Datum Datum SequenceDatum: Datum: Item+ Datum Local: Address: Int Indexed: Address: Local Datum
The "W" in some of the node names means that those operations operate on word-sized values (four bytes on the RISC) which in this compiler are used to implement both integer and Boolean values.
The following subsections describe the constructs of the table. Some of those constructs represent specific RISC instructions and others represent collections of instructions that involve related decisions about operand access.
RISC: RISC: Item+
The encoding of a RISC construct is very simple, we simply encode each of the items in its body, concatenate the resulting RISC code sequences and then add prologue (initilisation) and epilogue (termination) code. Currently this prologue code simply initialises register $27 topoints to the memory segment which will be used to store the values of global variables and temporaries. The epilogue is begun by a standard label to enable the Ret construct (see below) to transfer control to it. Then it simply terminates the program by executing a ret $0 instruction.
Beq: Item: Datum Label Bne: Item: Datum Label Jmp: Item: Label
A branch (Beq or Bne) is encoded as the encoding of its Datum component followed by a test and branch to the Label component. A Beq does a branch on equal to zero and a Bne does a branch on not equal to zero.
A Jmp does an unconditional branch to its Label component.
LabelDefLabelDef: Item: Label
A LabelDef construct represents a definition of a label and is encoded by emitting that definition in the appropriate assembler syntax.
Read and WriteRead: Item: Address Write: Item: Datum
These constructs are encoded using the corresponding terminal IO RISC instructions rd, wrd and wrl. In the case of Read the value read is stored in the location given by the Address component. In the case of Write the value written is that given by the Datum component which is encoded first.
RetRet: Item
A Ret construct is encoded by an unconditional jump to a label at the end of the code comprising the program (i.e., to the beginning of the epilogue). This encoding ensures that a return from any part of the program will complete necessary processing before exiting the program.
StWStW: Item: Address Datum
A StW construct is encoded by encoding the Datum component followed by an instruction to store the value of the Datum into the given address.
AddW, DivW, MulW, NegW, Not, RemW, SubWAddW: Datum: Datum Datum DivW: Datum: Datum Datum MulW: Datum: Datum Datum NegW: Datum: Datum RemW: Datum: Datum Datum SubW: Datum: Datum Datum
Most of the arithmetic operations are encoded by encoding their Datum component(s) followed by a single instruction that performs the appropriate operation. The NegW operation is implemented by subtracting the given operand from 0 (the value of register $0).
IntDatum: Datum: Int
An IntDatum construct is encoded as a move of the integer value into the location required by the Datum.
Cond and NotCond: Datum: Datum Datum Datum Not: Datum: Datum
A Cond construct is encoded by encoding its first Datum component, followed by a sequence of instructions that evaluate the second Datum if the first Datum is non-zero, or evaluate the third Datum if the first Datum is zero. In either case the result value will be left in the location required by the Cond Datum itself. The Not construct is encoded as if it were converted to a corresponding Cond tree under the following translation:
Not (d) -> Cond (d, IntDatum(0), IntDatum(1))
CmpeqW, CmpneW, CmpgtW, CmpltWCmpeqW: Datum: Datum Datum CmpneW: Datum: Datum Datum CmpgtW: Datum: Datum Datum CmpltW: Datum: Datum Datum
The comparison constructs CmpeqW, CmpgtW, CmpltW, and CmpneW are encoded as the encoding of their operands, followed by a comparison instruction, followed by moves and conditional branches as appropriate to establish the result value of 0 or 1 in the register associated with the given comparison Datum.
LdWLdW: Datum: Address
A LdW construct is encoded as a load of a word value from the location specified by its Address component into the register associated with this LdW.
Local, IndexedLocal: Address: Int Indexed: Address: Local Datum
A Local address represents a word-sized storage location in the main block of memory that is accessible to an Obr program. Its Int child specifies the offset in bytes from the start of the memory block at which the word is located.
An Indexed address is an address that is computed as a byte offset from a local address. The offset is given by a computation expressed as a Datum.
When an address is used in another construct (i.e., an LdW or an StW) it is first encoded, then used as an operand in the load or store. Local address do not produce any code when they are encoded. Indexed addresses encode their Datum component.
SequenceDatumSequenceDatum construct of the form
SequenceDatum Item-1 ... Item-n Datumis implemented as follows:
Code for Item-1 ... Code for Item-n Code for DatumHere
Item-i is the ith element of the component Item list.
The results of the mapping process from source to target are reflected in the properties and structure of the target tree. This section describes how Obr source data and actions are mapped to target constructs.
Obr programs can manipulate only integer and Boolean basic values plus structured values that are arrays and records. Both parameters and variables can be declared. Therefore, a definition of the data mapping task must specify how values of these types are implemented on the RISC, and how storage is allocated for parameters and variables.
Because there is no possibility of recursion in Obr, it is possible to implement data storage for parameters and variables statically. The Obr "parameters" really aren't parameters at all --- they are top-level variables that must be initialised by reading them from the standard input before executing the body of the Obr program. Thus their storage is implemented just like variables.
Obr constants do not need any storage since the compiler knows their value and can construct an IntDatum node that can be used directly.
An Obr integer is implemented by a RISC word (32 bits). For convenience, Boolean values are also represented by RISC words. True is represented by 1, and false is represented by 0.
Storage for all of the variables declared in an Obr program is allocated in a single area of RISC memory. During execution, register $27 contains the address of the beginning of the memory area. Thus, any variable's location can be specified by the sum of a non-negative integer and the contents of register $27. Since each variable occupies four bytes of memory, the offsets from the content of register $27 are all multiples of 4: The topmost variable is in location $27, the next variable is in location $27 + 4, and so on.
Arrays and records are allocated as contiguous memory as if the array elements or fields were declared as individual integer variables. (Recall that array elements and fields must be integers.) Therefore an array of N elements or a record with N fields is allocated as N contiguous words of memory.
IntVar and ArrayVar that represent declarations). This section describes how the other constructs are translated into RISC target tree constructs.
AssignStmt constructs are translated into a StW construct whose left child is the address of the variable, array element or field being assigned, and whose right child is the translation of the expression on the right-hand side of the assignment.
A BoolExp is translated into an IntDatum where 0 is used for FALSE and 1 for TRUE.
AndExp and OrExp translate into uses of the Cond target construct in order to achieve short-circuit evaluation. They are translated as follows:
AndExp (e1, e2) -> Cond (t1, t2, 0) OrExp (e1, e2) -> Cond (t1, 1, t2)
In both of these translations t1 and t2 are the translations of e1 and e2, respectively.
NotExp translates into a boolean complement operation using the Not target construct.
The comparison operators EqualExp, NotEqualExp, GreaterExp, and LessExp are translated to the CmpeqW, CmpneW, CmpgtW and CmpltW constructs, respectively.
ExitStmtAn ExitStmt is implemented by a jump to the terminating label of the closest containing LoopStmt. See also the description of the LoopStmt construct below.
FieldExpA FieldExp translates to a LdW from the address of the given record field.
ForStmtA ForStmt construct is implemented as follows:
ForStmt (id, e1, e2, s) ->
StW (idmem, t1),
StW (mem, t2),
Bne (CmpgtW (LdW (idmem), LdW (mem)), L2),
Jmp (L1),
LabelDef (L3),
StW (idmem, AddW (LdW (idmem), IntDatum (1))),
LabelDef (L1),
i
Bne (CmpltW (LdW (idmem), LdW (mem)), L3),
LabelDef (L2)
Here, i is the list of Item nodes that is the translation of s, t1 is the translation of e1, and t2 is the translation of e2. idmem is the storage location being used for the variable id, and mem is a new integer memory location not used elsewhere.
Note that this scheme avoids a problem if the maximum expression e2 evaluates to the maximum integer possible, because id is not incremented unless overflow cannot happen.
IdnExpAn IdnExp is translated into either an IntDatum containing the integer value of the identifier (if it denotes a constant), or a LdW from the location in which the variable is stored.
IfStmtAn IfStmt construct is implemented as follows:
IfStmt (e, s1, s2) ->
Beq (t, L1)
i1
Jmp (L2)
LabelDef (L1)
i2
LabelDef (L2)
Here, i1 and i2 are the lists of Item nodes that are the translations of s1 and s2, respectively, and t is the translation of e.
IndexExpAn IndexExp translates to a LdW from the address of the given array element. In general, the index is not constant so it must be calculated as part of the address computation.
An IntExp is translated into an IntDatum whose value is the Int component of the IntExp.
The arithmetic target constructs are used to implement the arithmetic operators (MinusExp, NegExp, ModExp, PlusExp, SlashExp and StarExp) in the obvious way. For example, PlusExp is represented by AddW, ModExp by RemW, and NegExp by NegW.
IntParamParameter declarations are always represented by IntParam constructs and are translated into a Read construct whose child is address of the storage allocated to the parameter.
LoopStmtA LoopStmt construct is implemented as follows:
Loop (s) ->
LabelDef (L1)
i
Jmp (L1)
LabelDef (L2)
Here, i is the list of Item nodes that is the translation of s. L2 is a label that can be used as the destination of jumps implementing ExitStmt constructs within the loop.
ObrIntThe ObrInt construct is translated into a RISC construct whose children are the Item nodes comprising the translation of its Declaration and Statement components. The RISC node also is given an Int component to record the maximum size of storage used by the program.
ReturnStmtThe ReturnStmt construct is implemented by a Write construct whose child is the translation of the component Expression to be returned, followed by a Ret construct.
WhileStmtWhileStmt construct is implemented as follows:
WhileStmt (e, s) ->
Jmp (L1)
LabelDef (L2)
i
LabelDef (L1)
Bne (t, L2)
Here, i is the list of Item nodes that is the translation of s, and t is the translation of e.
The default behaviour of the Obr compiler is to execute all syntactic, semantic and code generation phases, reporting any errors that occur but doing nothing else. To alter this behaviour we provide three command line flags:
-t spill the target tree constructed by the translation phase to the standard output.
-a spill the output of the encoding phase to the standard output as RISC assembly language code.
-e assemble and execute the generated RISC code in the Obr compilers built in RISC machine emulator.
This section shows the complete RISC target trees and assembly code that would be produced for the factorial and GCD Obr programs.
Consider the Obr version of Euclid's algorithm for calculating the greatest common divisor of two numbers.
PROGRAM GCD (x : INTEGER; y : INTEGER) : INTEGER;
BEGIN
WHILE x # y DO
IF x > y
THEN x := x - y;
ELSE y := y - x;
END
END
RETURN x;
END GCD.
From this code, the Obr compiler generates the following target tree:
RISCProg(
List(
StW(Local(0),Read()),
StW(Local(4),Read()),
Jmp(Label(2)),
LabelDef(Label(3)),
Beq(CmpgtW(LdW(Local(0)),LdW(Local(4))),Label(4)),
StW(Local(0),SubW(LdW(Local(0)),LdW(Local(4)))),
Jmp(Label(5)),
LabelDef(Label(4)),
StW(Local(4),SubW(LdW(Local(4)),LdW(Local(0)))),
LabelDef(Label(5)),
LabelDef(Label(2)),
Bne(CmpneW(LdW(Local(0)),LdW(Local(4))),Label(3)),
Write(LdW(Local(0))),
Ret()))
From this target tree, the encoder produces the following RISC assembly code. Note that the encoder includes the target constructs as comments (starting with exclamation marks) to make the correspondence clearer.
! Prologue
movi $27, $0, 0
! StW(Local(0),Read())
rd $1
stw $1, $27, 0
! StW(Local(4),Read())
rd $1
stw $1, $27, 4
! Jmp(Label(2))
br label2
! LabelDef(Label(3))
label3:
! Beq(CmpgtW(LdW(Local(0)),LdW(Local(4))),Label(4))
ldw $1, $27, 0
ldw $2, $27, 4
cmp $1, $2
movi $1, $0, 1
bgt label7
movi $1, $0, 0
label7:
cmpi $1, 0
beq label4
! StW(Local(0),SubW(LdW(Local(0)),LdW(Local(4))))
ldw $1, $27, 0
ldw $2, $27, 4
sub $1, $1, $2
stw $1, $27, 0
! Jmp(Label(5))
br label5
! LabelDef(Label(4))
label4:
! StW(Local(4),SubW(LdW(Local(4)),LdW(Local(0))))
ldw $1, $27, 4
ldw $2, $27, 0
sub $1, $1, $2
stw $1, $27, 4
! LabelDef(Label(5))
label5:
! LabelDef(Label(2))
label2:
! Bne(CmpneW(LdW(Local(0)),LdW(Local(4))),Label(3))
ldw $1, $27, 0
ldw $2, $27, 4
cmp $1, $2
movi $1, $0, 1
bne label8
movi $1, $0, 0
label8:
cmpi $1, 0
bne label3
! Write(LdW(Local(0)))
ldw $1, $27, 0
wrd $1
wrl
! Ret()
br label6
! Epilogue
label6:
ret $0
Here is the same information for the Obr factorial program.
PROGRAM Factorial (v : INTEGER) : INTEGER;
CONST
limit = 7;
VAR
c : INTEGER;
fact : INTEGER;
BEGIN
IF (v < 0) OR (v > limit) THEN
RETURN -1;
ELSE
c := 0;
fact := 1;
WHILE c < v DO
c := c + 1;
fact := fact * c;
END
RETURN fact;
END
END Factorial.
From this code, the Obr compiler generates the following target tree:
RISCProg(
List(
StW(Local(0),Read()),
Beq(
Cond(
CmpltW(LdW(Local(0)),IntDatum(0)),
IntDatum(1),
CmpgtW(LdW(Local(0)),IntDatum(7))),
Label(2)),
Write(NegW(IntDatum(1))),
Ret(),
Jmp(Label(3)),
LabelDef(Label(2)),
StW(Local(4),IntDatum(0)),
StW(Local(8),IntDatum(1)),
Jmp(Label(4)),
LabelDef(Label(5)),
StW(Local(4),AddW(LdW(Local(4)),IntDatum(1))),
StW(Local(8),MulW(LdW(Local(8)),LdW(Local(4)))),
LabelDef(Label(4)),
Bne(CmpltW(LdW(Local(4)),LdW(Local(0))),Label(5)),
Write(LdW(Local(8))),
Ret(),
LabelDef(Label(3))))
From this target tree, the encoder produces the following RISC assembly code:
! Prologue
movi $27, $0, 0
! StW(Local(0),Read())
rd $1
stw $1, $27, 0
! Beq(Cond(CmpltW(LdW(Local(0)),IntDatum(0)),IntDatum(1),CmpgtW(LdW(Local(0)),IntDatum(7))),Label(2))
ldw $1, $27, 0
movi $2, $0, 0
cmp $1, $2
movi $1, $0, 1
blt label9
movi $1, $0, 0
label9:
cmpi $1, 0
beq label7
movi $1, $0, 1
mov $1, $0, $1
br label8
label7:
ldw $1, $27, 0
movi $2, $0, 7
cmp $1, $2
movi $1, $0, 1
bgt label10
movi $1, $0, 0
label10:
mov $1, $0, $1
label8:
cmpi $1, 0
beq label2
! Write(NegW(IntDatum(1)))
movi $1, $0, 1
sub $1, $0, $1
wrd $1
wrl
! Ret()
br label6
! Jmp(Label(3))
br label3
! LabelDef(Label(2))
label2:
! StW(Local(4),IntDatum(0))
movi $1, $0, 0
stw $1, $27, 4
! StW(Local(8),IntDatum(1))
movi $1, $0, 1
stw $1, $27, 8
! Jmp(Label(4))
br label4
! LabelDef(Label(5))
label5:
! StW(Local(4),AddW(LdW(Local(4)),IntDatum(1)))
ldw $1, $27, 4
movi $2, $0, 1
add $1, $1, $2
stw $1, $27, 4
! StW(Local(8),MulW(LdW(Local(8)),LdW(Local(4))))
ldw $1, $27, 8
ldw $2, $27, 4
mul $1, $1, $2
stw $1, $27, 8
! LabelDef(Label(4))
label4:
! Bne(CmpltW(LdW(Local(4)),LdW(Local(0))),Label(5))
ldw $1, $27, 4
ldw $2, $27, 0
cmp $1, $2
movi $1, $0, 1
blt label11
movi $1, $0, 0
label11:
cmpi $1, 0
bne label5
! Write(LdW(Local(8)))
ldw $1, $27, 8
wrd $1
wrl
! Ret()
br label6
! LabelDef(Label(3))
label3:
! Epilogue
label6:
ret $0