This assignment concerned the design, development, testing, and documentation of a syntax analyser for a simple MiniJava version of the Java programming language. We had to implement a lexical analyser, parser, and tree builder for the language, building on the skeleton code. Also, we had to design test cases to demonstrate that our implementation is working correctly.
This report presents the design and implementation of my syntax analyser and describes the design of the tests I have used to check the correctness of the implementation. Parts One and Two are discussed separately. The full code for this sample solution is on the COMP332 iLearn site.
I first describe the design and implementation of the parsing code, without consideration of tree building. The second sub-section adds tree building to the parsing code.
My parser design follows the method discussed in lectures. Each context-free grammar production translates into a Scala parser value whose definition mimics the production. Operators in the grammar translate into their equivalents from the Scala parsing library.
I use lazy values to define my parsers so that I don't have to worry about the order in which I define them. An alternative would be to use non-lazy vals and be careful about the order, but this harder to write.
The translation from the grammar to the parsers is mostly straight-forward. For example, the productions
Arguments : Type Identifier ("," Type Identifier)*
| /* empty */.
Statement : "{" Statement* "}"
| "if" "(" Expression ")" Statement "else" Statement
| "while" "(" Expression ")" Statement
| "System.out.println" "(" Expression ")" ";"
| Identifier "=" Expression ";"
| Identifier "[" Expression "]" "=" Expression ";".
become
lazy val arguments =
repsep (argument, ",")
lazy val argument =
tipe ~ identifier
lazy val statement =
"{" ~> (statement*) <~ "}" |
"if" ~> ("(" ~> expression <~ ")") ~ statement ~ ("else" ~> statement) |
"while" ~> ("(" ~> expression <~ ")") ~ statement |
"System.out.println" ~> ("(" ~> expression <~ ")") <~ ";" |
identifier ~ ("=" ~> expression) <~ ";" |
identifier ~ ("[" ~> expression <~ "]") ~ ("=" ~> expression) <~ ";"
The repsep method used in the arguments
definition builds a parser where input parsed by the first parser is
separated by instances of input parsed by the second parser, allowing
no repetitions at all. Thus, this is equivalent to the two alternatives
for this symbol given in the grammar.
Appropriate bracketing and the <~ and ~>
operators are used in the productions to make sure that we throw away
the values of the literal strings and just retain the values of the
components that need to end up in the tree. A more complex case of
this occurs in the parsers for ClassDeclaration which
we solve by bracketing each pair of components.
lazy val classDeclaration =
("class" ~> identifier) ~ (("extends" ~> identifier)?) ~
("{" ~> (varDeclaration*)) ~ ((methodDeclaration*) <~ "}")
Similar translations implement the other productions.
The expression parsers follow the approach described in lectures
to describe the precedence and associativity of operators. We
were given the factor parser. I added
expression0 for multiplication,
expression1 for addition and subtraction,
expression2 for less-than and
expression for Boolean AND, in decreasing order
of precedence.
The parsers for the terminal symbols of the grammar (which effectively implement the lexical analysis) follow their natural language descriptions, since the assignment did not give productions for them. No conversion is needed since the string value returned by these parsers is what we want to return to the expression parser.
lazy val integer : PackratParser[String] =
regex ("[0-9]+".r)
lazy val identifier : PackratParser[String] =
regex ("[a-zA-Z][a-zA-Z0-9_]*".r)
The skeleton provided the tree structure that we were to produce. Adding tree building to my parsing code followed the method discussed in lectures. Each parser was augmented with a tree construction action to build the appropriate tree fragment from the components recognised by its component parsers.
In almost all cases, the tree building action simply constructs
a node of the appropriate case class using the ^^
operator, since the components correspond directly to the case
class constructor arguments.
For example, the parsers from the previous section become the following when types and tree building actions are added.
lazy val arguments : PackratParser[List[Argument]] =
repsep (argument, ",")
lazy val argument : PackratParser[Argument] =
tipe ~ identifier ^^ {
case t ~ n =>
Argument (t, n)
}
lazy val statement : PackratParser[Statement] =
"{" ~> (statement*) <~ "}" ^^ {
case ss =>
Block (ss)
} |
"if" ~> ("(" ~> expression <~ ")") ~ statement ~ ("else" ~> statement) ^^ {
case e ~ s1 ~ s2 =>
If (e, s1, s2)
} |
"while" ~> ("(" ~> expression <~ ")") ~ statement ^^ {
case e ~ s =>
While (e, s)
} |
"System.out.println" ~> ("(" ~> expression <~ ")") <~ ";" ^^ {
case e =>
Println (e)
} |
identifier ~ ("=" ~> expression) <~ ";" ^^ {
case n ~ e =>
VarAssign (n, e)
} |
identifier ~ ("[" ~> expression <~ "]") ~ ("=" ~> expression) <~ ";" ^^ {
case n ~ i ~ e =>
ArrAssign (n, i, e)
}
Note regarding marking: I am aware that proper testing for this assignment was quite a bit of work. We did not expect the level of testing described below for a good mark, but it is worthwhile to bear in mind that without this kind of testing, any confidence you have in the correct operation of your program is probably mis-placed.
I used all of the MiniJava example programs from the MiniJava web site as general tests. My parser correctly handles all of these files, which gives me some confidence that things are ok, particularly combniations of many constructs and handling of whitespace and comments. These tests are somewhat unsatisfactory, however, since they cannot be run and checked automatically by sbt. Hence, we want to be as complete as practical in the sbt tests.
I designed positive tests (correct input) based on the possible alternatives in the grammar, to make sure that I had all of the bases covered for correct programs. I tried to make each test just test one thing. For example, there is no need to test all sorts of expressions as a while statement condition, since there will be tests of expressions elsewhere.
Consider the terminal symbols, starting with identifiers. The skeleton already had a test for a normal identifier containing multiple letters. Missing were a minimal test (an identifier with just one letter) and one that tested inclusion of digits and underscores. There was already a negative test for parsing an integer as an identifier. I added two more to make check that sequences that start with a digit or underscore, but otherwise are ok as identifiers, are not allowed. Similar tests were written for integer terminals, although there are less possibilites to test since integers have a simpler structure.
For non-terminal constructs, I used the following general procedure for designing test cases. Constructs that involve no alternatives (including repetition) just need one test case. Where alternatives are present, I made sure to have at least one test case for each alternative. Repetitions are a special case of alternation. For each repetition, I designed tests to deal with the lower boundary conditions (empty or single case). Upper boundary conditions of repetitions were infinite, so they cannot be tested, but I made sure to have tests of multiple repetitions for each situation. For possibly empty repetitions I also have a test case for a single element, since a common error to make is to forget the repetition completely.
As an example, consider the following production.
ClassDeclaration : "class" Identifier ("extends" Identifier)? {"
VarDeclaration*
MethodDeclaration*
"}".
We need two tests of the class header, one that has an
extends clause and one that doesn't. To deal
with the repetitions, we need to account for nothing in the
body, one of each repetition or more than one of each repetition.
Considering all of the combinations leads to quite a few
tests, but they are routine.
Expression testing requires a little more work, since the precedence and associativity rules for binary operators are not encoded in the grammar that we were given. Hence, I included some extra tests to make sure that the operators were all associative in the correct way and that expressions with multiple operators correctly took into account the relative precedence of those operators. Also, I checked that parentheses correctly override the associativity and precedence rules. For the less than operator I have a negative test to make sure that it is not associative.
I didn't test all of the combinations of binary operators
for precedence ordering, but I tested each step in the precedence
hierarchy. By transitivity, the other cases should be covered
as well. For example,
I test that * has higher precedence than +
and that + has higher precedence than <.
From these two tests we can be reasonable sure that *
has higher precedence than <.
Testing negative conditions (erroneous input) is harder since there are a lot more possibilities. I limited my testing to the cases that were likely possibilities for errors. For example, where a repetition called for one or more of something, I tested to make sure that the wrong number is not allowed. For example, a main class has to have exactly one statment, so I have negative tests for zero and more than one. Similarly, a method body has to have a return statement so I have a negative test to check that an empty body is not allowed. In general, the aim of negative tests is to spot places where it is likely that an error has been made.
Overall, as usual we can't be sure that problems are still not present since we can't test every possible input combination. However, the combinations I have tried work, plus the fact that the more complex actual MiniJava progams are handled correctly, means that I have a high level of confidence that this program is working properly.
The first task for Part Two was to research Java interfaces and to come up with a version that makes sense in the context of MiniJava.
The main idea of Java interfaces is to provide a way to declare the signatures of methods without their bodies. Java classes can declare that they implement one or more interfaces, in which case they need to provide implementations for the the methods in those interfaces.
The signature of a method comprises its name, its argument
names and types, and its return type. A class declares that
it implements an interface via an implements
clause in its declaration. For example, the following is a
declaration of an interface Colourable and a
use of it by a Shape class, assuming the
existence of a Colour class or interface.
interface Colourable {
void setColour (Colour colour);
Colour getColour ();
}
class Shape implements Colourable {
void setColour (Colour colour) {
... implementation ...
}
Colour getColour () {
... implementation ...
}
}
Java interfaces have more aspects than this (e.g., a class can implement more than one interface), but I decided on this level of support since it fits with the rest of MiniJava.
I then considered how to add interfaces to the syntax of MiniJava. By analogy to the existing syntax of classes and methods, I added these productions to the grammar.
InterfaceDeclaration :
"interface" Identifier "{"
SignatureDeclaration*
"}".
SignatureDeclaration :
"public" Type Identifier "(" Arguments ")" ";".
To allow interface declarations to appear in a program, I modified the program production as follows.
Program : MainClass ClassOrInterfaceDeclaration*.
ClassOrInterfaceDeclaration :
ClassDeclaration
| InterfaceDeclaration.
Finally, I modified class declarations to allow an optional implements clause after the optional extends clause.
ClassDeclaration :
"class" Identifier ("extends" Identifier)? ("implements" Identifier)? {"
VarDeclaration*
MethodDeclaration*
"}".
To accomodate interfaces in the abstract syntax tree, I added a new
tree node type for interfaces. It and Class share
a new common superclass ClassOrInterface and a program is now
a list of classes or interfaces. Also, Class got a new
field to store its optional implements information, analogous to the
existing field that stores the optional superclass. A new node type
for signatures was added that is the same as for methods with the body
components and return expression removed.
case class Program (main : MainClass,
classesOrInterfaces : List[ClassOrInterface]) extends SourceNode
sealed abstract class ClassOrInterface extends SourceNode
case class Class (name : String, superclass : Option[String],
implements : Option[String],
vars : List[Var], methods : List[Method]) extends ClassOrInterface
case class Interface (name : String, signatures : List[Signature]) extends ClassOrInterface
case class Signature (tipe : Type, name : String, args : List[Argument]) extends SourceNode
Modifying the parser and tree construction to accommodate these new constructs or changes to old ones was straight-forward and proceeded as described in Part One. No new techniques were needed.
Testing was undertaken as for the main part of the assignment. A set of tests similar to those for classes were added for interfaces, and tests for signature declarations were added similar to those already present for method declarations. Also, tests of the optional implements clause were added. A test of mixed classes and interfaces in a program was added.