Motivation #
To determine if a student had successfully completed ANY of the numerous prerequisite combinations. The prerequisites for courses at Ontario Tech are structured like this;
To take CSCI 3055U
, the expression (CSCI 1060U or CSCI 2030U) and CSCI 2110U
must evaluate to true.
An easy way to do this was to check if any of the possible combinations were true, this lends itself to the idea of Disjunctive Normal Form.
eg: CSCI 3055U
’s prerequisites can be met one of two ways:
CSCI 1060U
andCSCI 2110U
have been takenCSCI 2030U
andCSCI 2110U
have been taken
of course CSCI 1060U, CSCI 2030U and CSCI2110U could all be taken but that is covered by the aforementioned cases.
Disjunctive Normal Form #
DNF is a disjunction of conjunctions (ORs of ANDs), if any of the conjunctions evaluate to true, then the original statement is also true.
the following are examples of DNF
(a and b and c) or (d and f)
(a and b) or (c)
(a and b)
(a)
Our earlier example (a or b) and c
turns into (a and c) or (a and b)
, matching our needs for determining if a prerequisite is met.
Conversion #
To automate this conversion I utilized ANTLR.
First we create the lexer to convert the incoming string into tokens:
grammar DNFConvertor;
// for simplicity courses will be a single letter
COURSE: 'A'..'Z' | 'a'..'z';
AND: ('AND' | 'and' | 'And');
OR: ('OR' | 'or' | 'Or');
LPAREN: '(';
RPAREN: ')';
WHITESPACE: [ \t\n\r]+ -> skip;
when we perform lexical analysis on our sample input (a or b) and c
we get:
Token Type: LPAREN, Text: '(', Line: 1, Position: 0
Token Type: COURSE, Text: 'a', Line: 1, Position: 1
Token Type: OR , Text: 'or', Line: 1, Position: 3
Token Type: COURSE, Text: 'b', Line: 1, Position: 6
Token Type: RPAREN, Text: ')', Line: 1, Position: 7
Token Type: AND , Text: 'and', Line: 1, Position: 9
Token Type: COURSE, Text: 'c', Line: 1, Position: 13
Building the Abstract Syntax Tree #
We want the AST to have nodes containing operations (or, and) to receive two expressions in DNF, combine them, and then output one DNF expression. With that in mind we come up with up the following grammar rules.
expression
: LPAREN expression RPAREN
| expression AND expression
| expression OR expression
| COURSE
;
lexing and parsing (a or b) and c
produces the following AST.
Adding Semantic Actions #
We now want to execute some chunks of code when the parser comes across certain rules or productions, we can do this by ‘binding’ code to grammar rules, this binding
is also known as semantic actions in ANTLR. Internally we will be representing DNF as a list[list[str]]
where the str
is a course and the inner list
is one of the conjunctions.
(a or b) and c
→ ANTLR Magic? → [["a", "c"], ["b", "c"]]
Production 1: LPAREN expression RPAREN
- right away we can return whatever the value of the expression is as nothing is being modified here
Production 2: expression_1 AND expression_2
- if any two of the conjunctions contained in the first and second expressions are true, then this will evaluate to true as well.
- this means we want to create all possible combinations of the conjunctions
[[*conjunction_1, *conjunction_2] for conjunction_1 in expression_1 for conjunciton2 in expression_2]
Production 3: expression_1 OR expression_2
- when we encounter an OR we can simply merge the expressions
expression_1 + expression_2
Production 4: COURSE
- we can simply return the variable, but as a lists of lists, as the variable alone is considered a conjunction
Adding these semantic actions to our grammar we end up with:
grammar DNFConvertor;
expression returns [list dnf]
: LPAREN e=expression RPAREN {$dnf = $e.dnf}
| e1=expression AND e2=expression {$dnf = [[*c1, *c2] for c1 in $e1.dnf for c2 in $e2.dnf]}
| e1=expression OR e2=expression {$dnf = $e1.dnf + $e2.dnf}
| c=COURSE {$dnf = [[$c.text]]}
;
// for simplicity courses will be a single letter
COURSE: 'A'..'Z' | 'a'..'z';
AND: ('AND' | 'and' | 'And');
OR: ('OR' | 'or' | 'Or');
LPAREN: '(';
RPAREN: ')';
WHITESPACE: [ \t\n\r]+ -> skip;
We can then preform the conversion with the following code snippet:
def expr_to_dnf(expression: str) -> list[list[str]]:
stream = InputStream(expression)
lexer = DNFConvertorLexer(stream)
stream = CommonTokenStream(lexer)
parser = DNFConvertorParser(stream)
dnf: list[list[str]] = parser.expression().dnf
if parser.getNumberOfSyntaxErrors() > 0:
raise Exception(f"Syntax errors parsing {expression}")
return dnf
# output: [['a', 'c'], ['b', 'c']]
print(expr_to_dnf("(a or b) and c)")
# output: [['a', 'b'], ['c', 'd']]
print(expr_to_dnf("(a and b) or (c and d)"))
# output: [['e'], ['a', 'c'], ['a', 'd'], ['b', 'c'], ['b', 'd']]
print(expr_to_dnf("e or ((a or b) and (c or d))"))