Parsing Boolean Expressions into DNF with ANTLR

Parsing Boolean Expressions into DNF with ANTLR

Motivation #

To determine if a student had successfully completed ANY of the numerous prerequisite combinations. The prerequisites for courses at Ontario Tech are structured like this;

To take CSCI 3055U, the expression (CSCI 1060U or CSCI 2030U) and CSCI 2110U must evaluate to true.

An easy way to do this was to check if any of the possible combinations were true, this lends itself to the idea of Disjunctive Normal Form.

eg: CSCI 3055U’s prerequisites can be met one of two ways:

  • CSCI 1060U and CSCI 2110U have been taken
  • CSCI 2030U and CSCI 2110U have been taken

of course CSCI 1060U, CSCI 2030U and CSCI2110U could all be taken but that is covered by the aforementioned cases.

Disjunctive Normal Form #

DNF is a disjunction of conjunctions (ORs of ANDs), if any of the conjunctions evaluate to true, then the original statement is also true.

the following are examples of DNF

  • (a and b and c) or (d and f)
  • (a and b) or (c)
  • (a and b)
  • (a)

Our earlier example (a or b) and c turns into (a and c) or (a and b), matching our needs for determining if a prerequisite is met.

Conversion #

To automate this conversion I utilized ANTLR.

First we create the lexer to convert the incoming string into tokens:

grammar DNFConvertor;

// for simplicity courses will be a single letter
COURSE: 'A'..'Z' | 'a'..'z';
AND: ('AND' | 'and' | 'And');
OR: ('OR' | 'or' | 'Or');
LPAREN: '(';
RPAREN: ')';
WHITESPACE: [ \t\n\r]+ -> skip;

when we perform lexical analysis on our sample input (a or b) and c we get:

Token Type: LPAREN, Text: '(',   Line: 1, Position: 0
Token Type: COURSE, Text: 'a',   Line: 1, Position: 1
Token Type: OR    , Text: 'or',  Line: 1, Position: 3
Token Type: COURSE, Text: 'b',   Line: 1, Position: 6
Token Type: RPAREN, Text: ')',   Line: 1, Position: 7
Token Type: AND   , Text: 'and', Line: 1, Position: 9
Token Type: COURSE, Text: 'c',   Line: 1, Position: 13

Building the Abstract Syntax Tree #

We want the AST to have nodes containing operations (or, and) to receive two expressions in DNF, combine them, and then output one DNF expression. With that in mind we come up with up the following grammar rules.

expression
    : LPAREN expression RPAREN
    | expression AND expression
    | expression OR expression
    | COURSE
    ;

lexing and parsing (a or b) and c produces the following AST.

alt text

Adding Semantic Actions #

We now want to execute some chunks of code when the parser comes across certain rules or productions, we can do this by ‘binding’ code to grammar rules, this binding is also known as semantic actions in ANTLR. Internally we will be representing DNF as a list[list[str]] where the str is a course and the inner list is one of the conjunctions.

(a or b) and c → ANTLR Magic? → [["a", "c"], ["b", "c"]]

Production 1: LPAREN expression RPAREN

  • right away we can return whatever the value of the expression is as nothing is being modified here

Production 2: expression_1 AND expression_2

  • if any two of the conjunctions contained in the first and second expressions are true, then this will evaluate to true as well.
  • this means we want to create all possible combinations of the conjunctions
  • [[*conjunction_1, *conjunction_2] for conjunction_1 in expression_1 for conjunciton2 in expression_2]

Production 3: expression_1 OR expression_2

  • when we encounter an OR we can simply merge the expressions
  • expression_1 + expression_2

Production 4: COURSE

  • we can simply return the variable, but as a lists of lists, as the variable alone is considered a conjunction

Adding these semantic actions to our grammar we end up with:

grammar DNFConvertor;

expression returns [list dnf]
: LPAREN e=expression RPAREN {$dnf = $e.dnf}
| e1=expression AND e2=expression {$dnf = [[*c1, *c2] for c1 in $e1.dnf for c2 in $e2.dnf]}
| e1=expression OR e2=expression {$dnf = $e1.dnf + $e2.dnf}
| c=COURSE {$dnf = [[$c.text]]}
;

// for simplicity courses will be a single letter
COURSE: 'A'..'Z' | 'a'..'z';
AND: ('AND' | 'and' | 'And');
OR: ('OR' | 'or' | 'Or');
LPAREN: '(';
RPAREN: ')';
WHITESPACE: [ \t\n\r]+ -> skip;

We can then preform the conversion with the following code snippet:

def expr_to_dnf(expression: str) -> list[list[str]]:
    stream = InputStream(expression)
    lexer = DNFConvertorLexer(stream)
    stream = CommonTokenStream(lexer)
    parser = DNFConvertorParser(stream)
    dnf: list[list[str]] = parser.expression().dnf
    if parser.getNumberOfSyntaxErrors() > 0:
        raise Exception(f"Syntax errors parsing {expression}")
    return dnf

# output: [['a', 'c'], ['b', 'c']]
print(expr_to_dnf("(a or b) and c)")
# output: [['a', 'b'], ['c', 'd']]
print(expr_to_dnf("(a and b) or (c and d)")) 
# output: [['e'], ['a', 'c'], ['a', 'd'], ['b', 'c'], ['b', 'd']]
print(expr_to_dnf("e or ((a or b) and (c or d))"))