Write calculator using lexx and yacc
Objective: The goal of this homework is to acquaint yourself with two very powerful tools for the generation of compilers: lex and yacc. When a program is presented to a compiler, the compiler has to divide the program into meaningful units, called tokens, and then discover the relationship among these units. The algorithm that divides the program into units is called a lexical analyzer, scanner, or lexer.
Once the tokens are identified, the compiler needs to find the expressions, statements, declarations, blocks, and procedures in the program. This task is the function of a parser. The parser requires a list of rules that define the relationships among the tokens. This list of rules is called a grammar.
Lex is a program that takes a set of descriptions of possible tokens and produces a C routine that implements a scanner. Yacc takes a concise description of a grammar and produces a C routine that implements a parser for the grammar.
Lex and yacc are tools which can be used to solve a variety of interesting problems, even outside the realm of compilers. We are sure that once you understand how they function, you will find that the use of lex and yacc will save you many programming hours.
Equipment and software:
For this assignment, you can use any Linux machine that has lex and yacc (or flex and
bison) installed.
You may use a Windows 10 machine provided that you install “windows subsystem for
linux”, which is basically an Ubuntu Linux system.
You may use a Mac OS X machine provided that you install Xcode.
Tasks:
All the files needed for this assignment can be found in the attachments. An example lex specification cal.l. You can build a standalone scanner starting from this file to see how the scanner works. Follow these instructions to build the scanner:
Invoke this command to convert a lex specification file into a C source file. “$ lexcal.l”
Now you must compile this C source file into an executable program using the following command: “$ gcc -o scanner lex.yy.c”
Now you can run the executable scanner and play with it by typing in text and checking if it is recognized as a token. $ ./scanner
We recommend that you consult the cal.l file as you play with the scanner to understand how the tokens are specified. Make sure you try operators, words, parenthesis, etc. The scanner will output the type of symbol that it recognizes for the input that you type. You can use CTRL-D to terminate the scanner.
Task 1: For this task you are asked to develop an infix calculator. You should develop two files:
infix.l and infix.y. The file infix.l is going to be the lex file written to match the patterns that are required to generate the tokens needed to implement a simple calculator. The file infix.y described the rules of the grammar that implement the calculator. You should customize the attached Makefile to generate the calculator. Make sure you run the make clean command to remove older versions of the files then type make:
$ make clean
$ make
The infix calculator should work like following. You invoke the calculator by running the program ”./infix”. If you type ”3 + 2” the calculator answers with ”= 5”; if you type ”x = 4” followed by “3 * x” the calculator responds with ”= 12”. Here is a demonstration:
$ ./infix
3+2
=5
x=4
=4
3*x
=12
You should be able to do the following operations:
Addition, subtraction, multiplication, and division.
Unary bitwise operator NOT. Use the following symbol: ‘!’. The operator should
precede the operand.
The power operator (e.g 2 to the power of 4). Use the following symbol: **.
Support parentheses “(“ and “)”, and can warn unmatched parentheses.
Support variables whose name must start with lower case alphabet characters “a-z”.
Note: The parser should NOT recognize symbols consisting of two characters with spaces in between them as valid (e.g. “* *”).
We suggest you look at the cal.l and cal.y files to try to understand how the calculator was specified.
Task 2: Modify your infix calculator to print out the prefix form of infix expressions. In particular, all parentheses should have been removed in the prefix form. Also spaces should be inserted when needed to properly delimit expression terms. For examples:
“2+2” => “+ 2 2”
“x=3” => “= x 3”
“!x” => “!x”
Solution
task1
In task 1, a lexer (infix.l) and parser (infix.y) are coded to calculate
infix expressions, which may contain those operator + – * / (**) ! =.
The = operator also creates variables and their values.
To support variables, in the infix.l, tokens of type VAR are generated
for a string starting with a lower case letter and containing only letters,
digits or _.
To store the value of variables, in the infix.y, a linked list of variable
nodes are created. In this data structure, each node contains the name of
a var and its value. Meanwhile, the function get_var is defined to find
the value of a variable by traversing the linked list. The function put_var
puts a new variable into the list or updates the current value of the variable.
The yacc rules can access variable values via these functions.
infix.l
%{
#include /* for atoi call */
#include /* for strcpy call */
#define NODEBUG /* for debuging: print tokens and their line numbers */
#define NUMBER 258 /* copy this from cal.tab.c */
#define VAR 259
#define EXP 260
typedef union { /* copy this from cal.tab.c */
int d;
char var[32];
} YYSTYPE;
YYSTYPE yylval; /* for passing value to parser */
extern intlineNum; /* line number from cal.tab.c */
%}
%%
[ \t]+ {}
[\n] { lineNum++; return ‘\n’; }
“(” {
#ifdef DEBUG
printf(“token ‘(‘ at line %d\n”, lineNum);
#endif
return ‘(‘;
}
“)” {
#ifdef DEBUG
printf(“token ‘)’ at line %d\n”, lineNum);
#endif
return ‘)’;
}
“+” {
#ifdef DEBUG
printf(“token ‘+’ at line %d\n”, lineNum);
#endif
return ‘+’;
}
“-” {
#ifdef DEBUG
printf(“token ‘-‘ at line %d\n”, lineNum);
#endif
return ‘-‘;
}
“*” {
#ifdef DEBUG
printf(“token ‘*’ at line %d\n”, lineNum);
#endif
return ‘*’;
}
“/” {
#ifdef DEBUG
printf(“token ‘/’ at line %d\n”, lineNum);
#endif
return ‘/’;
}
[0-9]+ {
#ifdef DEBUG
printf(“token %s at line %d\n”, yytext, lineNum);
#endif
yylval.d = atoi(yytext);
return NUMBER;
}
“!” {
#ifdef DEBUG
printf(“token ‘!’ at line %d\n”, lineNum);
#endif
return ‘!’;
}
“**” {
#ifdef DEBUG
printf(“token ‘**’ at line %d\n”, lineNum);
#endif
return EXP;
}
[a-z][a-zA-Z0-9_]* {
#ifdef DEBUG
printf(“token ‘%s’ at line %d\n”, yytext, lineNum);
#endif
strcpy(yylval.var, yytext);
return VAR;
}
“=” {
#ifdef DEBUG
printf(“token ‘=’ at line %d\n”, lineNum);
#endif
return ‘=’;
}
%%
intyywrap() { /* need this to avoid link problem */
return 1;
}
infix.y
%{
#include
#include
#include
#include
intlineNum = 1;
void yyerror(char *ps, …) { /* need this to avoid link problem */
printf(“%s\n”, ps);
}
/* save the variable to value mapping */
struct node {
struct node *next;
char name[32];
intval;
};
struct node *vars = NULL;
struct node *find_var(const char name[]) {
struct node *it = vars;
while (it != NULL &&strcmp(it->name, name) != 0) {
it = it->next;
}
return it;
}
intget_var(const char name[]) {
/* if not found, return 0. */
struct node *it = find_var(name);
if (it == NULL) {
printf(“undefined variable %s, use default value 0\n”, name);
return 0;
}
return it->val;
}
void put_var(const char name[], intval) {
struct node *it = find_var(name);
if (it == NULL) {
it = (struct node *)malloc(sizeof(struct node));
strcpy(it->name, name);
it->next = vars;
vars = it;
}
it->val = val;
}
%}
%union {
int d;
char var[32];
}
// need to choose token type from union above
%token NUMBER
%token VAR
%token ‘(‘ ‘)’
%left ‘+’
%left ‘-‘
%left ‘*’
%left ‘/’
%left ‘!’
%left EXP
%type calexp factor term base
%start multiple
%%
multiple : stmt multiple |
;
stmt : assign ‘\n’ | cal ‘\n’
;
assign : VAR ‘=’cal
{ put_var($1, $3); }
;
cal : exp
{ printf(“=%d\n”, $1);
$$ = $1;
}
;
exp : exp ‘+’ factor
{ $$ = $1 + $3; }
| exp ‘-‘ factor
{ $$ = $1 – $3; }
| factor
{ $$ = $1; }
;
factor : factor ‘*’ term
{ $$ = $1 * $3; }
| factor ‘/’ term
{ $$ = $1 / $3; }
| term
{ $$ = $1; }
;
term : ‘!’ term
{ $$ = ~$2; }
| term EXP base
{ int r = 1;
for (int i = 0; i < $3; i++) {
r = r * $1;
}
$$ = r;
}
| base
{ $$ = $1; }
;
base : NUMBER
{ $$ = $1; }
| VAR
{ $$ = get_var($1); }
| ‘(‘ exp ‘)’
{ $$ = $2; }
| ‘(‘ exp
{ printf(“unmatched parenthesis\n”); $$ = $2; }
;
%%
int main() {
yyparse();
return 0;
}
task2
In task 2, the lexer and parser are coded to translate infix expressions
to prefix expressions. For example, 3 + 2 will result in + 3 2.
To implement this, most yacc rules are coded to return a string. In this
way, we can use string functions (for example, sprintf, strcpy) to manipulate
the resulting strings of rules together to construct the prefix version of
an infix expression.
infix.l
%{
#include /* for atoi call */
#include /* for strcpy call */
#define NODEBUG /* for debuging: print tokens and their line numbers */
#define NUMBER 258 /* copy this from cal.tab.c */
#define VAR 259
#define EXP 260
typedef union { /* copy this from cal.tab.c */
char var[32];
} YYSTYPE;
YYSTYPE yylval; /* for passing value to parser */
extern intlineNum; /* line number from cal.tab.c */
%}
%%
[ \t]+ {}
[\n] { lineNum++; return ‘\n’; }
“(” {
#ifdef DEBUG
printf(“token ‘(‘ at line %d\n”, lineNum);
#endif
return ‘(‘;
}
“)” {
#ifdef DEBUG
printf(“token ‘)’ at line %d\n”, lineNum);
#endif
return ‘)’;
}
“+” {
#ifdef DEBUG
printf(“token ‘+’ at line %d\n”, lineNum);
#endif
return ‘+’;
}
“-” {
#ifdef DEBUG
printf(“token ‘-‘ at line %d\n”, lineNum);
#endif
return ‘-‘;
}
“*” {
#ifdef DEBUG
printf(“token ‘*’ at line %d\n”, lineNum);
#endif
return ‘*’;
}
“/” {
#ifdef DEBUG
printf(“token ‘/’ at line %d\n”, lineNum);
#endif
return ‘/’;
}
[0-9]+ {
#ifdef DEBUG
printf(“token %s at line %d\n”, yytext, lineNum);
#endif
strcpy(yylval.var, yytext);
return NUMBER;
}
“!” {
#ifdef DEBUG
printf(“token ‘!’ at line %d\n”, lineNum);
#endif
return ‘!’;
}
“**” {
#ifdef DEBUG
printf(“token ‘**’ at line %d\n”, lineNum);
#endif
return EXP;
}
[a-z][a-zA-Z0-9_]* {
#ifdef DEBUG
printf(“token ‘%s’ at line %d\n”, yytext, lineNum);
#endif
strcpy(yylval.var, yytext);
return VAR;
}
“=” {
#ifdef DEBUG
printf(“token ‘=’ at line %d\n”, lineNum);
#endif
return ‘=’;
}
%%
intyywrap() { /* need this to avoid link problem */
return 1;
}
infix.y
%{
#include
#include
#include
#include
intlineNum = 1;
void yyerror(char *ps, …) { /* need this to avoid link problem */
printf(“%s\n”, ps);
}
%}
%union {
char buffer[32];
}
// need to choose token type from union above
%token NUMBER
%token VAR
%token ‘(‘ ‘)’
%left ‘+’
%left ‘-‘
%left ‘*’
%left ‘/’
%left ‘!’
%left EXP
%type assign calexp factor term base
%start multiple
%%
multiple : stmt multiple |
;
stmt : assign ‘\n’ { printf(“%s\n”, $1); }
| cal ‘\n’ { printf(“%s\n”, $1); }
;
assign : VAR ‘=’cal
{ sprintf($$, “= %s %s”, $1, $3); }
;
cal : exp { strcpy($$, $1); }
;
exp : exp ‘+’ factor
{ sprintf($$, “+ %s %s”, $1, $3); }
| exp ‘-‘ factor
{ sprintf($$, “- %s %s”, $1, $3); }
| factor
{ strcpy($$, $1); }
;
factor : factor ‘*’ term
{ sprintf($$, “* %s %s”, $1, $3); }
| factor ‘/’ term
{ sprintf($$, “/ %s %s”, $1, $3); }
| term
{ strcpy($$, $1); }
;
term : ‘!’ term
{ sprintf($$, “! %s”, $2); }
| term EXP base
{ sprintf($$, “** %s %s”, $1, $3); }
| base
{ strcpy($$, $1); }
;
base : NUMBER
{ strcpy($$, $1); }
| VAR
{ strcpy($$, $1); }
| ‘(‘ exp ‘)’
{ strcpy($$, $2);}
| ‘(‘ exp
{ printf(“unmatched parenthesis\n”); strcpy($$, $2); }
;
%%
int main() {
yyparse();
return 0;
}