C++代写：EECS665CompilerConstruction

发布日期: 2020-08-28

使用 [ Flex ](https://www.geeksforgeeks.org/flex-fast-lexical-analyzer-
generator/ “Flex”) 实现一个名为dmc的编译器。
![Flex](https://media.geeksforgeeks.org/wp-
content/uploads/how_flex_is_used.jpeg)

Overview

In this project, you will begin construction of dmc, the compiler for our new
language, named (sigh) “Drewno Mars”. We will continue to build on the
codebase started in this project throughout the semester until we have a
fully-functioning compiler. For this first assignment, you will implement the
scanner module using the scanner-generator Flex.
The scanner of dmc will transform character sequences from an input file into
a token stream. The ultimate intention is that the input file will contain
what the user expects to be valid Drewno Mars source code (though it may be
invalid / contain errors). Of course, the scanner only cares if the input
contains valid tokens so don’t expect that input will correspond to a valid
program or try to identify errors beyond those listed below. In later
projects, the token stream will be consumed internally as part of the
compiler’s frontend, but for now the code should dump the token stream in text
format, and report any errors to stderr.
The main “action” of the project will be building a lexical specification (a
.l file) that recognizes tokens and reports errors according to the format
listed below.

Requirements

This section contains information that affects how your project will be
graded. It should be considered mandatory reading.

YOUR SUBMISSION

You should submit a single tarred-and-zipped directory containing source code
and a makefile such that the following invocations build and execute your
code:
make
./dmc <infile.dm> -t <tokens.txt> 2> <errors.txt>
Where

is an input file to the compiler (described in detail below).
is the file to which the token stream will be written in the required output format (described in detail below).
is the file to which the required error messages will be written in the required error message format (described in detail below).
You should include a file named TEAM.txt in your tarball with each person who
worked on the project included in the following format. (lastname1,firstname1)
(lastname2,firstname2) You may work with a partner for the project. If you do,
include both names. Otherwise, include only your name in TEAM.txt.

INPUT FORMAT

You should expect the input to be a file containing ASCII characters. The
lexical details of Drewno Mars are presented in the lexical details section of
the language specification. That document should serve as a complete guide for
all of the tokens you need to recognize.
Whitespace separates tokens and changes the character counter, but should
otherwise be ignored (except inside a string literal).
If you are not sure which token kind matches which token, ask!

OUTPUT FORMAT

The token stream should be output with 1 token per line in the form
[,]
Where is the name of the token for all tokens except for:

string literals (STRINGLITERAL), which will be of the form STRINGLIT: where is the string lexeme matched (including quotation marks). Newline and tab specifiers should be printed literally.
integer literals (INTLITERAL), which will be of the form INTLIT: where is the integer lexeme matched.
Identifiers (ID), which will be of the form ID: where is the lexeme matched.

ERRORS AND WARNINGS

The scanner should handle the following errors ( and should be replaced by
the position of the first line and character of the error, respectively. <2
and <2 should be replaced by the position of the last line and character of
the error, respectively.)
Illegal characters: Issue the error message: FATAL [<l1>,<c1>-<l2>,<c2>]: Illegal character <ch> (where is the illegal character) and resume lexing
after the character.
Unterminated string literals: A string literal is considered to be
unterminated if there is a newline or end-of-file before the closing quote.
Issue the error message: FATAL [<l1>,<c1>-<l2>,<c2>]: Unterminated string literal detected and continue past the unterminated string literal. Start
looking for the next token after the newline.
Bad escape sequences in string literals: This case occurs if a string inclues
a bad escape sequence; i.e., a backslash followed by a q (since \q isn’t part
of Drewno Mars). Issue the error message: FATAL [<l1>,<c1>-<l2>,<c2>]: String literal with bad escape seqeunce detected and ignore the string
literal and start looking for the next token after the closing quote.
Bad escape sequences in unterminated string literals: If the string literal
has a bad escape sequence and is unterminated, issue the error message FATAL [<l1>,<c1>-<l2>,<c2>]: Unterminated string literal with bad escaped sequence detected Start looking for the next token after the newline. Note that a
string literal that has a newline immediately after a backslash should be
treated as having a bad escaped character and being unterminated. For example,
given:
“very bad string
abc
the scanner should report an unterminated string literal with a bad escaped
character on line 1, and an identifier on line 2.

Integer literals overflow

If an integer literal is larger than the 32-bit INT_MAX of C (i.e. 2147483647)
issue the error message FATAL [<l1>,<c1>-<l2>,<c2>]: Integer literal overflow.
Note that because negative numbers are lexed as two tokens, underflow is not
reported.

Line/Column numbers

For unterminated string literals, bad string literals, and bad integer
literals, the line and column numbers used in the error message should
correspond to the position of the first character in the string/integer
literal.

Advice

This section contains tips and guidance. It may be helpful, but it’s optional
reading.

GETTING STARTED

It would be super cool and totally possible to complete this project from
scratch, using only your own code. However, you may choose to use the “starter
code”. This code contains the basic structure of a reasonable solution for the
project. The starter code is provided here: p1.tgz
Hopefully, you can see what this code does by looking at it (it would be good
practice to poke around), but there is also some documentation about these
files written on the Project 1 doc page.

FLEX

Setup: If you want to set up Flex on your own machine, it’s available in
Ubuntu via apt install flex. If you are working on an EECS Linux machine, it
will already be installed.
Docs: If you need more guidance with Flex for generating lexers, check out the
on-line Flex reference manual, and/or the class Flex notes for information
about writing a Flex specification.
Gotchas: Flex-generated scanners have some behaviors that can appear
surprising to beginners. A couple of things to remember:

Scanners start greedily consuming characters from the end of the last match until no longer token can possibly be produced. If there are multiple tokens that could all be of that same longest length, the rule that appears earliest in the Flex file is used. Thus, you should make sure your reserved keywords appear in your Flex file before your identifiers to ensure keywords pre-empt identifiers.
Token “kinds” (i.e., values to be returned by the scanner) are defined in the file frontend.hh, and you can see them in action in scanner.cpp, For example, the kind for the token to be returned when an integer literal is recognized is INTLITERAL and the token to be returned when the reserved word int is recognized is INT.
Note that code telling Flex to return the special EOF token on end-of-file has already been included in the file Drewno Mars.l – you don’t have to include a specification for that token.

What the Scanner Should Do

The main job of the scanner is to identify and return the next token. The
value to be returned includes:

The token “kind” (e.g., INTLITERAL). Token kinds are defined in the file frontend.hh.
The line number in the input file on which the token starts.
The number of the character on that line at which the token starts.
For identifiers, integer literals, and string literals: the actual value (a String, an int, or a String, respectively). For a string literal, the value should include the double quotes that surround the string, as well as any backslashes used inside the string as part of an “escaped” character.
Your scanner will return this information by creating a new Token object in
the action associated with each regular expression that defines a token (the
Token type is defined externally; you don’t need to look at that definition).
A Token includes a field of type int for the token kind, and a field of store
auxilary information about the token (its line, column, and value - if it
needs to store a value). The given .l file already includes an example of how
to call the Token (subclass) constructors. The given tokens.cpp file has
example code that accesses the fields of a Token.
In your compiler, the Token will actually be subclassed, with a value()
function to get the actual value; that type is defined in lexer.l. Every
TokenVal includes a linenum field, and a charnum field. Subtypes of Token with
more fields will be used for the values associated with identifier, integer
literal, and string literal tokens. These subtypes, IntLitToken, IDToken, and
StrLitToken are also defined in Drewno Mars.l.
You should keep track of the line and character number, but it’s not really
that hard. Line counting is already done for you in the sample file, and
column counting can be done by adjusting the charNum variable according to the
amount of characters consumed. The code in Drewno Mars.l does this for some of
the patterns that it defines, and you should be able to figure out how to do
the same thing for the new patterns that you add.
The Flex scanner also provides a macro yytext that returns the actual text
that matches a regular expression. You will find it useful to use this method
in the actions you write in your Flex specification.
Note that, for the integer literal token, you will need to convert a string
(the value scanned) to an int (the value to be returned). You should use code
like the following:
double d = std::stod(yytext); // convert String to double
// INSERT CODE HERE TO CHECK FOR BAD VALUE – SEE ERRORS AND WARNINGS BELOW
int k = atoi(yytext) // convert to int
—|—

Testing

A good practice is to write test cases for your project, as discussed in the
Flex Lab. You should run your test suite by invoking the command
make test
which runs the test rule in the Makefile (which should, in turn, invoke each
element of the test suite).
To test that your scanner correctly handles an unterminated string literal
with end-of-file before the closing quote, you will need to create a file that
does NOT end in a newline, which many editors do automatically.
On a Linux machine, you can tell that there is no final newline by typing: cat
eof.txt or od -a eof.txt od will print the newline explicitly. For cat, you
should see your command-line prompt at the end of the last line of the output
instead of at the beginning of the following line for cat, and the file will
not display a newline.
In order to create a file with no final newline, you can issue the command
echo -n “Hello world” > no_newline.txt
Which will create a file “no_newline.txt” containing the text “Hello world”
with no newline afterwards. You may also want to create a file with a quote,
which can be done using
echo -n “"Hello world” > no_newline.txt