The goal of this assignment is to implement a regular expression evaluator library. The evaluator library will provide an interface for creating regular expression objects and pattern matching against input strings. The result of the matching operation is simply to return true if the match succeeds and false otherwise. You are welcome to provide any implementation/representation for your regular expressions as long as you conform to the library interface that we provide.
Read this entire document. If, after a careful reading, something seems ambiguous or unclear to you, then communicate to the course staff immediately. Start this assignment as soon as possible. Do not wait until 5pm the night before the assignment is due to tell us you don’t understand something, as our ability to help you will be minimal.
Reminder: Copying partial or whole solutions, obtained from other students or elsewhere, is academic dishonesty. Do not share your code with your classmates, and do not use your classmates’ code. If you are confused about what constitutes academic dishonesty you should re-read the course syllabus and policies. We assume you have read the course information in detail and by submitting this assignment you have provided your virtual signature in agreement with these policies.
You are responsible for submitting project assignments that compile and are configured correctly. If your project submission does not follow these policies exactly you may receive a grade of zero for this assignment.
Policies:
For many assignments, it will be useful for you to write additional Scala files. Any Scala file you write that is used by your solution MUST be in the provided src/main directory you submit to Moodle.
The course staff are here to help you figure out errors (not solve them for you), but we won’t do so for you after you submit your solution. When you submit your solution, be sure to remove all compilation errors from your project. Any compilation errors in your project will cause the auto-grader to fail your assignment, and you will receive a zero for your submission. No Exceptions!
In the src/test/scala
directory, we provide ScalaTest test suites that will help you keep on track while completing the assignment. We recommend you run the tests often and use them to help create a checklist of things to do next. But, you should be aware that we deliberately do not provide you the full test suite we use when grading.
We recommend that you think about possible cases and add new test cases to these files as part of your programming discipline. Simple tests to add will consider questions such as:
Do your methods handle edge cases such as integer arguments that may be positive, negative, or zero? Many methods only accept arguments that are in a particular range.
Does your code handle unusual cases, such as empty or maximally-sized data structures?
More complex tests will be assignment-specific. To build good test cases, think about ways to exercise functions and methods. Work out the correct result for a call of a method with a given set of parameters by hand, then add it as a test case. Note that we will not be looking at your test cases (unless otherwise specified by the assignment documentation), they are just for your use and will be removed by the auto-grader during the evaluation process.
If you modify the test cases we provided or you added your own it is important to know that they will not be used. The auto-grader will use its own copy of the public and private tests. If you modify any source files in the src/test/scala
directory your changes will not be reflected in the grading of your submission.
Before submitting, make sure that your program compiles with and passes all of the original tests. If you have errors in these files, it means the structure of the files found in the src
directory have been altered in a way that will cause your submission to lose some (or all) points.
The project should normally contain the following root items:
src/main/scala
: This is the source folder where all code you are submitting must go. You can change anything you want in this folder (unless otherwise specified in the problem description and in the code we provide), you can add new files, etc.
src/instructor/scala
: This folder contains support code that we encourage you to use (and must be used to pass certain tests). You must not change or add anything in this folder. The auto-grader will replace your submitted src/instructor
directory with the original during the evaluation process. If you make changes or add/remove anything it can lead to problems in your submission and will result in a 0 for the assignment.
src/test/scala
: The test folder where all of the public unit tests are available.
tools/grading-assistant.jar
: This is the grading assistant that you can run to give you an estimate of your score as well as package your project to be submitted to Moodle. More on this later. Do not remove.
activator
: This is a Unix helper script that can be used to run the activator interactive build tool. You can use this on Linux and Mac OSX. Do not remove.
activator.bat
: This is a Windows helper script that can be used to run the activator interactive build tool. You can use this on Windows. Do not remove.
activator-launch-1.3.2.jar
: This is a Scala/Java library that runs the activator tool and is used by the previously mentioned scripts. Do not remove.
build.sbt
: This is the build definition file. It provides build information to activator to build your project. Do not remove.
.gitignore
: This is a special file used by git source control to ignore certain files. You should not touch this file and please do not remove. This file may not be present if the assignment does not call for git usage.
As mentioned previously, you are provided a set of unit tests that will test various aspects of your implementation. You should get in the habit of running the tests frequently to see how you are doing and to understand where you might be going wrong. The ScalaTest testing framework is built-in to the activator tool and you can easily run the tests by issuing the following command from the command line (Mac/Linux):
./activator test
For Windows users you would issue the following command from the command window:
activator.bat test
This will compile your code and run the public ScalaTest unit tests. After you compile and run the tests you will notice that a target directory has been created. The target directory contains the generated class files from your source code as well as information and results of the tests. Activator uses this directory so it must not be removed. After you run the tests you can get a grade report from the Jeeves tool by issuing this command from the command line (Mac/Linux):
scala -cp tools/grading-assistant.jar autograder.jeeves.Jeeves
For Windows users you would issue the following command from the command window:
scala -cp tools\grading-assistant.jar autograder.jeeves.Jeeves
Note, issuing the above command from the activator console will not work! This will print a report to the console showing you the tests you passed, tests you failed, and a score based on the public tests. Although this gives you a good estimate as to what your final score might look like, it does not include points from our private tests. You may run Jeeves as often as you like, however, you must run the tests before your run Jeeves to give you an updated result.
Another very useful approach to test and play with the code you write is the activator/Scala console. You can run the console with this command (Mac/Linux):
./activator console
For Windows users you would issue the following command from the command window:
activator.bat console
This will load up the Scala REPL (read-eval-print-loop). You can type code directly into the console and have it executed. If you want to cut and paste a larger segment of code (e.g., function declaration) you simply type :paste in the console, then paste in your code, then type control-D.
In addition, you can run activator
without any commands and you will be presented with the following prompt:
./activator
>
Or on Windows:
activator.bat
>
From the activator prompt you can type in any of the following commands:
compile
: this will compile your code~compile
: this will watch your files for changes and compile them when a change is detected.test
: this will compile your code, the test code, and run the testsconsole
: runs the scala
console.You are welcome to use any editor or IDE you choose. We recommend using a basic text editor such as Atom, SublimeText, Notepad++, emacs, or vim.
If you use a text editor you should use activator in a separate terminal window to compile, run, and test your code.
If you have successfully installed and imported your project into IntelliJ you are welcome to continue using it.
You can generate the scaladoc documentation from the provided project source code running the following command from sbt
:
$ sbt
> sbt doc
This will generate HTML documentation into the directory target/scala-X.YZ/api/index.html
. You can then open that index.html
file in any browser and read the documentation easily. You can also read the documentation from the source code.
Your project is distributed with an sbt
formatted project structure. The files that are most important include:
src/main/scala/cs220/regex/RegEx.scala
: This file defines the interface for the classes you need to implement representing regular expressions as well as the interface your library will export to the client that uses it. In particular, you will notice the RegEx
and RegExLanguage
trait defined in this file.
src/main/scala/cs220/regex/Factory.scala
: This file is used to export your implementation of the RegExLanguage
trait described below. It offers us nothing more than an entry point into your library without dictating the names of your files.
The RegEx
trait defines 6 methods, 4 are abstract and 2 we provide an implementation for. Here is a summary of these methods:
delta: RegEx: The delta
method implements the δ function defined in the [notes]. That is, it returns the “empty” regular expression ε if the regular expression matches “empty” or ∅ if the regular expression matches “null”. Recall that the empty regular expression ε matches the empty string and ∅ matches nothing.
isEmpty: Boolean: This method returns true if it matches empty or false otherwise. That is, if we have a regular expression ab
then invoking the isEmpty
method on ab
would be false.
derivative(c: Char): RegEx: This method implements the rule for regular expression derivatives in the [notes]. You will need to implement each rule for each of the corresponding regular expressions that you need to create (different classes). The derivative
method yields a new regular expression (RegEx
object) with respect to the character argument c
. Note, the derivative rules are exactly that - they describe rules that generate a new regular expression based off of the current regular expression and some character c
.
normalize: RegEx: This method implements the “evaluation” of a regular expression after taking its derivative. This method implements the implicit nature of the theoretical aspect of regular expression derivatives. You can see how we use it in the matches
method described below. That is, after you apply the derivative we must normalize its form with respect to the application of delta
and the derivative
. Here are the rules you need to consider during normalization (we use N to indicate a call to normalize
):
You can see another form of these rules in the documentation for this method.
matches(s: String): Boolean: This method is simple - it calls the method toList
on the String s
and calls matches
. It is the only publicly available method from the RegEx
trait. You do not need to modify this method.
matches(cc: List[Char]): Boolean: This method implements the matching rules which you can find in the [notes]. In particular, it pattern matches on the list of characters. If we are at the end of the list (Nil
) then we check to see if this regular expression is empty by invoking its isEmpty
method. Otherwise, we have a character c
followed by the remaining characters cc
in the list. As rule R2 suggests we then apply the derivative with respect to character c
, normalize the resulting regular expression with the normalize
method, then recursively call matches
on the remaining characters in the list. Thus, we keep matching character by character, taking the derivative of the regular expression, until we reach the end of the list of characters - where we then check if this regular expression matches empty (R1).
The RegExLanguage
trait defines abstract methods for each of the corresponding regular expression types that can be created. Here is a summary of these methods:
empty: RegEx: This method returns a regular expression object representing the “empty” regular expression.
char(c: Char): RegEx: This method returns a regular expression object representing the character regular expression (the regular expression that matches a single character c
).
seq(re1: RegEx, re2: RegEx): RegEx: This method returns a regular expression that represents a sequence of regular expressions re1
and re2
. Note, to represent a regular expression of a sequence of characters such as /foo/
we could do the following: seq(char('f'), seq(char('o), seq(char('o), empty)))
.
alt(re1: RegEx, re2: RegEx): RegEx: This method returns a regular expression that represents an alternative regular expression. That is, a regular expression such as /a|b/
(matches either “a” or “b”), which we would create with alt(char('a), char('b))
.
str(s: String): RegEx: This method creates a regular expression for matching the string s
. As we mentioned above this can (and should) be implemented by a sequence of characters.
closure(re: RegEx): RegEx: This method implements the closure regular expression. For example, /a*/
matches 0 or more a
’s, which we would represent as closure(char('a'))
.
The Factory
object contains a single method re
which returns an implementation of the RegExLanguage
trait. You will need to implement this method as described below.
Your first task is to implement the regular expression classes. That is, for each of the regular expressions that can be created by the methods in the RegExLanguage
trait (except for str
) you must create a class that extends the RegEx
trait. In addition, you will need to create a class to represent the null regular expression ∅ because you will need to use this as part of the implementation of delta
and normalize
.
The best way to approach any project is to work incrementally. So, first create the regular expression classes and implement each of the abstract methods using ???
. You can have sbt
compiling by your side at each step of the way to make sure you are writing correct Scala.
After you have implemented each of the regular expression classes you must override the toString
method to return a string representation of the regular expression. Thus, alt(closure(char('a')),char('b'))
would be “(a*|b)“.
Because we are building a library you must make sure that each of your regular expression classes are private to the regex
package. That is, we should not have access to those classes outside of that package.
Do not implement the isEmpty
, derivative
, delta
, and normalize
methods at this point.
The next task is to extend the RegExLanguage
trait with an implementation of each of the abstract methods. These methods should use your implementation of the RegEx
trait from Part 1 to create proper regular expression objects representing each of the regular expressions described in Part 1.
You should then modify the ???
implementation of the Factory.re
method (in the Factory.scala
file) to return an instance of you implementation. In doing so, you have provided public access to those functions are can start creating your regular expressions in the repl:
> console
scala> import cs220.regex._
scala> val re = Factory.re
scala> re.char('f')
scala> re.str('foo')
Note: it might be best to implement each method one-by-one and test them in the repl to make sure they are working properly. You may want to add some debugging methods to your Factory that will print out the structure of your regular expressions to make sure you have created them properly. Do not change anything to public in the provided code as we will be checking to make sure that you adhere to the access rights of your library.
At this point you should be able to construct regular expressions through the regular expression library API. Now, we need to implement matching. To do this you must implement each of the corresponding abstract methods in the RegEx
trait in your regular expression classes. You should implement each of the methods in the order described below.
isEmpty: Boolean: This method is simple. You should simply return a boolean for each regular expression that indicates if it is “empty” or “null”.
delta: RegEx: Use the rules defined in the [notes] for the δ function to implement this method for each of your regular expression classes (including null). Most of the implementations are extremely simple. The implementation for sequences and alternative require some additional work to determine if they are “empty” or “null”.
derivative(c: Char): RegEx: This is the hardest function to implement. However, if you make sure you understand the rules covered in the [notes] you will be able to construct the proper regular expressions given the rules. Remember, all you are doing in this method is creating a new regular expression that represents the derivative of the regular expression you are invoking the derivative
method on. Again, the most complicated of these are the sequence and alternative regular expressions. Thus, you should start with the simply ones first. Perhaps add some debugging methods in your Factory
object to help with debugging purposes (as this is the only way to gain access to the internals). Again, do not modify the access modifiers on any of the provided code.
normalize: RegEx: If you follow the rules outlined above for normalization you should not have too much trouble with this method. This method is simplifying the regular expression based on the result of taking the regular expression’s derivative. Again, start with the simple rules first, add some debugging support, and do not modify the public API that is provided.
Notes: You should test your code step-by-step to ensure that it is working properly. Do not write all of this at once and then test - it will be too confusing and harder to debug. You should implement the simple regular expressions first and then run the unit tests to see if it works on the simple cases. You can also test your implementation from the console. For example, implement the “empty” regular expression first and test to see if matching works:
> console
scala> import cs220.regex._
scala> val re = Factory.re
scala> re.empty matches ""
res0: Boolean = true
scala> re.empty matches "x"
res1: Boolean = false
Then try to see if the character regular expression works:
scala> re.char('c') matches "c"
res2: Boolean = true
scala> re.char('c') matches "x"
res3: Boolean = false
scala> re.char('c') matches "cc"
res4: Boolean = false
Continue in this fashion implementing and testing your code along the way. In the end, you should run the unit tests and verify that all of the provided tests pass.
Your submission will be scored on the number of tests that are successfully passed. In addition, the course staff will be reviewing each submission for proper scaladoc documentation, comments, and style. All categories will contribute to your final score for this assignment.
When you have completed the changes to your code, you must run the following command to package up your project for submission (Mac/Linux):
scala -cp tools/grading-assistant.jar submission.tools.PrepareSubmission
On Windows:
scala -cp tools\grading-assistant.jar submission.tools.PrepareSubmission
This will package up your submission into a zip file called submission-DATE.zip
, where DATE
is the date and time you packaged your project for submission. After you do this, log into Moodle and submit the generated zip file.
After you submit the file to Moodle you should ensure that Moodle has received your submission by going to the activity page for the assignment and verifying that it has indeed been uploaded to Moodle. To further ensure that you have uploaded the correct zip file you should download your submission from Moodle and verify that the contents are what you expect.
We do not allow re-submissions after the assignment due date even if you failed to submit the proper file or forgot. There are no exceptions so be very sure that you have submitted properly.
You should test that you can run the PrepareSubmission
tool early so you do not have trouble packaging your assignment up for submission minutes before the deadline.