This assignment exercises your understanding of Scala and design patterns. In addition, it will teach you a new algorithm that is entertaining in what it produces. We are often faced with writing programs that must consume text, transform that text, and generate textual output. The problem we are asking you to solve in this project assignment is exactly this type of program, albeit a bit unusual. In particular, you will complete a program that generates random English text, that reads well, from books in text format.
In the wonderful book The Practice of Programming (POP), by Brian Kernighan and Rob Pike, they describe an algorithm known as the Markov chain algorithm. Before you begin this assignment please read Chapter 3 of that book to gain an understanding of the algorithm. The excerpt from the book describes two implementations in C and in Java. Clearly, we will be implementing our version in Scala. In addition, we will be using the composition design pattern and recursion and immutable data structures rather than the mutable implementation described in Chapter 3 of POP.
Read this entire document. If, after a careful reading, something seems ambiguous or unclear to you, then communicate to the course staff immediately. Start this assignment as soon as possible. Do not wait until 5pm the night before the assignment is due to tell us you don’t understand something, as our ability to help you will be minimal.
Reminder: Copying partial or whole solutions, obtained from other students or elsewhere, is academic dishonesty. Do not share your code with your classmates, and do not use your classmates’ code. If you are confused about what constitutes academic dishonesty you should re-read the course syllabus and policies. We assume you have read the course information in detail and by submitting this assignment you have provided your virtual signature in agreement with these policies.
You are responsible for submitting project assignments that compile and are configured correctly. If your project submission does not follow these policies exactly you may receive a grade of zero for this assignment.
Policies:
For many assignments, it will be useful for you to write additional Scala files. Any Scala file you write that is used by your solution MUST be in the provided src/main directory you submit to Moodle.
The course staff are here to help you figure out errors (not solve them for you), but we won’t do so for you after you submit your solution. When you submit your solution, be sure to remove all compilation errors from your project. Any compilation errors in your project will cause the auto-grader to fail your assignment, and you will receive a zero for your submission. No Exceptions!
In the src/test/scala
directory, we provide ScalaTest test suites that will help you keep on track while completing the assignment. We recommend you run the tests often and use them to help create a checklist of things to do next. But, you should be aware that we deliberately do not provide you the full test suite we use when grading.
We recommend that you think about possible cases and add new test cases to these files as part of your programming discipline. Simple tests to add will consider questions such as:
Do your methods handle edge cases such as integer arguments that may be positive, negative, or zero? Many methods only accept arguments that are in a particular range.
Does your code handle unusual cases, such as empty or maximally-sized data structures?
More complex tests will be assignment-specific. To build good test cases, think about ways to exercise functions and methods. Work out the correct result for a call of a method with a given set of parameters by hand, then add it as a test case. Note that we will not be looking at your test cases (unless otherwise specified by the assignment documentation), they are just for your use and will be removed by the auto-grader during the evaluation process.
If you modify the test cases we provided or you added your own it is important to know that they will not be used. The auto-grader will use its own copy of the public and private tests. If you modify any source files in the src/test/scala
directory your changes will not be reflected in the grading of your submission.
Before submitting, make sure that your program compiles with and passes all of the original tests. If you have errors in these files, it means the structure of the files found in the src
directory have been altered in a way that will cause your submission to lose some (or all) points.
The project should normally contain the following root items:
src/main/scala
: This is the source folder where all code you are submitting must go. You can change anything you want in this folder (unless otherwise specified in the problem description and in the code we provide), you can add new files, etc.
src/instructor/scala
: This folder contains support code that we encourage you to use (and must be used to pass certain tests). You must not change or add anything in this folder. The auto-grader will replace your submitted src/instructor
directory with the original during the evaluation process. If you make changes or add/remove anything it can lead to problems in your submission and will result in a 0 for the assignment.
src/test/scala
: The test folder where all of the public unit tests are available.
tools/grading-assistant.jar
: This is the grading assistant that you can run to give you an estimate of your score as well as package your project to be submitted to Moodle. More on this later. Do not remove.
activator
: This is a Unix helper script that can be used to run the activator interactive build tool. You can use this on Linux and Mac OSX. Do not remove.
activator.bat
: This is a Windows helper script that can be used to run the activator interactive build tool. You can use this on Windows. Do not remove.
activator-launch-1.3.2.jar
: This is a Scala/Java library that runs the activator tool and is used by the previously mentioned scripts. Do not remove.
build.sbt
: This is the build definition file. It provides build information to activator to build your project. Do not remove.
.gitignore
: This is a special file used by git source control to ignore certain files. You should not touch this file and please do not remove. This file may not be present if the assignment does not call for git usage.
As mentioned previously, you are provided a set of unit tests that will test various aspects of your implementation. You should get in the habit of running the tests frequently to see how you are doing and to understand where you might be going wrong. The ScalaTest testing framework is built-in to the activator tool and you can easily run the tests by issuing the following command from the command line (Mac/Linux):
./activator test
For Windows users you would issue the following command from the command window:
activator.bat test
This will compile your code and run the public ScalaTest unit tests. After you compile and run the tests you will notice that a target directory has been created. The target directory contains the generated class files from your source code as well as information and results of the tests. Activator uses this directory so it must not be removed. After you run the tests you can get a grade report from the Jeeves tool by issuing this command from the command line (Mac/Linux):
scala -cp tools/grading-assistant.jar autograder.jeeves.Jeeves
For Windows users you would issue the following command from the command window:
scala -cp tools\grading-assistant.jar autograder.jeeves.Jeeves
Note, issuing the above command from the activator console will not work! This will print a report to the console showing you the tests you passed, tests you failed, and a score based on the public tests. Although this gives you a good estimate as to what your final score might look like, it does not include points from our private tests. You may run Jeeves as often as you like, however, you must run the tests before your run Jeeves to give you an updated result.
Another very useful approach to test and play with the code you write is the activator/Scala console. You can run the console with this command (Mac/Linux):
./activator console
For Windows users you would issue the following command from the command window:
activator.bat console
This will load up the Scala REPL (read-eval-print-loop). You can type code directly into the console and have it executed. If you want to cut and paste a larger segment of code (e.g., function declaration) you simply type :paste in the console, then paste in your code, then type control-D.
In addition, you can run activator
without any commands and you will be presented with the following prompt:
./activator
>
Or on Windows:
activator.bat
>
From the activator prompt you can type in any of the following commands:
compile
: this will compile your code~compile
: this will watch your files for changes and compile them when a change is detected.test
: this will compile your code, the test code, and run the testsconsole
: runs the scala
console.You are welcome to use any editor or IDE you choose. We recommend using a basic text editor such as Atom, SublimeText, Notepad++, emacs, or vim.
If you use a text editor you should use activator in a separate terminal window to compile, run, and test your code.
If you have successfully installed and imported your project into IntelliJ you are welcome to continue using it.
After you download and unzip the starter kit you should take some time reviewing the provided code and comments. In particular, there are two important files. Here is the first:
this file contains several definitions that provide a starting structure for the implementation of the markov chain algorithm. Here is a summary of what is included:
class Prefix
: a class representing two-word prefixes.
object Prefix
: an object with a utility function generating an empty prefix. This is important for implementing the start of the Markov Chain algorithm.
class Suffix
: a class representing suffixes. Yes, it is simply but it is often easier to work with an abstraction that makes sense rather than String
.
trait ChainAlgorithm
: a base trait that represents a piece of functionality used in the markov chain algorithm. It provides only a single definition for a type alias called Chain
. The purpose of this is two-fold: (1) is shortens the long definition of the Map type; and (2) it enforces the use of an immutable map in classes that extend this trait. A type alias is simply giving a name to another type and being able to use it in its place. As the definition of the Map
type is so long, the type alias helps shorten it up leading to code that is more clear. You can use the Chain
type in place of the Map
in the code you implement.
trait Builder
: this trait represents the part of the markov algorithm used to build up the mapping from prefixes to suffixes (the chain). It contains a single abstract method called build
that takes an array of file names and produces a Chain
.
trait Generator
: this trait represents the part of the markov algorithm that generates the random text from the Chain
. It contains a single abstract method called generate
that takes a Chain
built from a Builder
, the number of words to generate, and a Randomizer
(see below). The result is a string containing the randomly generated paragraph.
trait Randomizer
: this trait contains a single method called random
that randomly generates an integer between 0 (inclusive) and n
(exclusive). It is used as part of the generation part of the markov algorithm. We extract it out into its own trait so we can use a deterministic random number generator to test your implementation.
class Markov
: this is an abstract class that uses composition in its definition. That is, it defines the builder and generator as abstract - expecting a subclass to implement and assign instances of a Builder
and Generator
. It then defines an apply
method that invokes the builder and generator appropriately.
The second important file is:
This file is the one you are to modify. We have provided implementation for several parts of the problem to get you going. Your job is to complete the implementation of MyBuilder
and MyGenerator
. We have also included lots of documentation that explains what you need to do for each of the methods labeled as TODO.
Unlike the implementations described in the associated POP text, this implementation of the Markov algorithm is recursive and it uses only immutable collection types (i.e., Map
and Vector
). First, you should read the documentation in the code for all the functions in the MyBuilder
class. You are to complete the implementation of several smaller utility functions and for the two buildX
methods. Although, there is not much code to write - the implementation is subtle. You should reference the POP text to understand how they implement the building phase of Markov. Be careful, though, as their implementation mutates state. Our implementation does not mutate any state - not even a Prefix
.
You are not allowed to use any loops, mutable var
variables, or mutable collection types in your implementation of these functions.
You should start by implementing the smaller helper functions. In particular, the suggested order is:
splitWords
getFirstWord
getRestOfWords
addSuffix
addToChain
Once the tests pass for these simple functions, you should then proceed with the rest of the functions using the implementation of the above:
buildWithLines
buildWithWords
Note, these two functions are tail recursive. You can read up on what this means in section 8.9 on page 159 of Programming in Scala. Although the explanation of tail recursion does not change your implementation it does help you understand why our approach to solve this problem recursively can be just as efficient as using loops and mutable data structures.
The implementation of MyGenerator
is also recursive. We give you the implementation of the first generate
method. Your job is to implement several smaller utility functions and the core of this algorithm. You should reference the POP text to understand how it is implemented in C and Java. You will need to bridge your understanding of that implementation into one that is recursive.
You are not allowed to use any loops, mutable var
variables, or mutable collection types in your implementation of these functions.
You should start by implementing the smaller helper functions. In particular, the suggested order is:
getSuffixes
getSuffixesLength
getRandomSuffix
isSuffixNewline
Once the tests pass for these simple functions, you should then proceed with the rest of the functions using the implementation of the above:
generateAll
When you have completed the changes to your code, you must run the following command to package up your project for submission (Mac/Linux):
scala -cp tools/grading-assistant.jar submission.tools.PrepareSubmission
On Windows:
scala -cp tools\grading-assistant.jar submission.tools.PrepareSubmission
This will package up your submission into a zip file called submission-DATE.zip
, where DATE
is the date and time you packaged your project for submission. After you do this, log into Moodle and submit the generated zip file.
After you submit the file to Moodle you should ensure that Moodle has received your submission by going to the activity page for the assignment and verifying that it has indeed been uploaded to Moodle. To further ensure that you have uploaded the correct zip file you should download your submission from Moodle and verify that the contents are what you expect.
We do not allow re-submissions after the assignment due date even if you failed to submit the proper file or forgot. There are no exceptions so be very sure that you have submitted properly.
You should test that you can run the PrepareSubmission
tool early so you do not have trouble packaging your assignment up for submission minutes before the deadline.