实现一个 Ripple Effect 的游戏。
![Ripple Effect](https://i0.wp.com/workology.com/wp-
content/uploads/2013/08/ripple-effect.jpg?ssl=1)
Problem Description
Ripple Effect or Hakyuu is a logic puzzle somewhat similar to Sudoku. The
puzzle consists of a rectangular grid divided into regions called rooms or
cages. I’ll go with rooms. Some of the cells are already filled with numbers.
Most cells are empty.
Just as with Sudoku, the goal is to fill the empty cells with numbers so that
a number of rules are satisfied. The rules of Ripple Effect are as follows:
- Room Rule: Each room of size s must contain every number between 1 and s exactly once.
- Cross Rule: If a cell c contains some value v, then another cell c in the same row or column can also contain value v only if the distance between c and c is more than v. For example, if c is at position (r, c) and stores the value 3, then none of the cells at positions (r - 3, c), (r - 2, c), (r - 1, c), (r + 1, c) (r + 2, c) (r + 3, c) (r, c - 3) (r, c - 2) (r, c - 1) (r, c + 1) (r, c + 2) (r, c + 3) can store the value 3. The cell at position (r + 4, c), for example, can store the value 3 because its distance to c is 3.
For the puzzle in Figure 1, the only solution that satisfies these conditions
is the following.
Your Project
Implement, in Haskell of course, a command line tool ripple-effect that takes
two arguments. The first one is the name of a file from which to load a Ripple
Effect puzzle. The second is the name of a file to which to write the
solution. This program should do the following:
- Read the input file and print an error message if it is not readable.
- Try to solve the puzzle. If there is no solution, it should print This puzzle has no solution.
- If there is a solution, it should write the solution to the output file.
- Optionally, you may choose to print the solution to the screen in human-readable format, as the puzzle viewer does, described under Test Data and Tools below.
Input and Output Formats
Input Format
For a puzzle with r rows and c columns, the input file consists of 2r + 1
lines.
- The first r lines describe the rooms. Each of these lines contains c numbers separated by spaces. In other words, these lines form an r c grid corresponding to the cells of the puzzle. The number corresponding to each cell identifies the room that the cell belongs to. Two cells with the same number belong to the same room. Two cells with different numbers belong to different rooms.
- The (r + 1)st line is always empty, to separate the lines describing the rooms from the lines describing the initial cell values.
- The last r lines describe the cell values. Each line contains c entries separated by spaces. So, once again, these lines form an r c grid corresponding to the cells of the puzzle. If a cell is initially empty, then its corresponding value is a dash (-). If the cell is non-empty, then its corresponding value is the number the cell contains.
As an example, the format for the example puzzle in Figure 1 of this document
is
1 1 2 2 3 4 4 5
1 6 6 7 3 3 8 8
1 1 1 9 3 3 8 8
10 1 11 9 9 3 8 12
13 13 13 13 14 14 14 15
16 16 16 13 17 18 14 15
19 16 16 18 18 18 15 15
20 20 20 18 18 15 15 21
Notes
- In this example, the rooms are numbered by the order of their top-left cells. You should not rely on this being always the case (and I don’t see why it would help).
- Most puzzles provided as test inputs (see Test Data and Tools below) have only rooms of size less than 10. Thus, all numbers in the puzzle can be represented as one-digit numbers. Neither your parser for puzzle files nor your solver should rely on this because there are some input files with rooms of size 10 or greater, and with some prefilled values in these rooms equal to 10 or greater.
Output Format
For a puzzle with r rows and c columns, the output file should consist of r
lines storing c numbers each. The numbers on each line should be separated by
spaces. Thus, these lines form an r c grid of numbers corresponding to the
cells of the puzzle. The value corresponding to each puzzle cell should be the
value stored in this puzzle cell in the solution. For example, the solution in
Figure 3 is represented by the following output file content:
1 4 1 2 3 1 2 1
3 1 2 1 4 2 1 3
5 2 7 3 6 1 4 5
1 6 1 2 1 5 2 1
2 3 5 1 2 4 3 2
3 5 1 4 1 2 1 3
1 2 4 1 3 6 5 4
2 1 3 5 4 1 6 1
Requirements
Your program must satisfy the following requirements:
- It must deal with incorrect command-line arguments gracefully. That is, if too few or too many command line arguments are provided, your program should print a usage message, and must not crash.
USAGE: ripple-effect - If the files given on the command line are not readable or not writable, your program is allowed to crash. Dealing with not being able to read or write a file requires either a lot of tedious code or the use of true exception handling to catch IOErrors. I do not expect you to do either in your code (unless you attempt the bonus challenge Exception Handling below).
- The goal of this programming assignment is to have you write a somewhat non-trivial program in Haskell. My sample solution clocks in at just above 300 lines of code including empty lines but not counting comments. You are not (yet) expected to write the most efficient Haskell code possible.
However, your program should still be able to solve most of the provided
puzzles (see Test Data below) in a minute or less. If your code exceeds this
time bound significantly, then it is likely that you are doing something
wrong.
Important note
There are a number of puzzles among the ones I have provided as test data
which a fairly nave solver takes a long time to solve. For your reference, I
include running times of my two solver implementations for all puzzles with
the test data. timings-naive.txt contains the times it took my nave solver to
solve each puzzle. timings-fast.txt are the times it took my fast solver to
solve each puzzle. The fast solver was able to solve any of these puzzles in
0.01s. The one exception is puzzle450.txt. This puzzle proved significantly
more difficult than all the other puzzles in the input for some reason. Even
my initial attempt at a fast solver took 12 minutes to solve puzzle 450. An
improved implementation using a more careful choice of data representation and
using parallel evaluation strategies reduced the running time to around 10s
using 8 cores on my MacBook Pro. You are not expected to match these running
times nor to solve puzzle 450 in under 1 minute. You are welcome to try
though. I promise you’ll learn a lot about Haskell programming if you do. The
bonus challenges Running Time Part 1: A Better Algorithm and Running Time Part
2: Parallelism below ask you to try to make your code fast, but they’re
optional.
Your code must satisfy the following requirements:
- Obviously, the main function and all input/output code must use the IO monad (possibly decorated with some monad transformers if you choose to do this).
- The “business logic” of your program (parsing the input file once it has been loaded into memory and solving the puzzle) must be implemented as pure functions. This means that you are not allowed to use the IO or ST monad in your implementation of these functions. You are allowed to use the list, State, Reader, Writer and Maybe monads, including their transformer versions, because they do not add true side effects and only offer more convenience in describing pure computations.
- You must define appropriate data types to represent the different entities in your program. If you were to do this in Java, you would probably have a class to represent a puzzle, a class to represent cells in a puzzle, and maybe a few others. You should do the same here. My code has a Puzzle type to represent a puzzle and a Cell type to represent a cell. I use simple (Int, Int) pairs to represent the size of a puzzle and the position of a cell in a puzzle. Defining new types for these would have required a lot of packing and unpacking of row and column indices. However, to improve readability, I defined type aliases and used those to indicate when an (Int, Int) pair represents a cell position and when it represents the size of a puzzle.
- Your code should be broken into functions that each do just one thing, at least mostly. My code has functions to take a size and two lists of room indices and cell values and builds a puzzle from it, a function to find all the cells that belong to the same room as a given cell, (obviously) a function that searches for a solution to a given puzzle, etc, etc, etc.
Bonus Challenges
The requirements above represent what I would consider (a) a functioning
program and (b) effective use of functional programming to implement such a
program. They still allow you to write pretty horrendous and inelegant code.
Try to do better the markers will thank you for it, because they will have a
much easier time trying to understand your code.
There are a number of opportunities to turn your code into more idiomatic
Haskell and improve its performance. If you feel comfortable programming in
Haskell and like to challenge yourself, here are a few optional requirements.
If your code meets these requirements, it earns you bonus marks.
Exception Handling (10% Bonus)
There are numerous places in the program where things can go wrong:
- The user may provide the wrong command line arguments.
- The input file provided by the user may not exist.
- The output file may not be writable.
- The input file may exist but it contains what is most decidedly not a Ripple Effect puzzle in the expected format.
You can deal with all of these exceptional situations by implementing
functions that indicate success or failure in their return values (e.g., using
Maybe), and then you deal with errors in the same way a good C programmer
would: by inspecting the return values of functions and accordingly continuing
or aborting the computation. It is much more elegant to use an exception
handling monad to gracefully abort the computation when an error is
encountered.
This challenge asks you to implement dealing with errors during the execution
of your program using an appropriate exception handling monad. For full marks,
you should also catch exceptions thrown by the I/O functions to read and write
files (when a file does not exist, isn’t readable or isn’t writable). Note
that these I/O functions do not magically throw exceptions in whatever
exception monad you use. They throw actual exceptions. The difference is that
when using a monad to handle exceptions, we only implement the correct
operator to ensure the computation is aborted when an error is encountered. In
the end, this is still pure code without side-effects. The exceptions thrown
by I/O functions are built into the Haskell run-time system in exactly the
same way as exceptions in Java or C++ are. If you don’t catch them, your
program crashes. Haskell offers numerous ways to deal with such true
exceptions. The most convenient method seems to be the try function from the
Control.Exception module. Search it up.
Back in your exception handling monad, you do not actually have to catch
exceptions anywhere, as all of these errors are fatal: If you can’t read the
input file, how should the computation continue, for example? You are expected
to wrap your entire program into a function call that checks whether an
exception was raised anywhere or not. If yes, then you should print an
appropriate error message, ideally corresponding to the error your program
encountered. If no exception was raised, then no action is required. In either
case, your program should exit gracefully and not crash.
Since most of the possible errors happen while doing I/O, you cannot use the
Maybe or Either monad. Instead, you need to equip the IO monad with exception
handling. A good tool to use is the ExceptT monad transformer, and the Except
monad. Search them up on Hoogle. Except behaves exactly like the monad
instance for Either that we discussed in class, only its name is more
descriptive. ExceptT is the transformer version of Except.
In my own implementation, I defined a “program monad” Prog as
type Prog = ExceptT String IO
that is, it equips the IO monad with exceptions of type String. All the non-
pure code of my implementation uses the Prog monad. I implemented a function
runProg :: Prog () -> IO ()
that takes an action in the Prog monad and runs it. If this ends with an
exception, the error message representing it is printed. Otherwise, runProg
simply exits without taking any action. My main function is
main :: IO ()
main = runProg $ do
– Implementation using theProg
monad
Search = Non-Determinism
Programmers often use the term “non-determinism” when they mean “solution
search.” That’s exactly what the list monad does. Using the list monad to
model a computation means that we deterministically explore the space of all
possible computation paths. Your puzzle solver will have to “guess” for at
least some cells what value to assign to them, based on the possible values
for these cells not eliminated by the Room and Cross Rules. For all but one of
these choices, you will later encounter conflicts; those were the wrong
choices. One of the choices leads to a correct solution of the puzzle.
Given that your program implements a search for a solution, it is natural and
elegant to implement this search using the list monad. If you do so, correctly
and in more than just one place, then you earn 20% bonus marks.
Running Time Part 1: A Better Algorithm
This is the biggest challenge, worth 30% bonus marks.
Doing the Easy Cells First
If you simply inspect the empty cells in order and guess for each which value
it should have, your solver will be able to deal with small and easy puzzles
but even easy but bigger ones or small but hard ones will become a challenge.
There is just too much guessing involved. You can see from the timings in
timings_naive.txt that my nave solver took several hours for some of the
puzzles. I didn’t even attempt to run puzzle 450 to completion because my
solver did not produce a result for this input even after running for 24
hours.
A common strategy to minimize the amount of guessing in solution search is to
always branch on the variables with the fewest possible values. Here, this
means that we always pick a cell with the minimum number of possible values to
branch on. This means in particular that cells that have only one possible
value left based on the choices made so far are chosen first. We end up fixing
their values without any guessing, and those choices may fix the values of
other cells. This propagation of choices without any guessing may continue for
a while before we have to make our next actual guess. Clearly, this helps a
lot to make our program faster.
To implement this strategy, you will need a few ingredients:
- The basic type you use to represent a puzzle, even in a nave implementation, should have puzzle cells that store values of type Maybe Int, to represent the presence or absence of a value. Your better implementation should use a Puzzle type that is polymorphic in the type it uses to represent cell values. During the search, you use a puzzle representation where each cell stores a list of values, namely the list of all values for this cell that haven’t been ruled out by the Room or Cross Rule yet.
- You can then maintain a priority queue of the empty cells, where the priority of each empty cell is the number of possible values it has left. You may hunt around online to see whether there exist good priority implementations on Hackage that you can use. I simply cooked my own using a Map as the underlying data structure. With a priority queue in hand, each recursive call picks the empty cell with minimum priority and either simply fills in its value if there’s only one possible value, or branches on the possible values that are left.
With this change in code, I managed to solve all puzzles in under 1s (almost
all of them in a few dozen milliseconds), except that pesky puzzle 450. That
one took 12 minutes using the improvement just described.
Not an Array But a Binary Tree
Given that a puzzle is a 2-d grid of cells, it is natural to store it as an
array. However, array updates are expensive as each such update requires
copying the entire array, and it turns out that the solution search performs a
lot of fine-grained array updates. For small puzzles, the copying isn’t a
death sentence. For large puzzles, such as puzzle 450, it is. It turns out
that using a Map (binary search tree) that maps each grid location to the cell
at this location instead of an array as the representation of the puzzle
reduces the amount of copying dramatically.
This change plus some additional changes in the search strategy that I don’t
care to detail here brought the running time of my code down to around 40s on
puzzle 450.
Running Time - Part 2: Parallelism
Here’s the final thing you can try if you’re truly adventurous: Haskell allows
us to parallelize our code rather easily. It can be as simple as writing to
construct a pair (f, g) whose components are computed in parallel.
Specifically, this says that g should be evaluated in a separate thread while
f is evaluated in the current thread.
Changing half a dozen lines of code in my solver to add this type of
parallelism brought the running time on puzzle 450 down to 10s on 8 cores.
Test Data and Tools
Test Data
Along with this description of the project, I have posted a ZIP file
containing 470 Ripple Effect puzzles in the format described above. You can
use them to test your code. The markers will also use these inputs to evaluate
the correctness of your program.
The puzzles are numbered by increasing size. As I mentioned earlier, some of
the puzzles are (much) harder than others, at least for a nave solver. Whether
a puzzle is hard or not doesn’t depend only on its size. Thus, I’ve also
included two lists of timings:
- timings-naive.txt is the time it took my nave solver to solve each puzzle except puzzle 450.
- timings-fast.txt is the time it took my fast solver to solve each puzzle.
You want to start testing your code on the puzzles that have fast solution
times in timings-naive.txt. As you make your code more efficient (if you
attempt the performance-oriented bonus challenges), then you can try harder
and harder puzzles and see whether you can come close to (or do better than ;)
) the running times listed in timings-fast.txt.
Puzzle Printer and Solution Checker
I installed a program ripple-effect-check on timberlea that hopefully helps
you with your efforts. I cannot post it as a ZIP file on Brightspace, as this
would mean that I’d have to share the source code with you, which contains
parts of my solution to this project. So, if you want to use this program, you
do need to log into timberlea and run it there.
You can either run it as /users/nzeh/bin/ripple-effect-check, or you can add
/users/nzeh/bin to your Shell’s search path on timberlea, and then you can run
it simply as ripple-effect-check, as I did in the examples below.
When given a single command-line argument, ripple-effect-check expects this
argument to be the name of a Ripple Effect input file. The program prints the
puzzle in human-readable form:
If you call ripple-effect-check with two files as argument, the first a puzzle
file and the second a claimed solution, then the output is the puzzle with its
solution filled in. If the solution satisfies the Room and Cross Rules, that
is, if it is a correct solution, then you will also see a line
Solution correct
For example,
$ ripple-effect-check inputs/puzzle1.txt outputs/puzzle1.txt
If there are violations of the Room or Cross Rule or any cell that already had
a value in the input has a different value in the output, all of these
violations are listed below the puzzle. For example,
$ ripple-effect-check inputs/puzzle1.txt outputs/puzzle1_incorrect.txt
A Better Algorithm
It is hard to quantify whether you did this correctly. Sure, I could give you
marks based on whether you approached the problem exactly as I did, but that
would be silly and would stifle creativity. So the requirements are pretty
general here. To make sure you don’t get clever and try to meet these
requirements by using the IO or ST monad (and the mutable arrays they offer)
or use Haskell’s foreign function interface to escape into C code, let me
stipulate that your solution search must remain 100% side-effect-free and must
be implemented in Haskell.
I would expect that the first 15% reflect the use of a better search strategy,
while the second 15% reflect switching to a better data structure, as outlined
in the description of this challenge. But you never know, maybe you manage to
meet these requirements in a different way, as long as it’s pure Haskell.
Use of Parallelism
You get marks for this challenge if you use the par function from
Control.Parallel or the Eval or Par monad from Control.Parallel.Strategies or
Control.Monad.Par to speed up the search, and if this actually speeds up the
search. Incorrect use of parallelism may actually slow your code down. My code
uses neither par, Eval nor Par. It uses a monad from a third-party library
specifically designed to parallelize solution search. This monad does nothing
that you couldn’t do yourself using par. If you end up Googling and find and
use this or another implementation of a parallel search monad, your code is
considered to meet the requirements of this challenge, but again only if you
actually achieve a reasonable speed-up.
Important
The markers need a way to evaluate how much the use of parallelism speeds up
your code. Thus, to get marks for this challenge, you need to submit two
versions of your code. The first one does not use parallism. The second one
does. The only difference between these two versions must be the
parallelization of the search strategy. Both versions must use the same data
structures and overall search strategy (e.g., choice which cell to fill in
each step).
Submission
Assignments must be submitted electronically via Brightspace. You must submit
your project as a single ZIP or TAR file. Submissions as multiple individual
files and submissions in other archive formates are not acceptable.