编写一个程序来分析 Poetry
,计数音节和寻找韵律。
![Poetry](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/Liji2_no_bg.png/175px-
Liji2_no_bg.png)
Introduction
In this assignment, you will write a program to analyze poetry, counting
syllables and looking for rhymes.
This handout explains the problem you are to solve and the tasks you need to
complete for the assignment. Please read it carefully.
Goals of this Assignment
- Write function bodies using dictionaries and file reading.
- Write code to mutate lists and dictionaries.
- Use top down design to break a problem down into subtasks and implement helper functions to complete those tasks.
- Write tests to check whether a function is correct.
Files in the download
Please download the Assignment 3 files and extract the zip archive.
- Starter code:
- poetry_reader.py and poetry_functions.py
These are the only files you need to modify and submit. These two files
contain the headers for the functions you will need to write for this
assignment, and a few completed function docstrings. Many of these functions
will be called by the main program ( poetry.py ). You can, and should, write
some helper functions in this file. Your lives will be easier if you do.
- poetry_reader.py and poetry_functions.py
- Helper module: poetry_constants .py
Read this! This file contains several definitions of new types that we use in
the function type annotations. - Main Program: poetry.py
Run this first. The file contains a program that calls the functions in the
starter code files. You can run it now, although it won’t work properly until
you complete the functions in the starter files. Still, you’ll be able to use
this to check your progress. - Data: poetry/*.txt
In the poetry directory are several files containing poems that you can use to
test your code. - Data: dictionary.txt
This file contains a huge list of English words and their pronunciations. - Data: poetry_forms.txt
This file contains information describing various poetic forms. - Checker: a3_checker.py
We have provided a checker program that you should use to check your code. See
below for more information about a3_checker.py .
Poetry Forms
Poetry differs from prose because it has a fixed structure. Different forms of
poetry, such as sonnets and haiku, have rules about which words must rhyme and
the number of syllables in each line.
In this assignment, you will write a program to read a poem from a file,
figure out the pronunciation, count the number of syllables in each line, and
determine which lines rhyme.
Some poetry forms specify the number and order of stressed and unstressed
syllables within a line. We will not consider syllabic stress in this
assignment.
Some poetry forms specify that particular words must alliterate, or start with
the same sound. We will not consider alliteration in this assignment.
Denitions
All links go to https://dictionary.com (
https://dictionary.com/ ) .
- poem ( https://www.dictionary.com/browse/poem ) a composition in verse, especially one that is characterized by a highly developed artistic form and by the use of heightened language and rhythm to express an intensely imaginative interpretation of the subject
- rhyme ( https://www.dictionary.com/browse/rhyme ) a word agreeing with another in terminal sound: Find is a rhyme for mind and womankind
consonant ( https://www.dictionary.com/browse/consonant ) (in English articulation) a
speech sound produced by occluding with or without releasing (p, b; t, d; k,
g), diverting (m, n, ng), or obstructing (f, v; s, z, etc.) the flow of air
from the lungs (opposed to vowel) - vowel ( https://www.dictionary.com/browse/vowel ) (in English articulation) a speech sound produced without occluding, diverting, or obstructing the flow of air from the lungs (opposed to consonant)
- syllable ( https://www.dictionary.com/browse/syllable ) an uninterrupted segment of speech consisting of a vowel sound, a diphthong, or a syllabic consonant, with or without preceding or following consonant sounds
There are many vowel sounds. For example, freight, fraught, fruit, and fright
all are different vowel sounds there are far more vowel sounds than there are
letters used to describe them: a, e, i, o, u, and sometimes y.
Poetry Form Example: Limerick
Here is a stupendous work of limerick art. The lines have been numbered and we
have highlighted the last syllable of each line, because those words must
rhyme according to a particular scheme. We have indicated, using bold and
underlined italics, the two sets of rhyming words.
- I wish I had thought of a rhyme 2. Before I ran all out of time!
- I’ll sit here instead,
- A cloud on my head
- That rains ‘til I’m covered with slime.
Limericks are five lines long. Lines 1, 2, and 5 have eight syllables and the
last syllables on these lines rhyme with each other. Lines 3 and 4 have five
syllables and the last syllables rhyme with each other. (There are additional
rules about the location and number of stressed vs. unstressed syllables, but
we’ll ignore those rules for this assignment; we will be counting syllables,
but not paying attention to whether they are stressed or unstressed.)
The CMU Pronouncing Dictionary
We’ll need a way to examine words and break them into syllables and
consonants. We’re going to use the Carnegie Mellon University Pronouncing
Dictionary ( http://www.speech.cs.cmu.edu/cgi-bin/cmudict
) , which contains a
dictionary where instead of definitions they store pronunciations. They use a
plain-text notation for various sounds; the quickest way to get used to them
is to go look at some. You don’t need to memorize the notation, but it helps
to see it. Head to the CMU Pronouncing Dictionary (
http://www.speech.cs.cmu.edu/cgi-
bin/cmudict) now and look up a couple of words; try searching for words like ,
, and , and see if you can interpret the results. Do contractions like (short
for ) and (short for ) work? What about possessives like “Rita’s”?
Now click the “Show Lexical Stress” checkbox and see how that changes the
results.
Here is the output for David (with “Show Lexical Stress” turned on): D EY1 V
IH0 D . There are five phonemes in the word David and each phoneme describes a
sound. The sounds are either vowel sounds or consonant sounds. We will refer
to phonemes that describe vowel sounds as vowel phonemes, and similarly for
consonants.
The phoneme notation was defined in a project called Arpanet (
http://en.wikipedia.org/wiki/Arpabet
) that was created by the Advanced Research Projects Agency (ARPA) (
http://en.wikipedia.org/wiki/Advanced_Research_Projects_Agency
) back in
the 1970’s.
We have downloaded a text file containing the CMU Pronouncing Dictionary: all
the words and their pronunciations. All vowel phonemes end in a 0 , 1 , or 2 ,
with the digit indicating a level of syllabic stress. Consonant phonemes do
not end in a digit. The number of syllables in a word is the same as the
number of vowel sounds in the word, so you can determine the number of
syllables in a word by counting the number of phonemes that end in a digit.
As an example, in the word “secondary” ( S EH1 K AH0 N D EH2 R IY0 ), there
are 4 vowel phonemes, and therefore 4 syllables. The vowel phonemes are EH1 ,
AH0 , EH2 , and IY0 .
In case you’re curious, 0 means unstressed, 1 means primary stress, and 2
means secondary stress try saying “secondary” out loud to hear for yourself
which syllables have stress and which do not. (In this assignment, your
program will not need to distinguish between the levels of syllabic stress.)
The assignment zipfile includes dictionary.txt , which contains our version of
the Pronouncing Dictionary. You must use this file, not any files from the CMU
website, because our version differs slightly from the CMU version. We have
removed alternate pronunciations for words, and we have removed words that do
not start and end with alphanumeric characters (like #HASH-MARK , #POUND-SIGN
and #SHARP-SIGN ). Open up dictionary.txt file to see the format; notice that
any line beginning with ;;; is a comment.
The words in dictionary.txt are all uppercase and do not contain surrounding
punctuation. When your program looks up a word, use the uppercase form, with
no leading or trailing punctuation. Function clean_up in the starter code file
poetry_functions.py will be helpful here.
Describing Poetry Forms
Here is our limerick poetry form:
Limerick
8 A
8 A
5 B
5 B
8 A
On each line, the first piece of information is a number that indicates the
number of syllables required on that line of the poem. The second piece of
information on each line is a letter that indicates the rhyme scheme. Here,
lines 1, 2, and 5 must rhyme with each other because they’re all marked with
the same letter ( A ), and lines 3 and 4 must rhyme with each other because
they’re both marked with the same letter ( B ). (Note that the choice to use
the letters A and B was arbitrary. Other letters could have been used to
describe this rhyme scheme.)
Two lines of a poem rhyme with each other when the last syllable of the last
word on each of the two lines rhyme. Two syllables rhyme when their vowels are
the same and they end in the same sequence of consonant phonemes, like goshand
wash.
Some poetry forms don’t require lines that rhyme. For example, the haiku form
has 5 syllables in the first line, 7 in the second line, and 5 in the third
line, but there are no rhyme requirements. Here is an example:
Dan’s hands are quiet.
Soft peace surrounds him gently:
No thought moves the air.
And another one:
Jen sits quietly,
Thinking of assignment three.
All ideas bad.
We’ll indicate the lack of a rhyme requirement by using the symbol * . Here is
our poetry form description for the haiku poetry form:
Haiku
5 *
7 *
5 *
Some poetry forms have rhyme requirements but don’t have a specified number of
syllables per line. Quintain (English) is one such example; these are 5-line
poems with an ABABB rhyme scheme, but with no syllable requirements. Here is
our poetry form description for the Quintain (English) poetry form (notice
that 0 is used to indicate that there is no requirement on the number of
syllables in the line):
Quintain (English)
0 A
0 B
0 A
0 B
0 B
Here’s an example of a Quintain (English) from Percy Bysshe Shelly’sOde To A
Skylark:
Teach us, Sprite or Bird,
What sweet thoughts are thine:
I have never heard
Praise of love or wine
That panted forth a flood of rapture so divine.
Your program will read a poetry form description file containing a list of
poetry form names and their poetry form descriptions. For each poetry form in
the file:
- the first line gives the name of the poetry form
- subsequent lines contain the number of syllables and rhyme scheme for each line of poetry
- each poetry form is separated from the next by a blank line
The poetry form names given in a poetry form description file are all
different.
We have provided poetry_forms.txt as an example poetry form description file.
We will test your code with other poetry form descriptions as well.
Stanza-based poetry
Many poetry forms don’t have a fixed number of lines. Instead, they specify
what a stanza looks like, and then the poetry is made up of as many stanzas as
the poet likes.
As an example drawn from Narodnaya Volya literature, here are the first two
stanzas of a poem called The Beauteous Terrorist. The author, Henry Parkes,
was inspired by Sophia Perovskaia, a prominent member of the Narodnaya Volya,
to write the poem. Each stanza follows a simple ABAB rhyme scheme.
SOFT as the morning’s pearly light,
Where yet may rise the thunder cloud,
Her gentle face was ever bright
With noble thought and purpose proud.
Dreamt ye that those divine blue eyes,
That beauty free from pride or blame,
Were fashioned but to terrorize
O’er Despot’s power of sword and flame?
We will not consider stanza-based poems in this assignment.
Data Representation
We use the following Python definitions to create new types relevant to the
problem domain. Read the comments in starter code file poetry_constants.py for
detailed descriptions with examples.
Domain | Type |
---|---|
POETRY_FORM | Tuple[List[int], List[str]] |
POETRY_FORMS | Dict[str, POETRY_FORM] |
CLEAN_POEM | List[List[str]] |
WORD_PHONEMES | List[str] |
LINE_PRONUNCIATION | List[WORD_PHONEMES] |
POEM_PRONUNCIATION | List[LINE_PRONUNCIATION] |
PRONOUNCING_DICTIONARY | Dict[str, WORD_PHONEMES] |
A note on StringIO
So far in this course, we have been using TextIO to read and write files.
StringIO works a lot like TextIO, but input comes from a String rather than
from a file It has all the built-in functions that we have used using TextIO,
including read(), readlines() and etc. For a comprehensive list, feel free to
call help on StringIO in python!
For example:
>>> from io import StringIO
>>> test_string = “1\n2\n3”
>>> print(test_string)
>>> string_io = StringIO(test_string)
>>> for line in string_io.readlines():
>>> print(line.strip())
So why are we using this?
In situations where an IO object is expected, rather than creating a new file,
writing text, and closing it, we can directly pass in a string! There are some
other differences that are beyond the scope of this course. If you are
interested, you can read the official documentation on the python webpage
listed below: https://docs.python.org/3.7/library/io.html#io.StringIO
Required Functions
This section contains a table with detailed descriptions of the functions that
you must complete in the two starter code files. You’ll need to add a second
example to the docstrings for each function in the starter code.
For all poetry samples used in this assignment, you should assume that all
words in the poems will appear as keys in the pronouncing dictionary. We will
test with other pronouncing dictionaries, but we will always follow this rule.
You should follow the approach we’ve been using on large problems recently and
write additional helper functions to break these high-level tasks down. Each
helper function must have a clear purpose. Each helper function must have a
complete docstring produced by following the Function Design Recipe. You
should test your helper functions to make sure they work!
A3 Checker
We are providing a checker module ( ) that tests two things:
- whether your code follows the Python Style Guidelines, and
- whether your functions are named correctly, have the correct number of parameters, and return the correct types.
To run the checker, open and run it. Be sure to scroll up to the top and read
all messages.
If the checker passes for both style and types: - Your code follows the style guidelines.
- Your function names, number of parameters, and return types match the assignment specification. This does not mean that your code works correctly in all situations. We will run a different set of tests on your code once you hand it in, so be sure to thoroughly test your code yourself before submitting.
If the checker fails, carefully read the message provided: - It may have failed because your code did not follow the style guidelines. Review the error description(s) and fix the code style. Please see the PyTA documentation for more information about errors.
- It may have failed because:
- you are missing one or more function,
- one or more of your functions is misnamed,
- one or more of your functions has the incorrect number or type of parameters, or
- one of more of your function return types does not match the assignment specification.
Read the error message to identify the problematic function, review the
function specification in the handout, and fix your code. Make sure the
checker passes before submitting.
Running the checker program on Markus
In addition to running the checker program on your own computer, run the
checker on MarkUs as well. You will be able to run the checker program on
MarkUs once every 12 hours (note: we may have to revert to every 24 hours if
MarkUs has any issues handling every 12 hours). This can help to identify
issues such as uploading the incorrect file.
First, submit your work on MarkUs. Next, click on the “Automated Testing” tab
and then click on “Run Tests”. Wait for a minute or so, then refresh the
webpage. Once the tests have finished running, you’ll see results for the
Style Checker and Type Checker components of the checker program (see both the
Automated Testing tab and results files under the Submissions tab). Note that
these are not actually marks – just the checker results. If there are errors,
edit your code, run the checker program again on your own machine to check
that the problems are resolved, resubmit your assignment on MarkUs, and (if
time permits) after the 24 hour period has elapsed, rerun the checker on
MarkUs.
Testing your Code
It is strongly recommended that you test each function as soon as you write
it. As usual, follow the Function Design Recipe (we’ve provided the function
name and types for you) to implement your code. Once you’ve implemented a
function, run it against the examples in your docstrings and the unit tests
you’ve defined.
How to tackle this assignment
Principles
- To avoid getting overwhelmed, deal with one function at a time. Start with functions that don’t call any other functions; this will allow you to test them right away. The steps listed below give you a reasonable order in which to write the functions.
- For each function that you write, start by adding at least one example call to the docstring before you write the function.
- Keep in mind throughout that any function you have might be a useful helper for another function. Part of your marks will be for taking advantage of opportunities to call an existing function.
- As you write each function, begin by designing it in English, using only a few sentences. If your design is longer than that, shorten it by describing the steps at a higher level that leaves out some of the details. When you translate your design into Python, look for steps that are described at such a high level that they don’t translate directly into Python. Design a helper function for each of these high-level steps, and put a call to the helpers into your code. Don’t forget to write a great docstring for each helper!