实现一个Movie Review Sentiment Analysis系统,由于提供了Start
code,因此一步一步根据Purpose实现Method即可。
Goal
Sentiment analysis is a Big Data problem that seeks to determine the general
attitude of a writer given some text they have written. For instance, we would
like to have a program that could look at the text “The film was a breath of
fresh air” and realize that it was a positive statement while “It made me want
to poke out my eye balls” is negative.
One algorithm that we can use for this is to assign a numeric value to any
given word based on how positive or negative that word is and then score the
statement based on the values of the words. But, how do we come up with our
word scores in the first place?
That’s the problem that we’ll solve in this assignment. You will write a
program that reads a file containing movie reviews from the Rotten Tomatoes (
http://www.rottentomatoes.com/ ) website
that have both a numeric score as well as text. You’ll use this to learn which
words are positive and which are negative. Then you’ll implement methods that
can compute the average value of any sequence of words using this data.
Starting Materials
Download the scenario for this assignment, which contains the images you need:
program5.zip
The scenario you download does not contain any classes, and will appear
completely bare. The files it contains are all support files (image files and
a data file). You’ll have to create all the classes yourself.
The downloadable project for this assignment includes a data file containing
the reviews you will use as the basis for this assignment. Note that each
review starts with a number 0 through 4 with the following meaning:
- 0: negative
- 1: somewhat negative
- 2: neutral
- 3: somewhat positive
- 4: positive
Classes You Create
For this assignment, you will create three classes. Combined, they’ll look
something roughly like this.
WordSentiment
This class represents basic statistics about a single word that appears in
reviews. The idea is that you will be loading reviews that consist of numeric
ratings together with a series of words (the body of the review). For each
word, you’ll want to remember the review scores associated with it, along with
how frequently it occurs. Internally, the data associated with a word can be
stored as two integers: a count of the number of times the word has been seen
across all reviews, together with a sum of all the review scores for all the
reviews this word appears in. This class should provide the following methods:
- WordSentiment(): This default constructor simply initializes the internal data (all counts/accumulators should start at zero).
- int getCount(): Returns the number of times the word has occurred across all reviews.
- int getSumOfReviewScores(): Returns the sum of all the review scores for all the reviews this word appears in.
- double getSentimentScore(): Returns the average sentiment score for this word, which is equal to the sum of review scores divided by the count of occurrences of the word. If no occurrences have yet been recorded, return a neutral sentiment value of 2.0.
- recordOccurrence(int score): This method takes a review score as a parameter and “records” one occurrence of the word associated with the given review score. You should call this method once each time the word is seen in a given review.
SmileyFace
This class is a subclass of Actor that represents a smiley (or frowny) face
that represents a sentiment score between 04. Remember that actors can change
their images to affect what they look like on screen using the method
setImage(). The method setImage() takes a string as a parameter, where the
string is the name of an image file to use as the actor’s image. The starter
project for this assignment includes 5 images (with names “0.png” through
“4.png”) that represent the 5 different review scores. You can change an
actor’s image with a call like: this.setImage(“0.png”);
The SmileyFace class should provide the following methods:
- SmileyFace(): This default constructor simply initializes the internal data of the class, initializing the face’s “score” to 2.0 (neutral).
- double getSentimentScore(): Returns the current review score that controls this actor’s look.
- void setSentimentScore(double score): Changes the score that controls this actor’s look, including changing the image of this actor to reflect the score. The image chosen for the actor should be based on the rounded integer value of the sentiment score (for example, a score of 1.9 should result in a neutral image, not a negative image).
- boolean isPositive(): Returns true if the score this actor represents has a positive sentiment (a score that is greater than or equal to 2.5).
- boolean isNegative(): Returns true if the score this actor represents has a negative sentiment (a score that is less than 1.5).
- boolean isNeutral(): Returns true if the score this actor represents has a neutral sentiment (see other methods for the appropriate numeric limits).
SentimentAnalyzer
The SentimentAnalyzer is a subclass of World. You will need to devise your own
strategy for the order in which you implement the methods in this class. The
world class represents the logic for loading reviews to compute word
statistics and for computing the sentiment score for any provided string of
text. The primary data stored in this class should be a map from strings
(words) to WordSentiment objects that represent the accumulated data about the
corresponding word.
This class must provide the following methods:
- SentimentAnalyzer(): The default constructor should initialize the world using 72x72 pixel grid cells arranged in 5 rows by 8 columns. Your map will initially be empty. The constructor should place the smiley face and text shape on the screen, vertically centered.
- SmileyFace getFace(): This getter returns the smiley face actor belonging to this object.
- TextShape getText(): This getter returns the text shape visible on this object.
- loadReviews(Scanner input): This method takes a scanner as a parameter and loads all the reviews from the input source connected to the scanner. Remember that the reviews are arranged one per line, and each line begins with an integer representing the review’s score. You should repeatedly process all of the words occurring on the remainder of the line using the same review score. Note: any words that do not begin with a letter should be ignored (hint: recall that the Character.isLetter() method can help you).
- loadReviews(): This method is an overloaded version of loadReviews() that loads the reviews from the default data file (the one named “movieReviews.txt”). Be careful not to duplicate code.
- double sentimentOfWord(String word): This method uses the reviews you have loaded (if any) to compute the sentiment score for the given word. If no sentiment data is available for the given word (i.e., it was never seen in any reviews loaded so far), then return a neutral score: 2.0.
- double sentimentOf(String text): This method examines all the words in the given text and computes the average sentiment score over all of the words. This average represents the sentiment score for the entire text. Just as with loadReviews(), any word that starts with a nonletter should be ignored for the purposes of computing the sentiment score.
- show(String text): This method computes the average sentiment score for the given text, and uses this score to set the smiley face, while also placing the humanreadable version of this score as the text in the displayed text shape.