Java代写:CSCI1300Workingwithrealdata


Introduction

这次需要代写的作业是处理一个真实的数据集,并按照需求所给的信息展示数据。
In this assignment you will have the opportunity to apply what you’ve learned
this semester about programming to an actual problem and actual data. For this
assignment we will use social media data collected during the 2014 Carlton
Complex Wildfire in Eastern Washington State. This data set was part of my
dissertation research on the integration of public social media communication
into emergency response.
The development of information and communication technologies (ICTs) has
changed how members of the public communicate and share information with each
other during crisis and disaster events. Researchers in the field of crisis
informatics look at social media communications for insight into how these
technologies are reshaping the information space surrounding a disaster and
provide new ways for the public to participate in both sharing of information
and response. My research focuses on the challenges faced by emergency
responders as they work to leverage these channels as part of their emergency
communications plan and also the solutions being developed to support the
monitoring of an often complex and unwieldy information space as events
unfold.
I work with an innovative group of emergency responders who are part of the
social media in emergency management community (SMEM) that have pioneered a
new form of digital volunteerism within the emergency response community
called a Virtual Operational Support Team (VOST). Members of VOST teams have a
mix of social media communication skills and training in public information
work and emergency response protocols. During a disaster, a VOST team extends
the resources of the emergency response team on the ground coordinating public
social media communications and gathering relevant situational awareness
information for the incident management team.
This dataset was taken from the 2014 Carlton Complex Wildfire. The fire
started on July 14th, when a lightning storm moved through the Methow Valley
in Eastern Washington State. On July 17th, adverse weather conditions caused
the fire to grow explosively overnight from approximately forty-nine thousand
acres to over a hundred and fifty thousand acres. This rate of fire growth is
somewhat unprecedented and the fires burned through the towns of Pateros and
Twisp resulting in large-scale evacuations and the destruction of over 300
homes. The fire also destroyed critical infrastructure resulting in widespread
power and cellular outages in many place for over a week. The data set for
this fire starts on July 17th when Portland NIMO, a federal Type I team, was
assigned to the fire and the NIMO VOST was activated until July 27th when the
team stood down. The fire ultimately grew to 256 thousand acres making it the
largest wildfire in Washington State history (eclipsed by the 2015 Okanogan
Complex in the same area a year later).
As a researcher on CU’s Project EPIC, my role on the VOST was to provide
analytical support to the public information team on the ground using data
collected through the Twitter API. I developed Python scripts that expanded
the links to embedded content and massaged the data in useful ways for
analysis in Tableau, a data visualization tool. At the end of each day, I
worked on a comprehensive summary that was forwarded to the public information
team as a reference for the morning briefing the following day.
Twitter is a particularly interesting platform for analysis during a disaster
because the Twitter stream can show you what is relevant in the moment across
a wide variety of sources. The ability to retweet information reinforces its
currency and acts as recommendation to others in a Twitterer’s network or
following the conversation. In addition, the ability to embed links and media
provides visibility to what is being shared across multiple social media
platforms simultaneously.

Data Set Description

The full dataset for this fire contains over 24 thousand tweets and related
information. I hava created multiple data extract files from this dataset so
that you can work with information on a more manageable scale.
As part of the analysis, we coded the most commonly occurring sources of
information (Twitter accounts and URL domains) using the following values:

Source Category Description
Business/org Business or Organization (e.g. Sierra Club)
Business/org - local Local Business or Organization (e.g. Methow Chamber
of Commerce)
EM/Fire Tweeter Any account used primarily to share information about
fire/disater but not an official account (e.g. personal account for fire
personnel, member of SMEM community)
Fundraising Account used primarily for fundraising purposes (e.g.
GoFundMe). This is important for the public information team because they are
on the lookout for notential fraud.
Individual Account belonging to an individual who is either not from the
geographic region or we can’t tell.
Individual - local Individual local to the fire.
Individual - pnw Individual from Pacific Northwest Resion but no evidence
that they are local.
Individual - WA Resident of Washington State but no evidence that they are
local to the fire.
Media Official account for media source or personal accounts for media
personnel.
News Tweeter Accounts that mass tweet links to trending news topics. It is
often a fine line between these types of accounts and spam. These have been
excluded from the extracts for the project.
Official - Civic Official agency not related to emergency response
organizations. Typically official city government agencies and personal
accounts for civic figures. (e.g. the Mayor)
Official EM/Fire Official accounts used to share public information
surrounding disaster.
Official Other Official organizations that don’t fall into response or
civic organizations.
Social Media Social Media Sources (e.g. YouTube, Instragram, etc.)
Spam/Other Sources resulting either from noise related to search terms or
that are hijacking trending hash tags.
VOST Personal accounts for individual VOST team members.
Unknown A tweet that has not yet been.

Tweet extracts

Each row in the tweet extracts is an individual tweet and contains the
following columns:

Column Name Description
Row Row identifier in data set.
Text The content of the tweet. (max length 120 characters)
Original Tweet Link to the original tweet.
Local Date The date translated to local time zone. (format month/day)
Local Hour Hour translated to local time zone.
Local Minute Minute translated to local time zone.
Is Retweet Boolean value, true if tweet is a retweet, false otherwise.
Retweet Count Number of times tweet was retweeted at end of data
collection.
Screen Name User Screen Name on Twitter.
User Class Indicates Twitter account type (see table above)
User ID Unique ID for user on Twitter.
User Link Link to user account on Twitter.
Coordinates Latitude/longitude values for geocoded tweets.
URL Fully expanded link for embedded coutent. If you see a hyperlink in a
tweet then this is the link to it.
URL Domain The URL domain.
URL Domain Class Source classification for domain. (e.g. media, official
em/fire)
Media Screen Name Twitter source for embedded content.
Media URL Link to embedded content. If you see the photo/video in the
tweet then this is the link to it.
Media URL User Class User class for source.
Individual Tweet Extracts Include:
  • allTweets.csv: all tweets in collection
  • geocodedTweets.csv (all tweets that were geocoded in the collection)
  • individualLocalTweets.csv (sources most likely to contain individual and local info)
  • offlEMandFireTweets.csv (Tweets coming from official sources and EM / Fire Tweeter accounts)
  • noRetweets.csv (all original tweets no retweets)
    NOTE: Spam/Other/News Tweeter sources filtered out
    Other data files:
  • twitterers.csv : All Twitter accounts (user class, user ID, User Link & Records)
  • domains.csv : All domains (URL domain, URL domain class & Records)
  • URLs.csv : All expanded URLs (URL, URL domain, URL domain class, & Records)
  • socialMediaURLs.csv : All social media links (URL, URL domain & Records)
  • mentions.csv : All mentions (Mention, Mention User Class, & Records)
  • offlMentions.csv : Mentions of official accounts by User Class

What Your Program Needs to Do

In this project, your program needs to extract interesting information from
the data and display it for
the user. Some ideas for interesting information include:

  • Create a bounding box and compute what percentage of geocoded tweets fall within this area (you can create multiple bounding boxes e.g. 50 miles, 100 mile etc.). A bounding box is rectangular area defined by a north and south latitude and an east and west longitude. The coordinates for the center ofthe fire are (48.211 latitude, -120.103 longitude). I will provide you with the code to calculate the east, west, north, and south boundaries for a bounding box. Unless, of course, you want to take this on yourself and then we will applaud your efforts!
  • Look at links to social media to see what platforms were the most popular for sharing information (e.g. YouTube, Facebook, Instagram, etc). What were the most popular posts
  • What sorts of media do people tend to embed in their tweets and what are the most popular sources of information?
  • Who tweeted the most (top ten vs. top per user class) and what class of account are they?
  • What sources were mentioned and retweeted the most during the fire?

Start with a description of what your program does

There is no COG for this assignment, the TAs will be grading everyone’s
project by hand. Your TA needs to know what your program does when they run
it. The first thing your program needs to do is print a welcome message to the
user that concisely explains program functionality For example, your program
might print something like:
Welcome. This program calculates the percentage of geocoded tweets that
fall within a specified distance from the 2014 Carlton Complex Wildfire.

Get user input from at least one menu

There should be a menu in your introduction that asks for input from the user.
You are welcome to
add additional menus if you need additional input from the user. For example,
after displaying the
welcome message you could display a menu to ask the user to specify the
distance:
Enter the distance:
1) Within 50 miles
2) Within 100 miles
3) Within 200 miles
4) Within 500 miles
5) Within 1000 miles

Present results and ask for another query

Using the input from the user, display the results in a neatly formatted
message such as:
The percentage of geocoded tweets that fell within 50 miles: 52%
After displaying the message, your program needs to ask the user if they would
like to perform any more calculations. If the user says Yes, you should
display the first menu again. If the user says No, you should display an exit
message and exit the program. The details of the exit message are described
below.

A final message

After the user selects No, and you exit your loop, you need to print another
message to the user. In this message, briefly explain the easiest, hardest,
and most and least enjoyable portions of this project. Then, exit the program.

Implementation Details/ Technical Requirements

Store data from the files in an object

Your program needs to have at least one class. A technical requirement of this
project is that you create a class to support the functionality of the
program. The class(es) you create will depend on the problem and data you are
working with. For instance, if you are working with individual tweets you may
need a Tweet class. If you are working geocoded tweets you may also want a
Geocode class that stores the latitude /longitude data.
The first thing your program needs to do, even before displaying the welcome
message, is input the data from the txt files. Data should be read in from the
files and stored in the appropriate variable in your class to support what
your program does. You should structure your program to read in all data only
one time.

Other requirements:

  1. All variables in your class need to be private and accessed through public methods. For example, if one of the class variables is latitude then you will need getLatitude() and setLatitude() methods.
  2. You need at least three objects. For example if you create a class Tweet, then you need at least three instances of Tweet in your program.
  3. You are welcome to generate new data files to support your program’s functionality. For example, if you are working with the URL domains extract, you may want to limit your analysis to domains that occur at least 25 times. The data in these sub-extracts is sorted by count, so you can import the .csv file into Excel and delete the rows that fall below 25. You can also write a program or talk to us about the specific slice of data you are interested in.
  4. If you store data in an array, you can create an array that is larger than you need and leave some of it unused. Look at the arrays in the AppleFarmer class for an example of what you might do for this assignment. You will need to keep track of how much ofthe array is used. The technique for doing this is the same as using the Curren tDay variable in AppleFarmer.
  5. The easiest way to read the .csv files is to use getline() for each line in the file and then use stringstream to parse the line. There are examples of how to do both of these things in notes provided on the Moodle.
  6. When you submit your program, include all data files you used in your project directory.

文章作者: SafePoker
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 SafePoker !
  目录