Introduction
Python的基础作业,完成对应函数即可,包含Text processing和Visualization两个部分。
Overview
In this assignment you will be developing a rudimentary system for visualising
traffic incidents in the Canberra region. You will do this by parsing tweets
from the ACTPol_Traffic twitter account to extract the location (street and
suburb) for incidents reported in each tweet. You will then plot these
locations on a map of Canberra such as the one shown below. All data and
starter code is provided.
Getting and Submitting the Assignment
The assignment is entirely managed from within the School’s GitLab server. You
will fork and checkout the assignment to work on it and then commit your
changes and push them back to the GitLab server to submit your solutions. You
should push back often. The latest version of your code available at the due
date will be considered as your assignment submission.
Forking and Cloning the Data and Code
Before working on the assignment you must first fork a copy for GitLab server.
Once you have forked the project, you will need to clone the project
repository using PyCharm IDE as you did in the first assignment.
Important: Make sure your fork of the assignment is private before you start
committing work to it, otherwise everyone using GitLab will be able to see
your work. This should happen automatically but it is always good practice to
check. You can set a Project to private via the GitLab interface:
- Go to the version of the assignment that you’ve forked from your Profile page.
- Click on the Gear drop-down box in the top right and choose “Edit Project”.
- Under “Visibility Level”, select “Private”.
- Click on the “Save changes” button
Description of the Data
This section describes how we obtained the data for the assignment. It is
provided for your information. You do not need to implement the steps outlined
in this section. All data is provided in the data directory.
The ACTPol_Traffic file was obtained by dumping tweets posted from the
ACTPol_Traffic Twitter account. We did this using the ‘twitter-archiver’
application, which is available in the Python ‘twitter’ package, as follows:
sudo easy_install twitter
twitter-archiver -o ACTPol_Traffic
We repeat, this is for your information only. You do not need to execute the
steps above.
The file has one tweet per line. The start of the line contains the following
space-separated fields, which are then followed by the tweet (until the end of
the line)
- Unique tweet identifier
- Date
- Time
- Timezone (AEST)
- Twitter account in angled brackets
There are two versions of the map of Canberra obtained from Open Street Map,
http://www.openstreetmap.org/ . Associated
with each map is a text file indicating the latitude and longitude range
covered by the map. The format of the lat/lon files is:
lat_max lat_min
lon_min lon_max
Canberra street locations were obtained via Google’s geocode API. The file
format includes the query address, returned address, latitude, longitude,
location type (from Google), and number of results found. The fields are
delimited by a vertical bar.
Tasks
You are provided with:
- a map of Canberra (and longitude/latitude bounds on the map)
- a file containing street and suburb names and their locations
- a file of tweets obtained from the ACTPol_Traffic twitter account
- and starter Python code and test cases
You should complete the tasks listed below in the Python 3 programming
language. You are free to use additional Python libraries to complete the task
as long as they are bundled with the Anaconda 3 distribution. Each task is
associated with a detailed GitLab issue which you must close when completing
the task. A break down of marks is given next to each task.
Part A. Text Processing (40%)
- parse and normalise tweets (15%)
- match tweets against some street names (10%)
- compute histogram of suburb mentions (15%)
* Note: the histogram could be slightly inaccurate since suburb can be same as street names
Part B. Visualization (55%)
- parse street locations from file (10%)
- read and display the map (5%)
- compute lat/long map coordinates (and test) (15%)
- animate tweet events on map (15%)
- show heatmap of events such as shown below (bonus) (10%)
Miscellaneous (5%)
- demonstrated use of GitLab (5%)
* inclusion of NOTES.md file
* commit history and issue tracking