使用NumPy和 Pandas 库,任选数据,完成题目要求的Feature.
![Pandas](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/Pandas_logo.svg/300px-
Pandas_logo.svg.png)
Your Challenge
Your challenge in this assignment is to develop an interactive data-driven
web-based Python application that shows your mastery of many coding concepts
as you interact with real world data. You will use Pandas and NumPy modules
for managing and interacting with data, MatPlotLib or Pandas charts for
plotting, and the Streamlit.io package for creating interactive web
applications using Python.
Interact with Real-World Data
Choose one of these data sets:
- Cambridge, MA AirBnB Data (695 rows, data from insideairbnb.com)
- USA Earthquake Data (20,000 rows, from US Geological Survey https://usgs.gov website and visualizations
- McDonalds Locations in the USA (14,171 rows) from GavinR’s Github
- Boston Uber and Lyft Rideshare Data from 2018, downloaded from Kaggle (693,702 rows)
To ensure students create a variety of projects, you will sign up for the data
set you wish to use in class on Thursday. If you miss class, or if the signups
are not approximately equally distributed, I will assign a data set for you to
use.
If you decide to use the McDonalds data, please run the clean_mcdonalds.py
file that accompanies the data to add leading zeros to zip codes in the
Northeast.
Demonstrate Your Python Coding Skills
Your Python code should demonstrate your Python coding skills as you implement
several of the concepts that we studied throughout the course that appropriate
for your project, such as:
- Coding Fundamentals: data types, if statements, loops, formatting, etc.
- Data Structures: Interact with Lists, Tuples, Dictionaries (keys, values, items)
- Functions: passing parameters, returning values
- Files: Reading data from a CSV File
- Statistics or Pandas module functions for calculating mean, median, etc.
- MatPlotLib or Pandas for creating different types of charts
- StreamLit.io for making interactive applications, displaying charts and maps
- Numpy functions for interacting with arrays (such as np.arange)
- Pandas DataFrames for interacting and manipulating large data sets using filtering, sorting, pivot tables, etc.
You are not required to use all of these. For example, if your application
does not need to use a tuple, do not worry about trying to find a way to
include one.
Assignment Details
Part 1. Design
The purpose of this part is to get you thinking about what you might do before
you start coding. Identify two different queries or questions you can ask
about your data set and ways to interact with and present the data based on
your understanding of Pandas DataFrames, MatPlotLib, and the Streamlit.io
packages.
Describe how your queries will be interactive by incorporating Streamlit’s
user interface elements to obtain user input. Describe how you will visually
present this data using charts, graphs, Streamlit tables or maps. For example,
if analyzing housing data, you might use a dropdown list to specify a list of
neighborhoods and a slider to specify a price range. You then might display
all rooms for rent in that neighborhood within that price range using a table,
chart, or map. (That’s an easy one. At least one of your queries needs to be
more complex than this!)
Be sure your page is “user friendly” -and is as “polished” as possible.
Provide ample user instructions; label values that are part of the user
interaction, make sure your charts have titles, legends or explanations that
would be helpful to the user.
Create a Word document describing your plans. Submit it on Blackboard only. I
will respond within 24 hours on Blakcboard approving your proposed questions
or making suggestions if they appear to be too complicated or too easy. Due
dates for proposal.
You may change your queries or visualizations after you start coding if you
need to change your plans. If you do this, please notify me during the coding
week.
Feel free to add to your project as you explore Pandas and Streamlit
capabilities and find cool ways to implement new features. Part of your grade
will be a “complexity/originality” score. If you use a module or do something
cool that we may not have discussed in class, that will give you a higher
score.
Part 2. Code
Create your Python application with a Streamlit UI and the various
visualizations. Create at least two different charts, graphs of different
types with custom legends, axis labels, tick marks, colors, other features),
or a map showing latitude and longitude. Be sure to include appropriate
context or labels in your user interface to cue the reader about which values
to specify, and the purpose of each chart or graph. You may wish to add a few
sentences explaining each chart. Place all UI controls in the left sidebar,
and your visualizations in the main content area. Make your application as
professional looking as you can.
Coding Checklist
As you write your code, be sure to demonstrate your mastery of these
capabilities in your project:
- At least one function that has two parameters and returns a value
- At least one function that does not return a value
- Interacting with dictionaries, lists, and tuples
- Using a Python module to calculate a statistical function such as average, median, mode, etc.
- User Interface and dashboard with Streamlit.io
Your code should demonstrate your mastery of at least three Pandas
capabilities as appropriate for your queries and data. These include: - Sorting data in ascending or descending order, multi-column sorting
- Filtering data by one or more conditions
- Analyzing data with pivot tables
- Managing rows or columns
- Add/drop/select/create new/group columns, frequency count, other features as you wish
Usual rules about writing “good” code apply: - Make your code as modular and easy to follow as possible
- Include a docstring, comments, and meaningful variable names.
- If you did something “cool” in your code that you are incredibly proud of, please write a comment call attention to what you did.
- If you referred to any online articles or other information beyond class examples, please be sure to list them as references in your code.
- Make sure the program runs and the output is correct.
Documentation String
Use this documentation string at the top of your code file:
“””
CS230: Section XXX
Name: Your Name
Data: Which data set you used
Description:
This program … (a few sentences about your program and the queries and charts)
I pledge that I have completed the programming assignment independently.
I have not copied the code from a student or any source.
I have not given my code to any student.
URL: Link to your web application online (see extra credit)
“””
—|—
Part 3: Present
All presentations ill be done in class over two class periods. Please let me
know by Wednesday December 9 if you plan to present earlier in the week
(Monday for HB1 and Tuesday for HB3) or Thursday (both classes). If the
signups are not approximately equally distributed, I will assign a day for you
to present.
Part 4. Publish Your Application Online (Extra Credit)
Post your application to the web by following these Streamlit Sharing
instructions. This is a newly released feature. It may take a few days before
your request is filled, so sign up for the invite now! As an alternative, you
can deploy it to a server on Heroku by following these instructions or similar
tutorials you find online by searching for “streamlit deploy heroku”. The
extra credit will be five points added to your Assignment score.