用 Pandas 等三方库,代写 Financial
technology 相关程序,进行股票分析,
![Financial
technology](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d7/Philippine-
stock-market-board.jpg/250px-Philippine-stock-market-board.jpg)
Requirement
This assignment builds on Lectures 7 to 9 and on Tutorials 6 and 7. You might
want to consider using some of the Python code discussed in those lectures and
tutorials to answer some of the questions below.
Important: It is important that you do not change the type (markdwon vs. code)
of any cell, nor copy/paste/duplicate any cell! If the cell type is markdown,
you are supposed to write text, not code, and vice versa. Provide your answer
to each question in the allocated cell. Do not create additional cells.
Answers provided in any other cell will not be marked. Do not rename the
assignment files. All files should be left as is in the assignment directory.
Task
You are given two datasets:
- A file called Assignment4-data.csv , that contains financial news (headlines) and daily returns for Apple (AAPL). Relying on this dataset, your role as a FinTech student is to explore the relationship between financial news and stock returns.
- A file called AAPL_returns.csv , that contains the daily returns for Apple (AAPL).
Helpful commands
You may find the following commands helpful to complete some of the questions.
- How to create a new column using data from existing column? Recall that, in
Tutorial 7, we worked with a variable called FSscore . Suppose we wanted to
divide all the values of this variable by 100 and store the outcome in a new
column. This can be done in one step. The code df[‘FSscore_scaled’] =
df[‘FSscore’]/100 creates a new column with the name FSscore_scaled and stores
the modified values. - How to separate a string variable into a list of strings? The method split() splits a string into a list based on a specified separator. The default separator is any white space. However, one can specify the applied separator as an argument. For example, the code “a,b,c”.split(“,”) splits the string “a,b,c” into the list [a, b, c].
- You can use string functions such as split() on a Pandas dataframe column by using the str attribute. For example, df[‘alphabets’].str.split(“,”) returns a series (consider a series as a dataframe with one column) that contains a list obtained by running the split function on each entry in the column named alphabets .
- How to chain multiple string operations in Pandas ? Note that a string function on a Pandas column returns a series. One can then use another string function on this series to chain multiple operations. For example, the cell below first converts the string to upper case and then calls the split function.
- How to combine two or more data frames? For this purpose, one can use the concat function from Pandas . To combine the dataframes to match indices you can use the axis=1 argument. Please see https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html for examples.
Please run the following cell to import the required libraries and for string
operations example.In [1]
:
## Execute this cell
####################### Package Setup ##########################
# Disable FutureWarning for better aesthetics.
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# essential libraries for this assignment
from finml import *
import numpy as np
import pandas as pd
%matplotlib inline
# for logistic regression
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
# suppress warnings for deprecated methods from TensorFlow
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
################################################################
# Example of string operations
import pandas as pd
example_data = {'alphabets':['a,b,c', 'd,e,f', 'a,z,x', 'a,s,p']}
example_df = pd.DataFrame(example_data)
# Chain two string operations
example_df['alphabets'].str.upper().str.split(",")
—|—Out[1]
:
0 [A, B, C]
1 [D, E, F]
2 [A, Z, X]
3 [A, S, P]
Name: alphabets, dtype: object
Data exploration and transformation
The dataset has the following three columns:
- date: This column contains the date of the observation.
- headlines: This column contains the concatenation of headlines for that date. The headlines are separated by the
<end>
string. For example, if there are three headlines h1 , h2 , and h3 on a given day, the headline cell for that day will be the stringh1<end>h2<end>h3
. - returns: This column contains the daily returns.
In your assessment, please address the following questions.
Question 1
Load the dataset in a Pandas dataframe and write a Python code that plots the
time series of the daily Apple returns (returns on the y-axis and dates on the
x-axis). Make sure your plot’s axes are appropriately labelled.
Note: Please use df as the variable name for the dataframe and the parse_dates
argument to correctly parse the date column.
Answer 1
In [41]
:
“””Write your code in this cell”””
import pandas as pd
df = pd.read_csv(‘AAPL_returns.csv’,index_col = 0,parse_dates=True
ax = df.plot( x=’date’, y=’daily Apple returns’)
ax.set_xlabel(“date”)
ax.set_ylabel(“daily Apple returns”)
df.plot()
—|—Out[41]
:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7cb1ae d6a0>
Question 2
Write a Python code that plots the time series of daily headline frequencies
(the number of headlines per day on the y-axis and the corresponding date on
the x-axis). Make sure your plot’s axes are appropriately labelled.
Answer 2
In [*]
:
“””Write your code in this cell”””
import matplotlib.pyplot as plt
df = pd.read_csv(‘Assignment4-data.csv’, encoding = “ISO-8859-1”)
df.head()
df.headlines.hist();
—|—
Question 3
We will use neural networks to explore the relationship between the content of
financial news and the direction of stock returns, i.e., their classification
into positive or negative returns.
- Create a new column called returns_direction in the dataframe that classifies daily returns based on their direction: it assigns a given return a value of 1, if the return is positive (i.e, greater than 0), and a value of 0 otherwise. You may find the Numpy function where() useful for this question.
- Count the number of days on which the stock had positive and non-positive returns, respectively.
Answer 3
In [ ]
:
“””Write your code in this cell”””
# YOUR CODE HERE
raise NotImplementedError()
—|—
Question
For this question please restrict your computations to the first 100 headline
dates. You can select them by using the head function of Pandas . Calculate
the tf-idf metric for the following word and headline(s) pairs:
- Word “apple” in headlines with date 2008-01-07. Store this value in a variable called aaple_tfidf .
- Word “samsung” in headlines with date 2008-01-17. Store this value in a variable called samsung_tfidf .
- Word “market” for news headlines with dates 2008-03-06. Store this value in a variable called market_tfidf .
Please write a Python code that calculates the metrics from the df dataframe.
Answer 4
In [ ]
:
“””Write your code in this cell”””
# YOUR CODE HERE
raise NotImplementedError()
—|—
Question 5
Build and train a one-layer neural network with two units (neurons) to explain
return directions based on financial news. Report and interpret the following
three performance measures: “Precision”, “Recall”, and “Accuracy”. According
to your opinion, which performance measure(s) is (are) most important in the
context of linking news headlines to stock returns and why?
Answer 5 - Code
In [ ]
:
“””Write your code in this cell”””
# YOUR CODE HERE
raise NotImplementedError()
—|—
Answer 5 - Text
YOUR ANSWER HERE
Question 6
Explore dierent neural network models by changing the number of layers and
units.
You can use up to three layers and five units.
Complete the table below by adding your results for the test data set. You
should duplicate the table format in your own markdown cell and replace the
“-“ placeholders with the corresponding values. Discuss your findings for both
the test and train data sets.
Answer 6 - Code
In [ ]
:
“””Write your code in this cell”””
# YOUR CODE HERE
raise NotImplementedError()
—|—
Answer 6 - Text
YOUR ANSWER HERE
Question 7
Explore the eects of dierent splits between the training and testing data on
the performance of a given neural network model.
Complete the table below by adding your results. You should duplicate the
table format in your own markdown cell and replace the “-“ placeholders with
the corresponding values. Discuss your findings.
Complete the table below by adding your results for the test data set. You
should use the same markdown format and simply replace the “-“ placeholders
with the corresponding values. Discuss your findings for the dierent test and
train data sets.
Answer 7 - Code
In [ ]
:
“””Write your code in this cell”””
# YOUR CODE HERE
raise NotImplementedError()
—|—
Answer 7 - Text
YOUR ANSWER HERE
Question 8
Run a logistic regression with the same independent and dependent variables as
used for the above neural network models. You have access to the sklearn
package, which should help you answering this question. To work with the
sklearn package, you may find the following links helpful.
- Building a logit model: https://scikitlearn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
- Evaluating a logit model:
* Recall: https://scikitlearn.org/stable/modules/generated/sklearn.metrics.recall_score.html
* Precision: https://scikitlearn.org/stable/modules/generated/sklearn.metrics.precision_score.html
* Accuracy: https://scikitlearn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
Compare and contrast your findings with the above findings based on neural
network models.
Answer 8 - Code
In [ ]
:
“””Write your code in this cell”””
# YOUR CODE HERE
raise NotImplementedError()
—|—
Answer 8 - Text
YOUR ANSWER HERE
Question 9
Everything you did so far was explaining stock returns with contemporaneous
financial news that were released on the same date. To explore how well a
neural network can predict the direction of future returns based on our text
data, you should do the following.
- Please read the AAPL_returns.csv into a dataframe by using the parse_dates argument and create a new column returns_pred by shifting the returns by one trading day. For this purpose, you may find the shift function from Pandas helpful.
- Combine the df dataframe that contains headlines with this new dataframe such that for a given headline date, the value in returns_pred contains the return on the subsequent trading day.
- Train a neural network that uses financial news to learn the returns_pred variable. You are allowed to use any of the above neural network parameterisations and train/test data splits.
- Explain your findings with regard to the given data and your chosen parameters.
Interpret your results in the context of the Ecient Market Hypothesis (EMH).
Answer 9 - Code
In [ ]
:
“””Write your code in this cell”””
# YOUR CODE HERE
raise NotImplementedError()
—|—
Answer 9 - Text
YOUR ANSWER HERE