Python代写：COMP4100K-NearestNeighborForBinaryClassification

发布日期: 2023-07-14

使用 KNN
算法进行分类。
![KNN](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e7/KnnClassification.svg/220px-
KnnClassification.svg.png)

Instructions

Do not import other libraries. You are only allowed to use Math, Numpy packages which are already imported in the file. DO NOT use scipy functions.
Please use Python 3.5 or 3.6 (for full support of typing annotations). You have to make the functions’ return values match the required type.
In this programming assignment you will implement k-Nearest Neighbours. We have provided the bootstrap code and you are expected to complete the classes and functions.
Download all files of PA1 from Vocareum and save in the same folder.
Only modifications in files {knn.py, utils.py} will be accepted and graded. test.py can be used testing purposes on your local system for your convenience. It will not be graded on vocareum. Submit {knn.py, utils.py} on Vocareum once you are finished. Please delete unnecessary files before you submit your work on Vocareum.
DO NOT CHANGE THE OUTPUT FORMAT. DO NOT MODIFY THE CODE UNLESS WE INSTRUCT YOU TO DO SO. A homework solution that mismatches the provided setup, such as format, name initializations, etc., will not be graded. It is your responsibility to make sure that your code runs well on Vocareum.

Notes on distances and F-1 score

In this task, we will use four distance functions: (we removed the vector
symbol for simplicity)

Canberra Distance
Minkowski Distance
Euclidean distance
Inner product distance
Gaussian kernel distance
Cosine Similarity
An inner product is a generalization of the dot product. In a vector space, it
is a way to multiply vectors together, with the result of this multiplication
being a scalar.
Cosine Distance = 1 - Cosine Similarity
F1-score is a important metric for binary classification, as sometimes the
accuracy metric has the false positive (a good example is in MLAPP book
2.2.3.1 “Example: medical diagnosis”, Page 29). We have provided a basic
definition. For more you can read 5.7.2.3 from MLAPP book.

Part 1.1 F-1 score and Distances

Implement the following items in utils.py

function f1_score
class Distances
- function canberra_distance
- function minkowski_distance
- function euclidean_distance
- function inner_product_distance
- function gaussian_kernel_distance
- function cosine distance
  Simply follow the notes above and to finish all these functions. You are not
  allowed to call any packages which are already not imported. Please note that
  all these methods are graded individually so you can take advantage of the
  grading script to get partial marks for these methods instead of submitting
  the complete code in one shot.

Part 1.2 KNN Class

The following functions are to be implemented in knn.py:

Part 1.3 Hyperparameter Tuning

In this section, you need to implement tuning_without_scaling function of
HyperparameterTuner class in utils.py. You should try different distance
functions you implemented in part 1.1, and find the best k. Use k range from 1
to 30 and increment by 2. Use f1-score to compare different models.

Part 2 Data transformation

We are going to add one more step (data transformation) in the data processing
part and see how it works. Sometimes, normalization plays an important role to
make a machine learning model work. This link might be helpful
https://en.wikipedia.org/wiki/Feature_scaling
Here, we take two different data transformation approaches.

Normalizing the feature vector

This one is simple but some times may work well. Given a feature vector, the
normalized feature vector is given.
If a vector is a all-zero vector, we let the normalized vector also be a all-
zero vector.

Min-max scaling the feature matrix

The above normalization is data independent, that is to say, the output of the
normalization function doesn’t depend on rest of the training data. However,
sometimes it is helpful to do data dependent normalization. One thing to note
is that, when doing data dependent normalization, we can only use training
data, as the test data is assumed to be unknown during training (at least for
most classification tasks).
The min-max scaling works as follows: after min-max scaling, all values of
training data’s feature vectors are in the given range. Note that this doesn’t
mean the values of the validation/test data’s features are all in that range,
because the validation/test data may have different distribution as the
training data.
Implement the functions in the classes NormalizationScaler and MinMaxScaler in
utils.py

normalize
normalize the feature vector for each sample . For example, if the input features = [[3, 4], [1, -1], [0, 0]] , the output should be [[0.6, 0.8], [0.707107, -0.707107], [0, 0]]
min_max_scale
normalize the feature vector for each sample . For example, if the input features = [[2, -1], [-1, 5], [0, 0]] , the output should be [[1, 0], [0, 1], [0.333333, 0.16667]]

Hyperparameter tuning with scaling

This part is similar to Part 1.3 except that before passing your trainig and
validation data to KNN model to tune k and distance function, you need to
create the normalized data using these two scalers to transform your data,
both training and validation. Again, we will use f1-score to compare different
models. Here we have 3 hyperparameters i.e. k, distance_function and scaler.

Use of test.py file

Please make use of test.py file to debug your code and make sure your code is
running properly. After you have completed all the classes and functions
mentioned above, test.py file will run smoothly and will show a similar output
as follows (your actual output values might vary).