Python代写:CS4006SupportVectorMachineClassifiers


代写实现SVM分类器。

Requirement

In this assignment, we use the scikit-learn package to train an SVM
classifier. To do so, we need to tune 2 hyperparameters: the cost C and
precision γ (gamma). We are going to use K-fold cross-validation to determine
the best combination of values for this pair.

Question 0

Have a look at the 3 first cells. In the third one, take note of how the SVC
object is instantiated and trained, how labels are predicted, and finally how
the fitting error is computed. In this assignment, the prediction error after
a given training is simply defined as the number of misclassified labels.

Question 1

Using an SVM classifier with an RBF kernel, use 10-fold cross-validation to
find the best cost and precision parameters. The range of test values for each
parameter is provided. a. First compute the cross-validation error matrix: for
each parameter combination, instantiate an SVM classifier; for each split
provided by the KFold object, re-train this classifier and compute the
prediction error; the cross- validation error is the average of these errors
over all splits. b. Use the error matrix to select the best parameter
combination. c. Visualize the error matrix using imshow and the ‘hot’
colormap.

Question 2

Plot the decision boundaries of this classifier, by appropriately modifying
the code from the previous assignments. Display the support vectors on the
same figure.

Question 3

Evaluate and print the generalization error of this classifier, computed on
the test set.

Code

imports

import matplotlib.pyplot as plt  
import numpy as np  
from sklearn.model_selection import KFold  
from sklearn.svm import SVC  
%matplotlib inline  

—|—

Load and display the training data

features = np.load("features.npy")  
labels = np.load("labels.npy")  
print("features size:", features.shape)  
print("labels size:", labels.shape)  
# Extract features for both classes  
pos = labels == 1 # 1D array of booleans, with pos[i] = True if labels[i] == 1  
features_pos = features[pos] # filter the array with the boolean array  
neg = labels != 1  
features_neg = features[neg]  
# Display data  
fig, ax = plt.subplots()  
ax.scatter(features_pos[:, 0], features_pos[:, 1], c="red", label="Posit ive class")  
ax.scatter(features_neg[:, 0], features_neg[:, 1], c="blue", label="Nega tive class")  
ax.set_title("Training data")  
ax.set_xlabel("Feature 1")  
ax.set_ylabel("Feature 2")  
ax.legend()  
plt.show()  

—|—
features size: (500, 2)
labels size: (500,)

Training the SVM classifier with arbitrary hyperparameters

cost = 1  
gamma = 1  
# Train the SVM classifier.  
svm = SVC(C=cost, kernel='rbf', gamma=gamma)  
svm.fit(features, labels)  
# Predict labels.  
# Note that here we use the same set for training and testing,  
# which is not the case in the remainder of the assignment.  
predicted_labels = svm.predict(features)  
# Compute the error.  
# Note: since in Python, True and False are equivalent to 1 and 0, we ca n  
# directly sum over the boolean array returned by the comparison operato r.  
error = sum(labels != predicted_labels)  
print("Prediction error:", error)  

—|—
Prediction error: 98

Training with K-fold cross-validation

Define test values for the cost and precision parameters

def logsample(start, end, num):  
  return np.logspace(np.log10(start), np.log10(end), num, base=10.0)  
num_gammas = 20  
num_costs = 20  
gamma_range = logsample(1e-1, 1e3, num_gammas)  
cost_range = logsample(1e-1, 1e3, num_costs)  

—|—

Compute the cross-validation error for each parameter combination

The KFold class from scikit-learn is a “cross-validation” object, initialized
with a number of folds. For each fold, it randomly partitions the input data
into a training set and a validation set. The documentation ( [ http://scikit-
learn.org/stable/modules/generated/sklearn.model_selection.KFold.html
](http://scikit-
learn.org/stable/modules/generated/sklearn.model_selection.KFold.html) )
provides an example of use.
K = 10 # number of folds for cross validation
kf = KFold(n_splits=K)
cv_error = np.zeros((num_gammas, num_costs)) # error matrix
# TODO (Question 1)
# /TODO (Question 1)
—|—

Train the classifier with the best parameter combination

# Find gamma and cost giving the smallest error  
# TODO (Question 1)  
# /TODO (Question 1)  
# Train the SVM classifier using these parameters  
svm = SVC(C=cost, kernel='rbf', gamma=gamma)  
svm.fit(features, labels)  
support_vectors = svm.support_vectors_  

—|—

Display cross-validation results and decision function

# Sample points on a grid  
num_points = 100  
x_rng = np.linspace(0, 1, num_points)  
y_rng = np.linspace(0, 1, num_points)  
grid_x, grid_y = np.meshgrid(x_rng, y_rng)  
# Evaluate decision function for each point  
xy_list = np.column_stack((grid_x.flat, grid_y.flat))  
values = svm.decision_function(xy_list)  
values = values.reshape((num_points, num_points))  
# Display  
fig = plt.figure(figsize=plt.figaspect(0.25))  
ax = fig.add_subplot(1, 3, 1)  
ax.set_title("Cross-validation error")  
ax.set_xlabel("Log10 of the cost parameter")  
ax.set_ylabel("Log10 of the precision parameter")  
# TODO (Question 1)  
# /TODO (Question 1)  
ax = fig.add_subplot(1, 3, 2)  
ax.set_title("Decision function")  
ax.set_xlabel("Feature 1")  
ax.set_ylabel("Feature 2")  
ax.imshow(values, extent=[0, 1, 0, 1], origin='lower')  
ax = fig.add_subplot(1, 3, 3)  
ax.set_title("Support vectors and isolevels of the decision function")  
ax.set_xlabel("Feature 1")  
ax.set_ylabel("Feature 2")  
# TODO (Question 2)  
# /TODO (Question 2)  
plt.show()  

—|—

Generalization error

Load the test data

# Load the training data  
test_features = np.load("test_features.npy")  
test_labels = np.load("test_labels.npy")  
print(test_features.shape)  
print(test_labels.shape)  

—|—
(500, 2)
(500,)

# TODO (Question 3)  
# /TODO (Question 3)  

—|—


文章作者: SafePoker
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 SafePoker !
  目录