代写Machine Learning中的SVM和Adaboost算法,并进行对比和性能分析。
Support Vector Machine classifiers
Use the LibSVM package to train a SVM classifier: a simple example of its use
is provided in demo libsvm.m. The first 3 bullet items outline the steps
followed in SVM.
- Use a Gaussian kernel, and use 10-fold cross-validation to choose the cost and precision parameters (referred to as -c: cost / -g: gamma parameters in LibSVM documentation).
Visualize the cross-validation error (a 2D function) using imagesc.
Once you have picked the best combination of cost and precision evaluate the
classifier’s generalization error on the test-set - report the number of
misclassified points. - Plot the decision boundaries, by appropriately modifying the code from the previous exercises. Superimpose on your plot also the support vectors, using the code outlined in SVM.m.
If you work is working right, you should be getting results of the following
form for the cross-validation error and the decision boundaries:
Adaboost
In Adaboost.m your task is to train an Adaboost classifier with synthetic
data. For reference, you are provided with the posterior P(y = 1|x), x {[0, 1]
[0, 1]} regularly sampled on the [0, 1] [0, 1] domain, so that you will see
how the output of the Adaboost classifier better approximates the posterior at
each round.
- Implement the Adaboost algorithm described in class. This involves iterating over the following steps:
- Find the best decision stump at each round.
- Evaluate the weak learner’s weighted error, and estimate t.
- Update the distribution over the training samples.
- Plot as a function of the boosting round. Make sure that E is monotonically decreasing with time (otherwise your code is wrong). Verify that E(y(x) f(x)) provides an upper bound for the number of errors.
- Show the response of your strong learner side-by-side with the posterior. Make sure that they look increasingly similar at larger iterations.
If you code is working right, you should be getting results of the following
form for the cross-validation error and the decision boundaries: strong
learner round 10 strong learner round 50 strong learner round 100strong
learner round 400
Top row: strong learner scores, displayed within [-1, 1] for dierent numbers
of boosting rounds.
Bottom row: left: loss function of Adaboost for increasing training rounds;
middle: class posterior (groundtruth); right: adaboost-based approximatiot to
posterior, at round 400.
Some clarifications
- The form of a decision stump, where d is the the dimension along which the decision is taken, s is 1 if is true and 0 otherwise, is the threshold applied along dimension d and s {-1, 1} is the polarity of the decision stump.
SVM vs. Adaboost
- When searching for the best decision stump you need in principle to consider all possible combinations of d. As discussed in class, for a given dimension d the only values of that you actually need to work with are the values taken on by the training set features along that dimension.
SVM vs. Adaboost BONUS
Compare the performance of the classifier you obtained from the previous part
of the exercise with that of Adaboost. Use the same training and test sets for
both classifiers. As above, make sure you have reproducible results, by
appropriately setting the seed variable in rand,randn+.
Report their average misclassification errors on the training set and on the
test set. What do you observe?
Fast weak learner selection BONUS
When picking the best weak learner at each round a substantial speedup can be
gained by a clever implementation of the threshold selection. Below some hints
are provided, you can try to implement it quickly based on these hints.
Consider that we are searching for the best weak learner that operates with
the k th feature dimension, f(k) . Our task is to choose the best threshold
for a decision stump of the form.
The naive implementation is to consider all possible thresholds, where N is
the training sample size, and for each of those evaluate the associated
weighted error.
The complexity of this computation is O(N^2), since for every of the N
threshold values we need to perform N summations and multiplications.
Instead of this, a faster algorithm will first sort the N training samples
according to their value of k; namely k consider that we have a permutation of
the training set such that f[i], respectively. Using these, we then have the
following recursion for C(i).
The first and second equations follow from the definition in Eq. 2 and the
fact that f have been sorted. The third equation can be verified by
substitution. Sorting has O(N log N) complexity, while the recursion can be
implemented in O(N) time, so the overall complexity goes down from O(N^2) to
O(N log N). In the code provided to you, this idea has been used to accelerate
the weak learner selection.
Inspect the code provided to you and reply to the following questions in no
more than 100 words (the shorter, the better):
- What is the code in lines 22-26 doing?
- What is the code in lines 28-38 doing?
Note: if you do not understand the code by looking at it, this is perfectly
normal. Before “leaking” the solution, the assignment was asking you to write
all of this code. Now, since the solution is leaked, we are asking you to
spend time to understand what is going on in the provided solution. Running
the code and observing the variables is an option to help you with doing this.
But, since this is a bonus exercise, you are expected to “go the extra mile”
to get the bonus.