MachineLearning代写:COMPSCI226GradientBoostingAlgorithm


机器学习需要考虑数据中存在的偏差,使用 Gradient boosting
算法为Allstate进行索赔预测。
![Gradient
boosting](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fe/Kernel_Machine.svg/220px-
Kernel_Machine.svg.png)

Requirement

First Option

Machine learning is known to include bias captured in the data. For example,
most natural language processing has suggested doctors as male and nurses as
female. This bias creates discrimination implicitly, even then algorithm
excludes the variable in modeling, due to the exposure distribution. In this
task, you are asked to,

Second Option

Gradient boosting with both tree and linear base learners: xgboost and
lightgbm are the most popular boosting libraries for data scientists. However,
most of the applications presumes the use of tree as base learners. Other base
learners, listed below, are rarely utilized.
xgboost has included linear predictor as a base learner option. (by setting
booster=”gblinear” in parameter). However, the existing library does not allow
both tree and linear predictor to estimate parameters in the same model run.
In this assignment, you need to modify the source codes of lightgbm package (
https://github.com/microsoft/LightGBM ):

  • A) Includes a booster similar to gblinear from xgboost. The module should allow users to train linear booster and predict from admissible dataset. You can safely assume the dataset to be fully numeric and treating missing values as zeros. (Tips: a new linear module should reside in LightGBM/src/boosting/ folder)
  • B) Enable the library to call different boosting at each iteration. For example, in a 500-iteration run, the model select tree (gbdt) in the first iteration and linear in the 2nd and 3rd. The flow should be in each iteration, there is a base learner in the 2 assignment mechanism given by a probability parameter provided by users. In each iteration, the algorithm should first generate a random number so that the base learners will be assigned with the appropriate probability. You can safely assume gbdt and linear are the only members.
  • C) The resulting booster (called gbdt_and_linear) should have the class functions and object as gbdt. i.e., training, predict, calculating metrics etc.
  • D) Write a python/R code to train the algorithm for Allstate claim prediction competition and make one submission. Data can be found in https://www.kaggle.com/c/ClaimPredictionChallenge/data

文章作者: SafePoker
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 SafePoker !
  目录