R代写：TBA2104PredictiveAnalytics

发布日期: 2020-05-04

代写数据分析作业，用R来回答问题。

Requirement

This question is based on the diabetes dataset (diabetes.arff). This dataset
consists of 768 observations and 9 attributes. The brief description of the
attributes are as follows:

preg : Number of times the patient is pregnant
plas: Plasma glucose concentration
pres : Diastolic blood pressure (mm Hg)
skin : Triceps skin fold thickness (mm)
insu: 2-hour serum insulin (mu U/ml)
mass : Body mass index (weight in kg / (height in m)^2)
pedi : Diabetes pedigree function
age : Age (years)
class : Class variable (either tested_negative or tested_positive)
a) Provide the R codes for loading the data into a variable Diabetes.
b) Provide the R codes generating the CSV equivalent of the diabetes dataset (diabetes.csv).
c) Compare and contrast the similarities and differences of the ARFF format and the CSV format.
d) Provide the R codes for generating a logistic regression model (model) using class as the response and the other attributes as predictors.
e) Using the logistic regression results of the model, write down the equation of logodds of the model. Please round off all the coefficient estimates to 4 decimal places.
f) We learned that logistic regression uses a logistic function: Pr(Y=REFERENCE_CLASS | data) (i.e. the probability of class = REFERENCE_CLASS given a data point. It turns out that R uses the first level value of a factor-type attribute as the reference class.
g) Provide the R codes for verifying the probability value of f) using the predict() function in R.
h) Suppose you want to change the reference class in R to tested_positive, you could use the relevel() function. Read the help pages and provide the R command to change reference class to test_positive so that predict() will be based on tested_positive.
i) If you were to generate a new model (model2) using tested_positive as the reference class, what is the difference in the regression model of model2 compared to model?