使用 Azure ML Studio 对提供的数据集创建预测模型。
Instructions
For this assignment, you are to create a predictive model in Azure ML Studio
for the attached dataset and turn in a report as specified in the following
pages. You should use whichever data preparation, modeling, and model
assessment techniques that were covered in this portion of the class that you
believe result in the best model.
You will be performing an Exploratory Data Analysis, Model Development and
Training, and Model Deployment activities and preparing a report in PowerPoint
form.
See the sample report that is part of this assignment for a template and
example.
When you are complete, save this file as a PDF and upload it to Gradescope.
As a reminder, the work that you submit must be done individually. Unlike the
homework assignments, working together is not permitted and the graders will
be looking for identical solutions.
For this assignment, you will use Azure ML Studio Designer to build a
classification model to predict the likelihood of a patient developing Chronic
Heart Disease (CHD) in the coming ten years. The dataset you will be using has
been distributed with this exam and consists of the variables on the following
page.
Data Dictionary
Variable | Description |
---|---|
Age | age of the participant at the time of examination |
Male | gender of the participant (male =1, female = 0) |
Education | Educational level of the patient (1 = less than high school, 2 = |
completed high school or equivalent, 3 = some college, 4= completed college or | |
higher) | |
Income | Income of the patient |
Current Smoker | whether the participant is currently a smoker (yes or no) |
Cigarettes per Day | the average number of cigarettes smoked per day by |
current smokers | |
BP Meds | whether the participant is taking blood pressure medication (yes |
or no) | |
Prevalent Stroke | whether the participant has a history of stroke (yes or |
no) | |
Prevalent Hyp | whether the participant has a history of hypertension (yes |
or no) | |
Diabetes | whether the participant has diabetes (yes or no) |
Total Chol | total cholesterol level in milligrams per deciliter |
Sys BP | systolic blood pressure in millimeters of mercury |
Dia BP | diastolic blood pressure in millimeters of mercury |
BMI | body mass index in kilograms per square meter |
Heart Rate | resting heart rate in beats per minute |
Glucose | Blood glucose level in milligrams per deciliter |
A1c | Hemoglobin A1c (%) |
Ten Year CHD | whether the participant developed coronary heart disease |
(CHD) within 10 years of the examination (yes or no) |
Note On Model Deployment
- When complete, create a real-time endpoint for your model and copy the REST Endpoint URL and the authentication key into a Google drive spreadsheet that will be published.
- The TAs will run scripts to independently evaluate your model performance sometime.
- Once complete, a message will be posted on Piazza and you should then delete your endpoint.
Final Report Structure
Please follow the provided template/example and structure your final report
into the following three sections:
- Exploratory Data Analysis
- Model Development
- Model Deployment
Final Report Outline/Grading Rubric
Report contents
- Attribute summary
- Data cleansing - summary of decisions made
- Data cleansing pipeline (portion of your overall pipeline)
- Univariate analysis
- Bivariate analysis (each variable vs the response variable)
- Feature section/engineering decisions
- Model pipeline screenshot
- Model evaluation results screenshot
- Inference pipeline screenshot
- REST Endpoint URL and authentication key (in PPT and in Google drive spreadsheet)
- Screenshot of scored test dataset
Model performance - Based on TAs calling your endpoint with test dat