代写AI中的Decision Trees,通过测试即可。
Summary
This assignment will involve writing a program that creates a decision tree
based on a set of training data. You will implement the information gain
process for building the best decision tree from a set of training data
provided in a file. You can work with a partner on this assignment. I will
post my usual Moodle thread asking for partners.
The Assignment
Your program should do the following:
- Prompt the user to enter a file name.
- Read the information from the file (see below).
- Build the best decision tree that results from the training data.
- Display or print the decision tree.
The printing of the tree should be readable. See the end of this document for
an example of what your output might look like.
If you cannot implement the information gain properly, a program that produces
any correct tree will gain most of the points.
Assumptions
You may make the following assumptions for the decision trees produced by your
program:
- The classification will be one of two values - TRUE and FALSE, for example.
- The value for each attribute will be one of two values - TRUE and FALSE for example.
- These values will not be named, and will be represented by 0 and 1 (see below).
- There will be no pieces of training data with partial data.
- There may be contradictory training data - you should assign values based on the majority in that case.
The File Structure
For this assignment, you will be reading in a file containing information. The
file will consists of some number n of attribute names, including the
classification name as the last of these. Then, there will be a blank line.
Then, there will be some number m of training instances, consisting of n
values that are 0 or 1, separated by commas. These represent the values of the
attributes (and the classification) for that piece of training data. For
example, a file might look like this:
Brown
Wrinkled
Smelly
Spongy
POISON
1,0,1,0,1
0,0,1,1,0
0,1,0,0,1
0,0,0,1,0
1,1,0,1,1
1,0,1,1,0
1,1,1,0,1
0,0,0,0,0
This file details a set of training data with five attributes: Brown,
Wrinkled, Smelly, Spongy; the final attribute is POISON, which is the
classification.
Then there are eight pieces of training data. The first is 1, 0, 1, 0, 1: the
first 1 represents the value for Brown of true; the second, Wrinkled, is false
(0); the third, Smelly, is true (1), the fourth, Spongy, is false (0). The
classification (POISON) is true (1). The rest of the training data works in a
similar way.
The best tree for this data would look like this if we printed it out:
Wrinkled
yes:
POISON = TRUE
no:
Brown
yes:
Spongy
yes:
POISON = FALSE
no:
POISON = TRUE
no:
POISON = FALSE
—|—