Intro to machine learning
ANSWER
The ID3 (Iterative Dichotomiser 3) algorithm is a classic decision tree learning algorithm used for classification tasks. Here are the key steps to implement the ID3 algorithm:
- Data Preparation:
- Ensure your dataset is in a suitable format with binary class labels and attributes.
- You may need to preprocess the data to handle missing values, if any.
- Node Creation:
- Create an empty decision tree as the root node.
- Stopping Criteria:
- Define stopping criteria for when to halt tree construction. Common stopping criteria include:
- If all examples in the current node belong to the same class, mark the node as a leaf node and assign it the class label.
- If there are no attributes left to split on, mark the node as a leaf node and assign it the majority class label of the examples in the current node.
- You can also limit the depth of the tree to prevent overfitting.
- Define stopping criteria for when to halt tree construction. Common stopping criteria include:
- Attribute Selection:
- Implement a function to select the best attribute to split on at each node. ID3 typically uses Information Gain (IG) to measure attribute usefulness. The attribute with the highest IG is chosen.
- Calculate Information Gain for each attribute based on the formula:
scss
IG(Attribute) = H(D) - H(Attribute)
where H(D) is the entropy of the current node’s class distribution, and H(Attribute) is the weighted average entropy of the child nodes after splitting on the attribute.
- Splitting:
- Split the current node into child nodes based on the selected attribute. Each child node represents a binary value (0 or 1) of the attribute.
- Recursively apply the ID3 algorithm to each child node.
- Tree Construction:
- Continue the tree construction process recursively for each child node until one of the stopping criteria is met.
- Pruning (Optional):
- After tree construction, you can consider pruning the tree to reduce overfitting. Pruning involves removing branches that do not significantly contribute to classification accuracy.
- Tree Output:
- Your implementation should return the fully constructed decision tree.
- Prediction:
- Implement a function to make predictions using the decision tree. Traverse the tree based on attribute values to classify new instances.
- Evaluation:
- Evaluate the performance of your decision tree using appropriate metrics, such as accuracy, precision, recall, and F1-score, on a separate test dataset.
QUESTION
Description
In this assignment, you will implement the ID3 algorithm for learning decision trees. You may assume that the class label and all attributes are binary (only 2 values).
Please follow the instructions in the notebook:
The following notebook uses ID3 from sklearn library. You can use it to compare your output:
You may look at open-source reference implementations, but please do not copy code from open-source projects.
The ID3 algorithm is similar to what we discussed in class: Start with an empty tree and build it recursively. Use information gain to select the attribute to split on. (Do not divide by split information.)
The full algorithm is described in this classic paper (with over 25,000 citations):