Decision Tree is one of the widely used algorithms in Machine Learning and Deep Learning, providing a solid baseline for subsequent approaches.
It is the easiest and popular classification algorithms to understand and interpret. It belongs to the family of supervised learning algorithms. It is very efficient for processing a large amount of data in data mining applications that require classifying categorical data based on their attributes.
The primary purpose of using a Decision Tree is to create a training model that can predict the target variable class or value by learning simple rules of decision inferred from prior data (training data). It uses a tree-like graph to show predictions arising from a series of splits based on features.
One way to think of a decision tree is through a series of nodes or a directional graph that starts with a single node at the base and extends to many leaf nodes representing the categories that the tree can classify. Every node in the tree specifies a test of some instance attribute. Each branch that comes down from a node corresponds to one of the attribute’s possible values. Each node at the leaf assigns a classification.
Another way of representing a decision tree is a flow chart, where the flow starts at the root node and ends with a decision made at the leaves. A decision tree can also be represented as a set of if-then rules. Decision tree algorithms like ID3, C4.5 are prevalent inductive inference algorithms, and they are applied successfully to many learning tasks.
Standard terms in Decision Tree
- Root Node: Root node is at the beginning of a tree, representing the entire population to be analyzed. From the root node, the population is divided into subgroups based on various features.
- Splitting: It is a process whereby a node is divided into two or more subnodes.
- Decision Node: When a sub-node splits into additional sub-nodes, it is called a decision node.
- Leaf Node or Terminal Node: It is a node that does not split.
- Pruning: Pruning is to remove the sub-nodes of a parent node. A tree grows through splitting and shrunk through pruning.
- Branch or Sub-Tree: A sub-section of a decision tree is called a branch or a sub-tree, while a portion of a graph is called a sub-graph.
- Parent Node and Child Node: Any node falling under a different node is a child node or sub-node, and any node preceding those child nodes is called a parent node.
Advantages of Decision Tree
Decision trees are popular for several reasons. First of all, they are simple to understand, interpret, and visualize and effectively handle numerical and categorical data. They can determine the worst, best, and expected values for several scenarios.
Decision trees require little data preparation and data normalization, and they perform well, even if the actual model violates the assumptions. The decision tree does not require any domain knowledge or parameter setting, and their representation of acquired knowledge in tree form is intuitive and easy to assimilate by humans.
Other advantages are as follows:
- Explanatory Power: Easy to explain and interpret: The output of decision trees is easy to interpret. It can be understood by anyone without analytical, mathematical, or statistical knowledge.
- Exploratory data analysis: Decision trees allow analysts to quickly identify significant variables and essential relationships between two or more variables, thus helping to surface the signal that many input variables contain.
- Minimum cleaning of data: As decision trees are resilient to outliers and missing values, they require less cleaning of data than other algorithms.
- All data types: Decision trees can make classifications based on both numerical as well as categorical variables.
- Non-parametric: Decision tree is a non-parametric algorithm, as opposed to neural networks that process input data transformed into a tensor, using a large number of coefficients, known as parameters, through tensor multiplication.
Disadvantages of Decision Tree
- Overfitting: A common flaw in decision trees is overfitting. Two ways to regulate a decision tree are to set constraints on model parameters and make the model simpler through pruning.
- Predicting continuous variables: Since decision trees may ingest constant numerical input, they may not be a practical way to predict such values. Therefore, decision-tree predictions need to be divided into discrete categories, leading to a loss of information when applying the model to continuous values.
- Heavy feature engineering: The flip side of the explanatory power of a decision tree is that it calls for heavy feature engineering. This makes decision trees sub-optimal when dealing with unstructured data or data with latent factors. In this respect, neural networks are superior.
When to consider Decision Tree
- Attribute-value pairs represent instances. A fixed set of attributes and the attributes take a small number of disjoint possible values.
- The target function has discrete output values. A decision tree is suitable for a Boolean classification, but it easily extends to learning functions with more than two possible output values.
- Disjunctive descriptions may be required. Decision trees naturally represent disjunctive expressions.
- The training data may contain errors. Decision tree learning methods are robust to errors in classifications of the training examples and in the attribute values that describe these examples.
- The training data may include missing values for the attributes. Methods for the decision tree can be used even when specific training examples have unknown values.
- A decision tree is best suited to issues such as classifying medical patients by their illness, equipment malfunctions by their cause, and loan applicants by their likelihood of defaulting on payments.
Let’s sum up. Decision trees offer a comprehensive way to calculate predictors and decision rules in a variety of commonly encountered data settings. However, the performance of decision trees on external datasets can sometimes be inadequate. Aggregating decision trees is a simple way to improve performance — and in some instances, aggregated tree predictors can exhibit state-of-the-art performance.