SciKit Learn Decision Tree

Decision Trees (DT) from SciKit Learn are a supervised learning procedure that is used non-parametrically for the regression and classification. The goal of the Decision Trees is to generate a model that can predict the target value variable by learning uncomplicated decision rules that are inferred from the features of the data. The deeper the tree is, the decision tree becomes more complex and the model becomes more fit.

SciKit Learn Decision Tree

Building the Decision Tree Classifier in SciKit Learn

Let us see in detail how the classifier of the decision tree is built. The process is as follows.

Loading the data: At first, the load of the required data set needed to be loaded using the CSV function.

Selection of the Features: In this step, the columns are divided into type different types of dependent and independent variables (target or feature variables).

Splitting the data: To completely understand the performance of the model, the dataset is divided into test and training sets. The dataset is split by using the function train_test_split(). Apart from these parameters, three other parameters need to be passed which are the target, features, and test_set size.

Evaluating the model: Next, the different types of models or classifiers are predicted based on the type of cultivars. Accuracy of the model is computed and tested by comparing the actual and predicted test set values. The accuracy can also be improved by tuning all the parameters of the Decision Tree algorithm.

Visualization of the Decision Trees: Here the export_graphviz function of the SciKit Learn is used to display the tree within the notebook of Jupyter. For plotting the tree, pydotplus and Graphviz need to be installed like below.

Pip install pydotplus

Pip install Graphviz

Export_graphviz function is then converted from decision tree classifier into dot file and pydotplus is translated into dot or png file or other displayable forms on the Jupyter. In the chart of a decision tree, every internal node has a particular decision rule that can divide the data. The Gini ratio of the decision tree calculates the node impurity. A node is considered pure when all the records fit into the same class and are called a leaf node.

The ultimate resulting tree is unpruned. This unpruned tree is unexplainable and is not easy to recognize.

How does the algorithm for Decision Tree Perform?

The fundamental idea behind the decision tree algorithm is described below.

  1. The best attribute has to be selected by using the ASM (Attribute Selection Method) to divide the records.
  2. This attribute needs to be marked by the decision node and splits the dataset into tiny subsets.
  3. The building of the tree is started by repeating the above process recursively for every child until one of the below circumstances will match.
  • There are no attributes that are left
  • All the tuples belong to the similar attribute value
  • No more instances are left

Advantages of the SciKit Learn Decision Tree

SciKit learn Decision Tree comes with several pros as below. Few of them are as follows:

  • Since trees can be visualized, they are easy to interpret and understand
  • The cost of predicting the data and implementing the tree is logarithmic in the total amount of data points that are required to train the tree
  • The tree can handle problems of multiple outputs
  • By implementing trees it is possible to validate a model through the statistical tests. This ensures the trustworthy feature of the model
  • Decision trees need very little preparation of the data. Data normalization, creation of data variables, and removal of blank values are required in other methods. But it should be noted that the trees do not carry the missing values.
  • The decision trees can handle both categorical and numerical data. Other techniques mainly focus on the analysis of the datasets that include only one variable.
  • Decision trees implement a model of a white box. If a given example is observed in the model, then the required explanation for the condition is explained with the help of Boolean logic. On the other hand, in the model of a black box, the results are more complicated to interpret.
  • It can perform very well even if the basic assumptions are tampered by the model from which the data are created.

Disadvantages of the SciKit Learn Decision Tree

Let us check some of the disadvantages of the SciKit Learn Decision Tree here in the below section.

  • Decision trees can lead to the generation of an over complicated and complex tree that does not generalize the data very well. His method is called overfitting. Different mechanisms like setting the minimum amount of samples at the leaf node, pruning, and setting the highest tree depth are essential to evade this kind of problem.
  • Algorithms for a partial tree are dependent on the heuristic algorithm. An example of such is greedy algorithms where the optimal decisions that are taken locally are created at the junction of each node. This type of algorithm does not guarantee to return the optimal decision tree globally. This can be shortened by giving training to multiple trees in an ensemble manner. Here the samples and features are haphazardly sampled with a substitute.
  • The learners of decision trees generate a biased tree if the classes dominate each other. Thus the dataset should be balanced with the decision tree before all the fittings.
  • Decision trees can be unstable due to slight variations in the data result that can generate a completely different tree than the actual one. This trouble can be mitigated by employing decision trees within an ensemble.
  • The hard to learn concepts like parity, XOR or multiplexer are not easily expressed by the decision trees.

Thus decision trees are easily captured by non-linear patterns and are simple to imagine and deduce. The decision tree needs no assumptions about the distribution due to the non-parametric nature. Several different parameters can regularize a tree. The real power of the decision tree unfolds when the user is cultivating more than one attribute.