Machine Learning (ML) Decision Trees Exercises

1/25

Correct

In a Decision Tree, what is the specific term for a node that does not split further and represents the final predicted class or value?

Root Node

Internal Node

Leaf Node

Parent Node

The Leaf Node (or Terminal Node) is the end point of a decision path.

It represents the final outcome or class label in a classification tree.
Unlike root or internal nodes, leaf nodes do not have any branches leading away from them.

Once a data point reaches a leaf, the prediction process for that point is complete.

Target Class	Sample Count
Class A	5
Class B	5

Node	Class: Apple	Class: Orange
Node 1	4	0
Node 2	2	2

Target Class (Mammal?)	Has Fur
Yes	Yes
Yes	Yes
No	No
No	Yes

Class	Probability (P_i)	P_i²
A	0.5	0.25
B	0.3	0.09
C	0.2	0.04

Class	Count
Class 1	8
Class 2	2

Quick Recap of Machine Learning (ML) Decision Trees Concepts

If you are not clear on the concepts of Decision Trees, you can quickly review them here before practicing the exercises. This recap highlights the essential points and logic to help you solve problems confidently.

What Is a Decision Tree in Machine Learning

A Decision Tree is a supervised machine learning model that makes predictions by repeatedly splitting data into smaller and more homogeneous groups. It works like a flowchart of questions, where each question narrows down the possible answers until a final decision is reached.

The idea behind a decision tree is very intuitive. It mimics the way humans make decisions: by asking a sequence of questions and choosing paths based on the answers.

Each question tests a feature (for example, “Is age greater than 30?”)
Each branch represents an outcome of that test
The final node gives a prediction

Decision trees can be used for both:

Classification — predicting a category (e.g., spam vs not spam)
Regression — predicting a numeric value (e.g., house price)

What makes decision trees especially powerful is their interpretability. Unlike many black-box models, you can visually trace how a prediction was made by following the path of splits from the root to a leaf.

Because of this transparency, decision trees are widely used in business, medicine, and finance where understanding the reasoning behind a prediction is just as important as the prediction itself.

How a Decision Tree Makes Predictions

A decision tree makes predictions by guiding an input through a series of feature-based tests. Each test moves the data point closer to a final decision by narrowing down the possible outcomes.

A decision tree is made up of three main components:

Root node – the first split that looks at the most important feature
Internal nodes – decision points based on feature conditions
Leaf nodes – final output (class label or numerical value)

When a new data point is passed into the tree, it starts at the root and follows a path based on its feature values.

For example, imagine a tree that predicts whether someone will buy a product:

Is age > 30?
If yes → check income
If no → check browsing history
Continue until a leaf node is reached

Each path from the root to a leaf forms a decision rule. These rules can be written in simple if–else form, making the model easy to interpret and explain.

In classification trees, the leaf node outputs the most common class among the training samples that reached that node. In regression trees, the leaf outputs the average target value of those samples.

Splitting Data in a Decision Tree

The core operation in a decision tree is the split. A split divides the data into smaller groups based on the value of a selected feature. The goal of every split is to make the resulting groups as pure as possible.

A split is defined by two things:

A feature (for example, age, income, or temperature)
A condition (for example, age ≤ 30 or income > 50,000)

All data points that satisfy the condition go to one branch, and the rest go to the other branch. This process is repeated recursively to form a tree structure.

For numerical features, splits usually compare values against a threshold. For categorical features, splits separate the data based on category membership.

Feature Type	Example Split
Numerical	Age ≤ 35 vs Age > 35
Categorical	Color = Red vs Color ≠ Red

The tree algorithm evaluates many possible splits and chooses the one that produces the best separation of the target variable. This decision is guided by measures of impurity, which quantify how mixed the classes are in a node.

Impurity and Information Gain

To decide where to split the data, a decision tree needs a way to measure how mixed the data is at a node. This is captured by a quantity called impurity.

A node is considered:

Pure if it contains data points from only one class
Impure if it contains a mix of different classes

Two commonly used impurity measures are Entropy and Gini Index.

Measure	Formula	Interpretation
Entropy	− Σ pᵢ log₂(pᵢ)	Higher value means more disorder
Gini Index	1 − Σ pᵢ²	Higher value means more impurity

Here, pᵢ represents the proportion of class i in the node.

When a split is applied, impurity usually decreases. The improvement caused by a split is called Information Gain.

Information Gain is defined as:

Information Gain = Impurity(before split) − Weighted Impurity(after split)

The tree chooses the split that produces the largest information gain, meaning it makes the child nodes more pure than the parent.

This mechanism allows the tree to automatically select the most informative features at each stage.

Tree Depth and Model Complexity

The depth of a decision tree refers to the number of levels from the root node down to the deepest leaf. Tree depth plays a major role in determining how complex and flexible the model is.

A shallow tree has only a few splits, which means it makes broad, simple decisions. A deep tree has many splits, allowing it to capture very detailed patterns in the data.

Tree Type	Characteristics	Effect
Shallow Tree	Few levels, simple structure	May underfit the data
Deep Tree	Many levels, complex structure	May overfit the data

As depth increases, the tree becomes better at fitting the training data. However, it also becomes more sensitive to noise, which can reduce performance on new data.

This reflects the classic bias–variance tradeoff: shallow trees have high bias but low variance, while deep trees have low bias but high variance.

Controlling tree depth is one of the most important ways to manage the balance between learning meaningful patterns and avoiding memorization.

Overfitting and Pruning

Decision trees are very powerful learners, but this power comes with a risk: overfitting. A tree can keep splitting until it perfectly memorizes the training data, including noise and random fluctuations.

An overfitted tree performs extremely well on training data but poorly on unseen data. This happens because the tree has learned very specific rules that do not generalize.

To control this, decision trees use a technique called pruning.

Pre-pruning stops the tree from growing too deep
Post-pruning removes branches after the tree is fully grown

Common pruning controls include:

Maximum tree depth
Minimum samples required to split a node
Minimum samples required in a leaf
Minimum impurity decrease

By pruning, we remove branches that do not significantly improve prediction accuracy, making the model simpler, more robust, and better at generalizing to new data.

Strengths and Weaknesses of Decision Trees

Decision trees are one of the most popular machine learning models because they are both powerful and easy to understand. However, like any model, they come with advantages and limitations.

Strengths	Weaknesses
Easy to interpret and visualize	Can easily overfit the data
Works with both numerical and categorical data	Unstable to small changes in data
No need for feature scaling	Greedy splitting may not find optimal tree
Handles nonlinear relationships	Lower accuracy than ensemble methods

One major benefit of decision trees is their transparency. Every prediction can be traced back through a series of logical rules, making the model highly explainable.

On the downside, a single decision tree is often not robust enough for complex datasets, which is why ensemble methods such as Random Forests and Gradient Boosting are widely used.

Where Decision Trees Are Used

Decision trees are widely used in many industries because they produce clear, rule-based decisions that humans can easily understand and trust.

They are especially valuable when both accuracy and explainability are required.

Domain	How Decision Trees Are Applied
Finance	Credit scoring, loan approval, fraud detection
Healthcare	Disease diagnosis, treatment recommendation
Marketing	Customer segmentation, churn prediction
Manufacturing	Quality control, fault detection
Human Resources	Employee performance and hiring decisions

Because decision trees can be translated into if–else rules, they are often integrated into decision-support systems where business users need to understand and justify model outputs.

They also serve as the building blocks for more advanced models like Random Forests and Gradient Boosted Trees.

Summary of Decision Trees

Decision trees are supervised learning models that make predictions by splitting data based on feature values. They can be used for both classification and regression tasks and are highly interpretable because each prediction follows a clear path of decisions from root to leaf.

Tree nodes represent tests on features, and leaves represent outcomes.
Splits are chosen to reduce impurity, measured by metrics like Gini Index or Entropy.
Tree depth controls model complexity and affects bias–variance tradeoff.
Pruning is used to prevent overfitting and improve generalization.
Decision trees are easy to interpret, handle both categorical and numerical data, and do not require feature scaling.
Limitations include sensitivity to noise, instability, and lower accuracy compared to ensembles.
Decision trees are applied in finance, healthcare, marketing, manufacturing, and HR.
They form the foundation for ensemble methods like Random Forests and Gradient Boosting.

Key Takeaways

Decision trees are intuitive and interpretable models.
Splitting criteria like Gini and Entropy help in choosing the best features for splits.
Control depth and pruning to balance bias and variance.
Single trees can overfit, but they are excellent for rule-based decision systems.
Understanding decision trees is essential before moving to ensemble methods.

About This Exercise: Decision Trees

Decision Trees are one of the most widely used machine learning algorithms for classification and regression. They work by splitting data into branches based on feature values, making decisions in a tree-like structure that is easy to understand and interpret.

In these Solviyo Decision Tree exercises and MCQs, you will practice how decision trees select features, create splits, and make predictions from structured data.

What You Will Learn

This topic focuses on the core ideas that make decision trees powerful and popular in data science and artificial intelligence.

How decision trees split data using features
The role of entropy, information gain, and Gini index
How tree depth and branches affect predictions
How overfitting occurs in decision trees

Core Decision Tree Concepts Covered

The MCQs in this section are designed around the most important decision tree learning concepts.

Feature selection and splitting criteria
Entropy, Gini impurity, and information gain
Tree depth, nodes, and leaf predictions
Handling categorical and numerical features

Why Decision Trees Matter in Machine Learning

Decision Trees are used in many real-world applications such as medical diagnosis, credit scoring, customer segmentation, and fraud detection. Their transparent structure makes them easy to explain and trust, which is important in business and AI systems.

By practicing these Decision Tree MCQs, learners develop a strong understanding of how machines make rule-based decisions from data.

Who Should Practice These Exercises

This topic is suitable for anyone learning or working with machine learning models.

Students studying data science and artificial intelligence
Beginners learning classification algorithms
Professionals preparing for ML interviews
Anyone working with predictive data models

How These MCQs Improve Your Skills

Solviyo’s Decision Tree exercises test both theoretical understanding and practical reasoning. You will learn how trees choose splits, reduce impurity, and make accurate predictions.

These exercises help you build a strong foundation for advanced models like Random Forests and Gradient Boosting while strengthening your overall machine learning knowledge.

Machine Learning (ML) Decision Trees Exercises

Machine Learning (ML) Decision Trees Practice Questions

In a Decision Tree, what is the specific term for a node that does not split further and represents the final predicted class or value?

When a Decision Tree algorithm evaluates a potential split, it calculates the Information Gain. Which of the following best describes how Information Gain is derived?

Decision Tree algorithms like ID3 use Information Gain, but they are often biased toward features with many distinct values (high cardinality). Which metric was introduced in the C4.5 algorithm to penalize such features?

You are considering a split for a dataset of 10 samples. Based on the distribution in the table below, what is the Entropy of this node?

You are evaluating two different potential leaf nodes (Node1 and Node2) for a classification tree. Based on the distributions below, which node has the lower Gini Impurity?

Consider the following dataset. If we split the data based on the feature "Has Fur", what is the Gini Impurity of the "Yes" child node?

Decision Trees handle numerical features differently than categorical ones. If you have a feature like "Temperature" with values [15, 20, 25, 30], how does the algorithm typically determine the threshold for a binary split?

A Decision Tree is allowed to grow to its maximum depth until every leaf is pure. What is the most likely consequence of this for the model's performance?

You are training a model and notice it has 100% accuracy on the training set but only 62% on the validation set. You decide to limit the tree's growth. Which hyperparameter would you decrease to most directly reduce the depth of the tree?

In Post-pruning (specifically Cost Complexity Pruning), a parameter called \( \alpha \) (Alpha) is used. If you set \( \alpha \) to a very large value, what will happen to the structure of your Decision Tree?

Decision Trees are often praised for requiring less data preprocessing than other algorithms. For which of the following steps is a Decision Tree naturally robust, meaning the step is often unnecessary?

When a Decision Tree splits a node into two child nodes, it aims to maximize the Information Gain. If a split results in one child node being 100% pure and the other child node still being mixed, what can we definitively say about the Information Gain?

In a Regression Tree, the goal is to predict a continuous value rather than a class. Instead of using Gini or Entropy, what metric does the algorithm typically minimize to determine the best split?

You have a parent node with a Gini Impurity of 0.5. You test a split that results in two child nodes with the following distributions:

Child A: 100% Class 1 (Pure)

Child B: 100% Class 2 (Pure)

What is the Gini Gain (Reduction in Impurity) for this split?

Consider a feature in your dataset called "Country" which contains 200 different unique values. If you use Information Gain (ID3) to select your root node, why might "Country" be chosen even if it isn't actually helpful for prediction?

A Decision Tree's decision boundaries are always axis-parallel (vertical or horizontal lines on a 2D scatter plot). What is the primary reason for this geometric behavior?

You are calculating the Gini Impurity for a node with three classes (A, B, C). The probability of each class is shown in the table below. What is the Gini Impurity?

One of the major disadvantages of a single Decision Tree is that it is unstable. What does "unstable" mean in the context of tree-based models?

In the CART (Classification and Regression Trees) algorithm, what is the default behavior when splitting a categorical feature with 4 levels: [Red, Blue, Green, Yellow]?

You are evaluating a split for a node containing 100 samples. If the split results in two child nodes where Child A has a Gini Impurity of 0.4 and Child B has a Gini Impurity of 0.4, what is the total Weighted Gini of the children?

Which of the following scenarios describes the "Greedy" nature of the Decision Tree induction algorithm (like CART or ID3)?

A node has the following distribution of samples. Calculate the Gini Impurity.

When using a Decision Tree for a Regression task, how is the final predicted value for a test data point determined once it reaches a leaf node?

In a binary classification problem, if a node has an Entropy of exactly 1.0, and we perform a split that results in two child nodes that are both perfectly pure, what is the Information Gain?

A data scientist is worried about the "Staircase Effect" in their model's decision boundary. This effect occurs because Decision Trees struggle to capture which type of relationship between features?