</> Code Editor { } Code Formatter

Machine Learning (ML) Decision Trees Exercises


Machine Learning (ML) Decision Trees Practice Questions

1/25
Correct
0%

In a Decision Tree, what is the specific term for a node that does not split further and represents the final predicted class or value?


The Leaf Node (or Terminal Node) is the end point of a decision path.

  • It represents the final outcome or class label in a classification tree.
  • Unlike root or internal nodes, leaf nodes do not have any branches leading away from them.

Once a data point reaches a leaf, the prediction process for that point is complete.

Quick Recap of Machine Learning (ML) Decision Trees Concepts

If you are not clear on the concepts of Decision Trees, you can quickly review them here before practicing the exercises. This recap highlights the essential points and logic to help you solve problems confidently.

What Is a Decision Tree in Machine Learning

A Decision Tree is a supervised machine learning model that makes predictions by repeatedly splitting data into smaller and more homogeneous groups. It works like a flowchart of questions, where each question narrows down the possible answers until a final decision is reached.

The idea behind a decision tree is very intuitive. It mimics the way humans make decisions: by asking a sequence of questions and choosing paths based on the answers.

  • Each question tests a feature (for example, “Is age greater than 30?”)
  • Each branch represents an outcome of that test
  • The final node gives a prediction

Decision trees can be used for both:

  • Classification — predicting a category (e.g., spam vs not spam)
  • Regression — predicting a numeric value (e.g., house price)

What makes decision trees especially powerful is their interpretability. Unlike many black-box models, you can visually trace how a prediction was made by following the path of splits from the root to a leaf.

Because of this transparency, decision trees are widely used in business, medicine, and finance where understanding the reasoning behind a prediction is just as important as the prediction itself.

How a Decision Tree Makes Predictions

A decision tree makes predictions by guiding an input through a series of feature-based tests. Each test moves the data point closer to a final decision by narrowing down the possible outcomes.

A decision tree is made up of three main components:

  • Root node – the first split that looks at the most important feature
  • Internal nodes – decision points based on feature conditions
  • Leaf nodes – final output (class label or numerical value)

When a new data point is passed into the tree, it starts at the root and follows a path based on its feature values.

For example, imagine a tree that predicts whether someone will buy a product:

  • Is age > 30?
  • If yes → check income
  • If no → check browsing history
  • Continue until a leaf node is reached

Each path from the root to a leaf forms a decision rule. These rules can be written in simple if–else form, making the model easy to interpret and explain.

In classification trees, the leaf node outputs the most common class among the training samples that reached that node. In regression trees, the leaf outputs the average target value of those samples.

Splitting Data in a Decision Tree

The core operation in a decision tree is the split. A split divides the data into smaller groups based on the value of a selected feature. The goal of every split is to make the resulting groups as pure as possible.

A split is defined by two things:

  • A feature (for example, age, income, or temperature)
  • A condition (for example, age ≤ 30 or income > 50,000)

All data points that satisfy the condition go to one branch, and the rest go to the other branch. This process is repeated recursively to form a tree structure.

For numerical features, splits usually compare values against a threshold. For categorical features, splits separate the data based on category membership.

Feature TypeExample Split
NumericalAge ≤ 35 vs Age > 35
CategoricalColor = Red vs Color ≠ Red

The tree algorithm evaluates many possible splits and chooses the one that produces the best separation of the target variable. This decision is guided by measures of impurity, which quantify how mixed the classes are in a node.

Impurity and Information Gain

To decide where to split the data, a decision tree needs a way to measure how mixed the data is at a node. This is captured by a quantity called impurity.

A node is considered:

  • Pure if it contains data points from only one class
  • Impure if it contains a mix of different classes

Two commonly used impurity measures are Entropy and Gini Index.

MeasureFormulaInterpretation
Entropy− Σ pᵢ log₂(pᵢ)Higher value means more disorder
Gini Index1 − Σ pᵢ²Higher value means more impurity

Here, pᵢ represents the proportion of class i in the node.

When a split is applied, impurity usually decreases. The improvement caused by a split is called Information Gain.

Information Gain is defined as:

Information Gain = Impurity(before split) − Weighted Impurity(after split)

The tree chooses the split that produces the largest information gain, meaning it makes the child nodes more pure than the parent.

This mechanism allows the tree to automatically select the most informative features at each stage.

Tree Depth and Model Complexity

The depth of a decision tree refers to the number of levels from the root node down to the deepest leaf. Tree depth plays a major role in determining how complex and flexible the model is.

A shallow tree has only a few splits, which means it makes broad, simple decisions. A deep tree has many splits, allowing it to capture very detailed patterns in the data.

Tree TypeCharacteristicsEffect
Shallow TreeFew levels, simple structureMay underfit the data
Deep TreeMany levels, complex structureMay overfit the data

As depth increases, the tree becomes better at fitting the training data. However, it also becomes more sensitive to noise, which can reduce performance on new data.

This reflects the classic bias–variance tradeoff: shallow trees have high bias but low variance, while deep trees have low bias but high variance.

Controlling tree depth is one of the most important ways to manage the balance between learning meaningful patterns and avoiding memorization.

Overfitting and Pruning

Decision trees are very powerful learners, but this power comes with a risk: overfitting. A tree can keep splitting until it perfectly memorizes the training data, including noise and random fluctuations.

An overfitted tree performs extremely well on training data but poorly on unseen data. This happens because the tree has learned very specific rules that do not generalize.

To control this, decision trees use a technique called pruning.

  • Pre-pruning stops the tree from growing too deep
  • Post-pruning removes branches after the tree is fully grown

Common pruning controls include:

  • Maximum tree depth
  • Minimum samples required to split a node
  • Minimum samples required in a leaf
  • Minimum impurity decrease

By pruning, we remove branches that do not significantly improve prediction accuracy, making the model simpler, more robust, and better at generalizing to new data.

Strengths and Weaknesses of Decision Trees

Decision trees are one of the most popular machine learning models because they are both powerful and easy to understand. However, like any model, they come with advantages and limitations.

StrengthsWeaknesses
Easy to interpret and visualizeCan easily overfit the data
Works with both numerical and categorical dataUnstable to small changes in data
No need for feature scalingGreedy splitting may not find optimal tree
Handles nonlinear relationshipsLower accuracy than ensemble methods

One major benefit of decision trees is their transparency. Every prediction can be traced back through a series of logical rules, making the model highly explainable.

On the downside, a single decision tree is often not robust enough for complex datasets, which is why ensemble methods such as Random Forests and Gradient Boosting are widely used.

Where Decision Trees Are Used

Decision trees are widely used in many industries because they produce clear, rule-based decisions that humans can easily understand and trust.

They are especially valuable when both accuracy and explainability are required.

DomainHow Decision Trees Are Applied
FinanceCredit scoring, loan approval, fraud detection
HealthcareDisease diagnosis, treatment recommendation
MarketingCustomer segmentation, churn prediction
ManufacturingQuality control, fault detection
Human ResourcesEmployee performance and hiring decisions

Because decision trees can be translated into if–else rules, they are often integrated into decision-support systems where business users need to understand and justify model outputs.

They also serve as the building blocks for more advanced models like Random Forests and Gradient Boosted Trees.

Summary of Decision Trees

Decision trees are supervised learning models that make predictions by splitting data based on feature values. They can be used for both classification and regression tasks and are highly interpretable because each prediction follows a clear path of decisions from root to leaf.

  • Tree nodes represent tests on features, and leaves represent outcomes.
  • Splits are chosen to reduce impurity, measured by metrics like Gini Index or Entropy.
  • Tree depth controls model complexity and affects bias–variance tradeoff.
  • Pruning is used to prevent overfitting and improve generalization.
  • Decision trees are easy to interpret, handle both categorical and numerical data, and do not require feature scaling.
  • Limitations include sensitivity to noise, instability, and lower accuracy compared to ensembles.
  • Decision trees are applied in finance, healthcare, marketing, manufacturing, and HR.
  • They form the foundation for ensemble methods like Random Forests and Gradient Boosting.

Key Takeaways

  • Decision trees are intuitive and interpretable models.
  • Splitting criteria like Gini and Entropy help in choosing the best features for splits.
  • Control depth and pruning to balance bias and variance.
  • Single trees can overfit, but they are excellent for rule-based decision systems.
  • Understanding decision trees is essential before moving to ensemble methods.


About This Exercise: Decision Trees

Decision Trees are one of the most widely used machine learning algorithms for classification and regression. They work by splitting data into branches based on feature values, making decisions in a tree-like structure that is easy to understand and interpret.

In these Solviyo Decision Tree exercises and MCQs, you will practice how decision trees select features, create splits, and make predictions from structured data.

What You Will Learn

This topic focuses on the core ideas that make decision trees powerful and popular in data science and artificial intelligence.

  • How decision trees split data using features
  • The role of entropy, information gain, and Gini index
  • How tree depth and branches affect predictions
  • How overfitting occurs in decision trees

Core Decision Tree Concepts Covered

The MCQs in this section are designed around the most important decision tree learning concepts.

  • Feature selection and splitting criteria
  • Entropy, Gini impurity, and information gain
  • Tree depth, nodes, and leaf predictions
  • Handling categorical and numerical features

Why Decision Trees Matter in Machine Learning

Decision Trees are used in many real-world applications such as medical diagnosis, credit scoring, customer segmentation, and fraud detection. Their transparent structure makes them easy to explain and trust, which is important in business and AI systems.

By practicing these Decision Tree MCQs, learners develop a strong understanding of how machines make rule-based decisions from data.

Who Should Practice These Exercises

This topic is suitable for anyone learning or working with machine learning models.

  • Students studying data science and artificial intelligence
  • Beginners learning classification algorithms
  • Professionals preparing for ML interviews
  • Anyone working with predictive data models

How These MCQs Improve Your Skills

Solviyo’s Decision Tree exercises test both theoretical understanding and practical reasoning. You will learn how trees choose splits, reduce impurity, and make accurate predictions.

These exercises help you build a strong foundation for advanced models like Random Forests and Gradient Boosting while strengthening your overall machine learning knowledge.