Decision Trees and Random Forests

Decision Trees and Random Forests - Higher Difficulty Problems

Q. In a Decision Tree, what does the Gini impurity measure?

A. The accuracy of the model.
B. The likelihood of misclassifying a randomly chosen element.
C. The depth of the tree.
D. The number of features used.

Solution

Gini impurity measures the likelihood of misclassifying a randomly chosen element from the dataset, helping to determine the best splits.

Correct Answer: B — The likelihood of misclassifying a randomly chosen element.

Learn More →

Q. In Random Forests, how are the individual trees trained?

A. On the entire dataset without any modifications.
B. Using a bootstrapped sample of the dataset.
C. On a subset of features only.
D. Using the same random seed for all trees.

Solution

Individual trees in Random Forests are trained using bootstrapped samples of the dataset, which helps to create diversity among the trees.

Correct Answer: B — Using a bootstrapped sample of the dataset.

Learn More →

Q. In Random Forests, what does 'bagging' refer to?

A. Using all available features for each tree.
B. Randomly selecting subsets of data to train each tree.
C. Combining predictions from multiple models.
D. Pruning trees to improve performance.

Solution

Bagging refers to randomly selecting subsets of data to train each tree, which helps to reduce variance and improve model robustness.

Correct Answer: B — Randomly selecting subsets of data to train each tree.

Learn More →

Q. In the context of Decision Trees, what does 'feature importance' refer to?

A. The number of times a feature is used in the tree.
B. The contribution of a feature to the model's predictions.
C. The correlation of a feature with the target variable.
D. The depth of a feature in the tree.

Solution

Feature importance refers to the contribution of a feature to the model's predictions, indicating how much it helps in making accurate decisions.

Correct Answer: B — The contribution of a feature to the model's predictions.

Learn More →

Q. What is a potential drawback of using a very deep Decision Tree?

A. It may not capture complex patterns.
B. It can lead to overfitting.
C. It requires more computational resources.
D. It is less interpretable.

Solution

A very deep Decision Tree can lead to overfitting, where the model learns noise in the training data rather than generalizable patterns.

Correct Answer: B — It can lead to overfitting.

Learn More →

Q. What is the effect of increasing the number of trees in a Random Forest?

A. It always increases the training time.
B. It can improve model accuracy but may lead to diminishing returns.
C. It decreases the model's interpretability.
D. It reduces the model's variance but increases bias.

Solution

Increasing the number of trees can improve model accuracy due to better averaging, but there are diminishing returns after a certain point.

Correct Answer: B — It can improve model accuracy but may lead to diminishing returns.

Learn More →

Q. What is the primary advantage of using Random Forests over a single Decision Tree?

A. Random Forests are easier to interpret.
B. Random Forests reduce overfitting by averaging multiple trees.
C. Random Forests require less computational power.
D. Random Forests can only handle categorical data.

Solution

Random Forests reduce overfitting by averaging the predictions of multiple decision trees, leading to better generalization on unseen data.

Correct Answer: B — Random Forests reduce overfitting by averaging multiple trees.

Learn More →

Q. What is the primary purpose of using ensemble methods like Random Forests?

A. To simplify the model.
B. To improve prediction accuracy by combining multiple models.
C. To reduce the training time.
D. To increase interpretability.

Solution

Ensemble methods like Random Forests improve prediction accuracy by combining the outputs of multiple models, thus leveraging their strengths.

Correct Answer: B — To improve prediction accuracy by combining multiple models.

Learn More →

Q. What is the purpose of the 'min_samples_split' parameter in a Decision Tree?

A. To control the minimum number of samples required to split an internal node.
B. To set the maximum depth of the tree.
C. To determine the minimum number of samples in a leaf node.
D. To specify the maximum number of features to consider.

Solution

'min_samples_split' controls the minimum number of samples required to split an internal node, helping to prevent overfitting.

Correct Answer: A — To control the minimum number of samples required to split an internal node.

Learn More →

Q. What is the role of the 'max_features' parameter in a Random Forest model?

A. It determines the maximum number of trees in the forest.
B. It specifies the maximum number of features to consider when looking for the best split.
C. It sets the maximum depth of each tree.
D. It controls the minimum number of samples required to split an internal node.

Solution

'max_features' specifies the maximum number of features to consider when looking for the best split, which helps to introduce randomness and reduce correlation among trees.

Correct Answer: B — It specifies the maximum number of features to consider when looking for the best split.

Learn More →

Q. Which evaluation metric is most appropriate for assessing the performance of a Decision Tree on an imbalanced dataset?

A. Accuracy
B. F1 Score
C. Mean Squared Error
D. R-squared

Solution

F1 Score is more appropriate for imbalanced datasets as it considers both precision and recall, providing a better measure of the model's performance.

Correct Answer: B — F1 Score

Learn More →

Q. Which of the following is a common method for preventing overfitting in Decision Trees?

A. Increasing the maximum depth of the tree.
B. Pruning the tree after it has been fully grown.
C. Using more features.
D. Decreasing the number of samples.

Solution

Pruning the tree after it has been fully grown helps to remove branches that have little importance, thus preventing overfitting.

Correct Answer: B — Pruning the tree after it has been fully grown.

Learn More →

Q. Which of the following statements about Decision Trees is true?

A. They can only be used for classification tasks.
B. They are sensitive to small changes in the data.
C. They require feature scaling.
D. They cannot handle missing values.

Solution

Decision Trees are sensitive to small changes in the data, which can lead to different splits and thus different models.

Correct Answer: B — They are sensitive to small changes in the data.

Learn More →

Showing 1 to 13 of 13 (1 Pages)

Decision Trees and Random Forests - Higher Difficulty Problems

On a scale of 0–10, how likely are you to recommend The Soulshift Academy?