Decision Trees and Random Forests

Decision Trees and Random Forests - Advanced Concepts

Q. How does Random Forest handle missing values in the dataset?

A. It ignores missing values completely
B. It uses mean imputation for missing values
C. It can use surrogate splits to handle missing values
D. It requires complete data without any missing values

Solution

Random Forest can use surrogate splits to handle missing values, allowing it to make predictions even with incomplete data.

Correct Answer: C — It can use surrogate splits to handle missing values

Learn More →

Q. In a Decision Tree, what does the term 'Gini impurity' refer to?

A. A measure of the tree's depth
B. A metric for evaluating model performance
C. A criterion for splitting nodes
D. A method for pruning trees

Solution

Gini impurity is a criterion used to measure the impurity of a node, helping to determine the best feature to split on.

Correct Answer: C — A criterion for splitting nodes

Learn More →

Q. In Decision Trees, what does the Gini impurity measure?

A. The accuracy of the model
B. The purity of a node
C. The depth of the tree
D. The number of features used

Solution

Gini impurity measures the impurity or disorder of a node, helping to determine the best split at each node.

Correct Answer: B — The purity of a node

Learn More →

Q. In Random Forests, what does the term 'out-of-bag error' refer to?

A. Error on the training set
B. Error on unseen data
C. Error calculated from the samples not used in training a tree
D. Error from the final ensemble model

Solution

Out-of-bag error is an estimate of the model's performance calculated using the data points that were not included in the bootstrap sample for each tree.

Correct Answer: C — Error calculated from the samples not used in training a tree

Learn More →

Q. In the context of Decision Trees, what does 'pruning' refer to?

A. Adding more branches to the tree
B. Removing branches to reduce complexity
C. Increasing the depth of the tree
D. Changing the splitting criteria

Solution

Pruning is the process of removing branches from a Decision Tree to prevent overfitting and improve generalization.

Correct Answer: B — Removing branches to reduce complexity

Learn More →

Q. What does the term 'feature importance' refer to in the context of Random Forests?

A. The number of features used in the model
B. The contribution of each feature to the model's predictions
C. The correlation between features
D. The total number of trees in the forest

Solution

Feature importance indicates how much each feature contributes to the model's predictions, helping to identify the most influential variables.

Correct Answer: B — The contribution of each feature to the model's predictions

Learn More →

Q. What is a common method for feature importance evaluation in Random Forests?

A. Permutation importance
B. Gradient boosting
C. K-fold cross-validation
D. Principal component analysis

Solution

Permutation importance is a common method used to evaluate feature importance in Random Forests by measuring the increase in prediction error when the feature's values are permuted.

Correct Answer: A — Permutation importance

Learn More →

Q. What is a common use case for Random Forests in real-world applications?

A. Image recognition
B. Natural language processing
C. Credit scoring
D. Time series forecasting

Solution

Random Forests are widely used in credit scoring due to their ability to handle large datasets and provide robust predictions.

Correct Answer: C — Credit scoring

Learn More →

Q. What is a primary advantage of using Random Forests over a single Decision Tree?

A. Lower computational cost
B. Higher accuracy due to ensemble learning
C. Easier to interpret
D. Requires less data

Solution

Random Forests combine multiple Decision Trees to improve accuracy and reduce overfitting, leveraging ensemble learning.

Correct Answer: B — Higher accuracy due to ensemble learning

Learn More →

Q. What is the main disadvantage of using a Decision Tree?

A. High bias
B. High variance
C. Requires a lot of data
D. Difficult to interpret

Solution

Decision Trees are prone to high variance, meaning they can overfit the training data and perform poorly on unseen data.

Correct Answer: B — High variance

Learn More →

Q. What is the main purpose of using cross-validation when training a Decision Tree?

A. To increase the size of the training set
B. To tune hyperparameters
C. To assess the model's generalization ability
D. To visualize the tree structure

Solution

Cross-validation helps in assessing how the model will generalize to an independent dataset, thus providing a better estimate of its performance.

Correct Answer: C — To assess the model's generalization ability

Learn More →

Q. What is the purpose of the 'bootstrap' sampling method in Random Forests?

A. To create a balanced dataset
B. To ensure all features are used
C. To generate multiple subsets of the training data
D. To improve model interpretability

Solution

Bootstrap sampling allows Random Forests to create multiple subsets of the training data, which helps in building diverse trees.

Correct Answer: C — To generate multiple subsets of the training data

Learn More →

Q. What is the purpose of the 'n_estimators' parameter in a Random Forest model?

A. To define the maximum depth of each tree
B. To specify the number of trees in the forest
C. To set the minimum samples required to split a node
D. To determine the number of features to consider at each split

Solution

'n_estimators' specifies the number of trees in the Random Forest, which affects the model's performance and stability.

Correct Answer: B — To specify the number of trees in the forest

Learn More →

Q. What is the role of 'bootstrap sampling' in Random Forests?

A. To select features for each tree
B. To create multiple subsets of the training data
C. To evaluate model performance
D. To increase the depth of trees

Solution

Bootstrap sampling involves creating multiple subsets of the training data by sampling with replacement, which helps in building diverse trees.

Correct Answer: B — To create multiple subsets of the training data

Learn More →

Q. What is the role of 'max_features' in Random Forests?

A. To limit the number of trees in the forest
B. To control the maximum depth of each tree
C. To specify the maximum number of features to consider when looking for the best split
D. To determine the minimum number of samples required to split an internal node

Solution

'max_features' controls how many features are considered for splitting at each node, which helps in reducing correlation among trees.

Correct Answer: C — To specify the maximum number of features to consider when looking for the best split

Learn More →

Q. What is the role of the 'max_depth' parameter in a Decision Tree?

A. It determines the maximum number of features to consider
B. It limits the number of samples at each leaf
C. It restricts the maximum depth of the tree
D. It controls the minimum number of samples required to split an internal node

Solution

The 'max_depth' parameter limits how deep the Decision Tree can grow, helping to prevent overfitting.

Correct Answer: C — It restricts the maximum depth of the tree

Learn More →

Q. Which algorithm is typically faster to train on large datasets?

A. Decision Trees
B. Random Forests
C. Both are equally fast
D. Neither, both are slow

Solution

Decision Trees are generally faster to train than Random Forests, as Random Forests require training multiple trees.

Correct Answer: A — Decision Trees

Learn More →

Q. Which evaluation metric is most appropriate for assessing the performance of a Decision Tree on a binary classification problem?

A. Mean Squared Error
B. Accuracy
C. Silhouette Score
D. R-squared

Solution

Accuracy is a common metric for evaluating the performance of classification models, including Decision Trees.

Correct Answer: B — Accuracy

Learn More →

Q. Which of the following metrics is commonly used to evaluate the performance of a Decision Tree?

A. Mean Squared Error
B. Accuracy
C. Silhouette Score
D. F1 Score

Solution

Accuracy is a common metric for evaluating the performance of classification Decision Trees.

Correct Answer: B — Accuracy

Learn More →

Q. Which of the following techniques can be used to handle missing values in Decision Trees?

A. Imputation
B. Ignoring missing values
C. Using a separate category for missing values
D. All of the above

Solution

All of the mentioned techniques can be used to handle missing values in Decision Trees, depending on the context.

Correct Answer: D — All of the above

Learn More →

Showing 1 to 20 of 20 (1 Pages)

Decision Trees and Random Forests - Advanced Concepts

On a scale of 0–10, how likely are you to recommend The Soulshift Academy?