Decision Trees and Random Forests - Advanced Concepts

Download Q&A
Q. How does Random Forest handle missing values in the dataset?
  • A. It ignores missing values completely
  • B. It uses mean imputation for missing values
  • C. It can use surrogate splits to handle missing values
  • D. It requires complete data without any missing values
Q. In a Decision Tree, what does the term 'Gini impurity' refer to?
  • A. A measure of the tree's depth
  • B. A metric for evaluating model performance
  • C. A criterion for splitting nodes
  • D. A method for pruning trees
Q. In Decision Trees, what does the Gini impurity measure?
  • A. The accuracy of the model
  • B. The purity of a node
  • C. The depth of the tree
  • D. The number of features used
Q. In Random Forests, what does the term 'out-of-bag error' refer to?
  • A. Error on the training set
  • B. Error on unseen data
  • C. Error calculated from the samples not used in training a tree
  • D. Error from the final ensemble model
Q. In the context of Decision Trees, what does 'pruning' refer to?
  • A. Adding more branches to the tree
  • B. Removing branches to reduce complexity
  • C. Increasing the depth of the tree
  • D. Changing the splitting criteria
Q. What does the term 'feature importance' refer to in the context of Random Forests?
  • A. The number of features used in the model
  • B. The contribution of each feature to the model's predictions
  • C. The correlation between features
  • D. The total number of trees in the forest
Q. What is a common method for feature importance evaluation in Random Forests?
  • A. Permutation importance
  • B. Gradient boosting
  • C. K-fold cross-validation
  • D. Principal component analysis
Q. What is a common use case for Random Forests in real-world applications?
  • A. Image recognition
  • B. Natural language processing
  • C. Credit scoring
  • D. Time series forecasting
Q. What is a primary advantage of using Random Forests over a single Decision Tree?
  • A. Lower computational cost
  • B. Higher accuracy due to ensemble learning
  • C. Easier to interpret
  • D. Requires less data
Q. What is the main disadvantage of using a Decision Tree?
  • A. High bias
  • B. High variance
  • C. Requires a lot of data
  • D. Difficult to interpret
Q. What is the main purpose of using cross-validation when training a Decision Tree?
  • A. To increase the size of the training set
  • B. To tune hyperparameters
  • C. To assess the model's generalization ability
  • D. To visualize the tree structure
Q. What is the purpose of the 'bootstrap' sampling method in Random Forests?
  • A. To create a balanced dataset
  • B. To ensure all features are used
  • C. To generate multiple subsets of the training data
  • D. To improve model interpretability
Q. What is the purpose of the 'n_estimators' parameter in a Random Forest model?
  • A. To define the maximum depth of each tree
  • B. To specify the number of trees in the forest
  • C. To set the minimum samples required to split a node
  • D. To determine the number of features to consider at each split
Q. What is the role of 'bootstrap sampling' in Random Forests?
  • A. To select features for each tree
  • B. To create multiple subsets of the training data
  • C. To evaluate model performance
  • D. To increase the depth of trees
Q. What is the role of 'max_features' in Random Forests?
  • A. To limit the number of trees in the forest
  • B. To control the maximum depth of each tree
  • C. To specify the maximum number of features to consider when looking for the best split
  • D. To determine the minimum number of samples required to split an internal node
Q. What is the role of the 'max_depth' parameter in a Decision Tree?
  • A. It determines the maximum number of features to consider
  • B. It limits the number of samples at each leaf
  • C. It restricts the maximum depth of the tree
  • D. It controls the minimum number of samples required to split an internal node
Q. Which algorithm is typically faster to train on large datasets?
  • A. Decision Trees
  • B. Random Forests
  • C. Both are equally fast
  • D. Neither, both are slow
Q. Which evaluation metric is most appropriate for assessing the performance of a Decision Tree on a binary classification problem?
  • A. Mean Squared Error
  • B. Accuracy
  • C. Silhouette Score
  • D. R-squared
Q. Which of the following metrics is commonly used to evaluate the performance of a Decision Tree?
  • A. Mean Squared Error
  • B. Accuracy
  • C. Silhouette Score
  • D. F1 Score
Q. Which of the following techniques can be used to handle missing values in Decision Trees?
  • A. Imputation
  • B. Ignoring missing values
  • C. Using a separate category for missing values
  • D. All of the above
Showing 1 to 20 of 20 (1 Pages)
Soulshift Feedback ×

On a scale of 0–10, how likely are you to recommend The Soulshift Academy?

Not likely Very likely