Feature Engineering and Model Selection for Competitive Exams

Feature Engineering and Model Selection - Higher Difficulty Problems

Q. In the context of model selection, what does cross-validation help to prevent?

A. Overfitting
B. Underfitting
C. Data leakage
D. Bias

Solution

Cross-validation helps to prevent overfitting by ensuring that the model performs well on unseen data.

Correct Answer: A — Overfitting

Learn More →

Q. What is the effect of using polynomial features in a linear regression model?

A. It reduces the model complexity
B. It can capture non-linear relationships
C. It increases the risk of underfitting
D. It eliminates multicollinearity

Solution

Polynomial features allow the model to capture non-linear relationships between the features and the target variable.

Correct Answer: B — It can capture non-linear relationships

Learn More →

Q. What is the main advantage of using ensemble methods like Random Forest over a single decision tree?

A. They are faster to train
B. They reduce variance and improve prediction accuracy
C. They are easier to interpret
D. They require less data

Solution

Ensemble methods like Random Forest reduce variance by averaging multiple decision trees, leading to improved prediction accuracy.

Correct Answer: B — They reduce variance and improve prediction accuracy

Learn More →

Q. What is the purpose of using regularization techniques in model selection?

A. To increase the model's complexity
B. To reduce the training time
C. To prevent overfitting by penalizing large coefficients
D. To improve the interpretability of the model

Solution

Regularization techniques prevent overfitting by adding a penalty for large coefficients in the model.

Correct Answer: C — To prevent overfitting by penalizing large coefficients

Learn More →

Q. Which feature transformation technique is used to normalize the range of features?

A. One-Hot Encoding
B. Min-Max Scaling
C. Label Encoding
D. Feature Extraction

Solution

Min-Max Scaling normalizes the range of features to a specified range, typically [0, 1].

Correct Answer: B — Min-Max Scaling

Learn More →

Q. Which of the following is a common method for handling missing data in a dataset?

A. Removing all rows with missing values
B. Replacing missing values with the mean or median
C. Ignoring the missing values during training
D. All of the above

Solution

Replacing missing values with the mean or median is a common method, though other methods can also be used depending on the context.

Correct Answer: B — Replacing missing values with the mean or median

Learn More →

Q. Which of the following is a common method for handling missing data?

A. Removing all rows with missing values
B. Imputing missing values with the mean or median
C. Ignoring missing values during training
D. Using a more complex model

Solution

Imputing missing values with the mean or median is a common method to handle missing data.

Correct Answer: B — Imputing missing values with the mean or median

Learn More →

Q. Which of the following is a disadvantage of using decision trees for model selection?

A. They are easy to interpret
B. They can easily overfit the training data
C. They handle both numerical and categorical data
D. They require less data preprocessing

Solution

Decision trees can easily overfit the training data, especially if they are not pruned or if the tree is too deep.

Correct Answer: B — They can easily overfit the training data

Learn More →

Q. Which of the following is a disadvantage of using too many features in a model?

A. Increased interpretability
B. Higher computational cost
C. Better model performance
D. Reduced risk of overfitting

Solution

Using too many features can lead to higher computational costs and may increase the risk of overfitting.

Correct Answer: B — Higher computational cost

Learn More →

Q. Which of the following techniques is NOT typically used in feature selection?

A. Recursive Feature Elimination
B. Principal Component Analysis
C. Random Forest Importance
D. K-Means Clustering

Solution

K-Means Clustering is an unsupervised learning algorithm used for clustering, not for feature selection.

Correct Answer: D — K-Means Clustering

Learn More →

Showing 1 to 10 of 10 (1 Pages)

Feature Engineering and Model Selection - Higher Difficulty Problems

On a scale of 0–10, how likely are you to recommend The Soulshift Academy?