Q. What is the primary goal of supervised learning?
A.To find hidden patterns in data
B.To predict outcomes based on labeled data
C.To cluster similar data points
D.To reduce dimensionality of data
Solution
Supervised learning aims to predict outcomes based on labeled data, where the model learns from input-output pairs.
Correct Answer: B — To predict outcomes based on labeled data
Q. What is the primary goal of the K-means clustering algorithm?
A.Minimize the distance between points in the same cluster
B.Maximize the distance between different clusters
C.Both A and B
D.None of the above
Solution
The primary goal of K-means clustering is to minimize the distance between points in the same cluster while maximizing the distance between different clusters.
Correct Answer: C — Both A and B
Q. What is the primary objective of the K-means clustering algorithm?
A.To minimize the distance between points in the same cluster
B.To maximize the distance between different clusters
C.To create a hierarchical structure of clusters
D.To classify data into predefined categories
Solution
K-means aims to minimize the distance between points within the same cluster by assigning points to the nearest centroid.
Correct Answer: A — To minimize the distance between points in the same cluster
Q. What is the purpose of a confusion matrix?
A.To visualize the performance of a regression model
B.To summarize the performance of a classification model
C.To optimize hyperparameters
D.To reduce overfitting
Solution
A confusion matrix summarizes the performance of a classification model by showing true vs. predicted classifications.
Correct Answer: B — To summarize the performance of a classification model
Q. What is the purpose of cross-validation in machine learning?
A.To increase the size of the training dataset
B.To assess how the results of a statistical analysis will generalize to an independent dataset
C.To reduce the complexity of the model
D.To improve the speed of training
Solution
Cross-validation is used to assess how well a model generalizes to an independent dataset by partitioning the data into training and validation sets.
Correct Answer: B — To assess how the results of a statistical analysis will generalize to an independent dataset
Q. What is the purpose of feature scaling in machine learning?
A.To increase the number of features
B.To improve the performance of the model
C.To reduce the size of the dataset
D.To convert categorical data to numerical
Solution
Feature scaling improves the performance of the model by ensuring that all features contribute equally to the distance calculations.
Correct Answer: B — To improve the performance of the model
Q. What is the purpose of the elbow method in K-means clustering?
A.To determine the optimal number of clusters
B.To visualize the clusters formed
C.To assess the performance of the algorithm
D.To preprocess the data before clustering
Solution
The elbow method is used to determine the optimal number of clusters by plotting the explained variance as a function of the number of clusters and identifying the 'elbow' point.
Correct Answer: A — To determine the optimal number of clusters
Q. What type of data is K-means clustering best suited for?
A.Categorical data
B.Numerical data
C.Text data
D.Time series data
Solution
K-means clustering is best suited for numerical data, as it relies on calculating distances between data points.
Correct Answer: B — Numerical data
Q. Which algorithm is commonly used for clustering?
A.Linear Regression
B.K-Means
C.Support Vector Machine
D.Decision Tree
Solution
K-Means is a popular algorithm used for clustering, grouping data points into clusters based on similarity.
Correct Answer: B — K-Means
Q. Which clustering method is more suitable for discovering nested clusters?
A.K-means clustering
B.Hierarchical clustering
C.DBSCAN
D.Gaussian Mixture Models
Solution
Hierarchical clustering is more suitable for discovering nested clusters, as it creates a tree structure that can reveal relationships at various levels of granularity.
Correct Answer: B — Hierarchical clustering
Q. Which clustering method is more suitable for discovering non-globular shapes in data?
A.K-means clustering
B.Hierarchical clustering
C.DBSCAN
D.Gaussian Mixture Models
Solution
DBSCAN is particularly effective for discovering clusters of varying shapes and sizes, making it suitable for non-globular data distributions.
Correct Answer: C — DBSCAN
Q. Which distance metric is commonly used in K-means clustering?
A.Manhattan distance
B.Cosine similarity
C.Euclidean distance
D.Hamming distance
Solution
K-means typically uses Euclidean distance to measure the distance between data points and centroids.
Correct Answer: C — Euclidean distance
Q. Which evaluation metric is commonly used to assess the quality of clustering results?
A.Accuracy
B.Silhouette score
C.F1 score
D.Mean squared error
Solution
The Silhouette score is a popular metric for evaluating clustering quality, measuring how similar an object is to its own cluster compared to other clusters.
Correct Answer: B — Silhouette score
Q. Which evaluation metric is commonly used to assess the quality of clustering?
A.Accuracy
B.Silhouette score
C.F1 score
D.Mean squared error
Solution
The Silhouette score is a popular metric for evaluating clustering quality, measuring how similar an object is to its own cluster compared to other clusters.
Correct Answer: B — Silhouette score
Q. Which neural network architecture is primarily used for image recognition tasks?
A.Recurrent Neural Network
B.Convolutional Neural Network
C.Feedforward Neural Network
D.Generative Adversarial Network
Solution
Convolutional Neural Networks (CNNs) are specifically designed for processing and recognizing images.
Correct Answer: B — Convolutional Neural Network
Q. Which of the following clustering methods is best suited for discovering non-linear relationships in data?
A.K-means
B.Hierarchical clustering
C.DBSCAN
D.Gaussian Mixture Models
Solution
DBSCAN is effective for discovering non-linear relationships and can identify clusters of varying shapes and sizes, unlike K-means.
Correct Answer: C — DBSCAN
Q. Which of the following clustering methods is sensitive to outliers?
A.K-means
B.Hierarchical clustering
C.DBSCAN
D.Gaussian Mixture Models
Solution
K-means is sensitive to outliers because they can significantly affect the position of the centroid, leading to poor clustering results.
Correct Answer: A — K-means
Q. Which of the following is a characteristic of K-means clustering?
A.It can produce overlapping clusters
B.It is deterministic and produces the same result every time
C.It can handle noise and outliers effectively
D.It partitions data into non-overlapping clusters
Solution
K-means clustering partitions data into non-overlapping clusters, assigning each data point to the nearest centroid.
Correct Answer: D — It partitions data into non-overlapping clusters
Q. Which of the following is a characteristic of neural networks?
A.They require structured data only
B.They can learn complex patterns through layers
C.They are only used for classification tasks
D.They do not require any training data
Solution
Neural networks can learn complex patterns through multiple layers of interconnected nodes, making them versatile.
Correct Answer: B — They can learn complex patterns through layers
Q. Which of the following is a common application of reinforcement learning?
A.Image recognition
B.Game playing
C.Data clustering
D.Text classification
Solution
Reinforcement learning is commonly applied in game playing, where an agent learns to make decisions through trial and error.
Correct Answer: B — Game playing
Q. Which of the following is a common evaluation metric for classification models?
A.Mean Squared Error
B.Accuracy
C.Silhouette Score
D.R-squared
Solution
Accuracy is a common evaluation metric for classification models, measuring the proportion of correct predictions.
Correct Answer: B — Accuracy
Q. Which of the following is a disadvantage of K-means clustering?
A.It is sensitive to outliers
B.It requires the number of clusters to be specified in advance
C.It can converge to local minima
D.All of the above
Solution
All of the listed options are disadvantages of K-means clustering, making it sensitive to outliers, requiring prior knowledge of the number of clusters, and potentially converging to local minima.
Correct Answer: D — All of the above
Q. Which of the following is a disadvantage of the K-means algorithm?
A.It can handle large datasets efficiently
B.It requires the number of clusters to be specified in advance
C.It is sensitive to outliers
D.It can be used for both supervised and unsupervised learning
Solution
A key disadvantage of K-means is that it requires the user to specify the number of clusters beforehand, which may not always be known.
Correct Answer: B — It requires the number of clusters to be specified in advance
Q. Which of the following is a limitation of the K-means algorithm?
A.It can handle non-spherical clusters
B.It requires the number of clusters to be specified in advance
C.It is computationally efficient for large datasets
D.It can be used for both supervised and unsupervised learning
Solution
A key limitation of K-means is that it requires the number of clusters to be specified beforehand, which can be challenging in practice.
Correct Answer: B — It requires the number of clusters to be specified in advance
Q. Which of the following is an example of a regression algorithm?
A.K-Means
B.Logistic Regression
C.Random Forest
D.Support Vector Classifier
Solution
Logistic Regression is a regression algorithm used for predicting binary outcomes, despite its name suggesting classification.
Correct Answer: B — Logistic Regression
Q. Which of the following is an example of unsupervised learning?
A.Image classification
B.Sentiment analysis
C.Market basket analysis
D.Spam detection
Solution
Market basket analysis is an example of unsupervised learning, where patterns are discovered without labeled outcomes.
Correct Answer: C — Market basket analysis
Q. Which of the following is NOT a common distance metric used in clustering?
A.Euclidean distance
B.Manhattan distance
C.Cosine similarity
D.Logistic distance
Solution
Logistic distance is not a standard distance metric used in clustering; common metrics include Euclidean, Manhattan, and Cosine similarity.
Correct Answer: D — Logistic distance
Q. Which of the following is NOT a method of linkage in hierarchical clustering?
A.Single linkage
B.Complete linkage
C.Average linkage
D.Random linkage
Solution
Random linkage is not a recognized method of linkage in hierarchical clustering; the common methods include single, complete, and average linkage.
Correct Answer: D — Random linkage
Q. Which of the following is NOT a step in the K-means clustering algorithm?
A.Assigning data points to the nearest centroid
B.Updating the centroid positions
C.Calculating the silhouette score
D.Choosing the initial centroids
Solution
Calculating the silhouette score is not a step in the K-means algorithm; it is an evaluation metric used after clustering.
Correct Answer: C — Calculating the silhouette score
Q. Which of the following is NOT a type of hierarchical clustering?
A.Single linkage
B.Complete linkage
C.K-means linkage
D.Average linkage
Solution
K-means linkage is not a type of hierarchical clustering; it refers to the K-means algorithm itself.