Q. If the distance between two clusters in hierarchical clustering is defined as the maximum distance between points in the clusters, what linkage method is being used?
A.Single linkage
B.Complete linkage
C.Average linkage
D.Centroid linkage
Solution
The method that defines the distance between two clusters as the maximum distance between points in the clusters is called complete linkage.
Q. In a K-means clustering algorithm, if you have 5 clusters and 100 data points, how many centroids will be initialized?
A.5
B.100
C.50
D.10
Solution
In K-means clustering, the number of centroids initialized is equal to the number of clusters. Therefore, if there are 5 clusters, 5 centroids will be initialized.
Agglomerative clustering starts with individual points and merges them into clusters, while divisive clustering starts with one cluster and splits it into smaller clusters.
Q. What is the main difference between K-means and hierarchical clustering?
A.K-means is a partitional method, while hierarchical is a divisive method
B.K-means requires the number of clusters to be defined, while hierarchical does not
C.K-means can only be used for numerical data, while hierarchical can handle categorical data
D.K-means is faster than hierarchical clustering for small datasets
Solution
K-means is a partitional clustering method that divides data into a fixed number of clusters, while hierarchical clustering builds a tree of clusters without needing to specify the number of clusters in advance.
Correct Answer: B — K-means requires the number of clusters to be defined, while hierarchical does not
Q. What is the primary goal of the K-means clustering algorithm?
A.Minimize the distance between points in the same cluster
B.Maximize the distance between different clusters
C.Both A and B
D.None of the above
Solution
The primary goal of K-means clustering is to minimize the distance between points in the same cluster while maximizing the distance between different clusters.
Q. What is the purpose of the elbow method in K-means clustering?
A.To determine the optimal number of clusters
B.To visualize the clusters formed
C.To assess the performance of the algorithm
D.To preprocess the data before clustering
Solution
The elbow method is used to determine the optimal number of clusters by plotting the explained variance as a function of the number of clusters and identifying the 'elbow' point.
Correct Answer: A — To determine the optimal number of clusters
Q. Which clustering method is more suitable for discovering nested clusters?
A.K-means clustering
B.Hierarchical clustering
C.DBSCAN
D.Gaussian Mixture Models
Solution
Hierarchical clustering is more suitable for discovering nested clusters, as it creates a tree structure that can reveal relationships at various levels of granularity.
Q. Which evaluation metric is commonly used to assess the quality of clustering results?
A.Accuracy
B.Silhouette score
C.F1 score
D.Mean squared error
Solution
The Silhouette score is a popular metric for evaluating clustering quality, measuring how similar an object is to its own cluster compared to other clusters.
Q. Which evaluation metric is commonly used to assess the quality of clustering?
A.Accuracy
B.Silhouette score
C.F1 score
D.Mean squared error
Solution
The Silhouette score is a popular metric for evaluating clustering quality, measuring how similar an object is to its own cluster compared to other clusters.
Q. Which of the following is a disadvantage of K-means clustering?
A.It is sensitive to outliers
B.It requires the number of clusters to be specified in advance
C.It can converge to local minima
D.All of the above
Solution
All of the listed options are disadvantages of K-means clustering, making it sensitive to outliers, requiring prior knowledge of the number of clusters, and potentially converging to local minima.