Clustering Methods: K-means, Hierarchical - Competitive Exam Level

Q. If a dataset has 200 points and you apply K-means clustering with K=4, how many points will be assigned to each cluster on average?
  • A. 50
  • B. 40
  • C. 60
  • D. 30
Q. If the distance between two clusters in hierarchical clustering is defined as the maximum distance between points in the clusters, what linkage method is being used?
  • A. Single linkage
  • B. Complete linkage
  • C. Average linkage
  • D. Centroid linkage
Q. In a K-means clustering algorithm, if you have 5 clusters and 100 data points, how many centroids will be initialized?
  • A. 5
  • B. 100
  • C. 50
  • D. 10
Q. In hierarchical clustering, what does 'agglomerative' mean?
  • A. Clusters are formed by splitting larger clusters
  • B. Clusters are formed by merging smaller clusters
  • C. Clusters are formed randomly
  • D. Clusters are formed based on a predefined distance
Q. In hierarchical clustering, what does agglomerative clustering do?
  • A. Starts with all data points as individual clusters and merges them
  • B. Starts with one cluster and splits it into smaller clusters
  • C. Randomly assigns data points to clusters
  • D. Uses a predefined number of clusters
Q. In hierarchical clustering, what does the term 'dendrogram' refer to?
  • A. A type of data point
  • B. A tree-like diagram that shows the arrangement of clusters
  • C. A method of calculating distances
  • D. A clustering algorithm
Q. In hierarchical clustering, what does the term 'linkage' refer to?
  • A. The method of assigning clusters to data points
  • B. The distance metric used to measure similarity
  • C. The strategy for merging clusters
  • D. The number of clusters to form
Q. In hierarchical clustering, what is the difference between agglomerative and divisive methods?
  • A. Agglomerative starts with individual points, divisive starts with one cluster
  • B. Agglomerative merges clusters, divisive splits clusters
  • C. Both A and B
  • D. None of the above
Q. In hierarchical clustering, what is the result of the agglomerative approach?
  • A. Clusters are formed by splitting larger clusters
  • B. Clusters are formed by merging smaller clusters
  • C. Clusters are formed randomly
  • D. Clusters are formed based on a predefined number
Q. In K-means clustering, what happens if K is set too high?
  • A. Clusters become too large
  • B. Overfitting occurs
  • C. Underfitting occurs
  • D. No effect
Q. In which scenario would hierarchical clustering be preferred over K-means?
  • A. When the number of clusters is known
  • B. When the dataset is very large
  • C. When a hierarchy of clusters is desired
  • D. When the data is strictly numerical
Q. What is a common application of clustering in real-world scenarios?
  • A. Spam detection in emails
  • B. Predicting stock prices
  • C. Image classification
  • D. Customer segmentation
Q. What is the effect of outliers on K-means clustering?
  • A. They have no effect on the clustering results
  • B. They can significantly distort the cluster centroids
  • C. They improve the clustering accuracy
  • D. They help in determining the number of clusters
Q. What is the main criterion for determining the optimal number of clusters in K-means?
  • A. Silhouette score
  • B. Elbow method
  • C. Both A and B
  • D. None of the above
Q. What is the main difference between K-means and hierarchical clustering?
  • A. K-means is a partitional method, while hierarchical is a divisive method
  • B. K-means requires the number of clusters to be defined, while hierarchical does not
  • C. K-means can only be used for numerical data, while hierarchical can handle categorical data
  • D. K-means is faster than hierarchical clustering for small datasets
Q. What is the primary goal of the K-means clustering algorithm?
  • A. Minimize the distance between points in the same cluster
  • B. Maximize the distance between different clusters
  • C. Both A and B
  • D. None of the above
Q. What is the purpose of the elbow method in K-means clustering?
  • A. To determine the optimal number of clusters
  • B. To visualize the clusters formed
  • C. To assess the performance of the algorithm
  • D. To preprocess the data before clustering
Q. What type of data is K-means clustering best suited for?
  • A. Categorical data
  • B. Numerical data
  • C. Text data
  • D. Time series data
Q. Which clustering method is more suitable for discovering nested clusters?
  • A. K-means clustering
  • B. Hierarchical clustering
  • C. DBSCAN
  • D. Gaussian Mixture Models
Q. Which clustering method is more suitable for discovering non-globular shapes in data?
  • A. K-means clustering
  • B. Hierarchical clustering
  • C. DBSCAN
  • D. Gaussian Mixture Models
Q. Which evaluation metric is commonly used to assess the quality of clustering results?
  • A. Accuracy
  • B. Silhouette score
  • C. F1 score
  • D. Mean squared error
Q. Which evaluation metric is commonly used to assess the quality of clustering?
  • A. Accuracy
  • B. Silhouette score
  • C. F1 score
  • D. Mean squared error
Q. Which of the following is a characteristic of K-means clustering?
  • A. It can produce overlapping clusters
  • B. It is deterministic and produces the same result every time
  • C. It can handle noise and outliers effectively
  • D. It partitions data into non-overlapping clusters
Q. Which of the following is a disadvantage of K-means clustering?
  • A. It is sensitive to outliers
  • B. It requires the number of clusters to be specified in advance
  • C. It can converge to local minima
  • D. All of the above
Q. Which of the following is a disadvantage of the K-means algorithm?
  • A. It can handle large datasets efficiently
  • B. It requires the number of clusters to be specified in advance
  • C. It is sensitive to outliers
  • D. It can be used for both supervised and unsupervised learning
Q. Which of the following is a limitation of the K-means algorithm?
  • A. It can handle non-spherical clusters
  • B. It requires the number of clusters to be specified in advance
  • C. It is computationally efficient for large datasets
  • D. It can be used for both supervised and unsupervised learning
Q. Which of the following is NOT a common distance metric used in clustering?
  • A. Euclidean distance
  • B. Manhattan distance
  • C. Cosine similarity
  • D. Logistic distance
Q. Which of the following is NOT a method of linkage in hierarchical clustering?
  • A. Single linkage
  • B. Complete linkage
  • C. Average linkage
  • D. Random linkage
Q. Which of the following is NOT a step in the K-means clustering algorithm?
  • A. Assigning data points to the nearest centroid
  • B. Updating the centroid positions
  • C. Calculating the silhouette score
  • D. Choosing the initial centroids
Q. Which of the following methods can be used to evaluate the quality of clusters formed by K-means?
  • A. Silhouette score
  • B. Davies-Bouldin index
  • C. Both A and B
  • D. None of the above
Showing 1 to 30 of 32 (2 Pages)