Clustering Methods: K-means, Hierarchical for Competitive Exams

Clustering Methods: K-means, Hierarchical - Higher Difficulty Problems

Q. In hierarchical clustering, what does 'agglomerative' refer to?

A. A method that starts with all points as individual clusters
B. A method that requires the number of clusters to be predefined
C. A technique that merges clusters based on distance
D. A type of clustering that uses a centroid

Solution

Agglomerative clustering begins with each data point as its own cluster and merges them iteratively based on distance until a single cluster is formed.

Correct Answer: A — A method that starts with all points as individual clusters

Q. In hierarchical clustering, what is agglomerative clustering?

A. A bottom-up approach to cluster formation
B. A top-down approach to cluster formation
C. A method that requires prior knowledge of clusters
D. A technique that uses K-means as a base

Solution

Agglomerative clustering is a bottom-up approach where each data point starts as its own cluster and pairs of clusters are merged as one moves up the hierarchy.

Correct Answer: A — A bottom-up approach to cluster formation

Q. In hierarchical clustering, what is the result of a dendrogram?

A. A visual representation of the clustering process
B. A table of cluster centroids
C. A list of data points in each cluster
D. A summary of the clustering algorithm's performance

Solution

A dendrogram visually represents the arrangement of clusters and the distances at which they are merged.

Correct Answer: A — A visual representation of the clustering process

Q. What is a common application of clustering in marketing?

A. Predicting customer behavior
B. Segmenting customers into distinct groups
C. Optimizing supply chain logistics
D. Forecasting sales trends

Solution

Clustering is often used in marketing to segment customers into distinct groups based on purchasing behavior or demographics.

Correct Answer: B — Segmenting customers into distinct groups

Q. What is a common application of K-means clustering in the real world?

A. Image segmentation
B. Spam detection
C. Sentiment analysis
D. Time series forecasting

Solution

K-means clustering is often used in image segmentation to group similar pixels together.

Correct Answer: A — Image segmentation

Q. What is a key advantage of using hierarchical clustering over K-means?

A. It requires less computational power
B. It does not require the number of clusters to be specified in advance
C. It is always more accurate
D. It can handle larger datasets

Solution

Hierarchical clustering does not require the number of clusters to be predetermined, allowing for more flexibility in exploring data.

Correct Answer: B — It does not require the number of clusters to be specified in advance

Q. What is a key characteristic of DBSCAN compared to K-means?

A. It requires the number of clusters to be specified
B. It can find clusters of arbitrary shape
C. It is faster than K-means for all datasets
D. It uses centroids to define clusters

Solution

DBSCAN can identify clusters of arbitrary shape and does not require the number of clusters to be specified in advance.

Correct Answer: B — It can find clusters of arbitrary shape

Q. What is the main advantage of hierarchical clustering over K-means?

A. It does not require the number of clusters to be specified in advance
B. It is faster and more efficient
C. It can handle larger datasets
D. It is less sensitive to outliers

Solution

Hierarchical clustering does not require the number of clusters to be predetermined, allowing for more flexibility in analysis.

Correct Answer: A — It does not require the number of clusters to be specified in advance

Q. What is the main advantage of using hierarchical clustering over K-means?

A. It is faster and more efficient
B. It does not require the number of clusters to be specified
C. It can handle large datasets better
D. It is less sensitive to outliers

Solution

Hierarchical clustering does not require the number of clusters to be specified in advance, allowing for more flexibility in cluster formation.

Correct Answer: B — It does not require the number of clusters to be specified

Q. What is the main difference between K-means and K-medoids clustering?

A. K-means uses centroids, while K-medoids uses actual data points
B. K-medoids is faster than K-means
C. K-means can only handle numerical data, while K-medoids can handle categorical data
D. K-medoids requires the number of clusters to be specified, while K-means does not

Solution

K-means uses centroids to represent clusters, while K-medoids uses actual data points as the center of clusters, making it more robust to outliers.

Correct Answer: A — K-means uses centroids, while K-medoids uses actual data points

Q. What is the primary objective of the K-means clustering algorithm?

A. To minimize the distance between points in the same cluster
B. To maximize the distance between different clusters
C. To create a hierarchical structure of clusters
D. To classify data into predefined categories

Solution

K-means aims to minimize the distance between points within the same cluster by assigning points to the nearest centroid.

Correct Answer: A — To minimize the distance between points in the same cluster

Q. Which distance metric is commonly used in K-means clustering?

A. Manhattan distance
B. Cosine similarity
C. Euclidean distance
D. Hamming distance

Solution

K-means typically uses Euclidean distance to measure the distance between data points and centroids.

Correct Answer: C — Euclidean distance

Q. Which of the following clustering methods is best suited for discovering non-linear relationships in data?

A. K-means
B. Hierarchical clustering
C. DBSCAN
D. Gaussian Mixture Models

Solution

DBSCAN is effective for discovering non-linear relationships and can identify clusters of varying shapes and sizes, unlike K-means.

Correct Answer: C — DBSCAN

Q. Which of the following clustering methods is sensitive to outliers?

A. K-means
B. Hierarchical clustering
C. DBSCAN
D. Gaussian Mixture Models

Solution

K-means is sensitive to outliers because they can significantly affect the position of the centroid, leading to poor clustering results.

Correct Answer: A — K-means

Q. Which of the following is NOT a type of hierarchical clustering?

A. Single linkage
B. Complete linkage
C. K-means linkage
D. Average linkage

Solution

K-means linkage is not a type of hierarchical clustering; it refers to the K-means algorithm itself.

Correct Answer: C — K-means linkage

Q. Which of the following scenarios is best suited for hierarchical clustering?

A. When the number of clusters is known
B. When the data is high-dimensional
C. When a hierarchy of clusters is desired
D. When speed is a priority

Solution

Hierarchical clustering is ideal when a hierarchy of clusters is needed, as it provides a detailed view of the data structure.

Correct Answer: C — When a hierarchy of clusters is desired

Q. Which of the following scenarios is K-means clustering NOT suitable for?

A. When clusters are spherical and evenly sized
B. When the number of clusters is known
C. When clusters have varying densities
D. When outliers are present in the data

Solution

K-means is not suitable for clusters with varying densities, as it assumes clusters are spherical and of similar size.

Correct Answer: C — When clusters have varying densities

Showing 1 to 17 of 17 (1 Pages)