# Data Mining | Week 7

## Data Mining Week 7 Answers

Q.1. Which of the following statement is NOT true about clustering?

a. It is a supervised learning technique

b. It is an unsupervised learning technique

c. It is also known as exploratory data analysis

d. It groups data into homogeneous groups

Q.2. Which of the following clustering technique start with the points as individual clusters and, at each step, merge the closest pair of clusters

a. K-Means clustering

b. DBSCAN

c. Divisive clustering

d. Agglomerative clustering

Q.3. DBSCAN is a___________ algorithm

a. Partitional clustering

b. Hierarchical clustering

c. Fuzzy clustering

d. Complete clustering

Q.4. The Euclidean distance matrix between four 2-dimensional points, p1, p2, p3, and p4, is shown below. A possible set of co-ordinate values of these points are:

a. p1=(0, 0), p2=(0, 1), p3=(1, 0), p4=(1, 1)

b. p1=(0, 0), p2=(1, 0), p3=(1, 1), p4=(0, 1)

c. p1=(1, 0), p2=(0, 0), p3=(1, 1), p4=(0, 1)

d. p1=(0, 0), p2=(1, 1), p3=(1, 0), p4=(0, 1)

Q.5. The leaves of a dendogram in hierarchical clustering represent?

a. Individual data points

b. Clusters of multiple data points

c. Distances between data points

d. Cluster membership of the data points

Q.6. Distance between two clusters in complete linkage clustering is defined as:

a. Distance between the closest pair of points between the clusters

b. Distance between the furthest pair of points between the clusters

c. Distance between the most centrally located pair of points in the clusters

d. None of the above

Q.7. Consider a set of five 2-dimensional points p1=(0, 0), p2=(5, 0), p3=(5, 1), p4=(0, 1), and p5=(0, 0.5). Euclide-an distance is the distance function. Single linkage clustering is used to cluster the points into two clusters. The clusters are:

a. {p1, p2, p3} {p4, p5}

b. {p1, p4, p5} {p2, p3}

c. {p1, p2, p5} {p3, p4}

d. {p1, p2, p4} {p3, p5}

Q.8. Consider a set of five 2-dimensional points p1=(0, 0), p2=(5, 0), p3=(5, 1), p4=(0, 1), and p5=(0, 0.5). Euclide-an distance is the distance function. Complete linkage clustering is used to cluster the points into two clus-ters. The clusters are:

a. {p1, p4, p5} {p2, p3}

b. {p1, p2, p3} {p4, p5}

c. {p1, p2, p5} {p3, p4}

d. {p1, p2, p4} {p3, p5}

Q.9. Consider a set of five 2-dimensional points p1=(0, 0), p2=(5, 0), p3=(5, 1), p4=(0, 1), and p5=(0, 0.5). Euclidean distance is the distance function. The k-means algorithm is used to cluster the points into two clusters. The initial cluster centers are p1 and p5. The clusters after two iterations of k-means are:

a. {p1, p4, p5} {p2, p3}

b. {p1, p2, p3} {p4, p5}

c. {p3, p4, p5} {p1, p2}

d. {p1, p2, p4} {p3, p5}

Q.10. Given a set of seven 2-dimensional points p1=(0, 0), p2=(5, 0), p3=(5, 1), p4=(0, 1), p5=(0, 0.5), p6=(0, 9), and p7=(5.5, 1). Euclidean distance is the distance function. The DBSCAN algorithm is used to cluster the points. Epsilon = 1, and MinPts = 2 is used for DBSCAN. The clusters and outliers obtained are:

a. Clusters: {p1, p3, p4, p5} {p2, p7}; Outlier: p6

b. Clusters: {p1, p2, p3} {p4, p5, p6}; Outlier: p7

c. Clusters: {p1, p4, p5} {p2, p3, p7}; Outlier: p6

d. Clusters: {p1, p4, p5} {p2, p3, p6}; Outlier: p7

0