One of the questions that many researchers and analysts face is how to generate meaningful insights or structures from observed data, or how to develop taxonomies. Here comes the role of cluster analysis.
Cluster analysis can be thought of a set of algorithms and techniques that are used to group objects of similar nature together. In other words, we can think of cluster analysis as an exploratory data analysis techniques which puts together observations into different clusters such that the association between two observations is maximum if they fall within the same cluster and minimal if they belong to different clusters. We can also think of cluster analysis as a statistical technique to discover patterns in the data without providing an interpretation. In other words, cluster analysis simply discovers structures in data without explaining why they exist.
Let us look at a couple of examples to help you understand how we deal with cluster analysis in our daily life. Consider different groups of tourists who visit your city, you’ll find that each group has more or less similar characteristics in terms of language they speak, their height, their color, their clothes etc. Similarly, you’ll find that in a grocery store items of similar nature, such as different types of soaps/detergents or vegetables/fruits are displayed in the same or nearby locations. Another example is that of biologists who have grouped different species of animals together such as mammals/reptiles/birds etc. In general, there are three categories of cluster analysis: Joining (Tree Clustering), Two-way Joining (Block Clustering), and k-Means Clustering.