Introduction to Clustering

Welcome to this introduction to clustering. While most machine learning algorithms we have covered belong to the supervised learning group, clustering algorithms are unsupervised learning algorithms. Unsupervised because we do not have labels on our data and we want an algorithm to give us insight regardless.

Here is an example: Say we have a bunch of customer data such as age, gender, location and which products they bought, but we don’t want to predict anything in particular about these people. Instead, we can then use an unsupervised clustering algorithm to create customer segments by dividing the data into clusters. We may find a group of very good customers that we should take care of, along with a group of bad customers that we can safely pay less attention to. Because of this, clustering algorithms are widely used across industries such as retail, banking, manufacturing, healthcare, etc.

The algorithms can be used on structured data such as tabular data, as well as on unstructured data such as images, text and audio. Clustering is often used to organize vast amounts of data.

In this guide, you will learn about clustering algorithms and how to apply them to a data set.

Most Common Clustering Algorithms

Let’s go through some of the most common and best algorithms used for clustering today:

  • K Means
  • Mean Shift

K Mean Clustering

There are several straight forward clustering algorithms but K Means is surely one of the simplest.
The goal of the algorithm is to separate the data into K clusters. The algorithm will iteratively assign clusters to each of the data points based on numerical similarities of the features. Ultimately, the output of the clustering will be two things: Centroid coordinates for each of the clusters and labels for all the data points.

Because of this simplicity, K Means is one of the fastest clustering algorithms out there. However, K means will fail if it finds a local minimum and it is therefore preferred to randomly initialize the algorithm several times to ensure convergence.

Click here for a guide on K-Means clustering.

Mean Shift Clustering

The algorithm Mean Shift determines density centers, so-called modes, in multidimensional feature spaces. Since the number of feeding modes, unlike k-Means, does not have to be specified, it is considered parameter-free. The method is based on the assumption that with a gradient ascent via the density function, each point converges to the mode in which it is clustered.

The big advantage here, in contrast to K-means clustering, the number of clusters does not have to be selected because the mean shift automatically detects this. The disadvantage is that the selection of the search radius can not be trivial.

Read more about Mean Shift clustering here.


Hopefully you want to learn more after this brief introduction to clustering. If you do, this is a good place to start!