Aquileo | DBScan Clustering in R Programming

DBScan (Density-Based Spatial Clustering of Applications with Noise) is a non-linear, unsupervised clustering algorithm that identifies groups (clusters) of densely packed data points without requiring the number of clusters to be specified beforehand. Unlike algorithms like k-means, DBScan is capable of discovering arbitrarily shaped clusters and distinguishing noise or outliers in datasets.

How Does DBSCAN Work?

Choose the parameters eps (neighborhood radius) and MinPts (minimum points to form a dense region).
Select an unvisited point and find all neighboring points within the eps radius.
If the number of neighbors is at least MinPts, classify it as a core point and start a new cluster.
Expand the cluster by including all density-reachable points connected to the core point.
Repeat the process for all unvisited points until every point is assigned to a cluster or marked as noise.

The diagram shows DBSCAN clustering where core points have ≥ 4 neighbors within a 1-unit radius, border points are near core points but not dense enough, and noise points lie outside any dense region.

Implementation of DBScan Clustering in R

We implement the DBScan clustering algorithm in R to identify non-linear clusters and detect noise in an unsupervised learning setting.

1. Installing and Loading Required Packages

We install and load the fpc package which provides the DBScan functionality.

install.packages: used to install external packages.
library: used to load the installed package into the session.

install.packages("fpc")
library(fpc)

2. Loading and Viewing the Dataset

We load and view the built-in Iris dataset to understand its structure.

data: used to load built-in datasets.
str: used to view the structure of the dataset.

data(iris)
str(iris)

Output:

3. Preparing the Data for Clustering

We remove the label column to prepare the dataset for unsupervised clustering.

[-5]: used to exclude the fifth column (Species) from the dataset.

iris_1 <- iris[-5]

4. Fitting the DBScan Model

We fit the DBScan clustering model on the prepared dataset with specified parameters.

set.seed: used to fix random initialization for reproducibility.
dbscan: used to apply the DBScan clustering algorithm.
eps: defines the radius of the neighborhood.
MinPts: defines the minimum number of points in a neighborhood to form a cluster.

set.seed(220)
Dbscan_cl <- dbscan(iris_1, eps = 0.45, MinPts = 5)
Dbscan_cl

Output:

5. Checking Cluster Assignments

We extract the cluster assignments and compare them to the original species for evaluation.

$cluster: used to access the cluster labels.
table: used to compare actual species with cluster assignments.

Dbscan_cl$cluster
table(Dbscan_cl$cluster, iris$Species)

Output:

6. Plotting the Clusters

We visualize the clusters to understand the spatial groupings formed by DBScan.

plot: used to plot the clustered data in 2D space.

plot(Dbscan_cl, iris_1, main = "DBScan")
plot(Dbscan_cl, iris_1, main = "Petal Width vs Sepal Length")

Output:

The output displays a 2D scatter plot of DBSCAN clustering results, where points are colored by cluster labels and noise points are marked separately, helping visualize spatial groupings in the Iris dataset.

DBScan Clustering in R Programming

How Does DBSCAN Work?

Implementation of DBScan Clustering in R

1. Installing and Loading Required Packages

2. Loading and Viewing the Dataset

3. Preparing the Data for Clustering

4. Fitting the DBScan Model

5. Checking Cluster Assignments

6. Plotting the Clusters

Explore