Physics Maths Engineering
Stiphen Chowdhury,
Na Helian,
Renato Cordeiro de Amorim
Renato Cordeiro de Amorim
School of Computer Science and Electronic Engineering
Peer Reviewed
DBSCAN is arguably the most popular density-based clustering algorithm, and it is capable of recovering non-spherical clusters. One of its main weaknesses is that it treats all features equally. In this paper, we propose a density-based clustering algorithm capable of calculating feature weights representing the degree of relevance of each feature, which takes the density structure of the data into account. First, we improve DBSCAN and introduce a new algorithm called DBSCANR. DBSCANR reduces the number of parameters of DBSCAN to one. Then, a new step is introduced to the clustering process of DBSCANR to iteratively update feature weights based on the current partition of data. The feature weights produced by the weighted version of the new clustering algorithm, W-DBSCANR, measure the relevance of variables in a clustering and can be used in feature selection in data mining applications where large and complex real-world data are often involved. Experimental results on both artificial and real-world data have shown that the new algorithms outperformed various DBSCAN type algorithms in recovering clusters in data.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a widely used clustering algorithm that groups data points based on density. It is popular because it can find non-spherical clusters and identify outliers, making it versatile for many applications.
DBSCAN treats all features equally, which can be a problem when some features are more relevant than others for clustering. This can lead to suboptimal results, especially with complex or high-dimensional data.
DBSCANR is an improved version of DBSCAN that reduces the number of parameters from two to one, simplifying its use. It maintains DBSCAN's ability to find non-spherical clusters while making the algorithm more user-friendly.
W-DBSCANR is a weighted version of DBSCANR that calculates feature weights during clustering. These weights represent the relevance of each feature, allowing the algorithm to focus on the most important features and improve clustering accuracy.
Feature weights are iteratively updated during the clustering process based on the current partition of the data. This ensures that the algorithm adapts to the density structure of the data and assigns higher weights to more relevant features.
W-DBSCANR:
The algorithm was tested on both artificial and real-world datasets. Results showed that W-DBSCANR outperformed various DBSCAN-type algorithms in accurately recovering clusters, especially in complex and high-dimensional data.
W-DBSCANR is useful in:
W-DBSCANR outperforms traditional DBSCAN and other variants by incorporating feature weights, which improve clustering accuracy. It is particularly effective for datasets with irrelevant or noisy features.
Future research could explore:
Show by month | Manuscript | Video Summary |
---|---|---|
2025 April | 3 | 3 |
2025 March | 92 | 92 |
2025 February | 55 | 55 |
2025 January | 58 | 58 |
2024 December | 53 | 53 |
2024 November | 49 | 49 |
2024 October | 46 | 46 |
2024 September | 63 | 63 |
2024 August | 43 | 43 |
2024 July | 34 | 34 |
2024 June | 22 | 22 |
2024 May | 26 | 26 |
2024 April | 21 | 21 |
2024 March | 6 | 6 |
Total | 571 | 571 |
Show by month | Manuscript | Video Summary |
---|---|---|
2025 April | 3 | 3 |
2025 March | 92 | 92 |
2025 February | 55 | 55 |
2025 January | 58 | 58 |
2024 December | 53 | 53 |
2024 November | 49 | 49 |
2024 October | 46 | 46 |
2024 September | 63 | 63 |
2024 August | 43 | 43 |
2024 July | 34 | 34 |
2024 June | 22 | 22 |
2024 May | 26 | 26 |
2024 April | 21 | 21 |
2024 March | 6 | 6 |
Total | 571 | 571 |