Results of the random trials with Problem P on Iris (A), BCW (B), BC-DR3 (C), BNA-DR3 (D), and BCW-Diag-10 (E), expanding on the summary given on the rightmost column of Table 2. Each point on each left panel corresponds to a trial and is color-coded according to the accompanying palette to reflect the value of ARI fnc it leads to by way of clustering with k-means. The point leading to the highest ARI fnc value is marked by the crosshair in the panel. Each right panel provides a view of how ARI fnc is distributed over all pertaining trials. https://doi.org/10.1371/journal.pone.0286312.g002

Source publication

Fig 1. ARI fnc versus AMI max for all partitions resulting from scaling...

Fig 2. Results of the random trials with Problem P on Iris (A), BCW...

Fig 3. Reference partition for the Iris data set (leftmost column of...

Shape complexity in cluster analysis

Article

Full-text available

May 2023

In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the workhorse in this preprocessing phase has been to divide the data by the standard deviation along each dimensio...

Context 1

... only exception is the BCW-Diag-10 data set, although in this case every one of the sets of α k 's can be said to lie, so to speak, in the same ballpark as 1/σ k (or 1=s pool k ). In fact, the plots in Fig 2(E) strongly suggest that scaling the data for BCW-Diag-10 by the outcome of virtually any of the random trials with Problem P would be equally acceptable. This would be so even if a reference partition (and hence ARI fnc values) had not been available, because comparing the obtained partitions with one another would already suffice. ...

View in full-text

Context 2

... course, the latter is based almost entirely on the highly concentrated character of the ARI fnc histogram in Fig 2(E), which to a degree is also true of Fig 2(B) and 2(C), which refer to the BCW and BC-DR3 data sets, respectively. For each of these two data sets, comparing the partitions resulting from the random trials with Problem P with one another, and adopting any of those that by the ARI fnc histogram seem not only to be one and the same but also to recur very frequently during the trials, would lead to equally acceptable scaling decisions. ...

View in full-text

Context 3

View in full-text

Context 4

... these two cases, choosing the set of α k 's to use out of those produced by the random trials with Problem P by simply comparing the obtained partitions and looking for a consensus with strong support would lead to disastrous results. This is clear from the ARI fnc histograms in Fig 2(A) and 2(D), which peak significantly to the left of the best values attained in the trials. Beyond comparing obtained partitions, one must therefore also use one's knowledge of the domain in question and look at what they are doing to the data. ...

View in full-text

Contexts in source publication