Results of the random trials with Problem P on Iris (A), BCW (B), BC-DR3 (C), BNA-DR3 (D), and BCW-Diag-10 (E), expanding on the summary given on the rightmost column of Table 2. Each point on each left panel corresponds to a trial and is color-coded according to the accompanying palette to reflect the value of ARI fnc it leads to by way of clustering with k-means. The point leading to the highest ARI fnc value is marked by the crosshair in the panel. Each right panel provides a view of how ARI fnc is distributed over all pertaining trials. https://doi.org/10.1371/journal.pone.0286312.g002

Results of the random trials with Problem P on Iris (A), BCW (B), BC-DR3 (C), BNA-DR3 (D), and BCW-Diag-10 (E), expanding on the summary given on the rightmost column of Table 2. Each point on each left panel corresponds to a trial and is color-coded according to the accompanying palette to reflect the value of ARI fnc it leads to by way of clustering with k-means. The point leading to the highest ARI fnc value is marked by the crosshair in the panel. Each right panel provides a view of how ARI fnc is distributed over all pertaining trials. https://doi.org/10.1371/journal.pone.0286312.g002

Source publication
Article
Full-text available
In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the workhorse in this preprocessing phase has been to divide the data by the standard deviation along each dimensio...

Contexts in source publication

Context 1
... only exception is the BCW-Diag-10 data set, although in this case every one of the sets of α k 's can be said to lie, so to speak, in the same ballpark as 1/σ k (or 1=s pool k ). In fact, the plots in Fig 2(E) strongly suggest that scaling the data for BCW-Diag-10 by the outcome of virtually any of the random trials with Problem P would be equally acceptable. This would be so even if a reference partition (and hence ARI fnc values) had not been available, because comparing the obtained partitions with one another would already suffice. ...
Context 2
... course, the latter is based almost entirely on the highly concentrated character of the ARI fnc histogram in Fig 2(E), which to a degree is also true of Fig 2(B) and 2(C), which refer to the BCW and BC-DR3 data sets, respectively. For each of these two data sets, comparing the partitions resulting from the random trials with Problem P with one another, and adopting any of those that by the ARI fnc histogram seem not only to be one and the same but also to recur very frequently during the trials, would lead to equally acceptable scaling decisions. ...
Context 3
... course, the latter is based almost entirely on the highly concentrated character of the ARI fnc histogram in Fig 2(E), which to a degree is also true of Fig 2(B) and 2(C), which refer to the BCW and BC-DR3 data sets, respectively. For each of these two data sets, comparing the partitions resulting from the random trials with Problem P with one another, and adopting any of those that by the ARI fnc histogram seem not only to be one and the same but also to recur very frequently during the trials, would lead to equally acceptable scaling decisions. ...
Context 4
... these two cases, choosing the set of α k 's to use out of those produced by the random trials with Problem P by simply comparing the obtained partitions and looking for a consensus with strong support would lead to disastrous results. This is clear from the ARI fnc histograms in Fig 2(A) and 2(D), which peak significantly to the left of the best values attained in the trials. Beyond comparing obtained partitions, one must therefore also use one's knowledge of the domain in question and look at what they are doing to the data. ...