An implementation of the gap statistic algorithm from Tibshirani, Walther, and Hastie's
"Estimating the number of clusters in a data set via the gap statistic".
This function calls the clusGap
-function of the
cluster-package to calculate the data for the plot.
sjc.kgap(x, max = 10, B = 100, SE.factor = 1, method = "Tibs2001SEmax", plotResults = TRUE)
x | matrix, where rows are observations and columns are individual dimensions, to compute and plot the gap statistic (according to a uniform reference distribution). |
---|---|
max | maximum number of clusters to consider, must be at least two. Default is 10. |
B | integer, number of Monte Carlo ("bootstrap") samples. Default is 100. |
SE.factor | [When |
method | character string indicating how the "optimal" number of clusters,
k^, is computed from the gap statistics (and their standard deviations),
or more generally how the location k^ of the maximum of f[k] should be
determined. Default is
|
plotResults | logical, if |
An object containing the used data frame for plotting, the ggplot object and the number of found cluster.
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via gap statistic. J. R. Statist. Soc. B, 63, Part 2, pp. 411-423
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.(2013). cluster: Cluster Analysis Basics and Extensions. R package version 1.14.4. (web)
# NOT RUN { # plot gap statistic and determine best number of clusters # in mtcars dataset sjc.kgap(mtcars) # and in iris dataset sjc.kgap(iris[,1:4]) # }