RClusterNumberGap()
Calls the function "clusGap" of a loaded model and returns the goodness of the clustering measure.
Synopsis
int RClusterNumberGap( const dyn_dyn_float data, bool scale, const
string& FUN="kmeans", int nstart = 25, int Kmax=10, int B=500, string
method="firstSEmax", int userData = 0);
Parameters | Description |
---|---|
data | Matrix of values. |
scale | True .. clustering is performed on scaled values. |
FUN | Cluster function to be used. This means gap statistic for estimating the number of clusters. In other words this is the optimum number of clusters for kmeans. kmeans is the cluster function to be used. |
nStart | Number of start tries for the calculation of the "goodness". Default: 25. |
Kmax | Maximum number of clusters to consider. Default: 10. |
B | Number of Monte Carlo (“bootstrap”) samples. Monte Carlo (“bootstrap”) samples is a statistical method for sampling. Default . 500. For more information see https://en.wikipedia.org/wiki/Particle_filter |
method |
Computation method identifier. The default method "firstSEmax" looks for the smallest cluster. For other available methods see: https://stat.ethz.ch/R-manual/R-devel/library/cluster/html/clusGap.html |
userData | User data of the function call. The user data variable can be set to an integer value and be used to detect errors when calling R functions. Set the variable to an integer value and when the function is called and an error occurs, the specified integer value is returned. |
Return Value
The function returns < 0 .. errr, >= 0 .. "gap" statistic
Description
Calls the function "clusGap" of a loaded model and returns the goodness of clustering measure. What does the goodness of clustering measure mean. It means that the goodness of the clustering measure is computed based on the average dispersion compared to a reference distribution for an increasing number of clusters.
Example
The example returns the number of clusters and the goodness of clustering measure.
#uses "CtrlR"
main()
{
dyn_float df1 = makeDynFloat(31,31,33,32,34,33,32,35,29,34,38,40,37,38,36,36,36,39,38,40,35,32,34,32,34,29,29,28,31,28,30,34,33,28,31,32,33,33,33,35,36,36,40,38,40,37,40,38,40,38);
dyn_float df2 = makeDynFloat(401,381,382,392,406,372,361,405,392,399,350,342,346,354,304,345,320,317,356,323,386,406,405,396,400,401,365,400,391,398,362,368,363,373,389,370,406,386,402,367,379,380,406,389,374,379,399,406,377,407);
dyn_float df3 = makeDynFloat(89,85,90,90,99,88,83,102,81,97,95,98,92,96,78,89,83,89,97,93,97,93,99,91,97,83,76,80,87,80,78,90,86,75,86,85,96,91,95,92,98,98,116,106,107,100,114,111,108,111);
dyn_float df4 = makeDynFloat(63,97,73,73,75,75,80,93,96,77,86,81,85,83,74,68,73,63,86,60,85,93,90,79,79,68,81,66,65,95,96,71,72,81,73,63,84,75,67,77,73,85,100,95,74,71,97,98,67,63);
dyn_float df5 = makeDynFloat(4,10,10,-4,8,8,-3,3,8,9,6,7,0,3,10,8,3,9,10,8,9,-3,6,9,1,3,6,-4,0,-2,-2,7,6,3,-2,6,8,8,0,3,-3,2,3,10,6,1,4,4,2,8);
dyn_float df6 = makeDynFloat(35,32,34,34,38,34,33,36,31,30,37,39,39,34,33,39,30,33,37,35,44,40,41,41,43,44,44,40,42,45,37,30,30,33,33,35,36,39,36,34,34,30,32,30,30,32,31,34,35,38);
string err_desc;
int context;
//Add the Matrix of values
dyn_dyn_float ddf1;
dynAppend(ddf1,df1);
dynAppend(ddf1,df2);
dynAppend(ddf1,df3);
dynAppend(ddf1,df4);
dynAppend(ddf1,df5);
dynAppend(ddf1,df6);
//***************************************************************************************************
/* Function call of RClusterNumberGap -> Calls the function "clusGap" of a loaded model and returns the goodness of the clustering measure.*/
bool scale2 = TRUE; //Clustering is performed on scaled values
int j; //Return value of the function -> OK/NOK
string FUN = "kmeans"; /*cluster function to be used (gap statistic for estimating the number of clusters. This means the optimum number of clusters for the function kmeans.*/
int nStart; //Number of start tries for the calculation of the goodness
int kMax; //Maximum number of clusters to consider
int B; //Number of Monte Carlo (“bootstrap”) samples (statistical method for sampling)
j = RClusterNumberGap(ddf1, scale2); //Function call
DebugN("Gab statistics:", j);//the number of clusters
//***************************************************************************************************
}
Assignment
Availability
See also