RRandomForest_Train()
Trains a RandomForest model and returns an error rate value, a confusion matrix and an importance matrix.
Synopsis
int RRandomForest_Train( const string& rModelVar, const dyn_string& headers, const dyn_dyn_float& vals, const dyn_float& la-bels, bool classification, int ntree, int mtry, float& err_rate, dyn_dyn_float& confusion, dyn_dyn_float& importance, int userData = 0);
Parameters
Parameter | Description |
---|---|
rModelVar | Name of R variable containing an RF model |
headers | Array of header strings |
vals | Matrix of values |
labels | Array of cluster labels |
classification | true .. classification, false .. regression. Classification identifies to which set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. |
ntree | Tree count. Random Forest is a set of decision trees. nTree specifies the number of trees to grow. This should not be set to too small number to ensure that every input row gets predicted at least a few times. |
mtry | Number of tries meaning number of variables randomly sampled as candidates at each split. Random Forest is a set of decision trees. The split refers to a branching in a decision tree. |
rate | Return parameter of error rate |
confusion | Return parameter of confusion matrix |
importance | Return parameter of importance matrix |
userData | User data of the function call. The user data variable can be set to an integer value and be used to detect errors when calling R functions. Set the variable to an integer value and when the function is called and an error occurs, the specified integer value is returned. |
Return Value
The function returns 0 if it was successfully executed.
Description
Trains a RandomForest model and returns an error rate value, a confusion matrix and an importance matrix. For more information on random forest models, see https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
Example
The example loads the model "D:/Test/myNewModel2.RData"; First the model is created via the RRandomForest_Train function. The function REvalExp evaluates an R expression to save the model in an output file. The model that was saved as a file can then be loaded via the RLoadModel function.
#uses "CtrlR"
main()
{
bool classification = TRUE;//Classification
int ntree = 5;
int mtry = 5;
float err_rate;
dyn_dyn_float confusion; //Return parameter of confusion matrix - see chapter Classification Wizard - Quality
dyn_dyn_float importance; //Return parameter of importance matrix - see chapter Classification Wizard - Quality
//Add data
dyn_float df1 = makeDynFloat(31,31,33,32,34,35,36); /* The size of the arrays must be identical. Each 7 items*/
dyn_float df2 = makeDynFloat(401,381,382,392,406,410,408); /* The size of the arrays must correspond to the header size */
dyn_float df3 = makeDynFloat(89,85,90,90,99,98,97);
dyn_float df4 = makeDynFloat(63,68,71,73,200,300,350);
dyn_float df5 = makeDynFloat(4,10,10,-4,8,7,8);
dyn_float labels = makeDynFloat(0,0,1,1,2,3,3); /*The number of labels must correspond to the number of array entries */
dyn_string headers = makeDynString ("Current", "Voltage", "Load","T_Increase","T_Ambient");
/* The size of the headers must correspond to the number of array entries df1, df2.. */
string H_LINE = "***************************************************************************";
dyn_dyn_float ddf1;
dynAppend(ddf1,df1);
dynAppend(ddf1,df2);
dynAppend(ddf1,df3);
dynAppend(ddf1,df4);
dynAppend(ddf1,df5);
string rModelVar = "myModel2"; //r model variable
string err_desc; //Description variable for error handling
int UserData; /* See the description of the userdata. This is the return parameter for the RGetLastErr - see below */
//Create a model via the function "RRandomForest_Train
int retV = RRandomForest_Train(rModelVar, headers, ddf1, labels, classification, ntree, mtry, err_rate, confusion, importance);
string out = "D:/Test/myNewModel2.RData"; //The concrete file for the model
REvalExp(0, "save(%var%, file = %var%)", rModelVar, out);
/*Evaluates the expression and saves the rModelVar in the out parameter */
DebugN("Value of the out parameter:", out);
string ModelName;
int h = RLoadModel(out, ModelName); /* Load the model. The function returns the name of the model. ModelName contains the name of the model */
DebugN("ModelName:",ModelName, "loaded:", h);
DebugN("Error rate:", err_rate, "Confusion matrix:", confusion, "Importance matrix:", importance);
//For information on confusion and importance matrices, see chapter Classification Wizard - Quality
blob at = RGetVarSerialized(rModelVar); //Retrieves the deserialized value of the R variable rModelVar.
if (RGetLastErr(err_desc, UserData, true) != 0) //Error handling
{
DebugN("Error occurred: " + err_desc);
return;
}
DebugN("Rmodelvar:", at);
DebugN(H_LINE);
int row_to_predict = 5;
int row_len = 5;
dyn_float values;
for(int i = 1; i <= row_len; i++)
{
int ret = dynAppend(values, ddf1[i][row_to_predict]); /*Add data to "values"-> Prepare values for the prediction function */
if( ret == -1 )
{
DebugN("Error! dynAppend to labels failed!");
return;
}
}
int prediction = RPredict("myModel2", headers, values); /* Calls the Calls the function "predict" of a loaded model and returns the prediction result.*/
//error handling
if (RGetLastErr(err_desc, UserData, true) != 0)
{
DebugTN("Error occurred: " + err_desc);
return;
}
DebugN("prediction=" + prediction);
DebugN("RPredict_test finished!");
DebugN(H_LINE);
}
Assignment
Availability
See also