This tutorial demonstrates how to perform behavioral classification of female and male mice according to their trait anxiety using the R package anxiotraitR. The experimental pipeline to obtain the behavioral data necessary for classification is described in detail in our publication Kovlyagina et al. [1]. Briefly, we measure individual freezing responses of mice in memory retrieval sessions with a prolonged conditioned stimulus (CS) exposure of 6 min following auditory aversive conditioning (AAC). Subsequently, we employ log-linear regression to estimate the intercept and slope (regression coefficient) of the fitted freezing curve during CS exposure. Together with the average freezing during the last 3.5 min of CS exposure, we use these parameters to classify mice into sustained and phasic responders using our pre-trained machine learning (ML) models implemented in anxiotraitR [2].
To install anxiotraitR, devtools must be available. The following code sequence will check for a devtools installation and then will install anxiotraitR directly from GitHub [2]:
if(!require(devtools)){
install.packages("devtools")
library(devtools)
}
devtools::install_github("johannesmiedema/anxiotraitR")
To classify mice into sustained and phasic responders, a data frame must be provided, where each row represents the freezing values of individual animals during CS presentation. To demonstrate this, we have included an example dataset. As seen below, the data frame contains 36 columns, each corresponding to a freezing value as a percentage in 30 s time bins during MR1 AAC.
\(~\)
library(anxiotraitR)
mydata <- anxiotraitR::freezingDataset
head(mydata)
#> 1 2 3 4 5 6 7 8
#> 1 9.216067 11.08597 4.67480 0.00000 3.47270 3.606267 0.000000 4.006967
#> 2 0.000000 0.00000 0.00000 0.00000 0.00000 8.681867 0.000000 0.000000
#> 3 0.000000 0.00000 0.00000 0.00000 15.76090 0.000000 0.000000 0.000000
#> 4 12.021000 0.00000 10.56103 30.44393 4.00700 3.739867 9.750367 18.298633
#> 5 6.945500 18.03160 0.00000 28.58340 11.35323 12.303733 30.838500 6.811933
#> 6 0.000000 10.95250 0.00000 0.00000 24.97710 19.367267 6.811933 27.713667
#> 9 10 11 12 13 14 15 16
#> 1 0.000000 41.66400 22.58130 0.000000 92.69700 73.86000 69.58800 29.651667
#> 2 0.000000 0.00000 0.00000 3.606333 53.96133 54.76267 58.90333 4.674867
#> 3 6.277633 23.24067 11.88747 15.226633 90.55067 49.77867 65.63067 22.305700
#> 4 4.007000 4.00700 15.49373 28.583267 92.55733 89.04767 66.34133 65.539667
#> 5 3.873467 20.95143 24.72857 55.194333 78.89633 72.13733 84.45333 77.026333
#> 6 10.620133 20.83650 40.60433 0.000000 91.48267 71.14933 79.12467 33.926000
#> 17 18 19 20 21 22 23
#> 1 37.50400 26.93940 33.059766667 0.000000 4.274100 8.948933 5.876900
#> 2 21.37077 0.00000 15.226666667 0.000000 19.500833 26.446333 30.720500
#> 3 31.21390 26.88773 32.332633333 7.470333 7.613333 3.339167 2.781170
#> 4 62.90500 17.63080 52.491666667 54.601333 14.318900 0.000000 6.927467
#> 5 48.84367 21.81650 37.403666667 20.698267 18.833000 0.000000 24.977100
#> 6 33.25827 0.00000 0.004683333 34.952667 20.205900 55.576333 30.975167
#> 24 25 26 27 28 29 30
#> 1 4.674800 3.339133 0.000000 0.000000 0.000000 0.000000 8.815367
#> 2 6.010533 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
#> 3 19.257400 0.000000 7.865233 21.920200 4.007000 0.000000 3.339167
#> 4 7.497733 0.000000 3.606300 9.349667 3.472733 7.079033 0.000000
#> 5 44.940667 40.676000 13.067900 11.107800 0.000000 0.000000 0.000000
#> 6 34.255333 14.363100 12.154633 23.775000 14.825967 33.124700 24.175667
#> 31 32 33 34 35 36
#> 1 0.000000 0.000000 0.000000 4.274100 0.000000 0.000000
#> 2 3.606333 19.367267 0.000000 0.000000 0.000000 5.209133
#> 3 7.479733 0.000000 16.963000 0.000000 4.685167 24.833133
#> 4 0.000000 0.000000 10.418200 3.606300 3.339167 3.472733
#> 5 5.476267 2.264473 3.345367 12.555333 4.674867 42.474333
#> 6 3.339167 25.778500 12.688900 4.407733 21.237200 3.339167
\(~\)
Classification of animals into sustained and phasic responders is performed only using freezing responses measured during CS presentation. In our example, CS occurred in bins 13 to 24. Therefore, the data frame must be subset to contain only columns 13:24 (please note that the data frame may need to be subset differently depending on the CS bins in the respective experiment) :
\(~\)
mydata_subsetted <- mydata[,13:24]
head(mydata_subsetted)
#> 13 14 15 16 17 18 19 20
#> 1 92.69700 73.86000 69.58800 29.651667 37.50400 26.93940 33.059766667 0.000000
#> 2 53.96133 54.76267 58.90333 4.674867 21.37077 0.00000 15.226666667 0.000000
#> 3 90.55067 49.77867 65.63067 22.305700 31.21390 26.88773 32.332633333 7.470333
#> 4 92.55733 89.04767 66.34133 65.539667 62.90500 17.63080 52.491666667 54.601333
#> 5 78.89633 72.13733 84.45333 77.026333 48.84367 21.81650 37.403666667 20.698267
#> 6 91.48267 71.14933 79.12467 33.926000 33.25827 0.00000 0.004683333 34.952667
#> 21 22 23 24
#> 1 4.274100 8.948933 5.876900 4.674800
#> 2 19.500833 26.446333 30.720500 6.010533
#> 3 7.613333 3.339167 2.781170 19.257400
#> 4 14.318900 0.000000 6.927467 7.497733
#> 5 18.833000 0.000000 24.977100 44.940667
#> 6 20.205900 55.576333 30.975167 34.255333
\(~\)
This data frame can now be used for classification. The example dataset was obtained from female animals during MR 1. These parameters need to be specified to select the correct pre-trained model for classification:
\(~\)
myclassification <- anxiotraitR::classify_freezer(data_MR1 = mydata_subsetted, sex = "female", MR = 1)
head(myclassification$classification)
\(~\)
The resulting object contains the inferred phenotype of each animal as a vector in the same order as provided by the rows in the input data frame. Furthermore, the parameters needed for the machine learning model including the intercept and slope of the fitted freezing curve and average freezing during the last 3 min of CS presentation are included in the output of the classification function.
Classification results can be visualized with a bivariate plot of average freezing vs. fitted freezing curve slope (to which we refer as decay rate of freezing) or as actual freezing curves over time bins for sustained and phasic responders. The bivariate plot can be made using ggplot2:
library(ggplot2)
#Convert Classification results including average freezing and decay rate into a dataframe
plotdata <- as.data.frame(myclassification)
bivariate <-ggplot(plotdata, aes(x=freezing, y = -decayrate, col = classification)) +
geom_point() +
labs(x = "Freezing", y ="Decay rate", col="") +
scale_color_manual(values=c("magenta", "green2")) +
stat_ellipse()
bivariate
For classification with male animals or different MR sessions (MR1 or MR2), the parameters of the function must be specified accordingly. For MR2 classification, the data frame needs to be specified under the argument data_MR2:
\(~\)
myclassification_MR2 <- anxiotraitR::classify_freezer(data_MR2 = mydata_subsetted, sex = "female", MR = 2)
\(~\)
Apart from classification with our pre-trained models, it is also possible to retrain the existing models by adding new data. To use the retrain_model function, the user must have ggplot2 installed and loaded, as this function will also produce plots for evaluation of the trained models. Due to the small sizes of the data sets, Monte-Carlo Cross-Validation is used to assess the performance of the models. Per default, the number of iterations is 1000, but this can be changed with the training_iterations parameter within the function call. For demonstration purposes in the tutorial, the example dataset from female animals will be added to the male MR1 training data to retrain the ML model for male animals as the example data was already included in training the original ML models for female mice:
\(~\)
#ggplot2 needs to be loaded
library(ggplot2)
#Retrain male MR1 models by using the data obtained from the example dataset
retrainingresults <- anxiotraitR::retrain_model(new_data = mydata_subsetted, sex = "male", MR = 1)
\(~\)
The resulting object can be used to assess the ML performance metric plots, metric tables and the trained models themself. For example, the following code can be used to investigate the accuracy of the retrained models:
\(~\)
#Obtain Accuracy plot
Accuracy <- retrainingresults$Accuracy
#Show the Accuracy plot
Accuracy
\(~\)
The same analysis can be performed accordingly for Specificity, Sensitivity, F1 and AUROC. The performance metrics can also be viewed as a data frame which shows the mean and standard deviation:
\(~\)
#Obtain model performance metrics as a dataframe
Stats <- retrainingresults$Stats
Stats
After investigating all metrics, the user can save the best-performing model as an RDS object. First, the model needs to be assigned to a new variable. For example, we will save the radial SVM model (RSVM) in this tutorial. The other models can be accessed by specifying the following output arguments: “SVM” for the SVM with a linear kernel, “RF” for random forest, “Logistic” for the logistic regression model, and “LDA” for linear discriminant analysis.
\(~\)
mymodel <- retrainingresults$RSVM
#Save as RDS object: the second parameter specifies the name of the saved object and the path
saveRDS(mymodel, "mymodel.RDS")
\(~\)
After saving the retrained model as an RDS object, this model can now be used for custom classification using the classify_freezer() function. For this, the model name and path of the saved RDS object need to be specified, as well as the data for custom classification, which needs to be specified under the data argument. The sex and MR arguments are not needed in this step: \(~\)
newclassification <- anxiotraitR::classify_freezer(model = "mymodel.RDS", data = mydata_subsetted)
[1] Kovlyagina I, Wierczeiko A, Todorov H, Jacobi E, Tevosian M, et al. (2024) Leveraging interindividual variability in threat conditioning of inbred mice to model trait anxiety. PLOS Biology 22(5): e3002642.
[2] Miedema J, Lutz B, Gerber S, Kovlyagina I & Todorov H (2025) Balancing ethics and statistics: Machine learning facilitates the reduction of sample sizes for highly accurate behavioral classification of inbred mice