Human activity recognition by applying LDA, QDA and kNN: A demo in R

8 min readJun 25, 2021

Human activity recognition (HAR) data utilized in this demo are available in UCI machine learning repository. The experiments were conducted with participation of 30 volunteers within 19–48 years old. The subjects wore a smartphone on their waists while performing certain activities. More details is available in README.txt file included in the downloaded zip file.

Since the provided data had been preprocessed, randomly split into training and test dataset, and normalized within [-1,1], thus we are safe to proceed to modeling (develop predictive models) stage. First, load the necessary libraries and import the training and test dataset from the working directory.

# load train dataset and its corresponding labels
library(plyr)    # mapping
library(MASS)    # lda and qda
library(class)   # kNN
library(ggplot2) # for beautiful plotsdata=read.table(“Train/X_train.txt”,header = FALSE, sep = “ “, dec = “.”)
labels=read.table(“Train/y_train.txt”,header=FALSE)# load test dataset and its corresponding labels
data_test=read.table(“Test/X_test.txt”,header=FALSE, sep= “ “,dec = “.”)
labels_test=read.table(“Test/y_test.txt”,header = FALSE)

The text files where we import our data do not contains predictors’ name. It always a good practice to add header names for each predictor as shown below.

# features’ names can be found in features.txt
featurenames=read.table(“features.txt”,header = FALSE)
# some of the feature names are redundant
featurenames=make.unique(featurenames[,1],sep=”_”)
colnames(data)=featurenames
colnames(data_test)=featurenames
# activity types are numeric
colnames(labels)=”activity_types”
colnames(labels_test)=”activity_types”

Next, replace the labels (response variable) which comes in numerical values to its real behaviors and store it as another column. Then, the training data frame as well as test data frame containing set of predictors and response variables are constructed. There are a total of 12 types of activities (classes) in the data, but only instances that corresponds to activity labels from 1 to 6 are selected for this experiment.

# activity labels
act_labels=read.table(“activity_labels.txt”,header = FALSE)[,2]labels$at=mapvalues(labels$activity_types,from=c(1:12),to = act_labels)
labels_test$at=mapvalues(labels_test$activity_types,from=c(1:12),to=act_labels)# Construct complete dataframe
X_train=cbind(data,labels)
# Filter the data to have activity types from 1 to 6.
X_train=X_train[(X_train$activity_types>=1) & (X_train$activity_types<=6),]
X_test=cbind(data_test,labels_test)
X_test=X_test[(X_test$activity_types>=1) & (X_test$activity_types<=6),]# remove variables that are not needed
rm(data,labels,data_test,labels_test)

Linear discriminant analysis (LDA)

Simply speaking, to develop LDA model, the distribution of predictors, X are modeled as multivariate Gaussian distribution as shown in equation (2) below respectively with respect to its response variables, Y. Then, these density functions, fₖ(x) are plugged into equation (1) to obtain the posterior probability, P(Y=p|X=x).

start_time=Sys.time()
# LDA classifier training
model_lda=lda(at~.-activity_types,data=X_train)
end_time=Sys.time()
(t_lda_train=end_time-start_time)

The training time for LDA model is roughly 20s. As what mentioned above, the training of LDA involves computing πₖ and fₖ(x). If we examine the components in fₖ(x) equation, parameters like mean vector, μₖ and covariance matrix, ∑ have to be calculated from the training data. It is not computationally intensive to calculate πₖ and μₖ. A majority of training time can be attributed to operations involving ∑ which is a 561x561 matrix in this case (e.g. determinant and inverse of matrix).

A warning message popped out on the console stating that variables are collinear. This happens when the determinant of ∑ is close to zero (i.e. two or more variables are almost a linear combination of each other). If the primary concern is on the model performance, then this issue is not a big deal. For more info please refer to this website.

Next, we can proceed to evaluation of model performance. As mentioned in the previous post, it is always a good idea to write a function if we need to perform a certain task constantly. Aside from popular performance metrics like accuracy, precision and sensitivity (recall), Cohen’s Kappa statistics is also computed. Essentially, it is a measure that compares an observed accuracy with an expected accuracy (random chance) and most importantly it is comparable among classifiers. Kappa statistic of 1 means perfect classification.

pred=predict(model_lda,newdata = X_train)$class
nclass=max(X_train$activity_types)# performance metrics for multiclass problem
perform_metric=function(true_label,pred,nclass) {
 conf=table(truth=true_label,prediction=pred) # confusion matrix
 nsample=sum(conf)
 acc=sum(diag(conf))/nsample # accuracy
 Rowsum=rowSums(conf)
 Colsum=colSums(conf)
 # Kappa statistics
 expected_acc=sum(Rowsum*Colsum/nsample)/nsample
 Kappa=(acc-expected_acc)/(1-expected_acc)
 sensitivity=diag(conf)/Rowsum
 precision=diag(conf)/Colsum
 specificity=c()
 for (i in 1:nclass) {
 specificity[i]=sum(conf[-i,-i])/(sum(conf[-i,-i])+colSums(conf)[i]-diag(conf)[i])
 }
 names(specificity)=names(sensitivity)
 list(confusion_matrix=conf,
 accuracy=round(acc,4),Kappa_statistic=round(Kappa,4),
 sensitivity=round(sensitivity,4),specificity=round(specificity,4),
 precision=round(precision,4))
}# training performance
perform_metric(X_train$at,pred,nclass)
# test performance
pred=predict(model_lda,newdata = X_test)$class
perform_metric(X_test$at,pred,nclass)

Performance metrics of LDA model evaluated using training data.

Performance metrics of LDA model evaluated using test data.

Quadratic discriminant analysis (QDA)

Like LDA, QDA employs Bayes’ theorem (equation (1)) to compute the conditional probability of an instance belonging to p class. However, QDA assumes that each class has its own covariance matrix and thus the density function for each class is different as shown in (3).

The rank deficiency in group ‘LAYING’ displayed as error message in R console below literally means that the covariance matrix of features set that belong to ‘LAYING’ class is singular (i.e. matrix inverse does not exist, thus determinant of matrix is zero).

k-nearest neighbor (kNN) classifier

kNN is instance-based, memory-based and lazy learner, whereby no model has to be fitted. Given an previously unseen datum, x₀, k training points that are nearest to x₀ (usually determined by Euclidean distance) are tracked down and x₀ is classified based on majority voting among the k neighbors. Vital parameters of kNN classifier are distance measures and number of nearest neighbors, k. Thus, k-fold cross validation (CV) is applied to determine the optimal k as shown below.

set.seed(100)
kfold=5    # k-fold cv, k=5
nn=30     # maximum number of nearest neighbors
y_train=factor(X_train$at)
nfeatures=length(featurenames)# function for searching the best k for kNN
kfold_knn_acc=function(data,label,kfold,nn,nfeatures) {
  idx_kfold=sample(1:kfold,size=nrow(data),replace=TRUE)
  # Matrix to store the cv accuracy
  acc=matrix(,nrow=5,ncol=(nn-1))
  for (i in 1:kfold) {
    data_train=data[,1:nfeatures][idx_kfold!=i,]
    data_val=data[,1:nfeatures][idx_kfold==i,]
    labels_train=label[idx_kfold!=i]
    labels_val=label[idx_kfold==i]
    for (j in 2:nn) {
      pred_knn=knn(data_train,data_val,labels_train,k=j)
      acc[i,j-1]=mean(pred_knn==labels_val)
    } 
  }
  acc
}
# There are 561 features in total
start_time=Sys.time()
acc=kfold_knn_acc(X_train,y_train,kfold,nn,nfeatures)
end_time=Sys.time()
(t_cv_knn=end_time-start_time)

The code snippets above requires about 45 minutes to an hour to run. Why it take so long? This is simply because of large number of training observations. The time taken to find nearest neighbors for a single datum is proportional to the number of training instance. In other word, processing k-fold holdout set is proportional to product of number of k-fold blocks as well as number of instances in training and test set. Next, visualize the average accuracies from each number of nearest neighbor.

mean_acc=apply(acc,2,mean)
std_acc=apply(acc, 2, sd)
plot_acc_knn=data.frame(nn=c(2:30),mean_acc,std_acc)ggplot(data=plot_acc_knn,aes(x=nn,y=mean_acc,ymin=mean_acc-std_acc,ymax=mean_acc+std_acc)) +
 geom_line() + geom_point() + geom_errorbar() +
 geom_vline(xintercept = which.max(mean_acc)+1,color=”red”,linetype=2) +
 xlab(“number of nearest neighbors”) +
 ylab(“accuracy”)

5-fold cross-validation to find the best number of nearest neighbors.

The optimal number of nearest neighbor, k obtained from cross validation is 5. By setting the hyperparameter k as 5, evaluate the kNN classifier performance.

# training performance
pred_knn=knn(X_train[,1:561],X_train[,1:561],y_train,k=5)
perform_metric(y_train,pred_knn,nclass)# test performance
y_test=factor(X_test$at)
pred_knn=knn(X_train[,1:561],X_test[,1:561],y_train,k=5)
perform_metric(y_test,pred_knn,nclass)

I am not showing the console output here (Please check it in RPubs). Instead, the results are tabulated as below. There are sign of overfitting in kNN classifier, manifested by discrepancy in accuracy and Cohen’s Kappa statistic between training and test phases.

Principal component analysis (PCA)

When faced with correlated features, PCA is a good alternative that is able to derive low dimensional set of features from large set of variables while preserving as much variation of data as possible, with the first principal component capturing most of the variation. The uncorrelated variables created is called principal components and are linear combination of original features.

pca=prcomp(X_train[,1:nfeatures])# variance explained
var_explained=pca$sdev²/sum(pca$sdev²)
plot_var_exp=data.frame(no_prin_comp=c(1:10),var=var_explained[1:10])# Scree plot
ggplot(data=plot_var_exp,aes(x=no_prin_comp,y=var)) +
 geom_line() + geom_point() +
 xlab(“Number of principal components”) +
 ylab(“Variance explained”)+
 ggtitle(“Scree Plot”)

PCA allows us to visualize the data in two-dimensional plot.

pca_plot=data.frame(pca$x[,c(1,2)],class=X_train$at)
ggplot(data=pca_plot,aes(x=PC1,y=PC2)) +
 geom_point(aes(color=factor(class)))

From the grouped scatterplot, it is obvious that there are two clusters: the left one contains ‘LAYING’, ‘SITTING’ and ‘STANDING’ classes while the right one contains ‘WALKING’, ‘WALKING DOWNSTAIRS’ and ‘WALKING_UPSTAIRS’ classes.

Next, select the number of principal components according to the cumulative percentages of variance explained as shown below.

# cumulative percentages of variance explained
cum_var=cumsum(var_explained)
nf=length(cum_var[cum_var<=0.95])
X_pca=data.frame(pca$x[,1:(nf+1)])
X_pca=cbind(X_pca,at=X_train$at)# Prepare test data
X_pca_test=predict(pca,newdata=X_test[,1:nfeatures])[,1:(nf+1)]
xx=as.data.frame(X_pca_test)
X_pca_test=cbind(xx,at=X_test$at)

‘pca$x’ is the score matrix of PCA. It is the 67ᵗʰ principal components that first exceeds the 95% total variance explained threshold, thus a total of 67 principal components are selected to fit LDA, QDA and kNN classifiers. The procedures on how to train and test the classifiers is similar to sections above, except the input arguments for syntaxes are replaced by principal component scores dataframe. The complete codes can be found on Github and RPubs.

Summary

The overall results of this demo is summarized as below.

Performances of models evaluated by test data.

Let’s interpret the results shown above:

It turns out that LDA fitted using original attributes achieves the best performance in terms of accuracy (96.16%) and Kappa statistic (0.9538). This performance result is similar to the research paper published in 2013. However, it should be noted that there are 2947 test observations in contrast to 2996 test samples in this demo, so both of them are not exactly comparable.
The performance of kNN classifiers are worse than LDA and QDA. This most probably attributed to the high dimensional feature space (561 predictor sets and 67 principal component scores). When kNN is performing classification of a query point in high dimensional space, nearest neighbors of the query point can be very far away in the training set, thus causing bias.
What do sensitivity and precision actually tell us? Let’s take the results of LDA model as tabulated above as example. We know that the sensitivity of class ‘SITTING’ is 0.876. This indicates that there are 87.6% of the time LDA will classify the instance correctly given that it truly belongs to class ‘SITTING’. The precision of class ‘STANDING’ is 0.8958. This implies that when LDA model predicts an unknown instance as ‘STANDING’, the probability that the decision is correct is 0.8958. The above scenario will only be true if the distribution of the future data is similar to the test data.
Extremely high precision and sensitivity for class ‘LAYING’ suggest that all classifiers fare well in distinguishing class ‘LAYING’ from others. The low precision for class ‘STANDING’ in general means whenever the classifier identify a future data point as ‘STANDING’, we are less confident of the decision as compared to occasion when the classifier yield an output ‘LAYING’.

Reference

Jorge-L. Reyes-Ortiz, Luca Oneto, Albert Samà, Xavier Parra, Davide Anguita. Transition-Aware Human Activity Recognition Using Smartphones. Neurocomputing. Springer 2015.