High-Level Features for Recognizing Human Actions in Daily Living Environments Using Wearable Sensors
Next Article in Journal
Deep Neural Networks on Mobile Healthcare Applications: Practical Recommendations
Previous Article in Journal
Ontology-Based Categorisation of Medical Texts for Health Professionals
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

High-Level Features for Recognizing Human Actions in Daily Living Environments Using Wearable Sensors †

by
Irvin Hussein López-Nava
* and
Angélica Muñoz-Meléndez
Department of Computer Science, Instituto Nacional de Astrofísica, Óptica y Electrónica, Tonantzintla 72840, Mexico
*
Author to whom correspondence should be addressed.
Presented at the 12th International Conference on Ubiquitous Computing and Ambient ‪Intelligence (UCAmI 2018), Punta Cana, Dominican Republic, 4–7 December 2018.
Proceedings 2018, 2(19), 1238; https://doi.org/10.3390/proceedings2191238
Published: 24 October 2018
(This article belongs to the Proceedings of UCAmI 2018)

Abstract

:
Action recognition is important for various applications, such as, ambient intelligence, smart devices, and healthcare. Automatic recognition of human actions in daily living environments, mainly using wearable sensors, is still an open research problem of the field of pervasive computing. This research focuses on extracting a set of features related to human motion, in particular the motion of the upper and lower limbs, in order to recognize actions in daily living environments, using time-series of joint orientation. Ten actions were performed by five test subjects in their homes: cooking, doing housework, eating, grooming, mouth care, ascending stairs, descending stairs, sitting, standing, and walking. The joint angles of the right upper limb and the left lower limb were estimated using information from five wearable inertial sensors placed on the back, right upper arm, right forearm, left thigh and left leg. The set features were used to build classifiers using three inference algorithms: Naive Bayes, K-Nearest Neighbours, and AdaBoost. The F- m e a s u r e average of classifying the ten actions of the three classifiers built by using the proposed set of features was 0.806 ( σ = 0.163).

1. Introduction

Action recognition is important for various applications, such as, ambient intelligence, smart devices, and healthcare [1,2]. There is also a growing demand and a sustained interest for developing technology able to tackle real-world application needs in such fields as ambient assisted living [3], security surveillance [4] and rehabilitation [5].
In effect, action recognition aims at providing information about behavior and intentions of users that enable computing systems to assist users proactively with their tasks [6]. Automatic recognition of human actions in daily living environments, mainly using wearable sensors, is still an open research problem of the field of pervasive computing [4]. However, there are number of reasons why human action recognition is a very challenging problem. Firstly, human body is non rigid and has many degrees of freedom [7]. For that, human body can perform infinite variations for every basic movement. Secondly, the intra and inter-variability in performing human actions are very high, i.e., a same action can be performed in many ways by the same person each time [8].
Most existing work on action recognition is built upon simplified structured environments, normally focusing on single-user single-action recognition. In real world situations, human actions are often performed in complex manners. That means that a person performs interleaved and concurrent actions and he/she can interact with other person(s) to perform joint actions, such as cooking [4].
Recently, various research studies have been done to analyze human actions based on wearable sensors [9]. A large number of these studies focus on identifying which are the most informative features that can be extracted from the actions data as well as in searching which are the most effective machine learning algorithms for classifying these actions [10]. Wearable sensors attached to human anatomical references, e.g., inertial and magnetic sensors (accelerometers, gyroscopes and magnetometers), vital sign processing devices (heart rate, temperature) and RFID tags, can be used to gather information about the behavioral patterns of a person.
Robustness to occlusion and to lighting variations, as well as portability are the major advantages of wearable sensors over visual motion-capture systems. Additionally, the visual motion-capture systems require very specific settings for properly operating [11]. When compared to approaches based on specialized systems, the wearable sensor-based approach is effective and also relatively inexpensive for data acquisition and action recognition for certain types of human actions, mainly human movements involving the upper and lower limb. Actions such as walking, running, sitting down/up, climbing or practising physical exercises, are generally characterized by a distinct, often periodic, motion pattern [4].
Another line of research is to find the most appropriate computational model to represent human action data. However, the robustness to model parameters of many existing human action recognition techniques are still quite limited [10]. In addition, the feature sets used for the classification do not describe how the actions were performed in terms of human motion.
This research focuses on extracting a set of features related to human motion, in particular the motion of the upper and lower limbs, in order to recognize actions in daily living environments, using time-series of joint orientation.
The rest of the paper is organized as follows. Section 2 addresses related work. Section 3 presents the experimental setup and the computational techniques for classifying human actions using a set of proposed features. The classification results and the major points of the comparison are presented in Section 4. Finally, Section 5 and Section 6 close with concluding remarks about the results, challenges and opportunities of this study.

2. Related Work

In recent years, several studies have been carried out for human action recognition in various contexts using mainly video [12,13], and information extraction technique, such as raw signals from inertial sensors [14,15] that are substantially different from the signals and techniques used in our research.
Regarding research work in which action classification was made using data of joint orientation estimated from inertial sensors data, in [16] three upper limb actions (eating, drinking and horizontal reaching) were classified using elbow flexion/extension angle, elbow position relative to shoulder and wrist position relative to shoulder. In the training stage, features are clustered using k-means algorithm, and a histogram is generated from the clustering as a template for each action used during the recognition stage by matching the templates. Two sensors were attached to the upper arm and forearm of four healthy subjects (aged from 23 to 40 years) and data were collected in a structured environment. This clustering-based classifier scored a F- m e a s u r e of 0.774.
In [17], six actions (agility cuts, walking, sprinting, jogging, box jumps and football free kicks) performed in an outdoor training environment were classified using the Discrete Wavelet Transform (DWT) in conjunction with a Random Forest inference algorithm. Flexion/extension of the knees was calculated from wearable inertial sensors attached to the thigh and shank of both, nine healthy and one injured subjects. A classification acuracy of 98.3% was achieved in the cited work, and kicking was the action with more instances confused with other actions.
Recently, in [18] nine everyday and fitness actions (lying, sitting, standing, walking, Nordic walking, running, cycling, ascending stairs, and descending stairs) were classified based on five time-domain and five frequency-domain features extracted from orientation signals of torso, shoulder joints and elbow joints, and using decision trees algorithm. Also, five sensors were placed on the upper body and one was attached to one shoe during indoor and outdoor recording sessions of a person. The overall performance of the classifier was 87.18%. Difficulties were encountered for classifying cycling and sitting actions.
In contrast to previous related work, our research focuses on classifying ten actions (cooking, doing housework, eating, grooming, mouth care, ascending stairs, descending stairs, sitting, standing, and walking) performed by five test subjects in their homes. The joint angles of the right upper limb and the left lower limb are estimated using information from five sensors placed on the back, right upper arm, right forearm, left thigh and left leg. A set of features related to human limb motions are extracted from the orientation signals and are used to build classifiers using three inference algorithms: Naive Bayes, K-Nearest Neighbours, and AdaBoost.

3. Methods

3.1. Setup

The sensors used in this research were LPMS-B (LP-research, Tokyo, Japan) miniature wearable inertial measurement units. Each one comprises three different sensors: a 3-axis gyroscope, a 3-axis accelerometer and a 3-axis magnetometer. The communication distance scope is of 10 m using Bluetooth interface, it has a lithium battery of 3.7 V at 800 mAh, and it weights 34 g.
The wearable inertial sensors were placed on five anatomical references of the body of test subjects as illustrated in Figure 1a. The first anatomical reference was located in the lower back 40 cm from the first thoracic vertebra measured in straight line, aligning the plane formed by the x-axis and y-axis of sensor S 1 with the coronal plane of the subject. Sensing device S 1 was settled over this anatomical reference using an orthopedic vest. The second anatomical reference was located 10 cm up to the right elbow, in the lateral side of the right upper arm, aligning the plane formed by the x-axis and y-axis of S 2 with the sagittal plane of the subject. The third anatomical reference was located 10 cm up to the right wrist, in the posterior side of the forearm, aligning the plane formed by the x-axis and y-axis of S 3 with the coronal plane of the subject. The fourth anatomical reference was located 20 cm up to the right knee, in the lateral side of the right thigh, aligning the plane formed by the x-axis and y-axis of S 4 with the sagittal plane of the subject. The fifth anatomical reference was located 20 cm up to the right malleolus, in the lateral side of the shank, aligning the plane formed by the x-axis and y-axis of S 5 with the sagittal plane of the subject. Sensors S 2 , S 3 , S 4 and S 5 were firmly attached to the anatomical references using elastic velcro straps. The configuration and coordinated systems of the sensors are shown in Figure 1a.
Two wearable RGB cameras were worn by the subjects for recording video that can be used for segmenting and labeling the performed actions and afterwards, to validate the recognition. One of these devices was a camera GoPro Hero 4, C 1 (GoPro Inc., San Mateo, CA, USA) which was carried on the front of the vest used for sensor S 1 . The other device was a Google Glass, C 2 (Google Inc., Menlo Park, CA, USA) which was worn as typical lenses. The weight of the GoPro Hero 4 is 83 g and the weight of the Google Glass is 36 g. The first camera stores video streaming using an external memory card whereas the camera of the Google Glass stores the video streaming using an internal memory of 2 GB. Both cameras were used for recording video streaming at 720p and 30 Hz.
Five healthy young adults (mean age 26.2 ± 4.4 years) were asked to wear the wearable sensors. All subjects declared to be right-handed and have not any mobility impairment in their limbs. They were not asked to carry any particular cloth, just wearing the Google glass, the GoPro camera, and the five sensing devices. Additionally, the subjects signed an informed consent in which they agreed that their data can be used for this research.

3.2. Environment

The test subjects were asked to perform the trials in their homes. Additionally, the subjects were asked not to take the sensing devices close to fire nor to wet them. The trials were performed in the mornings after the subject had taken a shower, a common action of all subjects at the beginning of the day, and before to leave home.
For each trial, sensors S 1 , S 2 , S 3 , S 4 and S 5 were placed on the body of each subject, according to the description given in Section 3.1. During each trial the subjects were asked to perform their daily activities at will, i.e., they did not perform any specific action in any established order.
To start and finish a new trial the subjects had to remain in an anatomical position similar to that shown in Figure 1b. Each subject must complete 3 trials in 3 different days. Additionally, to synchronize the data obtained by the sensing devices and the two cameras, the test subjects were asked to perform a control movement 3 s after the beginning of each test. This movement consisted of performing a fast movement of the right upper limb in front of the field of view of the cameras and it was used in a further process of segmentation. The wearable inertial sensors captured data at 100 Hz.
Figure 1 shows the sensing devices and the cameras worn by a test subject in his daily living environment. The sensing devices S 2 and S 3 were attached directly to the skin of test subject usign two elastic bands, while sensing devices S 1 , S 4 and S 5 were firmly attached to the clothing of the subject using an orthopedic vest and elastic bands so that the movement of the clothes did not modify the initial position of the sensors.
In Figure 2 an example of each action performed by a test subject during a trial at his home is shown. During a pilot test it was observed that the field of view of the Google Glass was too narrow and too short for recording the grasping actions performed by the subjects over a table, so it was decided to place the GoPro camera at chest height to complement information captured on video streaming. As an example, in the top snapshots of Figure 2a–c the action performed by the subject can not be distinguished. The actions that are performed on the head of the subject can not be properly distinguished using the camera placed on the subject’s chest, see Figure 2d,e.
The duration of the trials in daily living environments of the five test subjects ranged from 17 min to 36 min (mean of 23 min ± 6 min). The actions performed for each trial were segmented and labeled manually according to the recorded video. The number of correctly segmented/labeled actions was 90 (46% of the total of captured actions), which were used as instances for the following analysis. The distribution of instances according to each action/class is presented in Table 1.
The action with more instances is `walking’, since it is the intermediate action between the rest of actions. Conversely, the actions with less instances are `ascending stairs’ and `descending stairs’ because the house of a test subject has only a floor, while another subject did not use his stairs in any of the three trials.

3.3. Feature Extraction

A set of joint angles L is estimated based on kinematic models of upper and lower limbs [19]. Each orientation l depends on the degrees of freedom of the joint, L = { S H l , E L l , H P l , K N l } , that correspond respectively to shoulder, elbow, hip, and knee joints. The human body is composed of bones linked by joints forming the skeleton and covered by soft tissue, such as muscles [20]. If the bones are considered as rigid segments, it is possible to assume that the body is divided into regions or anatomical segments, and so the motion among these segments can be described by the same methods used for manipulator kinematics.
The degrees of freedom modeled for the right upper limb are S H l = { l f / e , l a / a , l i / e } and E L l = { l f / e , l p / s } , where f / e , a / a , i / e and p / s are flexion/extension, abduction/adduction, internal/external rotation, and pronation/supination, respectively. Similarly, the degrees of freedom modeled for the opposite lower limb are H P l = { l f / e , l a / a , l i / e } and K N l = { l f / e , l i / e } . The description of human motion can be explained by the movements among the segments and the range of motion of each joint and are illustrated in Figure 3.
Thus far, orientation signals are used as recordings of human movement in general. Such recordings, in the form of time-series, may be captured during tests of movement in controlled environments, such as human gait tests for clinical purposes, recordings as well can be captured during experiments in uncontrolled environments, such as monitoring activities in the homes of test subjects. This research is mainly concerned with the second type of experiments.
This work involves capturing data from the subjects during long periods of time, in which the subjects perform several actions. Therefore, it is necessary to segment recorded data according to the actions of interest for the study. Each data segment w i = ( t 1 , t 2 ) is defined by its start time t 1 and end time t 2 within the time series. The segmentation step yields a set of segments W = w 1 , . . . , w m , in which each segment w i contains an activity y i .
Segmenting a continuous sensor recording is a difficult task. People perform daily activities fluently, which means that the actions can be paused, interleaved or even more, can be concurrent [21]. Thereby, in the absence of sharp boundaries, actions can be easily confounded. However, the exact boundaries of an action are often difficult to define. An action of mouth care, for instance, might start when the subject reaches the toothbrush or it might start when he/she initiates brushing teeth; in the same way, the action might end when the subject finishes rinsing his/her mouth or even when the toothbrush is left. For that, a protocol was defined for segmenting and labeling the time-series L. This protocol depends of the video recording captured during the tests in the daily living environments of the subjects.
The feature extraction reduces the time-series W into features that must be discriminative for the actions at hand. Features are extracted from features’ vectors h l f X i on the segments W, with F as the feature extraction function expressed in Formula (1).
h l f X i = F ( L , w i )
The features corresponding to the same action should be clustered in the feature space, while the features corresponding to different actions should be separated. At the same time, the selected type of features need to be robust across different people as well as to intraclass variability of an action.
The selection of features requires previous knowledge about the type of actions to recognize, e.g., to discriminate the walking action from resting action, it may be sufficient to select as feature the energy of the acceleration signals, whereas the very same feature would not be enough to discriminate walking action from ascending stairs action. In the filed of activity recognition, several features have been used to discriminate different number of actions [22,23,24].
Even though signal-based (time-domain and frequency-domain) features have been widely used in the activity recognition field, these features lack a description of the type of movement that people perform. One of the advantages of using joint angles is that the time-series can be characterized as terms of movements. This representation based on anatomical term of movements can be used not only for classification purposes, it can also be used for describing how people perform actions. From now on, features extracted from joint angles are referred to as high-level features or HLF.
The anatomical term of movements varies according to the degree of freedom modeled for each joint as summarized below:
  • Shoulder joint: flexion/extension, abduction/adduction, and internal/external rotation.
  • Elbow joint: flexion/extension, pronation/supination.
  • Hip joint: flexion/extension, abduction/adduction, and internal/external rotation.
  • Knee joint: flexion/extension, and internal/external rotation.
The first stage for extracting high level features h l f X i from orientation time-series ( L , w i ) is searching tendencies in the signals that are related to the anatomical terms of movements. Three tendencies have been defined and used as templates to find matches along time-series, as shown in Figure 4. The first one is ascending (Figure 4a) during a time given by t e m p a s c , and is used for searching movements of flexion, abduction, internal rotation and pronation. The second one is descending (Figure 4b) during a time given by t e m p d e s c , and it is used for searching movements of extension, adduction, external rotation and supination. The third one (Figure 4c) during a time given by t e m p n e u , is related to resting lapses or to readings with a very small arc of movement.
The three templates depend on the associated values to conform the signals t e m p a s c , t e m p d e s c , and  t e m p n e u . Two approaches are explored based on the extraction of such values: static and dynamic. The static approach consists of using an a priori value based on prior knowledge of the performed actions, whereas the dynamic approach searches tendencies in function of the magnitude of the movements made during the tests.
The complete description of the high level features extraction is detailed in Algorithm 1.
Dynamic time warping algorithm, D T W , is used to match templates to the time-series [25]. This algorithm is able to compare samples of different length in order to find the optimal match between them and each template. Finally, the HLF extracted from each signal of ( L , w i ) are used together to conform the h l f X i vector.
Algorithm 1:Summary of the high-level features extraction
Inputs:
     ( L , w i ) : segmented time-series comprising { S H l , E L l , H P l , K N l } .
     w i n d o w : window size for searching high-level features.
     m o d e : type of high-level features searching { s t a t i c , d y n a m i c } .
     m a g n i t u d e : magnitude value for templates { 15 , 30 , 45 } for static mode, and { f u l l , h a l f } for dynamic mode.
Output:
     h l f X i : high level features vector.
Notation:
S H , E L , H P , and K N refer to the human joints: shoulder, elbow, hip and knee. f l e , e x t , a b d , a d d ,
i _ r o t , e _ r o t , p r o , s u p are the term of movements of flexion, extension, abduction,
adduction, internal rotation, external rotation, pronation and supination, respectively. d i s t is
the distance calculated by the D T W algorithm. n u m is the set of high-level features of joints.
Start
Initialization
Proceedings 02 01238 i001
Proceedings 02 01238 i002

3.4. Action Classification

The classification is divided into two stages: training and testing. In supervised learning, classification is the process of identifying to which class from a set of classes a new instance belongs (testing), on the basis of a training set of data containing instances whose class is known (training).
Training is performed using training data T = { ( X i , y i ) i = 1 n } with n pairs of feature vectors X i and corresponding ground truth labels y i . A model is built based on patterns found in training data T using supervised inference methods, I , before to be used in testing stage. Model parameters λ must be learned to minimize the classification error on T if a parametric algorithm has been used for building models. In contrast, nonparametric algorithms take as parameter the labeled training data λ = T without further training.
Testing is performed using a trained model with parameters λ , mapping each new feature vector χ i to a set of class labels Y = y 1 , , y c with corresponding scores P i = p i 1 , , p i c , as defined in Formula (2).
p i ( y | χ i , λ ) = I ( χ i , λ ) , f o r   e a c h y Y
with the inference method I . Then, the calculated scores P i is used to obtain the maximum score and select the corresponding class label y i as the classification output expressed by Formula (3).
y i = argmax y Y , p P i p i ( y | χ i , λ )
Three inference algorithms were selected because they are appropriate to deal with problems involving unbalanced data [26,27]. Naive Bayes (NB) classifier is a classification method founded on the Bayes theorem based on the estimated conditional probabilities [28]. The input attributes are assumed to be independet of each other given the class. This assumption is called conditional independence. The NB method involves a learning step in which the probability of each class and the probabilities of the attributes given the class are estimated, based on their frequencies over the training data. The set of these estimates corresponds to the learned model. Then, new instances are classified maximizing the function of the learned model [29].
K-Nearest Neighbors (KNN) is a non-parametric method that does not need any modeling or explicit training phase before the classification process [30]. To classify a new instance, KNN algorithm calculates the distances between the new instance and all training points. A new instance is assigned to the most common class according a majority vote of its k-nearest neighbors. The KNN algorithm is sensitive to the local structure of data. The selection of the parameter k, the number of considered neighbors, is a very important issue that can affect the decision made by the KNN classifier.
Boosting produces an accurate prediction rule by combining rough and moderately inaccurate rules [31]. The purpose of boosting is to sequentially apply a weak classification algorithm to repeatedly modified versions of the data, and the predictions from all of them combined through a weighted majority vote to produce the final prediction [32]. AdaBoost (AB) is a type of adaptive boosting algorithm that incrementally trains a base classifier by suitably increasing the pattern weights to favour the missclassified data [33]. Initially all of the weights are set equally, so that the first step trains the classifier on the data in the usual manner. For each successive iteration, the instance weights are individually modified and the classification algorithm is reapplied. Those instances that were misclassified at the previous step have their weights increased, whereas the weights are decreased for those that were classified correctly. Thus as iterations proceed, each successive classifier is forced to concentrate on those training instances that are missed by previous ones in the sequence.

4. Results

With the aim of recognizing the actions listed in Table 1, the high-level features were extracted from the orientation signals ( L , w i ) according to the method detailed in Section 3.3. Then, three classifiers were built using (1) features based on the static approach, (2) features based on the dynamic approach, and (3) features based on both static and dynamic approaches; and the cross product of the inference algorithms described in Section 3.4. For evaluating the performance of each classifier, three standard metrics were calculated: sensitivity, specificity and F-measure.
Sensitivity is the True Positive Rate T P R , also called R e c a l l , and measures the proportion of positive instances which are correctly identified as defined in Formula (4). Specificity is the True Negative Rate T N R and measures the proportion of negative instances which are correctly identified, see Formula (5). F-measure or F 1 is the harmonic average of the precision and recall as indicated in Formula (6).
T P R = T r u e p o s i t i v e s T r u e p o s i t i v e s + F a l s e n e g a t i v e s
T N R = T r u e n e g a t i v e s T r u e n e g a t i v e s + F a l s e p o s i t i v e s
F 1 = 2 · P r e c i s i o n · R e c a l l P r e c i s i o n + R e c a l l
where T r u e p o s i t i v e s are the instances correctly identified, F a l s e n e g a t i v e s are the instances incorrectly rejected, T r u e n e g a t i v e s are the instances correctly rejected, F a l s e p o s i t i v e s are the instances incorrectly identified, and P r e c i s i o n is the positive predictive value as indicated in Formula (7).
P r e c i s i o n = T r u e p o s i t i v e s T r u e p o s i t i v e s + F a l s e p o s i t i v e s
k-fold cross validation was used for partitioning datasets, with k = 5 . Thereby, features datasets were randomly partitioned into k equal sized subsamples. Of the k subsamples, a subsample was used for testing the classifier, and the remaining k 1 subsamples were used for training the classifier. The cross validation process is then repeated k times, with each of the k subsamples used once as testing subsample. The k results from the folds were averaged to score a single estimation.
In Table 2, Table 3 and Table 4, the classification results of classifiers built using static approach, dynamic approach, and both static and dynamic approaches, are shown, respectively. In general, the K N N classifiers scored the best results and A B classifiers scored the worst results.
From Table 2, the actions with the highest rate of instances correctly classified were Standing and Eating, while Ascending stairs was the action with the worst rate of true positives. For the classification using dynamic H L F summarized in Table 3, the actions with the best rate of instances correctly classified were Eating and Cooking, while Sitting was the action with the worst rate of true positives. Combining the static and dynamic H L F , three actions scored a T P R average greater than 0.9: Standing, Walking and Cooking, while Ascending stairs was again the action with the worst rate of true positives. These values are consistent for T N R and F-measure metrics too.
To analyze in detail the actions classified by the combined approach summarized in Table 4, the confusion matrices for each classifier were obtained and are shown in Table 5, Table 6 and Table 7. In general, the instances missclassified in the three confusion matrices are related to the actions involving mainly movements of the upper limbs: Cooking, Eating, Doing housework, Grooming and Mouth Care, as well as the actions involving mainly movements of the lower limbs: Ascending and Descending stairs, Sitting, Standing and Walking; with the exception of three instances of Doing housework and Eating actions, which were missclassified as Walking by the classifier built using AdaBoost algorithm.
In particular, AdaBoost classifier correctly classified all instances of Walking, and the largest number of instances of actions that involve mostly the lower limbs, only confused three instances between pairs of similar actions: Ascending stairs-descending stairs, and Sitting-Standing. However, six of eigth instances of the Mouth Care action were incorrectly classified.
The classifiers built using Naive Bayes and K-Nearest Neighbors correctly classified most of the instances of Cooking, Doing housework and Eating actions. However, they classified incorrectly half of instances of Ascending and Descending Stairs as Walking. Additionally, the KNN classifier confused most instances of Sitting as Standing.

5. Discussion

The set of high-level features proposed in the present study allows the classification of the actions of interest with a sensitivity close to 0.800. The most important aspect to highlight is that although there were some missclassified instances by the three classifiers, most of these instances were confused with similar actions, which reflects the consistency between the features and the joint signals that were used.
In order to compare the set of proposed features, two other types of signal-based features were extracted from the orientation signals of the performed experiment. This set of features is subdivided into temporal-domain and frequency-domain features. Time-domain features are: arithmetic mean, standard deviation, range of motion, zero crossing rate, and root mean square. For their part, frequency-domain features are extracted from the power spectral density of the signal in frequencies 0–2 Hz, 2–4 Hz, and 4–6 Hz [34]. All features were extracted for W segmented time-series, as expressed by Formula (1).
The classification results for comparing the three type of features are summarized in Table 8. From this Table, the results scored by the classifiers built using the three type of features are close, although the results using frequency-domain features are the worst ones. Conversely, the best result was scored by the classifier built using time-domain features and using the K-Nearest Neighbours algorithm.
As can be noticed, there is not a significant difference between the values of the calculated classification metric. Even though the classification average is higher using the time-domain features, in general, the dispersion of scores using the high-level features is smaller than the dispersion of scores using the time-domain features.
Finally, with respect to the related work, the F-measure 0.806 obtained in this study using the data captured in daily living environments, is greater that the F-measure 0.774 reported in [16] using data captured in a structured environment, whose scores are the closest ones to our results reported in the literature to the best of our knowledge. According to the number of test subjects, in [18] the authors obtained an 87.18% of correct classification, in a study in which participated only one person. Regarding the types of actions, in [17] a 98.3% of correct classification was reported from a study involving six actions with high variability among them, the studied actions were training exercises, in contrast to our study in which the actions were daily living actions. Finally, in contrast to the related work in which the data used to classify were based on time-domain or frequency-domain features, the proposed set of features also describe how the movement of the limbs was performed by the test subjects.

6. Conclusions

In the present research a set of so-called high-level features to analyse human motion data captured by wearable inertial sensors is proposed. HLF extract motion tendencies from windows of variable size and can be easily calculated. Then, our set of high-level features was used for the recognition of a set of actions performed by five test subjects in their daily environments, and their discriminant capability under different conditions was analysed and contrasted. The F- m e a s u r e average of classifying the ten actions of the three classifiers built by using the proposed set of features was 0.806 ( σ = 0.163).
This study enabled us to expand the knowledge about wearable technologies operating under real situations during realistic periods of time in daily living environments. Wearable technologies can also complement the monitoring of human actions in smart environments or domestic settings, in which information is collected from multiple environmental sensors as well as video camera recordings [35,36]. In the near future, we consider to analyze more in detail user acceptance issues, namely wearable and comfort limitations.
In order to carry out an in-depth analysis on the feasibility of using high-level features for the classification of human actions, other databases must be used for evaluating the set of features proposed in this research, including databases obtained from video systems, thus the performance of the classifiers using this set can be evaluated.
Similarly, new high-level metrics can be included to those currently considered, e.g., the duration of each movement term or the prededence between them. Also, a full set of the three type of features presented in this study: high-level features, frequency-domain features and time-domain features, will be evaluated in the following classification studies.
The studied actions might also be extended with new actions that were observed during the present study, lasting a longer duration than the durations considered for this research, as well as outdoor actions. In the same way, the detection of the set of actions performed in daily living environments in real time, including `resting’ and `unrecognized’ actions, in the case of those body movements that are distinct of the actions of interest.

Acknowledgments

The first author is supported by the Mexican National Council for Science and Technology, CONACYT, under the grant number 271539/224405.

References

  1. Patel, S.; Hughes, R.; Hester, T.; Stein, J.; Akay, M.; Dy, J.G.; Bonato, P. A novel approach to monitor rehabilitation outcomes in stroke survivors using wearable technology. Proc. IEEE 2010, 98, 450–461. [Google Scholar] [CrossRef]
  2. Chen, L.; Hoey, J.; Nugent, C.; Cook, D.; Yu, Z. Sensor-Based Activity Recognition. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2012, 42, 790–808. [Google Scholar] [CrossRef]
  3. Cicirelli, F.; Fortino, G.; Giordano, A.; Guerrieri, A.; Spezzano, G.; Vinci, A. On the Design of Smart Homes: A Framework for Activity Recognition in Home Environment. J. Med. Syst. 2016, 40, 200. [Google Scholar] [CrossRef]
  4. Chen, L.; Khalil, I. Activity recognition: Approaches, practices and trends. In Activity Recognition in Pervasive Intelligent Environments; Springer: Berlin, Germany, 2011; pp. 1–31. [Google Scholar]
  5. Patel, S.; Park, H.; Bonato, P.; Chan, L.; Rodgers, M. A review of wearable sensors and systems with application in rehabilitation. J. Neuroeng. Rehabil. 2012, 9, 21. [Google Scholar] [CrossRef] [PubMed]
  6. Bulling, A.; Blanke, U.; Tan, D.; Rekimoto, J.; Abowd, G. Introduction to the Special Issue on Activity Recognition for Interaction. ACM Trans. Interact. Intell. Syst. 2015, 4, 16e:1–16e:3. [Google Scholar] [CrossRef]
  7. Sempena, S.; Maulidevi, N.; Aryan, P. Human action recognition using Dynamic Time Warping. In Proceedings of the International Conference on Electrical Engineering and Informatics, Bandung, Indonesia, 17–19 July 2011; pp. 1–5. [Google Scholar]
  8. López-Nava, I.H.; Arnrich, B.; Muñoz-Meléndez, A.; Güneysu, A. Variability Analysis of Therapeutic Movements using Wearable Inertial Sensors. J. Med. Syst. 2017, 41, 7. [Google Scholar] [CrossRef]
  9. López-Nava, I.H.; Muñoz-Meléndez, A. Wearable Inertial Sensors for Human Motion Analysis: A Review. IEEE Sens. J. 2016, 16, 7821–7834. [Google Scholar] [CrossRef]
  10. Zhang, S.; Xiao, K.; Zhang, Q.; Zhang, H.; Liu, Y. Improved extended Kalman fusion method for upper limb motion estimation with inertial sensors. In Proceedings of the 4th International Conference on Intelligent Control and Information Processing, Beijing, China, 9–11 June 2013; pp. 587–593. [Google Scholar]
  11. Altun, K.; Barshan, B.; Tunçel, O. Comparative study on classifying human activities with miniature inertial and magnetic sensors. Pattern Recognit. 2010, 43, 3605–3620. [Google Scholar] [CrossRef]
  12. Jalal, A.; Kim, Y.H.; Kim, Y.J.; Kamal, S.; Kim, D. Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recognit. 2017, 61, 295–308. [Google Scholar] [CrossRef]
  13. Lillo, I.; Niebles, J.C.; Soto, A. Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos. Image Vis. Comput. 2017, 59, 63–75. [Google Scholar] [CrossRef]
  14. Noor, M.H.M.; Salcic, Z.; Kevin, I.; Wang, K. Adaptive sliding window segmentation for physical activity recognition using a single tri-axial accelerometer. Pervasive Mob. Comput. 2017, 38, 41–59. [Google Scholar] [CrossRef]
  15. Wannenburg, J.; Malekian, R. Physical activity recognition from smartphone accelerometer data for user context awareness sensing. IEEE Trans. Syst. Man Cybern. Syst. 2017, 47, 3142–3149. [Google Scholar] [CrossRef]
  16. Wang, X.; Suvorova, S.; Vaithianathan, T.; Leckie, C. Using trajectory features for upper limb action recognition. In Proceedings of the IEEE 9th International Conference on Intelligent Sensors, Sensor Networks and Information Processing, Singapore, 21–24 April 2014; pp. 1–6. [Google Scholar]
  17. Ahmadi, A.; Mitchell, E.; Destelle, F.; Gowing, M.; OConnor, N.E.; Richter, C.; Moran, K. Automatic Activity Classification and Movement Assessment During a Sports Training Session Using Wearable Inertial Sensors. In Proceedings of the 11th International Conference on Wearable and Implantable Body Sensor Networks, Zurich, Switzerland, 16–19 June 2014; pp. 98–103. [Google Scholar]
  18. Reiss, A.; Hendeby, G.; Bleser, G.; Stricker, D. Activity Recognition Using Biomechanical Model Based Pose Estimation. In 5th European Conference on Smart Sensing and Context; Springer: Berlin, Germany, 2010; pp. 42–55. [Google Scholar]
  19. López-Nava, I.H. Complex Action Recognition from Human Motion Tracking Using Wearable Sensors. PhD Thesis, Computer Science Department, Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla, Mexico, 2018. [Google Scholar]
  20. Marieb, E.N.; Hoehn, K. Human Anatomy & Physiology; Pearson Education: London, UK, 2007. [Google Scholar]
  21. Gu, T.; Wu, Z.; Tao, X.; Pung, H.K.; Lu, J. An emerging patterns based approach to sequential, interleaved and concurrent activity recognition. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications, Galveston, TX, USA, 9–13 March 2009; pp. 1–9. [Google Scholar]
  22. Attal, F.; Mohammed, S.; Dedabrishvili, M.; Chamroukhi, F.; Oukhellou, L.; Amirat, Y. Physical human activity recognition using wearable sensors. Sensors 2015, 15, 31314–31338. [Google Scholar] [CrossRef] [PubMed]
  23. Lara, O.D.; Labrador, M.A. A survey on human activity recognition using wearable sensors. IEEE Commun. Surv. Tutor. 2013, 15, 1192–1209. [Google Scholar] [CrossRef]
  24. Banaee, H.; Ahmed, M.U.; Loutfi, A. Data mining for wearable sensors in health monitoring systems: A review of recent trends and challenges. Sensors 2013, 13, 17472–17500. [Google Scholar] [CrossRef]
  25. Müller, M. Dynamic time warping. In Information Retrieval for Music and Motion; Springer: Berlin/Heidelberg, Germany, 2007; pp. 69–84. [Google Scholar]
  26. Sun, Y.; Wong, A.K.; Kamel, M.S. Classification of imbalanced data: A review. Int. J. Pattern Recognit. Artif. Intell. 2009, 23, 687–719. [Google Scholar] [CrossRef]
  27. Liu, X.Y.; Wu, J.; Zhou, Z.H. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B 2009, 39, 539–550. [Google Scholar]
  28. Friedman, N.; Geiger, D.; Goldszmidt, M. Bayesian network classifiers. Mach. Learn. 1997, 29, 131–163. [Google Scholar] [CrossRef]
  29. Mitchell, T.M. Machine Learning, 1st ed.; McGraw-Hill, Inc.: New York, NY, USA, 1997. [Google Scholar]
  30. Keller, J.M.; Gray, M.R.; Givens, J.A. A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 1985, 4, 580–585. [Google Scholar] [CrossRef]
  31. Freund, Y.; Schapire, R.E. A desicion-theoretic generalization of on-line learning and an application to boosting. In European Conference on Computational Learning Theory; Springer: Berlin, Germany, 1995; pp. 23–37. [Google Scholar]
  32. Friedman, J.; Hastie, T.; Tibshirani, R. The Elements of Statistical Learning; Springer: Berlin, Germany, 2001; Volume 1. [Google Scholar]
  33. Preece, S.J.; Goulermas, J.Y.; Kenney, L.P.; Howard, D.; Meijer, K.; Crompton, R. Activity identification using body-mounted sensors—A review of classification techniques. Physiol. Meas. 2009, 30, R1–R33. [Google Scholar] [CrossRef]
  34. López-Nava, I.H.; Muñoz-Meléndez, A. Complex human action recognition on daily living environments using wearable inertial sensors. In Proceedings of the 10th EAI International Conference on Pervasive Computing Technologies for Healthcare, Cancun, Mexico, 16–19 May 2016; pp. 138–145. [Google Scholar]
  35. Zhu, N.; Diethe, T.; Camplani, M.; Tao, L.; Burrows, A.; Twomey, N.; Kaleshi, D.; Mirmehdi, M.; Flach, P.; Craddock, I. Bridging e-Health and the Internet of Things: The SPHERE Project. IEEE Intell. Syst. 2015, 30, 39–46. [Google Scholar] [CrossRef]
  36. Tunca, C.; Alemdar, H.; Ertan, H.; Incel, O.D.; Ersoy, C. Multimodal Wireless Sensor Network-Based Ambient Assisted Living in Real Homes with Multiple Residents. Sensors 2014, 14, 9692–9719. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Configuration of the wearable sensors placed on the right upper limb ( S 2 and S 3 ), the left lower limb ( S 4 and S 5 ) and the lower back ( S 1 ) of a test subject. Two cameras were carried, one in the front side of the vest, C 1 , and one in the lenses, C 2 , for labeling purposes.
Figure 1. Configuration of the wearable sensors placed on the right upper limb ( S 2 and S 3 ), the left lower limb ( S 4 and S 5 ) and the lower back ( S 1 ) of a test subject. Two cameras were carried, one in the front side of the vest, C 1 , and one in the lenses, C 2 , for labeling purposes.
Proceedings 02 01238 g001
Figure 2. Actions performed by a subject in his daily living environment while his movements were captured by the wearable inertial sensors and the cameras recorded video. The top image of each Subfigure was captured by the Google Glass, and the bottom image was captured by the GoPro Hero 4 carried on the vest.
Figure 2. Actions performed by a subject in his daily living environment while his movements were captured by the wearable inertial sensors and the cameras recorded video. The top image of each Subfigure was captured by the Google Glass, and the bottom image was captured by the GoPro Hero 4 carried on the vest.
Proceedings 02 01238 g002
Figure 3. Degrees of freedom of upper and lower limbs considered in this study: 3 for shoulder joint (a,b), 3 for hip joint (a,b), 2 for elbow joint (c,d), and 2 for knee joint (c,d).
Figure 3. Degrees of freedom of upper and lower limbs considered in this study: 3 for shoulder joint (a,b), 3 for hip joint (a,b), 2 for elbow joint (c,d), and 2 for knee joint (c,d).
Proceedings 02 01238 g003
Figure 4. Templates for searching tendencies in time-series W. Ascending template (a) corresponds to terms of movements: flexion, abduction, internal rotation and pronation. Descending template (b) corresponds to movements of extension, adduction, external rotation and supination. And neutral template (c) matches the cases without a clear tendency.
Figure 4. Templates for searching tendencies in time-series W. Ascending template (a) corresponds to terms of movements: flexion, abduction, internal rotation and pronation. Descending template (b) corresponds to movements of extension, adduction, external rotation and supination. And neutral template (c) matches the cases without a clear tendency.
Proceedings 02 01238 g004
Table 1. Distribution of instances according to the actions performed by five subjects in their daily living environments.
Table 1. Distribution of instances according to the actions performed by five subjects in their daily living environments.
Action# of InstancesAction# of Instances
Cooking7Ascending stairs3
Doing housework11Descending stairs4
Eating10sitting7
Grooming8Standing9
Mouth care8Walking23
Table 2. Classification results for each action using High-level features extracted with the static approach. KNN: K-nearest neighbours, NB: Naive Bayes, and AB: AdaBoost inference algorithms.
Table 2. Classification results for each action using High-level features extracted with the static approach. KNN: K-nearest neighbours, NB: Naive Bayes, and AB: AdaBoost inference algorithms.
NBKNNAB
Action TPR TNR F 1 TPR TNR F 1 TPR TNR F 1
Cooking0.8570.9880.8570.8571.0000.9230.8570.9520.706
Doing housework0.9090.9870.9090.9090.9620.8330.7271.0000.842
Eating1.0001.0001.0001.0001.0001.0000.7000.9620.700
Grooming0.7500.9760.7500.6250.9760.6670.7500.9880.800
Mouth care0.7500.9760.7501.0001.0001.0000.5000.9760.571
Ascending stairs0.3330.8740.1330.3331.0000.5000.6671.0000.800
Descending stairs0.7500.9530.5450.5000.9880.5711.0000.9880.889
Sitting0.8571.0000.9230.5710.9640.5710.8571.0000.923
Standing1.0001.0001.0000.8890.9630.8001.0000.9880.947
Walking0.4350.9700.5710.9130.9550.8941.0000.9550.939
Table 3. Classification results for each action using High-level features extracted with the dynamic approach. KNN: K-nearest neighbours, NB: Naive Bayes, and AB: AdaBoost inference algorithms.
Table 3. Classification results for each action using High-level features extracted with the dynamic approach. KNN: K-nearest neighbours, NB: Naive Bayes, and AB: AdaBoost inference algorithms.
NBKNNAB
Action TPR TNR F 1 TPR TNR F 1 TPR TNR F 1
Cooking0.8571.0000.9231.0001.0001.0000.5710.9760.615
Doing housework0.8180.9620.7830.7270.9240.6400.7270.9490.696
Eating1.0001.0001.0001.0001.0001.0000.8000.9620.762
Grooming0.6250.9510.5880.6250.9630.6250.5000.9510.500
Mouth care0.8750.9760.8240.5001.0000.6670.6250.9630.625
Ascending stairs0.3331.0000.5000.6670.9540.4440.6671.0000.800
Descending stairs0.5001.0000.6670.5000.9770.5000.7500.9880.750
Sitting0.4290.9640.4620.4290.9520.4290.2860.9280.267
Standing0.6670.9380.6000.7780.9140.6090.3330.9380.353
Walking0.9130.9550.8940.5220.9400.6150.9130.9700.913
Table 4. Classification results for each action using High-level features extracted with both static and dynamic approaches. KNN: K-nearest neighbours, NB: Naive Bayes, and AB: AdaBoost inference algorithms.
Table 4. Classification results for each action using High-level features extracted with both static and dynamic approaches. KNN: K-nearest neighbours, NB: Naive Bayes, and AB: AdaBoost inference algorithms.
NBKNNAB
Action TPR TNR F 1 TPR TNR F 1 TPR TNR F 1
Cooking0.8570.9880.8571.0001.0001.0000.8570.9640.750
Doing housework0.9090.9750.8701.0000.9750.9170.7271.0000.842
Eating1.0001.0001.0001.0001.0001.0000.6000.9620.632
Grooming0.6250.9760.6670.7500.9760.7500.6250.9510.588
Mouth care0.7500.9760.7500.7501.0000.8570.2500.9510.286
Ascending stairs0.3330.9770.3330.3331.0000.5000.6670.9890.667
Descending stairs0.7500.9880.7500.5000.9880.5710.7500.9880.750
Sitting0.7140.9880.7690.4290.9640.4620.8571.0000.923
Standing0.8890.9880.8891.0000.9510.8181.0000.9880.947
Walking0.8700.9400.8510.8700.9550.8701.0000.9550.939
Table 5. Confusion matrix of classification results of Table 4 using Naive Bayes algorithm.
Table 5. Confusion matrix of classification results of Table 4 using Naive Bayes algorithm.
Classified as
Actualact01act02act03act04act05act06act07act08act09act10
act01 Cooking6000100000
act02 Doing housework01001000000
act03 Eating00100000000
act04 Grooming0205100000
act05 Mouth care1001600000
act06 Ascending stairs0000010002
act07 Descending stairs0000003001
act08 Sitting0000000511
act09 Standing0000000180
act10 Walking00000210020
Table 6. Confusion matrix of classification results of Table 4 using K-Nearest Neighbours algorithm.
Table 6. Confusion matrix of classification results of Table 4 using K-Nearest Neighbours algorithm.
Classified as
Actualact01act02act03act04act05act06act07act08act09act10
act01 Cooking7000000000
act02 Doing housework01100000000
act03 Eating00100000000
act04 Grooming0206000000
act05 Mouth care0002600000
act06 Ascending stairs0000010101
act07 Descending stairs0000002002
act08 Sitting0000000340
act09 Standing0000000090
act10 Walking00000012020
Table 7. Confusion matrix of classification results of Table 4 using AdaBoost algorithm.
Table 7. Confusion matrix of classification results of Table 4 using AdaBoost algorithm.
Classified as
Actualact01act02act03act04act05act06act07act08act09act10
act01 Cooking6010000000
act02 Doing housework0801000002
act03 Eating2060100001
act04 Grooming0005300000
act05 Mouth care1023200000
act06 Ascending stairs0000021000
act07 Descending stairs0000013000
act08 Sitting0000000610
act09 Standing0000000090
act10 Walking00000000023
Table 8. Classification results using weighted averages corresponding based on three types of features: high-level, frequency-domain, and time-domain. σ is the weighted standard deviation.
Table 8. Classification results using weighted averages corresponding based on three types of features: high-level, frequency-domain, and time-domain. σ is the weighted standard deviation.
AlgorithmMetricHigh-Level ( σ )Frequency-Domain ( σ )Time-Domain ( σ )
NB T P R 0.822 (0.136)0.778 (0.168)0.844 (0.255)
T N R 0.973 (0.021)0.981 (0.020)0.986 (0.029)
F 1 0.821 (0.125)0.777 (0.091)0.825 (0.257)
KNN T P R 0.833 (0.205)0.833 (0.146)0.911 (0.189)
T N R 0.975 (0.020)0.976 (0.015)0.986 (0.015)
F 1 0.826 (0.166)0.829 (0.122)0.906 (0.142)
AB T P R 0.778 (0.223)0.800 (0.156)0.800 (0.156)
T N R 0.971 (0.019)0.973 (0.014)0.980 (0.010)
F 1 0.771 (0.198)0.795 (0.108)0.802 (0.159)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

López-Nava, I.H.; Muñoz-Meléndez, A. High-Level Features for Recognizing Human Actions in Daily Living Environments Using Wearable Sensors. Proceedings 2018, 2, 1238. https://doi.org/10.3390/proceedings2191238

AMA Style

López-Nava IH, Muñoz-Meléndez A. High-Level Features for Recognizing Human Actions in Daily Living Environments Using Wearable Sensors. Proceedings. 2018; 2(19):1238. https://doi.org/10.3390/proceedings2191238

Chicago/Turabian Style

López-Nava, Irvin Hussein, and Angélica Muñoz-Meléndez. 2018. "High-Level Features for Recognizing Human Actions in Daily Living Environments Using Wearable Sensors" Proceedings 2, no. 19: 1238. https://doi.org/10.3390/proceedings2191238

APA Style

López-Nava, I. H., & Muñoz-Meléndez, A. (2018). High-Level Features for Recognizing Human Actions in Daily Living Environments Using Wearable Sensors. Proceedings, 2(19), 1238. https://doi.org/10.3390/proceedings2191238

Article Metrics

Back to TopTop