1. Introduction
The availability and affordability of smart home technology have driven the rapid increase in the number of smart homes. Typically, smart home technologies enable voice-activated functions, automation, monitoring, and tracking events such as the status of windows and doors, entry, and presence detection. Besides comfort and convenience, the integration of smart home functionality with the Internet of Things (IoT) and other communications systems creates new possibilities for assisting and monitoring the well-being of aged or disabled people [
1]. In particular, activity recognition within smart homes can provide valuable information about the well-being of the smart home residence. Such information can be utilized to automatically adjust the ambient conditions of the rooms with the use of heating, ventilation, and air conditioning (HVAC). Another use of this information could be the detection of irregularities within the residence’s activities that indicate that assistance is required or a medical emergency. In general, human activity recognition systems can be applied to many fields, such as assisted living, injury detection, personal healthcare, elderly care, fall detection, rehabilitation, entertainment, and surveillance in smart home environments [
2].
In general, human activity recognition is formulated as a classification problem. It is an important research topic in pattern recognition and pervasive computing [
3]. A significant amount of literature concerning machine learning techniques has focused on the automatic recognition of activities performed by people and the diversity of approaches and methods to address this issue [
4,
5]. Minarno et al. [
6] compared the performance of logistic regression and support vector machine to recognize activities such as lying down, standing, sitting, walking, and walking upstairs or downstairs. Guan et al. [
7] tackled this issue using wearable deep LSTM learners for activity recognition. Ramamurthy et al. [
8] noted that deep learning methods applied to human activity recognition commonly represent the data better compared to the handcrafted features, due to their advantage of hierarchically self-derived features. Jiang et al. [
9] proposed using accelerometer data and convolutional neural networks for real-time human activity recognition. Lee et al. [
10] also considered using accelerometer data and a convolutional neural network and obtained 92.71% recognition accuracy. Wan et al. [
11] compared four algorithms of neutral networks (convolutional, long short-term memory, bidirectional long short-term memory, multilayer perceptron) in the recognition of human behavior from smartphone accelerometer data. Murad et al. [
12] noted that the size of convolutional kernels restricts the captured range of dependencies between data samples and suggested using deep recurrent neural networks instead.
This work proposes the use of two body-worn devices worn on the wrist and ankle. These devices measure temperature, humidity, proximity, magnetic field, acceleration, and rotation and transmit live data to a local host computer. Based on the received data and the use of artificial neural networks, the local host computer can recognize few human activity classes. In our previous works [
13], IBM SPSS Modeler and IBM SPSS statistics were used to implement feed-forward neural networks and logistical regression. IBM SPSS Modeler and IBM SPSS statistics are software tools that are commonly used to implement statistical methods. The developed models were designed to recognize multiple pre-defined human activities. Overall, the models showed acceptable levels of recognition accuracy. However, a few shortcomings need to be addressed; for example, some activity categories were too general and difficult to predict, only one test subject was used in the experiment, using two different measurement systems caused synchronization problems, and the accuracy differences between cross-validation and scoring results showed that larger datasets are required. This work aims to solve the problems related to the mentioned issues by using a new methodology. Since the previous works clearly showed the superiority of neural networks, this work utilizes a multilayer perception neural network. For simplicity of measurement and to address data synchronization issues, the use of room ambient data has been eliminated. Besides introducing new activity classes, the least consistent activity classes have been replaced with more specific activity classes, which results in better recognition accuracy. To increase the measurement data size, multiple test subjects were used and new types of equipment were utilized to increase the sampling rate. The above changes resulted in significant recognition accuracy improvements. Overall, this work aimed to increase the recognition accuracy and the number of recognizable activities, and to provide a practical solution that eliminates the typical computational limitations of wearable devices.
2. Related Works
In recent years, the data analysis within smart homes has gained significant attention among researchers. Geraldo et al. [
14] proposed an intelligent decision-making system for a residential distributed automation infrastructure based on wireless sensors and actuators. The method increased the precision in decision-making with a neural network model and reduced node energy consumption using a temporal correlation mechanism. Ueyama et al. [
15] used a probabilistic technique for monitoring a remote alert system for energy. A Markov chain model was used to calculate the entropy of each device monitored, and the method identified novelties with the use of a machine learning algorithm. The results showed that the method could reduce the power consumption of the monitored equipment by 13.7%. Rocha et al. [
16] proposed an intelligent decision system based on the fog computing paradigm, which provides efficient management of residential applications. The proposed solution was evaluated both in simulated and real environments. Goncalves et al. [
17] determined and mapped out the physical and emotional state of home care users, implemented a participatory design that included the user within its social, psychological, and therapeutic context, and explored the flexible method when applied to older users. Subbaraj et al. [
18] described the process of checking the consistent behavior of a context-aware system in a smart home environment using formal modeling and verification methods. The results confirmed the consistent behavior of the context-aware system in the smart environment. Torres et al. [
19] designed an offloading algorithm to ensure resource provision in a microfog and synchronize the complexity of data processing through a healthcare environment architecture, and they experimented with face recognition and fall detection. Balakrishnan et al. [
20] discussed and reviewed the literature on the smart home definition, purpose, benefits, and technologies. Tax et al. [
21] investigated the performance of several techniques for human behavior prediction in a smart home. Azzi et al. [
22] proposed to use a very fast decision tree for activity recognition and formulated activity recognition as a classification problem where classes correspond to activities. Sim et al. [
23] proposed an acoustic information-based behavior detection algorithm for use in private spaces. The system classified human activities using acoustic information, combined strategies of elimination and similarity, and established new rules.
Much of the research in the indirect activity recognition field is focused on fall detection [
24,
25]. Sadreazami et al. [
24] utilized Standoff Radar and a time series-based method to detect fall incidents. Ahamed et al. [
25] used accelerometer-based data and deep learning methods for fall detection. Other researchers took activity recognition further than fall detection by recognizing multiple human behaviors. Commonly, camera-based recognition techniques are used to recognize multiple predefined human activities. Hsueh et al. [
26] used deep learning techniques to learn the long-term dependencies in a multi-view detection framework to recognize human behavior. Besides the computational burden, the camera-based solutions frequently introduce privacy and security concerns for the residence. Therefore, indirect recognition methods are generally preferred. Indirect recognition methods are often limited to presence detection and occupancy monitoring. Szczurek et al. [
27] investigated occupancy determination based on time series of
concentration, temperature, and relative humidity. Vanus et al. [
28] designed a
-based method for human presence monitoring in an intelligent building. The work continued by replacing measured
with predicted values of
. Predictions were performed on neural networks [
29], random trees, and linear regression [
30].
On a larger scale, others have taken indirect recognition to a more advanced level by recognizing specific human activities. Kasteren et al. [
31] introduced a sensor and annotation system for performing activity recognition in a house setting using a hidden Markov model and conditional random fields, resulting in class accuracy of 79.4%. Nweke et al. [
2] reviewed deep learning algorithms for human activity recognition using mobile and wearable sensor networks. Albert et al. [
32] used mobile phones for activity recognition in Parkinson’s patients. Hassan et al. [
33] proposed using smartphone inertial sensors such as accelerometers and gyroscope sensors to recognize human activities. The obtained results showed a mean recognition rate of 89.61%. Zhou et al. [
34] used deep learning and datasets collected from smartphones and on-body wearable devices to perform human activity recognition within the Internet of Healthcare Things. In similar studies, Kwapisz et al. [
35] and Bayat et al. [
36] also suggested using smartphones.
The use of a smartphone as the primary sensor is very convenient but it comes with major drawbacks. In practice, they fail to identify complicated and real-time human activities. Ravi et al. [
37] found that using a single triaxial accelerometer to recognize human activity can result in fairly accurate results. The work showed the limitation of a single worn sensor near the pelvic region when it comes to activities that involve the movement of only the hands or mouth. Chen et al. [
38] noted the variety of smartphone positions or orientations, and the gross accuracy of their embedded sensors could result in additional challenges. Other works investigated the use of multiple sensors. Bao et al.’s [
39] implementation involved five small biaxial accelerometers worn simultaneously on different parts of the body; decision tree classifiers showed an overall accuracy rate of 84%. Furthermore, the research showed that the recognition accuracy only drops slightly when only two thigh- and wrist-worn sensors are used. Trost et al. [
40] compared results obtained from hip- and wrist-worn accelerometer data for the recognition of seven classes of activities. On the other hand, Zhang et al. [
41] noted that the computational limitations of wearable devices can also represent a challenge in real-world applications. Our implementation involves wrist-worn and ankle-worn devices that communicate wirelessly with a remote computer, which eliminates computational limitations. These limitations have been eliminated by the use of a powerful local host computer.
4. Measurements and Results
The data acquisition was performed in laboratory EB412 at the new Faculty of Electrical Engineering and Computer Science building of the VSB Technical University of Ostrava. Six datasets were obtained as the results of these measurements.
Table 3 shows the number records in each recorded dataset, where individual letters are assigned to different test subjects and numbers represent different measurement dates. This section evaluates the recognition accuracy of the developed models with the use of cross-validation and scoring.
The analysis was performed using IBM SPSS Modeler. In the first stage, models were trained and evaluated using cross-validation.
Figure 7 shows the developed data stream. It starts with importing the data and continues with selecting relevant data and assigning a specific type to each datum. Once the input data are established, the partition nodes split the data into three subsets: training (30% of total data), testing (30% of total data), validation (40% of total data). In the next stage, an MLP network is trained, tested, and validated using the above partitions.
The above steps were repeated for seven model settings and six different datasets, which resulted in 42 models. These models mostly showed accuracy levels above 99%, which is considerably more accurate than similar implementations. A minimum accuracy of 94.59% was observed in dataset B1, activity class 1, with eight neurons in the hidden layer. On other hand, many models showed 100% accuracy across multiple activity classes and neuron settings.
Table 4 shows the average accuracy of the models across all nine classes. In general, it can be observed that an increase in the number of neurons slightly improves the accuracy, but this accuracy improvement reverses in models with more than 128 hidden layer neurons. A closer look shows that these models are limited by the maximum allowed training time (stopping rule SR3). Therefore, the lowest possible error state and highest accuracy cannot be reached by these models.
Table 5 represents the average accuracy of each activity class across multiple datasets. Class 4 is the most accurate and often shows 100% recognition accuracy on average. Given that it corresponds to relaxing and minimal movement, this is a very consistent activity and is easy to recognize. All other activity classes maintained average accuracy levels above 98.86%.
The above results demonstrate extremely accurate recognition rates and the high potential of the introduced method. In general, the training dataset and validation dataset are very similar in cross-validation. Therefore, this indicates the accuracy of models that are trained with a very large training dataset that includes most of the possible events. Often, most researchers only rely on these cross-validation results. However, to estimate the real performance of the models for certain datasets, it is recommended to use an entirely different dataset for training and evaluation. This process is called scoring.
Figure 8 shows a scoring data stream. Dataset A1 is entirely used for training, and dataset A2 is only used for evaluation. Since the scored models have never observed the evaluation datasets, it is expected that noticeable differences will be observed in the accuracy levels in comparison with the cross-validation results. The larger the difference, the better the indication of the larger training dataset requirement.
Table 6 shows the average scoring accuracy of the models. Scoring dataset A1 and dataset A2 against each other resulted in an average of 91.35% and 91.04%, which is impressive. On the other hand, datasets B1 and B2 experienced a more significant drop (average of 79.45% and 77.45%). Further investigation showed that these significant accuracy drops were only present fpr class 7 and class 9 activities, which is the direct result of the inconsistent actions of the test subject during these activities. Scoring datasets C1 and C2 against each other resulted in 88.72% and 93.60%, which is also an impressive outcome.
Table 7 shows the average scoring accuracy of each activity class across multiple models and datasets. In general, classes 1, 2, 4, and 5 show highly accurate scoring results. On the other hand, class 7’s average accuracy suffers from significant accuracy loss. A closer look shows that this accuracy loss is mainly present in the experiment using datasets B1 and B2. Otherwise, other datasets performed decently across all classes and models. In total, the validation accuracy averaged 99.40% and the scoring accuracy averaged 86.94%. This difference was smaller for specific model settings. Further observations of both evaluations showed that the accuracy levels increased with an increase in the hidden layer neuron count. Typically, this relation reversed after 128 or 256 neurons due to the maximum allowed training times. The model setting with 256 neurons was selected to be the most suitable model setting. The average validation and scoring accuracies of these models were 99.78% and 89.27%, respectively, which shows approximately a 10% difference. This is a significant improvement over previous implementations.
5. Discussion
This study aimed to introduce a methodology that addresses most of the concerns within activity recognition research. The initial research showed that using ankle- and wrist-worn wearable devices is optimal in terms of the recognizable number of activities. Using wireless technology to transmit measured body movements and remote processing of data reduces the computational burden on measurement devices. Essentially, this allows simpler and perhaps much smaller devices to be utilized in the future. The remote processing using a powerful local computer also alleviated most of the concerns about the computational limitations of wearable devices and smartphones. This implementation used an MLP with only a single hidden layer, which represents a simpler model and less computationally intensive training. This allows better training of larger models in a given time. With direct comparison with our previous study [
13], which used two hidden layers, the cross-validation accuracy was almost identical (within margins of error). However, due to higher and more stable data acquisition rates, the scoring accuracy was significantly improved.
In addition, this study increased the number of recognizable activities to nine. A total of 84 models were developed to examine the recognition accuracy of these activity classes. The models used for cross-validation (42 models) mostly showed accuracy levels above 99%, which is considerably more accurate than similar implementations and our previous study [
13]. The relaxing activity showed mostly 100% recognition accuracy levels, and other activities cross-validated to accuracy levels above 98.86%. A minimum accuracy of 94.59% was observed in dataset B1, activity class 1, with eight neurons in the hidden layer. This was expected since dataset B1 represents the smallest data size (
Table 3). On the other hand, many models resulted in 100% accuracy across multiple activity classes and neuron settings. According to
Table 4, increasing the number of neurons slightly improved the accuracy, but this effect was reversed in larger models due to exceeding the maximum allowed training time threshold, which did not allow the models to reach a minimum recognition error state. The methodology was further tested using the scoring technique, which resulted in additional 42 models (
Table 6 and
Table 7). As mentioned earlier, all activities showed highly accurate scoring results, but the vacuum cleaning activity’s (class 7) average accuracy suffered from an accuracy loss. Scoring the dataset A1 and dataset A2 against each other resulted in an average of 91.35% and 91.04%, and scoring C1 and C2 against each other resulted in 88.72% and 93.60%, but datasets B1 and B2 experienced a more significant drop (average of 79.45% and 77.45%), which was mainly caused by class 7’s recognition accuracy. Since this problem only exists in one dataset, it can be ruled out as a measurement error. By removing the class 7 results when datasets B1 and B2 were scored against each other, the scoring accuracy was almost on par with the validation results. This shows that a sufficient amount of training data were used in this research. Further observations of both evaluations showed that the accuracy levels increased with an increase in the hidden layer neuron count. Typically, this relation reversed after 128 or 256 neurons due to the maximum allowed training times. Overall, the obtained results demonstrated extremely accurate recognition and the high potential of the introduced method.
This work was aimed at introducing a methodology with high recognition accuracy and without the typical computational limitations that are described in most existing research. The novel measurement methodology of this work addressed many previous concerns, such as inaccurate predefined activity classes, the use of a single test subject, and the utilization of two different measurement systems. In future works, the obtained accuracy levels can be further improved by the use of filters and data buffering to eliminate outliers within the prediction results. In addition, the number of activity classes could be further increased.
6. Conclusions
This work addresses many previous concerns. These issues are resolved by the use of a new methodology. It utilizes a multi-layer perceptron neural network and a novel data acquisition method to recognize nine different human activity classes, with impressive accuracy levels. The developed models cross-validated to accuracy levels above 98% across all activity classes. Thanks to the use of higher data acquisition rates and subsequently larger datasets, the accuracy difference between cross-validation and scoring was reduced to only 10%. Overall, the recognition accuracy levels were noticeably improved in comparison with the previous implementation. However, allowing longer training times may increase the accuracy levels in larger neural networks and allow even more accurate results. In addition, these results may be further improved by the use of filters and data buffering to eliminate outliers within the prediction results. The novelty of this work lies within the simplified recognition methods, highly accurate recognition accuracy levels, elimination of the computational burden by the use of a remote computer, variety of recognizable activities, and the possibilities of integration with smart home technologies. In future works, the trained models will be used in a real-time system that allows live recognition of the smart home residence, with integration and communication with smart home technologies and IoT (Internet of Things) platforms. Furthermore, the number of test subjects, recognizable activities, and accuracy levels will be increased.