Abstract
The details presented in this article revolve around a sophisticated monitoring framework equipped with knowledge representation and computer vision capabilities, that aims to provide innovative solutions and support services in the healthcare sector, with a focus on clinical and non-clinical rehabilitation and care environments for people with mobility problems. In contemporary pervasive systems most modern virtual agents have specific reactions when interacting with humans and usually lack extended dialogue and cognitive competences. The presented tool aims to provide natural human-computer multi-modal interaction via exploitation of state-of-the-art technologies in computer vision, speech recognition and synthesis, knowledge representation, sensor data analysis, and by leveraging prior clinical knowledge and patient history through an intelligent, ontology-driven, dialogue manager with reasoning capabilities, which can also access a web search and retrieval engine module. The framework’s main contribution lies in its versatility to combine different technologies, while its inherent capability to monitor patient behaviour allows doctors and caregivers to spend less time collecting patient-related information and focus on healthcare. Moreover, by capitalising on voice, sensor and camera data, it may bolster patients’ confidence levels and encourage them to naturally interact with the virtual agent, drastically improving their moral during a recuperation process.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We use the Description Logic syntax where OWL2 is based on.
References
Ajami, H., & Mcheick, H. (2018). Ontology-based model to support ubiquitous healthcare systems for copd patients. Electronics, 7(12), 371.
Akbik, A., Blythe, D., & Vollgraf, R. (2018). Contextual string embeddings for sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 1638–1649).
Aouedi, O., Tobji, M. A. B., & Abraham, A. (2020). Internet of things and ambient intelligence for mobile health monitoring: A review of a decade of research.
Atzori, L., Iera, A., & Morabito, G. (2010). The internet of things: A survey. Computer Networks, 54(15), 2787–2805.
Baevski, A., Edunov, S., Liu, Y., Zettlemoyer, L., & Auli, M. (2019). Cloze-driven pretraining of self-attention networks. arXiv:1903.07785.
Bickmore, T. W., Trinh, H., Olafsson, S., O’Leary, T. K., Asadi, R., Rickles, N. M., & Cruz, R. (2018). Patient and consumer safety risks when using conversational assistants for medical information: an observational study of siri, alexa, and google assistant. Journal of medical Internet research, 20(9), e11510.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.
Brickley, D., & Miller, L. (2007). Foaf vocabulary specification 0.91. Citeseer.
Chernbumroong, S., Cang, S., & Yu, H. (2014). Genetic algorithm-based classifiers fusion for multisensor activity recognition of elderly people. IEEE Journal of Biomedical and Health Informatics, 19(1), 282–289.
Chin, J. P., Diehl, V. A., & Norman, K. L. (1988). Development of an instrument measuring user satisfaction of the human-computer interface. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 213–218).
Chowdhury, A. K., Tjondronegoro, D., Chandran, V., & Trost, S. G. (2017). Physical activity recognition using posterior-adapted class-based fusion of multi-accelerometers data. IEEE Journal of Biomedical and Health Informatics (99), 1–1.
Cook, D. J., Augusto, J. C., & Jakkula, V. R. (2009). Ambient intelligence: Technologies, applications, and opportunities. Pervasive and Mobile Computing, 5(4), 277–298.
Dam, H. V., Engberg, J., & Gerzymisch-Arbogast, H. (2011). Knowledge systems and translation Vol. 7. Berlin: Walter de Gruyter.
Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly, 319–340.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.
Giannakeris, P., Meditskos, G., Avgerinakis, K., Vrochidis, S., & Kompatsiaris, I. (2020). Real-time recognition of daily actions based on 3d joint movements and fisher encoding. In International Conference on Multimedia modeling, 5-8 January 2020: Springer.
Heckmann, D., Schwartz, T., Brandherm, B., Schmitz, M., & von Wilamowitz-Moellendorff, M. (2005). Gumo–the general user model ontology. In International Conference on User Modeling (pp. 428–432): Springer.
Higashinaka, R., Imamura, K., Meguro, T., Miyazaki, C., Kobayashi, N., Sugiyama, H., Hirano, T., Makino, T., & Matsuo, Y. (2014). Towards an open-domain conversational system fully based on natural language processing. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp. 928–939).
Hobbs, J. R., & Pan, F. (2006). Time ontology in owl. W3C Working Draft, 27, 133.
Hu, J. F., Zheng, W. S., Lai, J., & Zhang, J. (2017). Jointly learning heterogeneous features for rgb-d activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2186–2200.
Hu, J-F, Zheng, W-S, Ma, L., Wang, G., & Lai, J. (2016). Real-time rgb-d activity prediction by soft regression. In European Conference on Computer Vision (pp. 280–296): Springer.
Islam, S. M. R., Kwak, D., Kabir, M. D. H., Hossain, M., & Kwak, K.-S. (2015). The internet of things for health care: a comprehensive survey. IEEE Access, 3, 678–708.
Jain, A., & Kanhangad, V. (2017). Human activity classification in smartphones using accelerometer and gyroscope sensors. IEEE Sensors Journal, 18(3), 1169–1177.
Jurcicek, F., Keizer, S., Gašić, M., Mairesse, F., Thomson, B., Yu, K., & Young, S. (2011). Real user evaluation of spoken dialogue systems using amazon mechanical turk. In Twelfth Annual Conference of the International Speech Communication Association.
Kamateri, E., Meditskos, G., Symeonidis, S., Vrochidis, S., Kompatsiaris, I., & Minker, W. (2019). Knowledge-based intelligence and strategy learning for personalised virtual assistance in the healthcare domain. In Proceedings of Semantic Technologies for Healthcare and Accessibility Applications (SyMpATHY).
Kultsova, M., Potseluico, A., Anikin, A., & Romanenko, R. (2016). An ontological user model for automated generation of adaptive interface for users with special needs. In 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA) (pp. 1–6): IEEE.
Lafferty, J., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data.
Liu, J., Shahroudy, A., Xu, D., Kot, A. C., & Wang, G. (2017). Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12), 3007–3021.
Luvizon, D. C., Tabia, H., & Picard, D. (2017). Learning features combination for human action recognition from skeleton sequences. Pattern Recognition Letters, 99, 13–20.
Ly, K. H., Ly, A-M, & Andersson, G. (2017). A fully automated conversational agent for promoting mental well-being: a pilot rct using mixed methods. Internet Interventions, 10, 39–46.
Mavropoulos, T., Meditskos, G., Kamateri, E., Symeonidis, S., Tzimikas, D., Papageorgiou, L., Eleftheriadis, C., Adamopoulos, G., Vrochidis, S., & Kompatsiaris, I. (2019). A smart dialogue-competent monitoring framework supporting people in rehabilitation. In Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments (pp. 499–508): ACM.
Metz, C. E. (2008). Roc analysis in medical imaging: a tutorial review of the literature. Radiological Physics and Technology, 1(1), 2–12.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781.
Münzner, S., Schmidt, P., Reiss, A., Hanselmann, M., Stiefelhagen, R., & Dürichen, R. (2017). Cnn-based sensor fusion techniques for multimodal human activity recognition. In Proceedings of the 2017 ACM International Symposium on Wearable Computers (pp. 158–165): ACM.
Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1), 3–26.
Nweke, H. F., Teh, Y. W., Mujtaba, G., & Al-Garadi, M. A. (2019). Data fusion and multiple classifier systems for human activity detection and health monitoring: Review and open research directions. Information Fusion, 46, 147–170.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv:1802.05365.
Pragst, L., Miehle, J., Minker, W., & Ultes, S. (2017). Challenges for adaptive dialogue management in the kristina project. In Proceedings of the 1st ACM SIGCHI International Workshop on Investigating Social Interactions with Artificial Agents (pp. 11–14): ACM.
Ravindranath, P. A., Hong, P., Rafii, M. S., Aisen, P. S., & Jimenez-Maggiora, G. (2018). A step forward in integrating healthcare and voice-enabled technology: Concept demonstration with deployment of automatic medical coding model as an amazon alexa skill. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association, 14(7), P955.
Rhif, M., Wannous, H., & Farah, I. R. (2018). Action recognition from 3d skeleton sequences using deep networks on lie group features. In 2018 24th International Conference on Pattern Recognition (ICPR) (pp. 3427–3432): IEEE.
Richman, L. S., Kubzansky, L., Maselko, J., Kawachi, I., Choo, P., & Bauer, M. (2005). Positive emotion and health: going beyond the negative. Health Psychology, 24(4), 422.
Sánchez, J., Perronnin, F., Mensink, T., & Verbeek, J. (2013). Image classification with the fisher vector: Theory and practice. International Journal of Computer Vision, 105(3), 222–245.
Sanderson, R., Ciccarese, P., & Young, B. (2017). Web annotation data model.
Sang, E. F., & De Meulder, F. (2003). Introduction to the conll-2003 shared task: Language-independent named entity recognition. arXiv:0306050.
Savino, J. A., & Latifi, R. (2019). Hospital and healthcare transformation over last few decades. In The Modern Hospital (pp. 23–29): Springer.
Stisen, A., Blunck, H., Bhattacharya, S., Prentow, T. S., Kjærgaard, M.B., Dey, A., Sonne, T., & Jensen, M. M. (2015). Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems (pp. 127–140): ACM.
Tai, L. K., Setyonugroho, W., & Chen, A.L. (2020). Finding discriminatory features from electronic health records for depression prediction. Journal of Intelligent Information Systems, 55(2), 371–396.
Tanaka, H., Adachi, H., Ukita, N., Ikeda, M., Kazui, H., Kudo, T., & Nakamura, S. (2017). Detecting dementia through interactive computer avatars. IEEE Journal of Translational Engineering in Health and Medicine, 5, 1–11.
Tanaka, H., Negoro, H., Iwasaka, H., & Nakamura, S. (2017). Embodied conversational agents for multimodal automated social skills training in people with autism spectrum disorders. PloS one, 12(8), e0182151.
Tang, Y., Tian, Y., Lu, J., Li, P., & Zhou, J. (2018). Deep progressive reinforcement learning for skeleton-based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5323–5332).
Tran, T. N. T., Felfernig, A., Trattner, C., & Holzinger, A. (2020). Recommender systems in the healthcare domain: state-of-the-art and research issues. Journal of Intelligent Information Systems,1–31.
Tsanousa, A., Chatzimichail, A., Meditskos, G., Vrochidis, S., & Kompatsiaris, I. (2020). Model-based and class-based fusion of multisensor data. In International Conference on Multimedia modeling, 5-8 January 2020: Springer.
Tsanousa, A., Meditskos, G., Vrochidis, S., & Kompatsiaris, I. (2019). A weighted late fusion framework for recognizing human activity from wearable sensors. In 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA) (pp. 1–8): IEEE.
Ultes, S., & Minker, W. (2014). Managing adaptive spoken dialogue for intelligent environments. Journal of Ambient Intelligence and Smart Environments, 6(5), 523–539.
Vemulapalli, R., Arrate, F., & Chellappa, R. (2014). Human action recognition by representing 3d skeletons as points in a lie group. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 588–595).
Wang, J., Liu, Z., Wu, Y., & Yuan, J. (2012). Mining actionlet ensemble for action recognition with depth cameras. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1290–1297): IEEE.
Xia, L., Chen, C.-C., & Aggarwal, J. K. (2012). View invariant human action recognition using histograms of 3d joints. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (pp. 20–27): IEEE.
Yu, Z., Black, A. W., & Rudnicky, A. I. (2017). Learning conversational systems that interleave task and non-task content. arXiv:1703.00099.
Yu, Z., Xu, Z., Black, A. W., & Rudnicky, A. (2016). Strategy and policy learning for non-task-oriented conversational systems. In Proceedings of the 17th annual meeting of the special interest group on discourse and dialogue (pp. 404–412).
Zanfir, M., Leordeanu, M., & Sminchisescu, C. (2013). The moving pose: An efficient 3d kinematics descriptor for low-latency action recognition and detection. In Proceedings of the IEEE international conference on computer vision (pp. 2752–2759).
Acknowledgements
This research has been co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship & Innovation, under the call RESEARCH-CREATE-INNOVATE (project code: T1EDK-00686).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mavropoulos, T., Symeonidis, S., Tsanousa, A. et al. Smart integration of sensors, computer vision and knowledge representation for intelligent monitoring and verbal human-computer interaction. J Intell Inf Syst 57, 321–345 (2021). https://doi.org/10.1007/s10844-021-00648-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-021-00648-7