Abstract
Human-Computer Interfaces have always played a fundamental role in usability and commands’ interpretability of the modern software systems. With the explosion of the Artificial Intelligence concept, such interfaces have begun to fill the gap between the user and the system itself, further evolving in Adaptive User Interfaces (AUI). Meta Interfaces are a further step towards the user, and they aim at supporting the human activities in an ambient interactive space; in such a way, the user can control the surrounding space and interact with it. This work aims at proposing a meta user interface that exploits the Put That There paradigm to enable the user to fast interaction by employing natural language and gestures. The application scenario is a video surveillance control room, in which the speed of actions and reactions is fundamental for urban safety and driver and pedestrian security. The interaction is oriented towards three environments: the first is the control room itself, in which the operator can organize the views of the monitors related to the cameras on site by vocal commands and gestures, as well as conveying the audio on the headset or in the speakers of the room. The second one is related to the control of the video, in order to go back and forth to a particular scene showing specific events, or zoom in/out a particular camera; the third allows the operator to send rescue vehicle in a particular street, in case of need. The gestures data are acquired through a Microsoft Kinect 2 which captures pointing and gestures allowing the user to interact multimodally thus increasing the naturalness of the interaction; the related module maps the movement information to a particular instruction, also supported by vocal commands which enable its execution. Vocal commands are mapped by means of the LUIS (Language Understanding) framework by Microsoft, which helps to yield a fast deploy of the application; furthermore, LUIS guarantees the possibility to extend the dominion related command list so as to constantly improve and update the model. A testbed procedure investigates both the system usability and multimodal recognition performances. Multimodal sentence error rate (intended as the number of incorrectly recognized utterances even for a single item) is around 15%, given by the combination of possible failures both in the ASR and gesture recognition model. However, intent classification performances present, on average across different users, accuracy ranging around 89–92% thus indicating that most of the errors in multimodal sentences lie on the slot filling task. Usability has been evaluated through task completion paradigm (including interaction duration and activity on affordances counts per task), learning curve measurements, a posteriori questionnaires.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Atzori, A., Barra, S., Carta, S., Fenu, G., Podda, A.S.: Heimdall: an AI-based infrastructure for traffic monitoring and anomalies detection (2021, in press)
Balta-Ozkan, N., Davidson, R., Bicket, M., Whitmarsh, L.: Social barriers to the adoption of smart homes. Energy Policy 63, 363–374 (2013). https://doi.org/10.1016/j.enpol.2013.08.043. https://www.sciencedirect.com/science/article/pii/S0301421513008471
Barra, S., Carcangiu, A., Carta, S., Podda, A.S., Riboni, D.: A voice user interface for football event tagging applications. In: Proceedings of the International Conference on Advanced Visual Interfaces, AVI 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3399715.3399967
Bolt, R.A.: “put-that-there” voice and gesture at the graphics interface. In: Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques, pp. 262–270 (1980)
Bonino, D., Corno, F.: What would you ask to your home if it were intelligent? exploring user expectations about next-generation homes. J. Ambient Intel. Smart Environ. 3, 111–126 (2011). https://doi.org/10.3233/AIS-2011-009910.3233
Browne, D., Totterdell, P., Norman, M. (eds.): Computers and People Series. Academic Press, London (1990). http://www.sciencedirect.com/science/article/pii/B9780121377557500017
Coutaz, J.: Meta-user interfaces for ambient spaces. In: Coninx, K., Luyten, K., Schneider, K.A. (eds.) TAMODIA 2006. LNCS, vol. 4385, pp. 1–15. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-70816-2_1
Coutaz, J.: Meta-user interfaces for ambient spaces: can model-driven-engineering help?. In: Burnett, M.H., Engels, G., Myers, B.A., Rothermel, G. (eds.) In: Proceedings of Dagstuhl Seminar End-User Software Engineering, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany, Dagstuhl, Germany, p. 07081 (2007). http://drops.dagstuhl.de/opus/volltexte/2007/1082
Grazioso, M., Cera, V., Di Maro, M., Origlia, A., Cutugno, F.: From linguistic linked open data to multimodal natural interaction: a case study. In: 2018 22nd International Conference Information Visualisation (IV), pp. 315–320 (2018). https://doi.org/10.1109/iV.2018.00060
Grazioso, M., Di Maro, M., Cutugno, F.: “what’s that called?”: a multimodal fusion approach for cultural heritage virtual experiences. In: CEUR-WS, vol. 2730 (2020). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85096133907&partnerID=40&md5=46b2830cde8476d6e561254c280e6987
Kashyap, H., Singh, V., Chauhan, V., Siddhi, P.: A methodology to overcome challenges and risks associated with ambient intelligent systems. In: 2015 International Conference on Advances in Computer Engineering and Applications, pp. 245–248 (2015). https://doi.org/10.1109/ICACEA.2015.7164704
Kim, B., Suh, H., Heo, J., Choi, Y.: AI-driven interface design for intelligent tutoring system improves student engagement. arXiv preprint arXiv:2009.08976 (2020)
Kim, M., Seong, E., Jwa, Y., Lee, J., Kim, S.: A cascaded multimodal natural user interface to reduce driver distraction. IEEE Access 8, 112969–112984 (2020). https://doi.org/10.1109/ACCESS.2020.3002775
Langley, P.: Machine learning for adaptive user interfaces. In: Brewka, G., Habel, C., Nebel, B. (eds.) KI 1997. LNCS, vol. 1303, pp. 53–62. Springer, Heidelberg (1997). https://doi.org/10.1007/3540634932_3
Lison, P., Kennington, C.: Opendial: A toolkit for developing spoken dialogue systems with probabilistic rules. In: Proceedings of ACL-2016 System Demonstrations, pp. 67–72 (2016)
Mostafazadeh Davani, A., Nazari Shirehjini, A.A., Daraei, S.: Towards interacting with smarter systems. J. Ambient Intell. Humanized Comput. 9(1), 187–209 (2018). https://doi.org/10.1007/s12652-016-0433-9
Mozgai, S., Hartholt, A., Rizzo, A.S.: An adaptive agent-based interface for personalized health interventions. In: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion, IUI 2020, pp. 118–119. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3379336.3381467
Nazari Shirehjini, A.A., Semsar, A.: Human interaction with IoT-based smart environments. Multimedia Tools Appl. 76(11), 13343–13365 (2016). https://doi.org/10.1007/s11042-016-3697-3
Origlia, A., Cutugno, F., Rodà, A., Cosi, P., Zmarich, C.: FANTASIA: a framework for advanced natural tools and applications in social, interactive approaches. Multimedia Tools Appl. 78(10), 13613–13648 (2019). https://doi.org/10.1007/s11042-019-7362-5
Ousmer, M., Vanderdonckt, J., Buraga, S.: An ontology for reasoning on body-based gestures. In: Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems. EICS 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3319499.3328238
Oviatt, S.L., Cohen, P.: Perceptual user interfaces: multimodal interfaces that process what comes naturally. Commun. ACM 43(3), 45–53 (2000)
Oviatt, S.L., DeAngeli, A., Kuhn, K.: Integration and synchronization of input modes during multimodal human-computer interaction. In: Proceedings of Conference on Human Factors in Computing Systems, CHI 1997, pp. 415–422 (March 22–27, Atlanta, GA). ACM Press, NY (1997)
Oviatt, S., et al.: Multimodal interfaces. The human-computer interaction handbook: Fundamentals, evolving technologies and emerging applications 14, 286–304 (2003)
Paudyal, B., Creed, C., Frutos-Pascual, M., Williams, I.: Voiceye: A multimodal inclusive development environment. In: Proceedings of the 2020 ACM Designing Interactive Systems Conference, pp. 21–33 (2020)
Portet, F., Vacher, M., Golanski, C., Roux, C., Meillon, B.: Design and evaluation of a smart home voice interface for the elderly: acceptability and objection aspects. Pers. Ubiquit. Comput. 17(1), 127–144 (2013). https://doi.org/10.1007/s00779-011-0470-5
Rocha, L.A.A., Naves, E.L.M., Morére, Y., de Sa, A.A.R.: Multimodal interface for alternative communication of people with motor disabilities. Research on Biomedical Engineering 36(1), 21–29 (2019). https://doi.org/10.1007/s42600-019-00035-w
Röhm, B., Gögelein, L., Kugler, S., Anderl, R.: AI-driven worker assistance system for additive manufacturing. In: Ahram, T. (ed.) AHFE 2020. AISC, vol. 1213, pp. 22–27. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-51328-3_4
Rosa, G.M., Elizondo, M.L.: Use of a gesture user interface as a touchless image navigation system in dental surgery: case series report. Imaging Sci. Dentist. 44(2), 155 (2014)
Roscher, D., Blumendorf, M., Albayrak, S.: A meta user interface to control multimodal interaction in smart environments. In: Proceedings of the 14th International Conference on Intelligent User Interfaces IUI 2009, pp. 481–482. Association for Computing Machinery, New York, NY, USA (2009). https://doi.org/10.1145/1502650.1502725
Sánchez-Margallo, F.M., Sánchez-Margallo, J.A., Moyano-Cuevas, J.L., Pérez, E.M., Maestre, J.: Use of natural user interfaces for image navigation during laparoscopic surgery: initial experience. Minim. Invasive Ther. Allied Technol. 26(5), 253–261 (2017)
dos Santos12, J.R.A., Meyer, T.S., Junior, P.T.A.: An adaptive interface framework for a home assistive robot
Wallach, D., Scholz, S.C.: User-centered design: why and how to put users first in software development. In: Maedche, A., Botzenhardt, A., Neer, L. (eds.) Software for people, pp. 11–38. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31371-4_2
Wijayasinghe, I.B., Saadatzi, M.N., Peetha, S., Popa, D.O., Cremer, S.: Adaptive interface for robot teleoperation using a genetic algorithm. In: 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), pp. 50–56. IEEE (2018)
Worsley, M., Barel, D., Davison, L., Large, T., Mwiti, T.: Multimodal interfaces for inclusive learning. In: Penstein Rosé, C., et al. (eds.) AIED 2018. LNCS (LNAI), vol. 10948, pp. 389–393. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93846-2_73
Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimedia 19(2), 4–10 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Grazioso, M., Podda, A.S., Barra, S., Cutugno, F. (2021). Natural Interaction with Traffic Control Cameras Through Multimodal Interfaces. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2021. Lecture Notes in Computer Science(), vol 12797. Springer, Cham. https://doi.org/10.1007/978-3-030-77772-2_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-77772-2_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77771-5
Online ISBN: 978-3-030-77772-2
eBook Packages: Computer ScienceComputer Science (R0)