Natural Interaction with Traffic Control Cameras Through Multimodal Interfaces

Grazioso, Marco; Podda, Alessandro Sebastian; Barra, Silvio; Cutugno, Francesco

doi:10.1007/978-3-030-77772-2_33

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12797))

Included in the following conference series:

International Conference on Human-Computer Interaction

3067 Accesses

Abstract

Human-Computer Interfaces have always played a fundamental role in usability and commands’ interpretability of the modern software systems. With the explosion of the Artificial Intelligence concept, such interfaces have begun to fill the gap between the user and the system itself, further evolving in Adaptive User Interfaces (AUI). Meta Interfaces are a further step towards the user, and they aim at supporting the human activities in an ambient interactive space; in such a way, the user can control the surrounding space and interact with it. This work aims at proposing a meta user interface that exploits the Put That There paradigm to enable the user to fast interaction by employing natural language and gestures. The application scenario is a video surveillance control room, in which the speed of actions and reactions is fundamental for urban safety and driver and pedestrian security. The interaction is oriented towards three environments: the first is the control room itself, in which the operator can organize the views of the monitors related to the cameras on site by vocal commands and gestures, as well as conveying the audio on the headset or in the speakers of the room. The second one is related to the control of the video, in order to go back and forth to a particular scene showing specific events, or zoom in/out a particular camera; the third allows the operator to send rescue vehicle in a particular street, in case of need. The gestures data are acquired through a Microsoft Kinect 2 which captures pointing and gestures allowing the user to interact multimodally thus increasing the naturalness of the interaction; the related module maps the movement information to a particular instruction, also supported by vocal commands which enable its execution. Vocal commands are mapped by means of the LUIS (Language Understanding) framework by Microsoft, which helps to yield a fast deploy of the application; furthermore, LUIS guarantees the possibility to extend the dominion related command list so as to constantly improve and update the model. A testbed procedure investigates both the system usability and multimodal recognition performances. Multimodal sentence error rate (intended as the number of incorrectly recognized utterances even for a single item) is around 15%, given by the combination of possible failures both in the ASR and gesture recognition model. However, intent classification performances present, on average across different users, accuracy ranging around 89–92% thus indicating that most of the errors in multimodal sentences lie on the slot filling task. Usability has been evaluated through task completion paradigm (including interaction duration and activity on affordances counts per task), learning curve measurements, a posteriori questionnaires.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

HERO: A Multi-modal Approach on Mobile Devices for Visual-Aware Conversational Assistance in Industrial Domains

Enhancing Efficiency and Functionality of Voice-Controlled Cars Through NLP Techniques and Additional Features

A multimodal driver monitoring benchmark dataset for driver modeling in assisted driving automation

Article Open access 30 March 2024

Notes

1.
https://www.unrealengine.com/.

References

Atzori, A., Barra, S., Carta, S., Fenu, G., Podda, A.S.: Heimdall: an AI-based infrastructure for traffic monitoring and anomalies detection (2021, in press)
Google Scholar
Balta-Ozkan, N., Davidson, R., Bicket, M., Whitmarsh, L.: Social barriers to the adoption of smart homes. Energy Policy 63, 363–374 (2013). https://doi.org/10.1016/j.enpol.2013.08.043. https://www.sciencedirect.com/science/article/pii/S0301421513008471
Article Google Scholar
Barra, S., Carcangiu, A., Carta, S., Podda, A.S., Riboni, D.: A voice user interface for football event tagging applications. In: Proceedings of the International Conference on Advanced Visual Interfaces, AVI 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3399715.3399967
Bolt, R.A.: “put-that-there” voice and gesture at the graphics interface. In: Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques, pp. 262–270 (1980)
Google Scholar
Bonino, D., Corno, F.: What would you ask to your home if it were intelligent? exploring user expectations about next-generation homes. J. Ambient Intel. Smart Environ. 3, 111–126 (2011). https://doi.org/10.3233/AIS-2011-009910.3233
Article Google Scholar
Browne, D., Totterdell, P., Norman, M. (eds.): Computers and People Series. Academic Press, London (1990). http://www.sciencedirect.com/science/article/pii/B9780121377557500017
Coutaz, J.: Meta-user interfaces for ambient spaces. In: Coninx, K., Luyten, K., Schneider, K.A. (eds.) TAMODIA 2006. LNCS, vol. 4385, pp. 1–15. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-70816-2_1
Chapter Google Scholar
Coutaz, J.: Meta-user interfaces for ambient spaces: can model-driven-engineering help?. In: Burnett, M.H., Engels, G., Myers, B.A., Rothermel, G. (eds.) In: Proceedings of Dagstuhl Seminar End-User Software Engineering, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany, Dagstuhl, Germany, p. 07081 (2007). http://drops.dagstuhl.de/opus/volltexte/2007/1082
Grazioso, M., Cera, V., Di Maro, M., Origlia, A., Cutugno, F.: From linguistic linked open data to multimodal natural interaction: a case study. In: 2018 22nd International Conference Information Visualisation (IV), pp. 315–320 (2018). https://doi.org/10.1109/iV.2018.00060
Grazioso, M., Di Maro, M., Cutugno, F.: “what’s that called?”: a multimodal fusion approach for cultural heritage virtual experiences. In: CEUR-WS, vol. 2730 (2020). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85096133907&partnerID=40&md5=46b2830cde8476d6e561254c280e6987
Kashyap, H., Singh, V., Chauhan, V., Siddhi, P.: A methodology to overcome challenges and risks associated with ambient intelligent systems. In: 2015 International Conference on Advances in Computer Engineering and Applications, pp. 245–248 (2015). https://doi.org/10.1109/ICACEA.2015.7164704
Kim, B., Suh, H., Heo, J., Choi, Y.: AI-driven interface design for intelligent tutoring system improves student engagement. arXiv preprint arXiv:2009.08976 (2020)
Kim, M., Seong, E., Jwa, Y., Lee, J., Kim, S.: A cascaded multimodal natural user interface to reduce driver distraction. IEEE Access 8, 112969–112984 (2020). https://doi.org/10.1109/ACCESS.2020.3002775
Article Google Scholar
Langley, P.: Machine learning for adaptive user interfaces. In: Brewka, G., Habel, C., Nebel, B. (eds.) KI 1997. LNCS, vol. 1303, pp. 53–62. Springer, Heidelberg (1997). https://doi.org/10.1007/3540634932_3
Chapter Google Scholar
Lison, P., Kennington, C.: Opendial: A toolkit for developing spoken dialogue systems with probabilistic rules. In: Proceedings of ACL-2016 System Demonstrations, pp. 67–72 (2016)
Google Scholar
Mostafazadeh Davani, A., Nazari Shirehjini, A.A., Daraei, S.: Towards interacting with smarter systems. J. Ambient Intell. Humanized Comput. 9(1), 187–209 (2018). https://doi.org/10.1007/s12652-016-0433-9
Article Google Scholar
Mozgai, S., Hartholt, A., Rizzo, A.S.: An adaptive agent-based interface for personalized health interventions. In: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion, IUI 2020, pp. 118–119. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3379336.3381467
Nazari Shirehjini, A.A., Semsar, A.: Human interaction with IoT-based smart environments. Multimedia Tools Appl. 76(11), 13343–13365 (2016). https://doi.org/10.1007/s11042-016-3697-3
Article Google Scholar
Origlia, A., Cutugno, F., Rodà, A., Cosi, P., Zmarich, C.: FANTASIA: a framework for advanced natural tools and applications in social, interactive approaches. Multimedia Tools Appl. 78(10), 13613–13648 (2019). https://doi.org/10.1007/s11042-019-7362-5
Article Google Scholar
Ousmer, M., Vanderdonckt, J., Buraga, S.: An ontology for reasoning on body-based gestures. In: Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems. EICS 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3319499.3328238
Oviatt, S.L., Cohen, P.: Perceptual user interfaces: multimodal interfaces that process what comes naturally. Commun. ACM 43(3), 45–53 (2000)
Article Google Scholar
Oviatt, S.L., DeAngeli, A., Kuhn, K.: Integration and synchronization of input modes during multimodal human-computer interaction. In: Proceedings of Conference on Human Factors in Computing Systems, CHI 1997, pp. 415–422 (March 22–27, Atlanta, GA). ACM Press, NY (1997)
Google Scholar
Oviatt, S., et al.: Multimodal interfaces. The human-computer interaction handbook: Fundamentals, evolving technologies and emerging applications 14, 286–304 (2003)
Google Scholar
Paudyal, B., Creed, C., Frutos-Pascual, M., Williams, I.: Voiceye: A multimodal inclusive development environment. In: Proceedings of the 2020 ACM Designing Interactive Systems Conference, pp. 21–33 (2020)
Google Scholar
Portet, F., Vacher, M., Golanski, C., Roux, C., Meillon, B.: Design and evaluation of a smart home voice interface for the elderly: acceptability and objection aspects. Pers. Ubiquit. Comput. 17(1), 127–144 (2013). https://doi.org/10.1007/s00779-011-0470-5
Article Google Scholar
Rocha, L.A.A., Naves, E.L.M., Morére, Y., de Sa, A.A.R.: Multimodal interface for alternative communication of people with motor disabilities. Research on Biomedical Engineering 36(1), 21–29 (2019). https://doi.org/10.1007/s42600-019-00035-w
Article Google Scholar
Röhm, B., Gögelein, L., Kugler, S., Anderl, R.: AI-driven worker assistance system for additive manufacturing. In: Ahram, T. (ed.) AHFE 2020. AISC, vol. 1213, pp. 22–27. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-51328-3_4
Chapter Google Scholar
Rosa, G.M., Elizondo, M.L.: Use of a gesture user interface as a touchless image navigation system in dental surgery: case series report. Imaging Sci. Dentist. 44(2), 155 (2014)
Article Google Scholar
Roscher, D., Blumendorf, M., Albayrak, S.: A meta user interface to control multimodal interaction in smart environments. In: Proceedings of the 14th International Conference on Intelligent User Interfaces IUI 2009, pp. 481–482. Association for Computing Machinery, New York, NY, USA (2009). https://doi.org/10.1145/1502650.1502725
Sánchez-Margallo, F.M., Sánchez-Margallo, J.A., Moyano-Cuevas, J.L., Pérez, E.M., Maestre, J.: Use of natural user interfaces for image navigation during laparoscopic surgery: initial experience. Minim. Invasive Ther. Allied Technol. 26(5), 253–261 (2017)
Article Google Scholar
dos Santos12, J.R.A., Meyer, T.S., Junior, P.T.A.: An adaptive interface framework for a home assistive robot
Google Scholar
Wallach, D., Scholz, S.C.: User-centered design: why and how to put users first in software development. In: Maedche, A., Botzenhardt, A., Neer, L. (eds.) Software for people, pp. 11–38. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31371-4_2
Chapter Google Scholar
Wijayasinghe, I.B., Saadatzi, M.N., Peetha, S., Popa, D.O., Cremer, S.: Adaptive interface for robot teleoperation using a genetic algorithm. In: 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), pp. 50–56. IEEE (2018)
Google Scholar
Worsley, M., Barel, D., Davison, L., Large, T., Mwiti, T.: Multimodal interfaces for inclusive learning. In: Penstein Rosé, C., et al. (eds.) AIED 2018. LNCS (LNAI), vol. 10948, pp. 389–393. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93846-2_73
Chapter Google Scholar
Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimedia 19(2), 4–10 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electric and Information Technology Engineering (DIETI), University of Naples, “Federico II”, Naples, Italy
Marco Grazioso, Silvio Barra & Francesco Cutugno
Department of Mathematics and Computer Sciences, University of Cagliari, Cagliari, Italy
Alessandro Sebastian Podda

Authors

Marco Grazioso
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Sebastian Podda
View author publications
You can also search for this author in PubMed Google Scholar
Silvio Barra
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Cutugno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Silvio Barra .

Editor information

Editors and Affiliations

Siemens Corporation, Princeton, NJ, USA
Helmut Degen
Foundation for Research and Technology – Hellas (FORTH), Heraklion, Greece
Stavroula Ntoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grazioso, M., Podda, A.S., Barra, S., Cutugno, F. (2021). Natural Interaction with Traffic Control Cameras Through Multimodal Interfaces. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2021. Lecture Notes in Computer Science(), vol 12797. Springer, Cham. https://doi.org/10.1007/978-3-030-77772-2_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-77772-2_33
Published: 03 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77771-5
Online ISBN: 978-3-030-77772-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Natural Interaction with Traffic Control Cameras Through Multimodal Interfaces

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

HERO: A Multi-modal Approach on Mobile Devices for Visual-Aware Conversational Assistance in Industrial Domains

Enhancing Efficiency and Functionality of Voice-Controlled Cars Through NLP Techniques and Additional Features

A multimodal driver monitoring benchmark dataset for driver modeling in assisted driving automation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Natural Interaction with Traffic Control Cameras Through Multimodal Interfaces

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

HERO: A Multi-modal Approach on Mobile Devices for Visual-Aware Conversational Assistance in Industrial Domains

Enhancing Efficiency and Functionality of Voice-Controlled Cars Through NLP Techniques and Additional Features

A multimodal driver monitoring benchmark dataset for driver modeling in assisted driving automation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation