Natural Interaction with Traffic Control Cameras Through Multimodal Interfaces | SpringerLink
Skip to main content

Natural Interaction with Traffic Control Cameras Through Multimodal Interfaces

  • Conference paper
  • First Online:
Artificial Intelligence in HCI (HCII 2021)

Abstract

Human-Computer Interfaces have always played a fundamental role in usability and commands’ interpretability of the modern software systems. With the explosion of the Artificial Intelligence concept, such interfaces have begun to fill the gap between the user and the system itself, further evolving in Adaptive User Interfaces (AUI). Meta Interfaces are a further step towards the user, and they aim at supporting the human activities in an ambient interactive space; in such a way, the user can control the surrounding space and interact with it. This work aims at proposing a meta user interface that exploits the Put That There paradigm to enable the user to fast interaction by employing natural language and gestures. The application scenario is a video surveillance control room, in which the speed of actions and reactions is fundamental for urban safety and driver and pedestrian security. The interaction is oriented towards three environments: the first is the control room itself, in which the operator can organize the views of the monitors related to the cameras on site by vocal commands and gestures, as well as conveying the audio on the headset or in the speakers of the room. The second one is related to the control of the video, in order to go back and forth to a particular scene showing specific events, or zoom in/out a particular camera; the third allows the operator to send rescue vehicle in a particular street, in case of need. The gestures data are acquired through a Microsoft Kinect 2 which captures pointing and gestures allowing the user to interact multimodally thus increasing the naturalness of the interaction; the related module maps the movement information to a particular instruction, also supported by vocal commands which enable its execution. Vocal commands are mapped by means of the LUIS (Language Understanding) framework by Microsoft, which helps to yield a fast deploy of the application; furthermore, LUIS guarantees the possibility to extend the dominion related command list so as to constantly improve and update the model. A testbed procedure investigates both the system usability and multimodal recognition performances. Multimodal sentence error rate (intended as the number of incorrectly recognized utterances even for a single item) is around 15%, given by the combination of possible failures both in the ASR and gesture recognition model. However, intent classification performances present, on average across different users, accuracy ranging around 89–92% thus indicating that most of the errors in multimodal sentences lie on the slot filling task. Usability has been evaluated through task completion paradigm (including interaction duration and activity on affordances counts per task), learning curve measurements, a posteriori questionnaires.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.unrealengine.com/.

References

  1. Atzori, A., Barra, S., Carta, S., Fenu, G., Podda, A.S.: Heimdall: an AI-based infrastructure for traffic monitoring and anomalies detection (2021, in press)

    Google Scholar 

  2. Balta-Ozkan, N., Davidson, R., Bicket, M., Whitmarsh, L.: Social barriers to the adoption of smart homes. Energy Policy 63, 363–374 (2013). https://doi.org/10.1016/j.enpol.2013.08.043. https://www.sciencedirect.com/science/article/pii/S0301421513008471

    Article  Google Scholar 

  3. Barra, S., Carcangiu, A., Carta, S., Podda, A.S., Riboni, D.: A voice user interface for football event tagging applications. In: Proceedings of the International Conference on Advanced Visual Interfaces, AVI 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3399715.3399967

  4. Bolt, R.A.: “put-that-there” voice and gesture at the graphics interface. In: Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques, pp. 262–270 (1980)

    Google Scholar 

  5. Bonino, D., Corno, F.: What would you ask to your home if it were intelligent? exploring user expectations about next-generation homes. J. Ambient Intel. Smart Environ. 3, 111–126 (2011). https://doi.org/10.3233/AIS-2011-009910.3233

    Article  Google Scholar 

  6. Browne, D., Totterdell, P., Norman, M. (eds.): Computers and People Series. Academic Press, London (1990). http://www.sciencedirect.com/science/article/pii/B9780121377557500017

  7. Coutaz, J.: Meta-user interfaces for ambient spaces. In: Coninx, K., Luyten, K., Schneider, K.A. (eds.) TAMODIA 2006. LNCS, vol. 4385, pp. 1–15. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-70816-2_1

    Chapter  Google Scholar 

  8. Coutaz, J.: Meta-user interfaces for ambient spaces: can model-driven-engineering help?. In: Burnett, M.H., Engels, G., Myers, B.A., Rothermel, G. (eds.) In: Proceedings of Dagstuhl Seminar End-User Software Engineering, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany, Dagstuhl, Germany, p. 07081 (2007). http://drops.dagstuhl.de/opus/volltexte/2007/1082

  9. Grazioso, M., Cera, V., Di Maro, M., Origlia, A., Cutugno, F.: From linguistic linked open data to multimodal natural interaction: a case study. In: 2018 22nd International Conference Information Visualisation (IV), pp. 315–320 (2018). https://doi.org/10.1109/iV.2018.00060

  10. Grazioso, M., Di Maro, M., Cutugno, F.: “what’s that called?”: a multimodal fusion approach for cultural heritage virtual experiences. In: CEUR-WS, vol. 2730 (2020). https://www.scopus.com/inward/record.uri?eid=2-s2.0-85096133907&partnerID=40&md5=46b2830cde8476d6e561254c280e6987

  11. Kashyap, H., Singh, V., Chauhan, V., Siddhi, P.: A methodology to overcome challenges and risks associated with ambient intelligent systems. In: 2015 International Conference on Advances in Computer Engineering and Applications, pp. 245–248 (2015). https://doi.org/10.1109/ICACEA.2015.7164704

  12. Kim, B., Suh, H., Heo, J., Choi, Y.: AI-driven interface design for intelligent tutoring system improves student engagement. arXiv preprint arXiv:2009.08976 (2020)

  13. Kim, M., Seong, E., Jwa, Y., Lee, J., Kim, S.: A cascaded multimodal natural user interface to reduce driver distraction. IEEE Access 8, 112969–112984 (2020). https://doi.org/10.1109/ACCESS.2020.3002775

    Article  Google Scholar 

  14. Langley, P.: Machine learning for adaptive user interfaces. In: Brewka, G., Habel, C., Nebel, B. (eds.) KI 1997. LNCS, vol. 1303, pp. 53–62. Springer, Heidelberg (1997). https://doi.org/10.1007/3540634932_3

    Chapter  Google Scholar 

  15. Lison, P., Kennington, C.: Opendial: A toolkit for developing spoken dialogue systems with probabilistic rules. In: Proceedings of ACL-2016 System Demonstrations, pp. 67–72 (2016)

    Google Scholar 

  16. Mostafazadeh Davani, A., Nazari Shirehjini, A.A., Daraei, S.: Towards interacting with smarter systems. J. Ambient Intell. Humanized Comput. 9(1), 187–209 (2018). https://doi.org/10.1007/s12652-016-0433-9

    Article  Google Scholar 

  17. Mozgai, S., Hartholt, A., Rizzo, A.S.: An adaptive agent-based interface for personalized health interventions. In: Proceedings of the 25th International Conference on Intelligent User Interfaces Companion, IUI 2020, pp. 118–119. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3379336.3381467

  18. Nazari Shirehjini, A.A., Semsar, A.: Human interaction with IoT-based smart environments. Multimedia Tools Appl. 76(11), 13343–13365 (2016). https://doi.org/10.1007/s11042-016-3697-3

    Article  Google Scholar 

  19. Origlia, A., Cutugno, F., Rodà, A., Cosi, P., Zmarich, C.: FANTASIA: a framework for advanced natural tools and applications in social, interactive approaches. Multimedia Tools Appl. 78(10), 13613–13648 (2019). https://doi.org/10.1007/s11042-019-7362-5

    Article  Google Scholar 

  20. Ousmer, M., Vanderdonckt, J., Buraga, S.: An ontology for reasoning on body-based gestures. In: Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems. EICS 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3319499.3328238

  21. Oviatt, S.L., Cohen, P.: Perceptual user interfaces: multimodal interfaces that process what comes naturally. Commun. ACM 43(3), 45–53 (2000)

    Article  Google Scholar 

  22. Oviatt, S.L., DeAngeli, A., Kuhn, K.: Integration and synchronization of input modes during multimodal human-computer interaction. In: Proceedings of Conference on Human Factors in Computing Systems, CHI 1997, pp. 415–422 (March 22–27, Atlanta, GA). ACM Press, NY (1997)

    Google Scholar 

  23. Oviatt, S., et al.: Multimodal interfaces. The human-computer interaction handbook: Fundamentals, evolving technologies and emerging applications 14, 286–304 (2003)

    Google Scholar 

  24. Paudyal, B., Creed, C., Frutos-Pascual, M., Williams, I.: Voiceye: A multimodal inclusive development environment. In: Proceedings of the 2020 ACM Designing Interactive Systems Conference, pp. 21–33 (2020)

    Google Scholar 

  25. Portet, F., Vacher, M., Golanski, C., Roux, C., Meillon, B.: Design and evaluation of a smart home voice interface for the elderly: acceptability and objection aspects. Pers. Ubiquit. Comput. 17(1), 127–144 (2013). https://doi.org/10.1007/s00779-011-0470-5

    Article  Google Scholar 

  26. Rocha, L.A.A., Naves, E.L.M., Morére, Y., de Sa, A.A.R.: Multimodal interface for alternative communication of people with motor disabilities. Research on Biomedical Engineering 36(1), 21–29 (2019). https://doi.org/10.1007/s42600-019-00035-w

    Article  Google Scholar 

  27. Röhm, B., Gögelein, L., Kugler, S., Anderl, R.: AI-driven worker assistance system for additive manufacturing. In: Ahram, T. (ed.) AHFE 2020. AISC, vol. 1213, pp. 22–27. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-51328-3_4

    Chapter  Google Scholar 

  28. Rosa, G.M., Elizondo, M.L.: Use of a gesture user interface as a touchless image navigation system in dental surgery: case series report. Imaging Sci. Dentist. 44(2), 155 (2014)

    Article  Google Scholar 

  29. Roscher, D., Blumendorf, M., Albayrak, S.: A meta user interface to control multimodal interaction in smart environments. In: Proceedings of the 14th International Conference on Intelligent User Interfaces IUI 2009, pp. 481–482. Association for Computing Machinery, New York, NY, USA (2009). https://doi.org/10.1145/1502650.1502725

  30. Sánchez-Margallo, F.M., Sánchez-Margallo, J.A., Moyano-Cuevas, J.L., Pérez, E.M., Maestre, J.: Use of natural user interfaces for image navigation during laparoscopic surgery: initial experience. Minim. Invasive Ther. Allied Technol. 26(5), 253–261 (2017)

    Article  Google Scholar 

  31. dos Santos12, J.R.A., Meyer, T.S., Junior, P.T.A.: An adaptive interface framework for a home assistive robot

    Google Scholar 

  32. Wallach, D., Scholz, S.C.: User-centered design: why and how to put users first in software development. In: Maedche, A., Botzenhardt, A., Neer, L. (eds.) Software for people, pp. 11–38. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31371-4_2

    Chapter  Google Scholar 

  33. Wijayasinghe, I.B., Saadatzi, M.N., Peetha, S., Popa, D.O., Cremer, S.: Adaptive interface for robot teleoperation using a genetic algorithm. In: 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), pp. 50–56. IEEE (2018)

    Google Scholar 

  34. Worsley, M., Barel, D., Davison, L., Large, T., Mwiti, T.: Multimodal interfaces for inclusive learning. In: Penstein Rosé, C., et al. (eds.) AIED 2018. LNCS (LNAI), vol. 10948, pp. 389–393. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93846-2_73

    Chapter  Google Scholar 

  35. Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimedia 19(2), 4–10 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Silvio Barra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Grazioso, M., Podda, A.S., Barra, S., Cutugno, F. (2021). Natural Interaction with Traffic Control Cameras Through Multimodal Interfaces. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2021. Lecture Notes in Computer Science(), vol 12797. Springer, Cham. https://doi.org/10.1007/978-3-030-77772-2_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77772-2_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77771-5

  • Online ISBN: 978-3-030-77772-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics