Comparing alternative modalities in the context of multimodal human–robot interaction | Journal on Multimodal User Interfaces
Skip to main content

Comparing alternative modalities in the context of multimodal human–robot interaction

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

With the advancement of interactive technology, alternative input modalities are often used, instead of conventional ones, to create intuitive, efficient, and user-friendly avenues of controlling and collaborating with robots. Researchers have examined the efficacy of natural interaction modalities such as gesture or voice in single and dual-task scenarios. These investigations have aimed to address the potential of the modalities on diverse applications encompassing activities like online shopping, precision agriculture, and mechanical component assembly, which involve tasks like object pointing and selection. This article aims to address the impact on user performance in a practical human–robot interaction application where a fixed-base robot is controlled through the utilization of natural alternative modalities. We explored this by investigating the impact of single-task and dual-task conditions on user performance for object picking and dropping. We undertook two user studies—one focusing on single-task scenarios, employing a fixed-base robot for object picking and dropping and the other encompassing dual-task conditions, utilizing a mobile robot for a driving scenario. We measured task completion times and estimated cognitive workload through the NASA Task Load Index (TLX), which offers a subjective, multidimensional scale measuring the perceived cognitive workload of a user. The studies revealed that the ranking of completion times for the alternative modalities remained consistent across both single-task and dual-task scenarios. However, the ranking based on perceived cognitive load was different. In the single-task study, the gesture-based modality resulted the highest TLX score, contrasting with the dual-task study, where the highest TLX score was associated with the eye gaze-based modality. Likewise, the speech-based modality achieved a lower TLX score compared to eye gaze and gesture in the single-task study, but its TLX score in the dual-task study was between gesture and eye gaze. These outcomes suggest that the efficacy of alternative modalities is contingent not only on user preferences but also on the specific situational context.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Biswas P, Langdon P (2014) Eye-gaze tracking based interaction in India. Procedia Comput Sci 39:59–66

    Article  Google Scholar 

  2. Guo J et al (2019) A novel robotic guidance system with eye-gaze tracking control for needle-based interventions. IEEE Trans Cognit Dev Syst 13(1):179–188

    Article  Google Scholar 

  3. Palinko O et al (2015) Eye gaze tracking for a humanoid robot. In: 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), IEEE

  4. Sharma S et al (2016) Gesture-based interaction for individuals with developmental disabilities in India. In: Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility

  5. He J et al (2014) Texting while driving: Is speech-based text entry less risky than handheld text entry? Accid Anal Prev 72:287–295

    Article  ADS  CAS  PubMed  Google Scholar 

  6. Klamka K et al (2015) Look & pedal: Hands-free navigation in zoomable information spaces through gaze-supported foot input. In: Proceedings of the 2015 ACM on international conference on multi-modal interaction

  7. Manawadu UE et al (2017) A multimodal human-machine interface enabling situation-Adaptive control inputs for highly automated vehicles. In: 2017 IEEE Intelligent Vehicles Symposium (IV), IEEE

  8. Bolt R (1980) Put-that-there" Voice and gesture at the graphics interface,". In: Proceedings of the 7th annual conference on Computer graphics and interactive techniques

  9. Hornof AJ and Cavender A (2005) EyeDraw: enabling children with severe motor impairments to draw with their eyes. In: Proceedings of the SIGCHI conference on Human factors in computing systems

  10. Nancel M et al (2011) Mid-air pan-and-zoom on wall-sized displays. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

  11. Serpiva V et al (2021) Dronepaint: swarm light painting with DNN-based gesture recognition. ACM SIGGRAPH 2021 Emerging Technologies, pp 1-4

  12. Yam-Viramontes B et al (2022) Commanding a drone through body poses, improving the user experience. J Multimod User Interfaces 16(4):357–369

    Article  Google Scholar 

  13. Majaranta P and Räihä K-J (2002) Twenty years of eye typing: systems and design issues. In: Proceedings of the 2002 symposium on Eye tracking research & applications

  14. Kumar M et al (2007) Eyepoint: practical pointing and selection using gaze and keyboard. In: Proceedings of the SIGCHI conference on Human factors in computing systems

  15. Sharma VK et al (2020) Eye gaze controlled robotic arm for persons with severe speech and motor impairment. In: ACM Symposium on Eye Tracking Research and Applications

  16. Oviatt S (1999) Ten myths of multimodal interaction. Commun ACM 42(11):74–81

    Article  Google Scholar 

  17. Lee M et al (2013) A usability study of multimodal input in an augmented reality environment. Virt Real 17(4):293–305

    Article  Google Scholar 

  18. Hürst W, Van Wezel C (2013) Gesture-based interaction via finger tracking for mobile augmented reality. Multimed Tools Appl 62(1):233–258

    Article  Google Scholar 

  19. M. Van den Bergh, et al., "Real-time 3D hand gesture interaction with a robot for understanding directions from humans," 2011 Ro-Man. IEEE, 2011.

  20. Alvarez-Santos V et al (2014) Gesture-based interaction with voice feedback for a tour-guide robot. J Vis Commun Image Represent 25(2):499–509

    Article  Google Scholar 

  21. Haddadi A et al (2013) Analysis of task-based gestures in human-robot interaction. In: 2013 IEEE International Conference on Robotics and Automation, IEEE

  22. Al Mahmud J et al (2022) 3D gesture recognition and adaptation for human-robot interaction. IEEE Access 10:116485–116513

    Article  Google Scholar 

  23. Coronado E et al (2017) Gesture-based robot control: Design challenges and evaluation with humans. In: 2017 IEEE international conference on robotics and automation (ICRA), IEEE

  24. Hettig J et al (2017) Comparison of gesture and conventional interaction techniques for interventional neuroradiology. Int J Comput Assist Radiol Surg 12(9):1643–1653

    Article  PubMed  Google Scholar 

  25. Gips J and Olivieri P (1996) EagleEyes: an eye control system for persons with disabilities. In: The eleventh international conference on technology and persons with disabilities

  26. Wobbrock JO et al (2008) Longitudinal evaluation of discrete consecutive gaze gestures for text entry. In: Proceedings of the 2008 symposium on Eye tracking research & applications

  27. Biswas P, Langdon P (2015) Multimodal intelligent eye-gaze tracking system. Int Jf Human-Comput Interact 31(4):277–294

    Article  Google Scholar 

  28. Murthy LRD et al (2021) Eye-gaze-controlled HMDS and MFD for military aircraft. J Aviat Technol Eng 10(2):34

    Article  Google Scholar 

  29. Lim Y et al (2018) Eye-tracking sensors for adaptive aerospace human-machine interfaces and interactions. In: 2018 5th IEEE international workshop on metrology for aerospace (MetroAeroSpace), IEEE

  30. Jannette M, Vollrath M (2009) Comparison of manual vs. speech-based interaction with in-vehicle information systems. Accid Anal Prev 41(5):924–930

    Article  Google Scholar 

  31. Lee JD et al (2001) Speech-based interaction with in-vehicle computers: the effect of speech-based email on drivers’ attention to the roadway. Hum Factors 43(4):631–640

    Article  CAS  PubMed  Google Scholar 

  32. Doyle J and Bertolotto M (2006) Combining speech and pen input for effective interaction in mobile geospatial environments. In: Proceedings of the 2006 ACM symposium on Applied computing

  33. Fröhlich J and Wachsmuth I (2013) The visual, the auditory and the haptic–a user study on combining modalities in virtual worlds. In: International Conference on Virtual, Augmented and Mixed Reality, Springer, Berlin, Heidelberg

  34. Frisch M et al (2009) Investigating multi-touch and pen gestures for diagram editing on interactive surfaces. In: Proceedings of the ACM international conference on interactive tabletops and surfaces

  35. Pfeuffer K et al (2014) Gaze-touch: combining gaze with multi-touch for interaction on the same surface. In: Proceedings of the 27th annual ACM symposium on User interface software and technology

  36. Hatscher B and Hansen C (2018) Hand, foot or voice: alternative input modalities for touchless interaction in the medical domain. In: Proceedings of the 20th ACM international conference on multi-modal interaction

  37. Chen Z et al (2017) Multi-modal interaction in augmented reality. In: 2017 IEEE international conference on systems, man, and cybernetics (SMC), IEEE

  38. Prabhakar G et al (2020) Interactive gaze and finger controlled HUD for cars. J Multi-Modal User Interfaces 14(1):101–121

    Article  Google Scholar 

  39. Palinko O et al (2016) Robot reading human gaze: Why eye tracking is better than head tracking for human-robot collaboration. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE

  40. Craig TL et al (2016) Human gaze commands classification: a shape based approach to interfacing with robots. In: 2016 12th IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications (MESA), IEEE

  41. Rudzicz F et al (2015) Speech interaction with personal assistive robots supporting aging at home for individuals with Alzheimer’s disease. ACM Trans Access Comput (TACCESS) 7(2):1–22

    Article  Google Scholar 

  42. Prodanov PJ et al (2002) Voice enabled interface for interactive tour-guide robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. Vol 2, IEEE

  43. Zinchenko K et al (2016) A study on speech recognition control for a surgical robot. IEEE Trans Industr Inf 13(2):607–615

    Article  Google Scholar 

  44. Bingol MC, Aydogmus O (2020) Performing predefined tasks using the human–robot interaction on speech recognition for an industrial robot. Eng Appl Artif Intell 95:103903

    Article  Google Scholar 

  45. Kurnia R et al (2004) Object recognition through human-robot interaction by speech. RO-MAN 2004. In: 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No. 04TH8759), IEEE

  46. Bannat A et al (2009) A multimodal human-robot-interaction scenario: working together with an industrial robot. In: International conference on human-computer interaction, Springer, Berlin, Heidelberg

  47. Randelli G et al (2013) Knowledge acquisition through human–robot multimodal interaction. Intel Serv Robot 6(1):19–31

    Article  Google Scholar 

  48. Strazdas D et al (2022) Robot system assistant (RoSA): towards intuitive multi-modal and multi-device human-robot interaction. Sensors 22(3):923

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  49. Tobii PCEye Mini. https://www.tobii.com/products/eye-trackers. Accessed on 31st October 2022

  50. Leap Motion Controller. https://leap2.ultraleap.com/leap-motion-controller-2/. Accessed on 31st Oct 2022

  51. Turtlebot3 Burger. https://emanual.robotis.com/docs/en/platform/turtlebot3/overview/. Accessed on 31st Oct 2022

  52. Dobot Magician Lite. https://www.dobot-robots.com/products/education/magician-lite.html. Accessed on 31st Oct 2022

  53. Logitech C310 HD Webcam.https://www.logitech.com/en-in/products/webcams/c310-hd-webcam.960-000588.html. Accessed on 31st Oct 2022

  54. NiTHO Drive Pro One. https://nitho.com/products/drive-pro%E2%84%A2-one-racing-wheel. Accessed on 31st Oct 2022

  55. NATO Phonetic Alphabet. https://www.worldometers.info/languages/nato-phonetic-alphabet/. Accessed 31 Oct 2022

  56. Hart SG, Staveland LE (1988) Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. Adv Psychol 52:139–183

    Article  Google Scholar 

  57. Biswas P and Dv J (2018) Eye gaze controlled MFD for military aviation. In: 23rd International Conference on Intelligent User Interfaces

  58. Karpov A and Ronzhin A (2014) A universal assistive technology with multimodal input and multimedia output interfaces. Universal Access in Human-Computer Interaction. Design and Development Methods for Universal Access: 8th International Conference, UAHCI 2014, Held as Part of HCI International 2014, Heraklion, Crete, Greece, Jun 22–27, 2014, Proceedings, Part I 8, Springer International Publishing

  59. Mukhopadhyay A et al (2019) Comparing CNNs for non-conventional traffic participants. In: Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications: Adjunct Proceedings

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suprakas Saren.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (MP4 48719 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saren, S., Mukhopadhyay, A., Ghose, D. et al. Comparing alternative modalities in the context of multimodal human–robot interaction. J Multimodal User Interfaces 18, 69–85 (2024). https://doi.org/10.1007/s12193-023-00421-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-023-00421-w

Keywords