Is Spoken Language All-or-Nothing? Implications for Future Speech-Based Human-Machine Interaction | SpringerLink
Skip to main content

Is Spoken Language All-or-Nothing? Implications for Future Speech-Based Human-Machine Interaction

  • Chapter
  • First Online:
Dialogues with Social Robots

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 427))

Abstract

Recent years have seen significant market penetration for voice-based personal assistants such as Apple’s Siri. However, despite this success, user take-up is frustratingly low. This article argues that there is a habitability gap caused by the inevitable mismatch between the capabilities and expectations of human users and the features and benefits provided by contemporary technology. Suggestions are made as to how such problems might be mitigated, but a more worrisome question emerges: “is spoken language all-or-nothing”? The answer, based on contemporary views on the special nature of (spoken) language, is that there may indeed be a fundamental limit to the interaction that can take place between mismatched interlocutors (such as humans and machines). However, it is concluded that interactions between native and non-native speakers, or between adults and children, or even between humans and dogs, might provide critical inspiration for the design of future speech-based human-machine interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 20591
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 25739
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
JPY 28599
Price includes VAT (Japan)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    See [1] for a comprehensive review of the history of speech technology R&D up to, and including, the release of Siri.

  2. 2.

    It is often argued that such an approach is unimportant as users will habituate. However, habituation only occurs after sustained exposure, and a key issue here is how to increase the effectiveness of first encounters (since that has a direct impact on the likelihood of further usage).

  3. 3.

    Interestingly, these ideas do appear to be having some impact on the design of contemporary autonomous social agents such as Jibo (which has a childlike and mildly robotic voice) [28].

  4. 4.

    Members of the same species.

  5. 5.

    Interestingly, Nass and Brave [8] noted that people speak to poor automatic speech recognition systems as if they were non-native listeners.

  6. 6.

    Unfortunately, this term has already been coined to refer to a robot’s natural language abilities in robot-robot and robot-human communication [54].

References

  1. Pieraccini, R.: The Voice in the Machine. MIT Press, Cambridge (2012)

    Google Scholar 

  2. Liao, S.-H.: Awareness and Usage of Speech Technology. Masters thesis, Dept. Computer Science, University of Sheffield (2015)

    Google Scholar 

  3. Deng, L., Huang, X.: Challenges in adopting speech recognition. Commun. ACM 47(1), 69–75 (2004)

    Article  Google Scholar 

  4. Minker, W., Pittermann, J., Pittermann, A., Strauß, P.-M., Bühler, D.: Challenges in speech-based human-computer interfaces. Int. J. Speech Technol. 10(2–3), 109–119 (2007)

    Article  Google Scholar 

  5. Gales, M., Young, S.J.: The application of hidden Markov models in speech recognition. Found. Trends Signal Process. 1(3), 195–304 (2007)

    Article  MATH  Google Scholar 

  6. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. (2012)

    Google Scholar 

  7. Moore, R.K.: Modelling data entry rates for ASR and alternative input methods. In: Proceedings of the INTERSPEECH-ICSLP, Jeju, Korea (2004)

    Google Scholar 

  8. Nass, C., Brave, S.: Wired for Speech: How Voice Activates and Advances the Human-computer Relationship. MIT Press, Cambridge (2005)

    Google Scholar 

  9. Moore, R.K.: From talking and listening robots to intelligent communicative machines. In: Markowitz, J. (ed.) Robots That Talk and Listen, pp. 317–335. De Gruyter, Boston (2015)

    Google Scholar 

  10. Bernsen, N.O., Dybkjaer, H., Dybkjaer, L.: Designing Interactive Speech Systems: From First Ideas to User Testing. Springer, London (1998)

    Book  Google Scholar 

  11. McTear, M.F.: Spoken Dialogue Technology: Towards the Conversational User Interface. Springer, London (2004)

    Book  Google Scholar 

  12. Lopez Cozar Delgado, R.: Spoken, Multilingual and Multimodal Dialogue Systems: Development and Assessment. Wiley (2005)

    Google Scholar 

  13. Philips, M.: Applications of spoken language technology and systems. In: Gilbert, M., Ney, H. (eds.) IEEE/ACL Workshop on Spoken Language Technology (SLT) (2006)

    Google Scholar 

  14. Tomko, S., Harris, T.K., Toth, A., Sanders, J., Rudnicky, A., Rosenfeld, R.: Towards efficient human machine speech communication. ACM Trans. Speech Lang. Process. 2(1), 1–27 (2005)

    Article  Google Scholar 

  15. Tomko, S.L.: Improving User Interaction with Spoken Dialog Systems via Shaping. Ph.D. Thesis, Carnegie Mellon University (2006)

    Google Scholar 

  16. Komatani, K., Fukubayashi, Y., Ogata, T., Okuno, H.G.: Introducing utterance verification in spoken dialogue system to improve dynamic Help generation for novice users. In: Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, pp. 202–205 (2007)

    Google Scholar 

  17. Schlangen, D., Skantze, G.: A general, abstract model of incremental dialogue processing. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL-09), Athens, Greece (2009)

    Google Scholar 

  18. Hastie, H., Lemon, O., Dethlefs, N.: Incremental spoken dialogue systems: tools and data. In: Proceedings of NAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community, pp. 15–16, Montreal, Canada (2012)

    Google Scholar 

  19. Williams, J.D., Young, S.J.: Partially observable Markov decision processes for spoken dialog systems. Comput. Speech Lang. 21(2), 231–422 (2007)

    Article  Google Scholar 

  20. Gašić, M., Breslin, C., Henderson, M., Kim, D., Szummer, M., Thomson, B., Tsiakoulis, P., Young, S.J.: POMDP-based dialogue manager adaptation to extended domains. In: Proceedings of 14th SIGdial Meeting on Discourse and Dialogue, pp. 214–222, Metz, France (2013)

    Google Scholar 

  21. Mori, M.: Bukimi no tani (the uncanny valley). Energy 7, 33–35 (1970)

    Google Scholar 

  22. Moore, R.K.: A Bayesian explanation of the “Uncanny Valley” effect and related psychological phenomena. Nat. Sci. Rep. 2(864) (2012)

    Google Scholar 

  23. Moore, R.K., Maier, V.: Visual, vocal and behavioural affordances: some effects of consistency. In: Proceedings of the 5th International Conference on Cognitive Systems (CogSys 2012), Vienna (2012)

    Google Scholar 

  24. Gibson, J.J.: The theory of affordances. In: Shaw, R., Bransford, J. (eds.) Perceiving, Acting, and Knowing: Toward an Ecological Psychology, pp. 67–82. Lawrence Erlbaum, Hillsdale (1977)

    Google Scholar 

  25. Worgan, S., Moore, R.K.: Speech as the perception of affordances. Ecolog. Psychol. 22(4), 327–343 (2010)

    Article  Google Scholar 

  26. Balentine, B.: It’s Better to Be a Good Machine Than a Bad Person: Speech Recognition and Other Exotic User Interfaces at the Twilight of the Jetsonian Age. ICMI Press, Annapolis (2007)

    Google Scholar 

  27. Moore, R.K., Morris, A.: Experiences collecting genuine spoken enquiries using WOZ techniques. In: Proceedings of the 5th DARPA Workshop on Speech and Natural Language, New York (1992)

    Google Scholar 

  28. Jibo: The World’s First Social Robot for the Home. https://www.jibo.com

  29. Jokinen, K., Hurtig, T.: User expectations and real experience on a multimodal interactive system. In: Proceedings of the INTERSPEECH-ICSLP Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA (2006)

    Google Scholar 

  30. Gardiner, A.H.: The Theory of Speech and Language. Oxford University Press, Oxford (1932)

    Google Scholar 

  31. Bickerton, D.: Language and Human Behavior. University of Washington Press, Seattle (1995)

    Google Scholar 

  32. Hauser, M.D.: The Evolution of Communication. The MIT Press (1997)

    Google Scholar 

  33. Hauser, M.D., Chomsky, N., Fitch, W.T.: The faculty of language: what is it, who has it, and how did it evolve? Science 298, 1569–1579 (2002)

    Article  Google Scholar 

  34. Everett, D.: Language: The Cultural Tool. Profile Books, London (2012)

    Google Scholar 

  35. Moore, R.K.: Spoken language processing: piecing together the puzzle. Speech Commun. 49(5), 418–435 (2007)

    Article  MathSciNet  Google Scholar 

  36. Maturana, H.R., Varela, F.J.: The Tree of Knowledge: The Biological Roots of Human Understanding. New Science Library/Shambhala Publications, Boston (1987)

    Google Scholar 

  37. Cummins, F.: Voice, (inter-)subjectivity, and real time recurrent interaction. Front. Psychol. 5, 760 (2014)

    Google Scholar 

  38. Bickhard, M.H.: Language as an interaction system. New Ideas Psychol. 25(2), 171–187 (2007)

    Article  Google Scholar 

  39. Cowley, S.J. (ed.): Distributed Language. John Benjamins Publishing Company (2011)

    Google Scholar 

  40. Fusaroli, R., Raczaszek-Leonardi, J., Tylén, K.: Dialog as interpersonal synergy. New Ideas Psychol. 32, 147–157 (2014)

    Article  Google Scholar 

  41. Scott-Phillips, T.: Speaking Our Minds: Why Human Communication Is Different, and How Language Evolved to Make It Special. Palgrave MacMillan (2015)

    Google Scholar 

  42. Baron-Cohen, S.: Evolution of a theory of mind? In: Corballis, M., Lea, S. (eds.) The Descent of Mind: Psychological Perspectives on Hominid Evolution. Oxford University Press (1999)

    Google Scholar 

  43. Malle, B.F.: The relation between language and theory of mind in development and evolution. In: Givón, T., Malle, B.F. (eds.) The Evolution of Language out of Pre-Language, pp. 265–284. Benjamins, Amsterdam (2002)

    Chapter  Google Scholar 

  44. Lakoff, G., Johnson, M.: Metaphors We Live By. University of Chicago Press, Chicago (1980)

    Google Scholar 

  45. Feldman, J.A.: From Molecules to Metaphor: A Neural Theory of Language. Bradford Books (2008)

    Google Scholar 

  46. Levinson, S.C.: Pragmatics. Cambridge University Press, Cambridge (1983)

    Google Scholar 

  47. Friston, K., Kiebel, S.: Predictive coding under the free-energy principle. Phil. Trans. R. Soc. B 364(1521), 1211–1221 (2009)

    Article  Google Scholar 

  48. Rizzolatti, G., Craighero, L.: The mirror-neuron system. Annu. Rev. Neurosci. 27, 169–192 (2004)

    Article  Google Scholar 

  49. Wilson, M., Knoblich, G.: The case for motor involvement in perceiving conspecifics. Psychol. Bull. 131(3), 460–473 (2005)

    Article  Google Scholar 

  50. Pickering, M.J., Garrod, S.: Do people use language production to make predictions during comprehension? Trends Cogn. Sci. 11(3), 105–110 (2007)

    Article  Google Scholar 

  51. Garrod, S., Gambi, C., Pickering, M.J.: Prediction at all levels: forward model predictions can enhance comprehension. Lang. Cogn. Neurosci. 29(1), 46–48 (2013)

    Article  Google Scholar 

  52. Moore, R.K.: Introducing a pictographic language for envisioning a rich variety of enactive systems with different degrees of complexity. Int. J. Adv. Robot. Syst. 13(74) (2016)

    Google Scholar 

  53. Fernald, A.: Four-month-old infants prefer to listen to Motherese. Infant Behav. Dev. 8, 181–195 (1985)

    Article  Google Scholar 

  54. Matson, E.T., Taylor, J., Raskin, V., Min, B.-C., Wilson, E.C.: A natural language exchange model for enabling human, agent, robot and machine interaction. In: Proceedings of the 5th International Conference on Automation, Robotics and Applications, pp. 340–345. IEEE (2011)

    Google Scholar 

  55. Serpell, J.: The Domestic Dog: Its Evolution, Behaviour and Interactions with People. Cambridge University Press (1995)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the European Commission [grant numbers EU-FP6-507422, EU-FP6-034434, EU-FP7-231868 and EU-FP7-611971], and the UK Engineering and Physical Sciences Research Council [grant number EP/I013512/1].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roger K. Moore .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Singapore

About this chapter

Cite this chapter

Moore, R.K. (2017). Is Spoken Language All-or-Nothing? Implications for Future Speech-Based Human-Machine Interaction. In: Jokinen, K., Wilcock, G. (eds) Dialogues with Social Robots. Lecture Notes in Electrical Engineering, vol 427. Springer, Singapore. https://doi.org/10.1007/978-981-10-2585-3_22

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2585-3_22

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2584-6

  • Online ISBN: 978-981-10-2585-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics