Personalized Navigation that Links Speaker’s Ambiguous Descriptions to Indoor Objects for Low Vision People | SpringerLink
Skip to main content

Personalized Navigation that Links Speaker’s Ambiguous Descriptions to Indoor Objects for Low Vision People

  • Conference paper
  • First Online:
Universal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments (HCII 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12769))

Included in the following conference series:

Abstract

Indoor navigation systems guide a user to his/her specified destination. However, current navigation systems face the challenges when a user provides ambiguous descriptions about the destinations. This can commonly happen to visually impaired people or those who are unfamiliar with new environments. For example, in an office, a low-vision person asks the navigator by saying “Take me to where I can take a rest?". The navigator may recognize each object (e.g., desk) in the office but may not recognize which location the user can take a rest. To overcome the gap of surrounding understanding between low-vision people and a navigator, we propose a personalized interactive navigation system that links user’s ambiguous descriptions to indoor objects.  We build a navigation system that automatically detect and describe objects in the environment by neural-network models. Further, we personalize the navigation by re-training the recognition models based on previous interactive dialogues, which may contain the corresponding between user’s understanding and the visual images or shapes of objects. In addition, we utilize a GPU cloud for supporting computational cost and smooth the navigation by locating user’s position using Visual SLAM. We discussed further research on customizable navigation with multi-aspect perceptions of disabilities and the limitation of AI-assisted recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    YOLOv4, https://github.com/Tianxiaomo/pytorch-YOLOv4.

  2. 2.

    Image Captioning, https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning.

  3. 3.

    OpenVSLAM, https://github.com/xdspacelab/openvslam.

  4. 4.

    https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html.

  5. 5.

    Google Glass Enterprise Edition 2, https://www.google.com/glass/tech-spec.

  6. 6.

    Checkerboard, https://markhedleyjones.com/projects/calibration-checkerboard-collection.

  7. 7.

    https://cloud.google.com/speech-to-text.

References

  1. Ahmetovic, D., Guerreiro, J., Ohn-Bar, E., Kitani, K.M., Asakawa, C.: Impact of expertise on interaction preferences for navigation assistance of visually impaired individuals. In: Proceedings of the 16th Web For All 2019 Conference - Personalizing the Web, W4A 2019, San Francisco, May 13–15, pp. 31:1–31:9. ACM (2019)

    Google Scholar 

  2. Ahmetovic, D., Mascetti, S., Bernareggi, C., Guerreiro, J., Oh, U., Asakawa, C.: Deep learning compensation of rotation errors during navigation assistance for people with visual impairments or blindness. ACM Trans. Access. Comput. 12(4), 19:1–19:19 (2020)

    Google Scholar 

  3. Ahmetovic, D., Sato, D., Oh, U., Ishihara, T., Kitani, K., Asakawa, C.: Recog: supporting blind people in recognizing personal objects. In: Bernhaupt, R., et al. (eds.) CHI 2020: CHI Conference on Human Factors in Computing Systems, Honolulu, April 25–30, pp. 1–12. ACM (2020)

    Google Scholar 

  4. Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOV4: optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020)

    Google Scholar 

  5. Giudice, N.A., Guenther, B.A., Kaplan, T.M., Anderson, S.M., Knuesel, R.J., Cioffi, J.F.: Use of an indoor navigation system by sighted and blind travelers: performance similarities across visual status and age. ACM Trans. Access. Comput. 13(3), 11:1–11:27 (2020)

    Google Scholar 

  6. Guerreiro, J., Ahmetovic, D., Sato, D., Kitani, K., Asakawa, C.: Airport accessibility and navigation assistance for people with visual impairments. In: Brewster, S.A., Fitzpatrick, G., Cox, A.L., Kostakos, V. (eds.) Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI 2019, Glasgow, 04–09, May, p. 16. ACM (2019)

    Google Scholar 

  7. Guerreiro, J., Sato, D., Asakawa, S., Dong, H., Kitani, K.M., Asakawa, C.: Cabot: designing and evaluating an autonomous navigation robot for blind people. In: Bigham, J.P., Azenkot, S., Kane, S.K. (eds.) The 21st International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS 2019, Pittsburgh, 28–30, October, pp. 68–82. ACM (2019)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Las Vegas, NV, USA, 27–30 June, pp. 770–778. IEEE Computer Society (2016)

    Google Scholar 

  9. Idrees, A., Iqbal, Z., Ishfaq, M.: An efficient indoor navigation technique to find optimal route for blinds using QR codes. CoRR abs/2005.14517 (2020)

    Google Scholar 

  10. Jabnoun, H., Hashish, M.A., Benzarti, F.: Mobile assistive application for blind people in indoor navigation. In: Jmaiel, M., Mokhtari, M., Abdulrazak, B., Aloulou, H., Kallel, S. (eds.) ICOST 2020. LNCS, vol. 12157, pp. 395–403. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-51517-1_36

    Chapter  Google Scholar 

  11. Kayukawa, S., Ishihara, T., Takagi, H., Morishima, S., Asakawa, C.: Blindpilot: a robotic local navigation system that leads blind people to a landmark object. In: Bernhaupt, R., et al. (eds.) Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, CHI 2020, Honolulu, 25–30 April, pp. 1–9. ACM (2020)

    Google Scholar 

  12. Kuriakose, B., Shrestha, R., Sandnes, F.E.: Smartphone navigation support for blind and visually impaired people - a comprehensive analysis of potentials and opportunities. In: Antona, M., Stephanidis, C. (eds.) HCII 2020. LNCS, vol. 12189, pp. 568–583. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49108-6_41

    Chapter  Google Scholar 

  13. Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  14. Liu, W., et al.: SSD: single shot multiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  15. Ohn-Bar, E., Guerreiro, J., Kitani, K., Asakawa, C.: Variability in reactions to instructional guidance during smartphone-based assisted navigation of blind users. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2(3), 131:1–131:25 (2018)

    Google Scholar 

  16. Plikynas, D., Zvironas, A., Gudauskis, M., Budrionis, A., Daniusis, P., Sliesoraityte, I.: Research advances of indoor navigation for blind people: a brief review of technological instrumentation. IEEE Instrum. Meas. Mag. 23(4), 22–32 (2020)

    Article  Google Scholar 

  17. Sato, D., et al.: NavCog3 in the wild: large-scale blind indoor navigation assistant with semantic features. ACM Trans. Access. Comput. 12(3), 14:1–14:30 (2019)

    Google Scholar 

  18. Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. CoRR abs/1808.03314 (2018)

    Google Scholar 

  19. Sumikura, S., Shibuya, M., Sakurada, K.: OpenVSLAM: a versatile visual SLAM framework. In: Proceedings of the 27th ACM International Conference on Multimedia MM 2019, pp. 2292–2295 (2019)

    Google Scholar 

  20. Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July, JMLR Workshop and Conference Proceedings, vol. 37, pp. 2048–2057. JMLR.org (2015)

    Google Scholar 

  21. Younis, A., Li, S., Jn, S., Hai, Z.: Real-time object detection using pre-trained deep learning models mobilenet-SSD. In: ICCDE 2020: The 6th International Conference on Computing and Data Engineering, Sanya, China, 4–6 January, pp. 44–48. ACM (2020)

    Google Scholar 

Download references

Acknowledgement

This work was supported by Japan Science and Technology Agency (JST CREST: JPMJCR19F2). Research Representative: Prof. Yoichi Ochiai, University of Tsukuba, Japan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun-Li Lu .

Editor information

Editors and Affiliations

A Neural Networks used in Recognition Models

A Neural Networks used in Recognition Models

We showed how to train our recognition models, as shown in Fig. 4, as follows. For detecting objects, we utilized the model of YOLOv4 [4], and there were eight object classes, which are “electric fan", “monitor", “chair", “locker", “door", “microwave", “blackboard", and “desk", trained in the demonstration. For describing objects in an environment, we utilized a typical model of image captioning [20]. In the demonstration, there were some sentences of user descriptions attached with the images of some objects. The spoken sentences from the user were translated by Google APIFootnote 7. Note that we ran transfer learning on the model of image captioning, since the basic recognition ability for textual descriptions on common visual images might be needed. We continued the training of image captioning on a model of weights, which were pre-trained on Microsoft COCO [13].

Fig. 4.
figure 4

Neural networks used in recognition models.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lu, JL. et al. (2021). Personalized Navigation that Links Speaker’s Ambiguous Descriptions to Indoor Objects for Low Vision People. In: Antona, M., Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments. HCII 2021. Lecture Notes in Computer Science(), vol 12769. Springer, Cham. https://doi.org/10.1007/978-3-030-78095-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78095-1_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78094-4

  • Online ISBN: 978-3-030-78095-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics