Abstract
Indoor navigation systems guide a user to his/her specified destination. However, current navigation systems face the challenges when a user provides ambiguous descriptions about the destinations. This can commonly happen to visually impaired people or those who are unfamiliar with new environments. For example, in an office, a low-vision person asks the navigator by saying “Take me to where I can take a rest?". The navigator may recognize each object (e.g., desk) in the office but may not recognize which location the user can take a rest. To overcome the gap of surrounding understanding between low-vision people and a navigator, we propose a personalized interactive navigation system that links user’s ambiguous descriptions to indoor objects. We build a navigation system that automatically detect and describe objects in the environment by neural-network models. Further, we personalize the navigation by re-training the recognition models based on previous interactive dialogues, which may contain the corresponding between user’s understanding and the visual images or shapes of objects. In addition, we utilize a GPU cloud for supporting computational cost and smooth the navigation by locating user’s position using Visual SLAM. We discussed further research on customizable navigation with multi-aspect perceptions of disabilities and the limitation of AI-assisted recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Image Captioning, https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning.
- 3.
OpenVSLAM, https://github.com/xdspacelab/openvslam.
- 4.
- 5.
Google Glass Enterprise Edition 2, https://www.google.com/glass/tech-spec.
- 6.
- 7.
References
Ahmetovic, D., Guerreiro, J., Ohn-Bar, E., Kitani, K.M., Asakawa, C.: Impact of expertise on interaction preferences for navigation assistance of visually impaired individuals. In: Proceedings of the 16th Web For All 2019 Conference - Personalizing the Web, W4A 2019, San Francisco, May 13–15, pp. 31:1–31:9. ACM (2019)
Ahmetovic, D., Mascetti, S., Bernareggi, C., Guerreiro, J., Oh, U., Asakawa, C.: Deep learning compensation of rotation errors during navigation assistance for people with visual impairments or blindness. ACM Trans. Access. Comput. 12(4), 19:1–19:19 (2020)
Ahmetovic, D., Sato, D., Oh, U., Ishihara, T., Kitani, K., Asakawa, C.: Recog: supporting blind people in recognizing personal objects. In: Bernhaupt, R., et al. (eds.) CHI 2020: CHI Conference on Human Factors in Computing Systems, Honolulu, April 25–30, pp. 1–12. ACM (2020)
Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOV4: optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020)
Giudice, N.A., Guenther, B.A., Kaplan, T.M., Anderson, S.M., Knuesel, R.J., Cioffi, J.F.: Use of an indoor navigation system by sighted and blind travelers: performance similarities across visual status and age. ACM Trans. Access. Comput. 13(3), 11:1–11:27 (2020)
Guerreiro, J., Ahmetovic, D., Sato, D., Kitani, K., Asakawa, C.: Airport accessibility and navigation assistance for people with visual impairments. In: Brewster, S.A., Fitzpatrick, G., Cox, A.L., Kostakos, V. (eds.) Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI 2019, Glasgow, 04–09, May, p. 16. ACM (2019)
Guerreiro, J., Sato, D., Asakawa, S., Dong, H., Kitani, K.M., Asakawa, C.: Cabot: designing and evaluating an autonomous navigation robot for blind people. In: Bigham, J.P., Azenkot, S., Kane, S.K. (eds.) The 21st International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS 2019, Pittsburgh, 28–30, October, pp. 68–82. ACM (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Las Vegas, NV, USA, 27–30 June, pp. 770–778. IEEE Computer Society (2016)
Idrees, A., Iqbal, Z., Ishfaq, M.: An efficient indoor navigation technique to find optimal route for blinds using QR codes. CoRR abs/2005.14517 (2020)
Jabnoun, H., Hashish, M.A., Benzarti, F.: Mobile assistive application for blind people in indoor navigation. In: Jmaiel, M., Mokhtari, M., Abdulrazak, B., Aloulou, H., Kallel, S. (eds.) ICOST 2020. LNCS, vol. 12157, pp. 395–403. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-51517-1_36
Kayukawa, S., Ishihara, T., Takagi, H., Morishima, S., Asakawa, C.: Blindpilot: a robotic local navigation system that leads blind people to a landmark object. In: Bernhaupt, R., et al. (eds.) Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, CHI 2020, Honolulu, 25–30 April, pp. 1–9. ACM (2020)
Kuriakose, B., Shrestha, R., Sandnes, F.E.: Smartphone navigation support for blind and visually impaired people - a comprehensive analysis of potentials and opportunities. In: Antona, M., Stephanidis, C. (eds.) HCII 2020. LNCS, vol. 12189, pp. 568–583. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49108-6_41
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, W., et al.: SSD: single shot multiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Ohn-Bar, E., Guerreiro, J., Kitani, K., Asakawa, C.: Variability in reactions to instructional guidance during smartphone-based assisted navigation of blind users. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2(3), 131:1–131:25 (2018)
Plikynas, D., Zvironas, A., Gudauskis, M., Budrionis, A., Daniusis, P., Sliesoraityte, I.: Research advances of indoor navigation for blind people: a brief review of technological instrumentation. IEEE Instrum. Meas. Mag. 23(4), 22–32 (2020)
Sato, D., et al.: NavCog3 in the wild: large-scale blind indoor navigation assistant with semantic features. ACM Trans. Access. Comput. 12(3), 14:1–14:30 (2019)
Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. CoRR abs/1808.03314 (2018)
Sumikura, S., Shibuya, M., Sakurada, K.: OpenVSLAM: a versatile visual SLAM framework. In: Proceedings of the 27th ACM International Conference on Multimedia MM 2019, pp. 2292–2295 (2019)
Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July, JMLR Workshop and Conference Proceedings, vol. 37, pp. 2048–2057. JMLR.org (2015)
Younis, A., Li, S., Jn, S., Hai, Z.: Real-time object detection using pre-trained deep learning models mobilenet-SSD. In: ICCDE 2020: The 6th International Conference on Computing and Data Engineering, Sanya, China, 4–6 January, pp. 44–48. ACM (2020)
Acknowledgement
This work was supported by Japan Science and Technology Agency (JST CREST: JPMJCR19F2). Research Representative: Prof. Yoichi Ochiai, University of Tsukuba, Japan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Neural Networks used in Recognition Models
A Neural Networks used in Recognition Models
We showed how to train our recognition models, as shown in Fig. 4, as follows. For detecting objects, we utilized the model of YOLOv4 [4], and there were eight object classes, which are “electric fan", “monitor", “chair", “locker", “door", “microwave", “blackboard", and “desk", trained in the demonstration. For describing objects in an environment, we utilized a typical model of image captioning [20]. In the demonstration, there were some sentences of user descriptions attached with the images of some objects. The spoken sentences from the user were translated by Google APIFootnote 7. Note that we ran transfer learning on the model of image captioning, since the basic recognition ability for textual descriptions on common visual images might be needed. We continued the training of image captioning on a model of weights, which were pre-trained on Microsoft COCO [13].
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Lu, JL. et al. (2021). Personalized Navigation that Links Speaker’s Ambiguous Descriptions to Indoor Objects for Low Vision People. In: Antona, M., Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Access to Media, Learning and Assistive Environments. HCII 2021. Lecture Notes in Computer Science(), vol 12769. Springer, Cham. https://doi.org/10.1007/978-3-030-78095-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-78095-1_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78094-4
Online ISBN: 978-3-030-78095-1
eBook Packages: Computer ScienceComputer Science (R0)