{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,20]],"date-time":"2024-09-20T17:02:21Z","timestamp":1726851741987},"reference-count":31,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2023,8,2]],"date-time":"2023-08-02T00:00:00Z","timestamp":1690934400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100019345","name":"King Salman center For Disability Research","doi-asserted-by":"publisher","award":["KSRG-2023-021"],"id":[{"id":"10.13039\/501100019345","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MTI"],"abstract":"Vision impairment affects an individual\u2019s quality of life, posing challenges for visually impaired people (VIPs) in various aspects such as object recognition and daily tasks. Previous research has focused on developing visual navigation systems to assist VIPs, but there is a need for further improvements in accuracy, speed, and inclusion of a wider range of object categories that may obstruct VIPs\u2019 daily lives. This study presents a modified version of YOLOv4_Resnet101 as backbone networks trained on multiple object classes to assist VIPs in navigating their surroundings. In comparison to the Darknet, with a backbone utilized in YOLOv4, the ResNet-101 backbone in YOLOv4_Resnet101 offers a deeper and more powerful feature extraction network. The ResNet-101\u2019s greater capacity enables better representation of complex visual patterns, which increases the accuracy of object detection. The proposed model is validated using the Microsoft Common Objects in Context (MS COCO) dataset. Image pre-processing techniques are employed to enhance the training process, and manual annotation ensures accurate labeling of all images. The module incorporates text-to-speech conversion, providing VIPs with auditory information to assist in obstacle recognition. The model achieves an accuracy of 96.34% on the test images obtained from the dataset after 4000 iterations of training, with a loss error rate of 0.073%.<\/jats:p>","DOI":"10.3390\/mti7080077","type":"journal-article","created":{"date-parts":[[2023,8,2]],"date-time":"2023-08-02T15:17:17Z","timestamp":1690989437000},"page":"77","source":"Crossref","is-referenced-by-count":8,"title":["Enhancing Object Detection for VIPs Using YOLOv4_Resnet101 and Text-to-Speech Conversion Model"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-0067-692X","authenticated-orcid":false,"given":"Tahani Jaser","family":"Alahmadi","sequence":"first","affiliation":[{"name":"Department of Information Systems, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University (PNU), P.O. Box 84428, Riyadh 11671, Saudi Arabia"}]},{"given":"Atta Ur","family":"Rahman","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science and Engineering, GIK Institute of Engineering Sciences and Technology, Swabi 23640, Pakistan"}]},{"ORCID":"http:\/\/orcid.org\/0000-0001-7507-5267","authenticated-orcid":false,"given":"Hend Khalid","family":"Alkahtani","sequence":"additional","affiliation":[{"name":"Department of Information Systems, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University (PNU), P.O. Box 84428, Riyadh 11671, Saudi Arabia"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-7673-5850","authenticated-orcid":false,"given":"Hisham","family":"Kholidy","sequence":"additional","affiliation":[{"name":"Department of Networks and Computer Security, SUNY Polytechnic Institute, College of Engineering, Utica, NY 13502, USA"}]}],"member":"1968","published-online":{"date-parts":[[2023,8,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"118720","DOI":"10.1016\/j.eswa.2022.118720","article-title":"DeepNAVI: A deep learning based smartphone navigation assistant for people with visual impairments","volume":"212","author":"Kuriakose","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Khan, G., Tariq, Z., and Khan, M.U.G. (2019). Multi-Person Tracking Based on Faster R-CNN and Deep Appearance Features, Intechopen.","DOI":"10.5772\/intechopen.85215"},{"key":"ref_3","first-page":"1","article-title":"Third eye: Object recognition and tracking system to assist visually impaired people","volume":"218","author":"Tambe","year":"2022","journal-title":"Int. Res. J. Mod. Eng. Technol. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Rathi, M., Sahu, S., Goel, A., and Gupta, P. (2022). Personalized Health Framework for Visually Impaired. Informatica, 46.","DOI":"10.31449\/inf.v46i1.2934"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Tapu, R., Mocanu, B., and Zaharia, T. (2017). DEEP-SEE: Joint Object Detection, Tracking and Recognition with Application to Visually Impaired Navigational Assistance. Sensors, 17.","DOI":"10.3390\/s17112473"},{"key":"ref_6","unstructured":"Shadi, S., Hadi, S., Nazari, M., and Hardt, W. (2023, June 02). Outdoor Navigation for Visually Impaired Based on Deep Learning. 2019. Volume 2514, pp. 97\u2013406. Available online: https:\/\/ceur-ws.org\/Vol-2514\/paper102.pdf."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Deepa, R., Tamilselvan, E., Abrar, E., and Sampath, S. (2019, January 4\u20136). Comparison of yolo, ssd, faster rcnn for real time tennis ball tracking for action decision networks. Proceedings of the International Conference on Advances in Computing and Communication Engineering (ICACCE), IEEE, Sathyamangalam, India.","DOI":"10.1109\/ICACCE46606.2019.9079965"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Kim, J., Sung, J.Y., and Park, S. (2020, January 1\u20133). Comparison of Faster-RCNN, YOLO, and SSD for real-time vehicle type recognition. Proceedings of the IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), Seoul, Republic of Korea.","DOI":"10.1109\/ICCE-Asia49877.2020.9277040"},{"key":"ref_9","first-page":"109","article-title":"Development smart eyeglasses for visually impaired people based on you only look once","volume":"20","author":"Hassan","year":"2022","journal-title":"Telkomnika Telecommun. Comput. Electron. Control"},{"key":"ref_10","first-page":"1","article-title":"Convolutional neural network for object detection system for blind people","volume":"11","author":"Wong","year":"2019","journal-title":"J. Telecommun. Electron. Comput. Eng."},{"key":"ref_11","first-page":"9715891","article-title":"Vision Navigator: A Smart and Intelligent Obstacle Recognition Model for Visually Impaired Users","volume":"2022","author":"Suman","year":"2022","journal-title":"Mob. Inf. Syst."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"14819","DOI":"10.1109\/ACCESS.2022.3148036","article-title":"CNN-Based Object Recognition and Tracking System to Assist Visually Impaired People","volume":"10","author":"Ashiq","year":"2022","journal-title":"IEEE Access"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Shamsollahi, D., Moselhi, O., and Khorasani, K. (2021, January 2\u20134). A Timely Object Recognition Method for Construction using the Mask R-CNN Architecture. Proceedings of the International Symposium on Automation and Robotics in Construction, Dubai, United Arab Emirates.","DOI":"10.22260\/ISARC2021\/0052"},{"key":"ref_14","first-page":"3434","article-title":"An assistive model of obstacle detection based on deep learning: YOLOv3 for visually impaired people","volume":"11","author":"Rachburee","year":"2021","journal-title":"Int. J. Electr. Comput. Eng."},{"key":"ref_15","first-page":"147","article-title":"Development of a Convolutional Neural Network-Based Object Recognition System for Uncovered Gutters and Bollards","volume":"5","author":"Adeyanju","year":"2022","journal-title":"ABUAD J. Eng. Res. Dev."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Rahman, M.M., Manik, M.M.H., Islam, M.M., Mahmud, S., and Kim, J.-H. (2020, January 9\u201312). An Automated System to Limit COVID-19 Using Facial Mask Detection in Smart City Network. Proceedings of the 2020 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), Vancouver, BC, Canada.","DOI":"10.1109\/IEMTRONICS51293.2020.9216386"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1016\/j.susoc.2021.08.001","article-title":"Application of deep learning and machine learning models to detect COVID-19 face masks\u2014A review","volume":"2","author":"Mbunge","year":"2021","journal-title":"Sustain. Oper. Comput."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Xie, L. (2021, January 8\u201310). Analysis of Commodity image recognition based on deep learning. Proceedings of the 6th International Conference on Multimedia and Image Processing, Zhuhai, China.","DOI":"10.1145\/3449388.3449389"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"8992","DOI":"10.3390\/s110908992","article-title":"Integrating Millimeter Wave Radar with a Monocular Vision Sensor for On-Road Obstacle Detection Applications","volume":"11","author":"Wang","year":"2011","journal-title":"Sensors"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3063592","article-title":"Mobile Multi-Food Recognition Using Deep Learning","volume":"13","author":"Pouladzadeh","year":"2017","journal-title":"ACM Trans. Multimedia Comput. Commun. Appl."},{"key":"ref_21","unstructured":"Alahmadi, T., and Drew, S. (June, January 28). Subjective evaluation of website accessibility and usability: A survey for people with sensory disabilities. Proceedings of the 14th International Web for All Conference, Perth, Australia."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"449","DOI":"10.3233\/AIS-170441","article-title":"An approach for developing indoor navigation systems for visually impaired people using Building Information Modeling","volume":"9","author":"Ivanov","year":"2017","journal-title":"J. Ambient. Intell. Smart Environ."},{"key":"ref_23","unstructured":"Bhadani, A.K., and Sinha, A.J. (2020). A facemask detector using machine learning and image processing techniques. Eng. Sci. Technol. Int. J., 1\u20138."},{"key":"ref_24","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1007\/s11277-019-06294-1","article-title":"Real Time Multi Object Detection for Blind Using Single Shot Multibox Detector","volume":"107","author":"Arora","year":"2019","journal-title":"Wirel. Pers. Commun."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2265","DOI":"10.1007\/s11063-020-10197-9","article-title":"An Evaluation of RetinaNet on Indoor Object Detection for Blind and Visually Impaired Persons Assistance Navigation","volume":"51","author":"Afif","year":"2020","journal-title":"Neural Process Lett."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Alzahrani, N., and Al-Baity, H.H. (2023). Object Recognition System for the Visually Impaired: A Deep Learning Approach using Arabic Annotation. Electronics, 12.","DOI":"10.3390\/electronics12030541"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Lin, Y.T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., and Zitnick, C.L. (2014;, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Huang, R., Pedoeem, J., and Chen, C. (2018, January 10\u201313). YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers. Proceedings of the 2018 IEEE International Conference on Big Data, Seattle, WA, USA.","DOI":"10.1109\/BigData.2018.8621865"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_31","first-page":"8403262","article-title":"Object Detection through Modified YOLO Neural Network","volume":"2020","author":"Ahmad","year":"2020","journal-title":"Sci. Program."}],"container-title":["Multimodal Technologies and Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2414-4088\/7\/8\/77\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,2]],"date-time":"2023-08-02T17:51:14Z","timestamp":1690998674000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2414-4088\/7\/8\/77"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,2]]},"references-count":31,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2023,8]]}},"alternative-id":["mti7080077"],"URL":"https:\/\/doi.org\/10.3390\/mti7080077","relation":{},"ISSN":["2414-4088"],"issn-type":[{"value":"2414-4088","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,2]]}}}