{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,8]],"date-time":"2024-06-08T00:33:05Z","timestamp":1717806785172},"reference-count":45,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2024,6,7]],"date-time":"2024-06-07T00:00:00Z","timestamp":1717718400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Institute of Information & communications Technology Planning & Evaluation","award":["RS-2022-00167169","RS-2022-00155911"]},{"name":"Convergence security core talent training business support program","award":["IITP-2023-RS-2023-00266615"]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"Fisheye cameras play a crucial role in various fields by offering a wide field of view, enabling the capture of expansive areas within a single frame. Nonetheless, the radial distortion characteristics of fisheye lenses lead to notable shape deformation, particularly at the edges of the image, posing a significant challenge for accurate object detection. In this paper, we introduce a novel method, \u2018VP-aided fine-tuning\u2019, which harnesses the strengths of the pretraining\u2013fine-tuning paradigm augmented by visual prompting (VP) to bridge the domain gap between undistorted standard datasets and distorted fisheye image datasets. Our approach involves two key elements: the use of VPs to effectively adapt a pretrained model to the fisheye domain, and a detailed 24-point regression of objects to fit the unique distortions of fisheye images. This 24-point regression accurately defines the object boundaries and substantially reduces the impact of environmental noise. The proposed method was evaluated against existing object detection frameworks on fisheye images, demonstrating superior performance and robustness. Experimental results also showed performance improvements with the application of VP, regardless of the variety of fine-tuning method applied.<\/jats:p>","DOI":"10.3390\/rs16122054","type":"journal-article","created":{"date-parts":[[2024,6,7]],"date-time":"2024-06-07T12:05:17Z","timestamp":1717761917000},"page":"2054","source":"Crossref","is-referenced-by-count":0,"title":["Fisheye Object Detection with Visual Prompting-Aided Fine-Tuning"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"http:\/\/orcid.org\/0009-0009-1172-8543","authenticated-orcid":false,"given":"Minwoo","family":"Jeon","sequence":"first","affiliation":[{"name":"College of Software, Kyunghee University, Yongin 17104, Republic of Korea"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-4011-9981","authenticated-orcid":false,"given":"Gyeong-Moon","family":"Park","sequence":"additional","affiliation":[{"name":"College of Software, Kyunghee University, Yongin 17104, Republic of Korea"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-3241-8455","authenticated-orcid":false,"given":"Hyoseok","family":"Hwang","sequence":"additional","affiliation":[{"name":"College of Software, Kyunghee University, Yongin 17104, Republic of Korea"}]}],"member":"1968","published-online":{"date-parts":[[2024,6,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1445","DOI":"10.1007\/s11042-013-1641-3","article-title":"Approximate model of fisheye camera based on the optical refraction","volume":"73","author":"Zhu","year":"2014","journal-title":"Multimed. Tools Appl."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Choi, K.H., Kim, Y., and Kim, C. (2019). Analysis of Fish-Eye Lens Camera Self-Calibration. Sensors, 19.","DOI":"10.3390\/s19051218"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Yogamani, S., Hughes, C., Horgan, J., Sistu, G., Varley, P., O\u2019Dea, D., Uric\u00e1r, M., Milz, S., Simon, M., and Amende, K. (2019, January 27\u201328). Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00940"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1016\/j.isprsjprs.2019.11.014","article-title":"Panoramic SLAM from a multiple fisheye camera rig","volume":"159","author":"Ji","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_5","unstructured":"Xiong, Y., and Turkowski, K. (1997, January 17\u201319). Creating image-based VR using a self-calibrating fisheye lens. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, PR, USA."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1007\/s11263-019-01247-4","article-title":"Deep Learning for Generic Object Detection: A Survey","volume":"128","author":"Liu","year":"2020","journal-title":"Int. J. Comput. Vis."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Gochoo, M., Otgonbold, M.E., Ganbold, E., Hsieh, J.W., Chang, M.C., Chen, P.Y., Dorj, B., Al Jassmi, H., Batnasan, G., and Alnajjar, F. (2023, January 17\u201323). FishEye8K: A Benchmark and Dataset for Fisheye Camera Object Detection. Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada.","DOI":"10.1109\/CVPRW59228.2023.00559"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1109\/LRA.2015.2502921","article-title":"An enhanced unified camera model","volume":"1","author":"Khomutenko","year":"2015","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Coors, B., Condurache, A.P., and Geiger, A. (2018, January 8\u201314). Spherenet: Learning spherical representations for detection and classification in omnidirectional images. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_32"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Rashed, H., Mohamed, E., Sistu, G., Kumar, V.R., Eising, C., El-Sallab, A., and Yogamani, S. (2021, January 3\u20138). Generalized Object Detection on Fisheye Cameras for Autonomous Driving: Dataset, Representations and Baseline. Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV48630.2021.00232"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Lei, X., Sun, B., Peng, J., and Zhang, F. (2020, January 6\u20138). Fisheye Image Object Detection Based on an Improved YOLOv3 Algorithm. Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China.","DOI":"10.1109\/CAC51589.2020.9326859"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Xu, X., Gao, Y., Liang, H., Yang, Y., and Fu, M. (2022, January 23\u201327). Fisheye object detection based on standard image datasets with 24-points regression strategy. Proceedings of the 2022 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan.","DOI":"10.1109\/IROS47612.2022.9981891"},{"key":"ref_13","unstructured":"Wang, X., Xu, X., Gao, Y., Yang, Y., Yue, Y., and Fu, M. (2023). CRRS: Concentric Rectangles Regression Strategy for Multi-point Representation on Fisheye Images. arXiv."},{"key":"ref_14","unstructured":"Bao, H., Dong, L., Piao, S., and Wei, F. (2021, January 3\u20137). BEiT: BERT Pre-Training of Image Transformers. Proceedings of the International Conference on Learning Representations, Vienna, Austria."},{"key":"ref_15","unstructured":"Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020, January 9\u201311). A Simple Framework for Contrastive Learning of Visual Representations. Proceedings of the 37th International Conference on Machine Learning, PMLR, Vienna, Austria."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Chen, X., Xie, S., and He, K. (2021, January 11\u201317). An empirical study of training self-supervised vision transformers. Proceedings of the CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00950"},{"key":"ref_17","unstructured":"Yuan, L., Chen, D., Chen, Y.L., Codella, N., Dai, X., Gao, J., Hu, H., Huang, X., Li, B., and Li, C. (2021). Florence: A new foundation model for computer vision. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., and Dong, L. (2022, January 18\u201324). Swin transformer v2: Scaling up capacity and resolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01170"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Lester, B., Al-Rfou, R., and Constant, N. (2021, January 7\u201311). The Power of Scale for Parameter-Efficient Prompt Tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.emnlp-main.243"},{"key":"ref_21","unstructured":"Hu, E.J., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021, January 3\u20137). LoRA: Low-Rank Adaptation of Large Language Models. Proceedings of the International Conference on Learning Representations, Virtual."},{"key":"ref_22","unstructured":"Zhang, Y., Zhou, K., and Liu, Z. (2022). Neural Prompt Search. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Jia, M., Tang, L., Chen, B.C., Cardie, C., Belongie, S., Hariharan, B., and Lim, S.N. (2022). Visual prompt tuning. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-031-19827-4_41"},{"key":"ref_24","unstructured":"Bahng, H., Jahanian, A., Sankaranarayanan, S., and Isola, P. (2022). Exploring visual prompts for adapting large-scale models. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Li, X.L., and Liang, P. (2021, January 1\u20136). Prefix-Tuning: Optimizing Continuous Prompts for Generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual. Volume 1: Long Papers.","DOI":"10.18653\/v1\/2021.acl-long.353"},{"key":"ref_26","first-page":"1","article-title":"Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing","volume":"55","author":"Liu","year":"2023","journal-title":"ACM Comput. Surv."},{"key":"ref_27","unstructured":"Kenton, J.D.M.W.C., and Toutanova, L.K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the NAACL-HLT, Minneapolis, MN, USA."},{"key":"ref_28","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_29","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020, January 26\u201330). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Kornblith, S., Shlens, J., and Le, Q.V. (2019, January 15\u201320). Do better imagenet models transfer better?. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00277"},{"key":"ref_31","first-page":"11285","article-title":"TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning","volume":"Volume 33","author":"Cai","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_32","unstructured":"Xuhong, L., Grandvalet, Y., and Davoine, F. (2018, January 10\u201315). Explicit inductive bias for transfer learning with convolutional networks. Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden."},{"key":"ref_33","unstructured":"Kumar, A., Raghunathan, A., Jones, R., Ma, T., and Liang, P. (2022, January 25\u201329). Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution. Proceedings of the International Conference on Learning Representations, Virtual."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Lee, Y., Jeong, J., Yun, J., Cho, W., and Yoon, K.J. (2019, January 15\u201320). SpherePHD: Applying CNNs on a Spherical PolyHeDron Representation of 360\u00b0 Images. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00940"},{"key":"ref_35","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arXiv."},{"key":"ref_36","unstructured":"Elsayed, G.F., Goodfellow, I., and Sohl-Dickstein, J. (2018). Adversarial reprogramming of neural networks. arXiv."},{"key":"ref_37","unstructured":"Goodfellow, I.J., Shlens, J., and Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv."},{"key":"ref_38","unstructured":"Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Li, F.-F. (2009, January 20\u201325). ImageNet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_41","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland. Proceedings, Part V 13.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_43","unstructured":"Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv."},{"key":"ref_44","unstructured":"Loshchilov, I., and Hutter, F. (2016, January 2\u20134). SGDR: Stochastic Gradient Descent with Warm Restarts. Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico."},{"key":"ref_45","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/12\/2054\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,7]],"date-time":"2024-06-07T13:02:32Z","timestamp":1717765352000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/12\/2054"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,7]]},"references-count":45,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2024,6]]}},"alternative-id":["rs16122054"],"URL":"https:\/\/doi.org\/10.3390\/rs16122054","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,7]]}}}