{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,29]],"date-time":"2024-10-29T04:16:28Z","timestamp":1730175388642,"version":"3.28.0"},"reference-count":37,"publisher":"Institution of Engineering and Technology (IET)","issue":"12","license":[{"start":{"date-parts":[[2024,7,17]],"date-time":"2024-07-17T00:00:00Z","timestamp":1721174400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["ietresearch.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["IET Image Processing"],"published-print":{"date-parts":[[2024,10]]},"abstract":"Abstract<\/jats:title>When detecting the objects in videos, motion always leads to object deterioration, like blurring and occlusion, as well as the strange state of the object's shape and posture. Consequently, the detection of video frames will lead to a decline in accuracy by using the image object detection model. This paper proposes an online video object detection method based on the one\u2010stage detector YOLOx. First, the module for space\u2013time feature aggregation is given, which uses the space\u2013time information of past frames to enhance the feature quality of the current frame. Then, the module for result reuse is given, which incorporates the detection results of past frames to improve the detection stability of the current frame. By these two modules, the trade\u2010off between accuracy and speed of video object detection could be achieved. Experimental results on the ImageNet VID show the improvement of speed and accuracy of the proposed\u00a0method.<\/jats:p>","DOI":"10.1049\/ipr2.13179","type":"journal-article","created":{"date-parts":[[2024,7,17]],"date-time":"2024-07-17T09:41:52Z","timestamp":1721209312000},"page":"3356-3367","update-policy":"http:\/\/dx.doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Video object detection via space\u2013time feature aggregation and result reuse"],"prefix":"10.1049","volume":"18","author":[{"given":"Liang","family":"Duan","sequence":"first","affiliation":[{"name":"School of Information Science and Engineering Yunnan University Kunming China"},{"name":"Yunnan Key Laboratory of Intelligent Systems and Computing Yunnan University Kunming China"}]},{"given":"Rongfei","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering Yunnan University Kunming China"},{"name":"Yunnan Key Laboratory of Intelligent Systems and Computing Yunnan University Kunming China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-3641-1461","authenticated-orcid":false,"given":"Kun","family":"Yue","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering Yunnan University Kunming China"},{"name":"Yunnan Key Laboratory of Intelligent Systems and Computing Yunnan University Kunming China"}]},{"given":"Zhengbao","family":"Sun","sequence":"additional","affiliation":[{"name":"School of Engineering Yunnan University Kunming China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-8449-6861","authenticated-orcid":false,"given":"Guowu","family":"Yuan","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering Yunnan University Kunming China"},{"name":"Yunnan Key Laboratory of Intelligent Systems and Computing Yunnan University Kunming China"}]}],"member":"265","published-online":{"date-parts":[[2024,7,17]]},"reference":[{"key":"e_1_2_10_2_1","doi-asserted-by":"publisher","DOI":"10.1049\/ipr2.12714"},{"key":"e_1_2_10_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/s44196-023-00256-z"},{"key":"e_1_2_10_4_1","first-page":"497","article-title":"A feature temporal attention based interleaved network for fast video object detection","volume":"4","author":"Yang Y.","year":"2021","journal-title":"J. Ambient Intell. Hum. Comput."},{"key":"e_1_2_10_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-021-02649-z"},{"key":"e_1_2_10_6_1","doi-asserted-by":"crossref","unstructured":"Sun G. Hua Y. Hu G. Robertson N.:MAMBA: multi\u2010level aggregation via memory bank for video object detection. In:Proceedings of the AAAI Conference on Artificial Intelligence pp.2620\u20132627.AAAI Publications Washington D.C. (2021)","DOI":"10.1609\/aaai.v35i3.16365"},{"key":"e_1_2_10_7_1","doi-asserted-by":"crossref","unstructured":"Xu R. Mu F. Lee J. Mukherjee P. Chaterji S. Bagchi S. Li Y.:Smartadapt: multi\u2010branch object detection framework for videos on mobiles. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp.2528\u20132538.IEEE Piscataway NJ(2022)","DOI":"10.1109\/CVPR52688.2022.00256"},{"key":"e_1_2_10_8_1","unstructured":"Ge Z. Liu S. Wang F. Li Z. Sun J.:YOLOX: Exceeding YOLO series in 2021. arXiv:210708430 (2021)"},{"key":"e_1_2_10_9_1","doi-asserted-by":"crossref","unstructured":"Tian Z. Shen C. Chen H. He T.:FCOS: fully convolutional one\u2010stage object detection. In:Proceedings of the IEEE International Conference on Computer Vision pp.9627\u20139636.IEEE Piscataway NJ(2019)","DOI":"10.1109\/ICCV.2019.00972"},{"key":"e_1_2_10_10_1","doi-asserted-by":"crossref","unstructured":"Fujitake M. Sugimoto A.:Real\u2010time object detection by feature map forecast for live streaming video. In:Proceedings of the IEEE International Conference on Multimedia and Expo pp.1\u20136.IEEE Piscataway NJ(2021)","DOI":"10.1109\/ICME51207.2021.9428277"},{"key":"e_1_2_10_11_1","doi-asserted-by":"crossref","unstructured":"Zhu X. Wang Y. Dai J. Yuan L. Wei Y.:Flow\u2010guided feature aggregation for video object detection. In:Proceedings of the IEEE International Conference on Computer Vision pp.408\u2013417.IEEE Piscataway NJ(2017)","DOI":"10.1109\/ICCV.2017.52"},{"key":"e_1_2_10_12_1","doi-asserted-by":"crossref","unstructured":"Deng J. Pan Y. Yao T. Zhou W. Li H. Mei T.:Relation distillation networks for video object detection. In:Proceedings of the IEEE International Conference on Computer Vision pp.7023\u20137032.IEEE Piscataway NJ(2019)","DOI":"10.1109\/ICCV.2019.00712"},{"key":"e_1_2_10_13_1","doi-asserted-by":"crossref","unstructured":"Chen Y. Cao Y. Hu H. Wang L.:Memory enhanced global\u2010local aggregation for video object detection. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp.10337\u201310346.IEEE Piscataway NJ(2020)","DOI":"10.1109\/CVPR42600.2020.01035"},{"key":"e_1_2_10_14_1","doi-asserted-by":"crossref","unstructured":"Fu Z. Liu Q. Fu Z. Wang Y.:STMTrack: template\u2010free visual tracking with space\u2010time memory networks. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp.13774\u201313783.IEEE Piscataway NJ(2021)","DOI":"10.1109\/CVPR46437.2021.01356"},{"key":"e_1_2_10_15_1","doi-asserted-by":"crossref","unstructured":"Redmon J. Divvala S. Girshick R. Farhadi A.:You only look once: unified real\u2010time object detection. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp.779\u2013788.IEEE Piscataway NJ(2016)","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_2_10_16_1","doi-asserted-by":"crossref","unstructured":"Liu W. Anguelov D. Erhan D. Szegedy C. Reed S. Fu C.\u2010Y. Berg A.C.:SSD: single shot multibox detector. In:Proceedings of the European Conference on Computer Vision pp.21\u201337.Springer Cham(2016)","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"e_1_2_10_17_1","unstructured":"Ren S. He K. Girshick R. Sun J.:Faster R\u2010CNN: towards real\u2010time object detection with region proposal networks. In:NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems pp.91\u201399.ACM New York(2015)"},{"key":"e_1_2_10_18_1","unstructured":"Dai J. Li Y. He K. Sun J.:R\u2010FCN: object detection via region\u2010based fully convolutional networks. In:NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems pp.379\u2013387.ACM New York(2016)"},{"key":"e_1_2_10_19_1","doi-asserted-by":"crossref","unstructured":"Liang J. Chen H. Du K. Yan Y. Wang H.:Learning intra\u2010inter semantic aggregation for video object detection. In:Proceedings of the ACM International Conference on Multimedia in Asia pp.1\u20137.ACM New York(2021)","DOI":"10.1145\/3444685.3446273"},{"key":"e_1_2_10_20_1","doi-asserted-by":"crossref","unstructured":"Deng H. Hua Y. Song T. Zhang Z. Xue Z. Ma R. Robertson N. Guan H.:Object guided external memory network for video object detection. In:Proceedings of the IEEE International Conference on Computer Vision pp.6677\u20136686.IEEE Piscataway NJ(2019)","DOI":"10.1109\/ICCV.2019.00678"},{"key":"e_1_2_10_21_1","doi-asserted-by":"crossref","unstructured":"Wang S. Zhou Y. Yan J. Deng Z.:Fully motion\u2010aware network for video object detection. In:Proceedings of the European Conference on Computer Vision pp.542\u2013557.Springer Cham(2018)","DOI":"10.1007\/978-3-030-01261-8_33"},{"key":"e_1_2_10_22_1","doi-asserted-by":"crossref","unstructured":"Chen Z. Li W. Fei C. Liu B. Yu N.:Spatial\u2010temporal feature aggregation network for video object detection. In:Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing pp.1858\u20131862.IEEE Piscataway NJ(2020)","DOI":"10.1109\/ICASSP40776.2020.9054080"},{"key":"e_1_2_10_23_1","unstructured":"Han W. Khorrami P. Paine T.L. Ramachandran P. Babaeizadeh M. Shi H. Li J. Yan S. Huang T.S.:Seq\u2010NMS for video object detection. arXiv:160208465 (2016)"},{"key":"e_1_2_10_24_1","doi-asserted-by":"crossref","unstructured":"Belhassen H. Zhang H. Fresse V. Bourennane E.:Improving video object detection by Seq\u2010Bbox matching. In:Proceedings of the International Joint Conference on Computer Vision Imaging and Computer Graphics Theory and Applications pp.226\u2013233.Springer Cham(2019)","DOI":"10.5220\/0007260002260233"},{"key":"e_1_2_10_25_1","doi-asserted-by":"crossref","unstructured":"Feichtenhofer C. Pinz A. Zisserman A.:Detect to track and track to detect. In:Proceedings of the IEEE International Conference on Computer Vision pp.3038\u20133046.IEEE Piscataway NJ(2017)","DOI":"10.1109\/ICCV.2017.330"},{"key":"e_1_2_10_26_1","doi-asserted-by":"crossref","unstructured":"Wu H. Chen Y. Wang N. Zhang Z.:Sequence level semantics aggregation for video object detection. In:Proceedings of the IEEE International Conference on Computer Vision pp.9217\u20139225.IEEE Piscataway NJ(2019)","DOI":"10.1109\/ICCV.2019.00931"},{"key":"e_1_2_10_27_1","doi-asserted-by":"crossref","unstructured":"Fujitake M. Sugimoto A.:Video representation learning through prediction for online object detection. In:Proceedings of the IEEE Winter Conference on Applications of Computer Vision pp.530\u2013539.IEEE Piscataway NJ(2022)","DOI":"10.1109\/WACVW54805.2022.00059"},{"key":"e_1_2_10_28_1","doi-asserted-by":"crossref","unstructured":"Yao C.H. Fang C. Shen X. Wan Y. Yang M.H.:Video object detection via object\u2010level temporal aggregation. In:Proceedings of the 16th European Conference on Computer Vision\u2013ECCV 2020 pp.160\u2013177.Springer Cham(2020)","DOI":"10.1007\/978-3-030-58568-6_10"},{"key":"e_1_2_10_29_1","doi-asserted-by":"crossref","unstructured":"Zhu X. Xiong Y. Dai J. Yuan L. Wei Y.:Deep feature flow for video recognition. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp.2349\u20132358.IEEE Piscataway NJ(2017)","DOI":"10.1109\/CVPR.2017.441"},{"key":"e_1_2_10_30_1","doi-asserted-by":"crossref","unstructured":"Jiang Z. Liu Y. Yang C. Liu J. Gao P. Zhang Q. Xiang S. Pan C.:Learning where to focus for efficient video object detection. In:Proceedings of the 16th European Conference onComputer Vision\u2013ECCV 2020 pp.18\u201334.Springer Cham(2020)","DOI":"10.1007\/978-3-030-58517-4_2"},{"key":"e_1_2_10_31_1","doi-asserted-by":"crossref","unstructured":"Dosovitskiy A. Fischer P. Ilg E. Hausser P. Hazirbas C. Golkov V. Van Der Smagt P. Cremers D. Brox T.:FlowNet: learning optical flow with convolutional networks. In:Proceedings of the IEEE International Conference on Computer Vision pp.2758\u20132766.IEEE Piscataway NJ(2015)","DOI":"10.1109\/ICCV.2015.316"},{"key":"e_1_2_10_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2019.2894261"},{"key":"e_1_2_10_33_1","unstructured":"Kim J. Koh J. Lee B. Yang S. Choi J.W.:Video object detection using object's motion context and spatio\u2010temporal feature aggregation. In:Proceedings of the International Joint Conference on Computer Vision Imaging and Computer Graphics Theory and Applications pp.226\u2013233.IEEE Piscataway NJ(2021)"},{"key":"e_1_2_10_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.2990070"},{"key":"e_1_2_10_35_1","unstructured":"Shi X. Chen Z. Wang H. Yeung D.Y. Wong W.K. Woo W.c.:Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In:NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems pp.802\u2013810.ACM New York(2015)"},{"key":"e_1_2_10_36_1","doi-asserted-by":"crossref","unstructured":"Lin T.Y. Doll\u00e1r P. Girshick R. He K. Hariharan B. Belongie S.:Feature pyramid networks for object detection. In:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp.2117\u20132125.IEEE Piscataway NJ(2017)","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_2_10_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_2_10_38_1","doi-asserted-by":"crossref","unstructured":"Negi A. Kumar K. Saini P. Kashid S.:Object detection based approach for an efficient video summarization with system statistics over cloud. In:2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical Electronics and Computer Engineering (UPCON) pp.1\u20136.IEEE Piscataway NJ(2022)","DOI":"10.1109\/UPCON56432.2022.9986376"}],"container-title":["IET Image Processing"],"original-title":[],"language":"en","deposited":{"date-parts":[[2024,10,28]],"date-time":"2024-10-28T18:33:00Z","timestamp":1730140380000},"score":1,"resource":{"primary":{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/10.1049\/ipr2.13179"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,17]]},"references-count":37,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,10]]}},"alternative-id":["10.1049\/ipr2.13179"],"URL":"https:\/\/doi.org\/10.1049\/ipr2.13179","archive":["Portico"],"relation":{},"ISSN":["1751-9659","1751-9667"],"issn-type":[{"type":"print","value":"1751-9659"},{"type":"electronic","value":"1751-9667"}],"subject":[],"published":{"date-parts":[[2024,7,17]]},"assertion":[{"value":"2023-07-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-02","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-17","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}