{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T16:41:16Z","timestamp":1740156076201,"version":"3.37.3"},"reference-count":20,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2019,6,5]],"date-time":"2019-06-05T00:00:00Z","timestamp":1559692800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"In this paper, our goal is to improve the recognition accuracy of battlefield target aggregation behavior while maintaining the low computational cost of spatio-temporal depth neural networks. To this end, we propose a novel 3D-CNN (3D Convolutional Neural Networks) model, which extends the idea of multi-scale feature fusion to the spatio-temporal domain, and enhances the feature extraction ability of the network by combining feature maps of different convolutional layers. In order to reduce the computational complexity of the network, we further improved the multi-fiber network, and finally established an architecture\u20143D convolution Two-Stream model based on multi-scale feature fusion. Extensive experimental results on the simulation data show that our network significantly boosts the efficiency of existing convolutional neural networks in the aggregation behavior recognition, achieving the most advanced performance on the dataset constructed in this paper.<\/jats:p>","DOI":"10.3390\/sym11060761","type":"journal-article","created":{"date-parts":[[2019,6,6]],"date-time":"2019-06-06T07:38:01Z","timestamp":1559806681000},"page":"761","source":"Crossref","is-referenced-by-count":7,"title":["Battlefield Target Aggregation Behavior Recognition Model Based on Multi-Scale Feature Fusion"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1157-9041","authenticated-orcid":false,"given":"Haiyang","family":"Jiang","sequence":"first","affiliation":[{"name":"Space Engineering University, 81 Road, Huairou District, Beijing 101400, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4442-6332","authenticated-orcid":false,"given":"Yaozong","family":"Pan","sequence":"additional","affiliation":[{"name":"Space Engineering University, 81 Road, Huairou District, Beijing 101400, China"}]},{"given":"Jian","family":"Zhang","sequence":"additional","affiliation":[{"name":"Space Engineering University, 81 Road, Huairou District, Beijing 101400, China"}]},{"given":"Haitao","family":"Yang","sequence":"additional","affiliation":[{"name":"Space Engineering University, 81 Road, Huairou District, Beijing 101400, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,6,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1049\/iet-cvi.2016.0252","article-title":"Meta-action descriptor for action recognition in RGBD video","volume":"11","author":"Huang","year":"2017","journal-title":"IET Comput. Vis."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/j.cviu.2016.03.013","article-title":"Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice","volume":"150","author":"Peng","year":"2016","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Duta, I.C., Ionescu, B., Aizawa, K., and Sebe, N. (2017, January 4\u20136). Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos. Proceedings of the International Conference on the MultiMedia Modeling (MMM), Reykjavik, Iceland.","DOI":"10.1007\/978-3-319-51811-4_30"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Li, C., Zhong, Q., Xie, D., and Pu, S. (2019, May 14). Collaborative Spatio-temporal Feature Learning for Video Action Recognition. Available online: https:\/\/arxiv.org\/pdf\/1903.01197.","DOI":"10.1109\/CVPR.2019.00806"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zolfaghari, M., Oliveira, G.L., Sedaghat, N., and Brox, T. (2017, January 22\u201329). Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.316"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wang, H., and Schmid, C. (2013, January 8\u201312). Action Recognition with Improved Trajectories. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia.","DOI":"10.1109\/ICCV.2013.441"},{"key":"ref_7","first-page":"111","article-title":"Two-Stream Convolutional Networks for Action Recognition in Videos","volume":"2","author":"Simonyan","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., and Van Gool, L. (2016, January 8\u201316). Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"ref_9","first-page":"3468","article-title":"Spatiotemporal Residual Networks for Video Action Recognition","volume":"29","author":"Feichtenhofer","year":"2016","journal-title":"Neural Inf. Process. Syst."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C., Pinz, A., and Wildes, R.P. (2017, January 22\u201325). Spatiotemporal Multiplier Networks for Video Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.787"},{"key":"ref_11","unstructured":"Zhu, Y., Lan, Z., Newsam, S., and Hauptmann, A.G. (2019, May 14). Hidden Two-Stream Convolutional Networks for Action Recognition. Available online: https:\/\/arxiv.org\/pdf\/1704.0389."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"677","DOI":"10.1109\/TPAMI.2016.2599174","article-title":"Long-Term Recurrent Convolutional Networks for Visual Recognition and Description","volume":"39","author":"Donahue","year":"2017","journal-title":"TPAMI"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1510","DOI":"10.1109\/TPAMI.2017.2712608","article-title":"Long-Term Temporal Convolutions for Action Recognition","volume":"40","author":"Varol","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_14","unstructured":"Carreira, J., and Zisserman, A. (2017, January 22\u201325). Quo Vadis, Action Recognition?. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Qiu, Z., Yao, T., and Mei, T. (2017, January 22\u201329). Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.590"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Xie, S., Sun, C., Huang, J., Tu, Z., and Murphy, K. (2018, January 8\u201314). Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01267-0_19"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., and Paluri, M. (2018, January 19\u201321). A Closer Look at Spatiotemporal Convolutions for Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00675"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Chen, Y., Kalantidis, Y., Li, J., Yan, S., and Feng, J. (2018, January 8\u201314). Multi-Fiber Networks for Video Recognition. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01246-5_22"},{"key":"ref_19","unstructured":"Lin, T.Y., Dollar, P., Girshick, R., He, K.M., Hariharan, B., and Belongie, S. (2019, May 14). Feature Pyramid Networks for Object Detection. Available online: https:\/\/arxiv.org\/pdf\/1612.03144."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1010003","DOI":"10.3788\/AOS201737.1010003","article-title":"Fast road detection based on RGBD images and convolutional neural network Acta Optica Sinica","volume":"37","author":"Qu","year":"2017","journal-title":"Acta Opt. Sinica"}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/6\/761\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,19]],"date-time":"2024-07-19T06:33:47Z","timestamp":1721370827000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/6\/761"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,6,5]]},"references-count":20,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2019,6]]}},"alternative-id":["sym11060761"],"URL":"https:\/\/doi.org\/10.3390\/sym11060761","relation":{},"ISSN":["2073-8994"],"issn-type":[{"type":"electronic","value":"2073-8994"}],"subject":[],"published":{"date-parts":[[2019,6,5]]}}}