{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,18]],"date-time":"2024-10-18T04:23:41Z","timestamp":1729225421640,"version":"3.27.0"},"reference-count":60,"publisher":"Wiley","issue":"19","license":[{"start":{"date-parts":[[2024,5,26]],"date-time":"2024-05-26T00:00:00Z","timestamp":1716681600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Computer aided Civil Eng"],"published-print":{"date-parts":[[2024,10]]},"abstract":"Abstract<\/jats:title>Autonomous equipment is playing an increasingly important role in construction tasks. It is essential to equip autonomous equipment with powerful 3D detection capability to avoid accidents and inefficiency. However, there is limited research within the construction field that has extended detection to 3D. To this end, this study develops a light detection and ranging (LiDAR)\u2010based deep\u2010learning model for the 3D detection of workers on construction sites. The proposed model adopts a voxel\u2010based anchor\u2010free 3D object detection paradigm. To enhance the feature extraction capability for tough detection tasks, a novel Transformer\u2010based block is proposed, where the multi\u2010head self\u2010attention is applied in local grid regions. The detection model integrates the Transformer blocks with 3D sparse convolution to extract wide and local features while pruning redundant features in modified downsampling layers. To train and test the proposed model, a LiDAR point cloud dataset was created, which includes workers in construction sites with 3D box annotations. The experiment results indicate that the proposed model outperforms the baseline models with higher mean average precision and smaller regression errors. The method in the study is promising to provide worker detection with rich and accurate 3D information required by construction automation.<\/jats:p>","DOI":"10.1111\/mice.13238","type":"journal-article","created":{"date-parts":[[2024,5,27]],"date-time":"2024-05-27T04:07:11Z","timestamp":1716782831000},"page":"2990-3007","update-policy":"http:\/\/dx.doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Deep learning framework with Local Sparse Transformer for construction worker detection in 3D with LiDAR"],"prefix":"10.1111","volume":"39","author":[{"given":"Mingyu","family":"Zhang","sequence":"first","affiliation":[{"name":"Department of Building and Real Estate Hong Kong Polytechnic University Hong Kong China"}]},{"given":"Lei","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Building and Real Estate Hong Kong Polytechnic University Hong Kong China"}]},{"given":"Shuai","family":"Han","sequence":"additional","affiliation":[{"name":"Department of Building and Real Estate Hong Kong Polytechnic University Hong Kong China"}]},{"given":"Shuyuan","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Building and Real Estate Hong Kong Polytechnic University Hong Kong China"}]},{"given":"Heng","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Building and Real Estate Hong Kong Polytechnic University Hong Kong China"}]}],"member":"311","published-online":{"date-parts":[[2024,5,26]]},"reference":[{"key":"e_1_2_8_2_1","unstructured":"Allinson M.(2022).Construction robotics startup Canvas launches drywall finishing robot. Robotics and Automation News.https:\/\/roboticsandautomationnews.com\/2022\/01\/27\/construction\u2010robotics\u2010startup\u2010canvas\u2010launches\u2010drywall\u2010finishing\u2010robot\/48705\/"},{"key":"e_1_2_8_3_1","doi-asserted-by":"publisher","DOI":"10.3390\/s17020286"},{"key":"e_1_2_8_4_1","doi-asserted-by":"crossref","unstructured":"Beltr\u00e1n J. Guindel C. Moreno F. M. Cruzado D. Garc\u00eda F. &De La Escalera A.(2018).BirdNet: A 3D object detection framework from LiDAR information.2018 21st International Conference on Intelligent Transportation Systems (ITSC) Maui HI (pp.3517\u20133523).https:\/\/doi.org\/10.1109\/ITSC.2018.8569311","DOI":"10.1109\/ITSC.2018.8569311"},{"key":"e_1_2_8_5_1","unstructured":"Business Research. (2023).Autonomous construction equipment market size trends and global forecast To 2032. The Business Research Company.https:\/\/www.thebusinessresearchcompany.com\/report\/autonomous\u2010construction\u2010equipment\u2010global\u2010market\u2010report"},{"key":"e_1_2_8_6_1","doi-asserted-by":"crossref","unstructured":"Caesar H. Bankiti V. Lang A. H. Vora S. Liong V. E. Xu Q. Krishnan A. Pan Y. Baldan G. &Beijbom O.(2020).nuScenes: A multimodal dataset for autonomous driving.2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Seattle WA (pp.11618\u201311628).https:\/\/doi.org\/10.1109\/CVPR42600.2020.01164","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"e_1_2_8_7_1","doi-asserted-by":"crossref","unstructured":"Charles R. Q. Su H. Kaichun M. &Guibas L. J.(2017).PointNet: Deep learning on point sets for 3D classification and segmentation.2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Honolulu HI (pp.77\u201385).https:\/\/doi.org\/10.1109\/CVPR.2017.16","DOI":"10.1109\/CVPR.2017.16"},{"key":"e_1_2_8_8_1","doi-asserted-by":"crossref","unstructured":"Chen Q. Sun L. Wang Z. Jia K. &Yuille A.(2020).Object as hotspots: An anchor\u2010free 3D object detection approach via firing of hotspots. arXiv.http:\/\/arxiv.org\/abs\/1912.12791","DOI":"10.1007\/978-3-030-58589-1_5"},{"key":"e_1_2_8_9_1","doi-asserted-by":"crossref","unstructured":"Chen Y. Liu J. Zhang X. Qi X. &Jia J.(2023).VoxelNeXt: Fully sparse VoxelNet for 3D object detection and tracking. arXiv.http:\/\/arxiv.org\/abs\/2303.11301","DOI":"10.1109\/CVPR52729.2023.02076"},{"key":"e_1_2_8_10_1","doi-asserted-by":"publisher","DOI":"10.3389\/frobt.2022.937772"},{"key":"e_1_2_8_11_1","unstructured":"Dosovitskiy A. Beyer L. Kolesnikov A. Weissenborn D. Zhai X. Unterthiner T. Dehghani M. Minderer M. Heigold G. Gelly S. Uszkoreit J. &Houlsby N.(2021).An image is worth 16\u00d716 words: Transformers for image recognition at scale. arXiv.http:\/\/arxiv.org\/abs\/2010.11929"},{"key":"e_1_2_8_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.autcon.2022.104428"},{"key":"e_1_2_8_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.aei.2018.05.003"},{"key":"e_1_2_8_14_1","unstructured":"Fey M.(2023).torch\u2010scatter: PyTorch extension library of optimized scatter operations (2.1.1) [Python].https:\/\/github.com\/rusty1s\/pytorch_scatter"},{"key":"e_1_2_8_15_1","doi-asserted-by":"crossref","unstructured":"Graham B. Engelcke M. &van derMaaten L.(2017).3D semantic segmentation with submanifold sparse convolutional networks. arXiv.http:\/\/arxiv.org\/abs\/1711.10275","DOI":"10.1109\/CVPR.2018.00961"},{"key":"e_1_2_8_16_1","unstructured":"Graham B. &van derMaaten L.(2017).Submanifold sparse convolutional networks. arXiv.http:\/\/arxiv.org\/abs\/1706.01307"},{"key":"e_1_2_8_17_1","doi-asserted-by":"crossref","unstructured":"Guo J. Han K. Wu H. Tang Y. Chen X. Wang Y. &Xu C.(2022).CMT: Convolutional neural networks meet vision transformers. arXiv.http:\/\/arxiv.org\/abs\/2107.06263","DOI":"10.1109\/CVPR52688.2022.01186"},{"key":"e_1_2_8_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3005434"},{"key":"e_1_2_8_19_1","doi-asserted-by":"crossref","unstructured":"He C. Li R. Li S. &Zhang L.(2022).Voxel set transformer: A set\u2010to\u2010set approach to 3D object detection from point clouds. arXiv.http:\/\/arxiv.org\/abs\/2203.10314","DOI":"10.1109\/CVPR52688.2022.00823"},{"key":"e_1_2_8_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3032602"},{"key":"e_1_2_8_21_1","doi-asserted-by":"crossref","unstructured":"Lai X. Liu J. Jiang L. Wang L. Zhao H. Liu S. Qi X. &Jia J.(2022).Stratified Transformer for 3D point cloud segmentation. arXiv.http:\/\/arxiv.org\/abs\/2203.14508","DOI":"10.1109\/CVPR52688.2022.00831"},{"key":"e_1_2_8_22_1","doi-asserted-by":"crossref","unstructured":"Lang A. H. Vora S. Caesar H. Zhou L. Yang J. &Beijbom O.(2019).PointPillars: Fast encoders for object detection from point clouds.2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Long Beach CA (pp.12689\u201312697).https:\/\/doi.org\/10.1109\/CVPR.2019.01298","DOI":"10.1109\/CVPR.2019.01298"},{"key":"e_1_2_8_23_1","unstructured":"Law H. &Deng J.(2019).CornerNet: Detecting objects as paired keypoints. arXiv.http:\/\/arxiv.org\/abs\/1808.01244"},{"key":"e_1_2_8_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.autcon.2018.11.017"},{"key":"e_1_2_8_25_1","doi-asserted-by":"crossref","unstructured":"Li E. Wang S. Li C. Li D. Wu X. &Hao Q.(2020).SUSTech POINTS: A portable 3D point cloud interactive annotation platform system.2020 IEEE Intelligent Vehicles Symposium (IV) Las Vegas NV (pp.1108\u20131115).https:\/\/doi.org\/10.1109\/IV47402.2020.9304562","DOI":"10.1109\/IV47402.2020.9304562"},{"key":"e_1_2_8_26_1","unstructured":"Li J. Xia X. Li W. Li H. Wang X. Xiao X. Wang R. Zheng M. &Pan X.(2022).Next\u2010ViT: Next generation vision transformer for efficient deployment in realistic industrial scenarios. arXiv.https:\/\/arxiv.org\/abs\/2207.05501v4"},{"key":"e_1_2_8_27_1","unstructured":"Li W. Hu Y. Zhou Y. &Pham D. T.(2023).Safe human\u2010robot collaboration for industrial settings: A survey.Journal of Intelligent Manufacturing. Advance online publication.https:\/\/doi.org\/10.1007\/s10845\u2010023\u201002159\u20104"},{"key":"e_1_2_8_28_1","doi-asserted-by":"crossref","unstructured":"Lin T.\u2010Y. Goyal P. Girshick R. He K. &Doll\u00e1r P.(2018).Focal loss for dense object detection. arXiv.http:\/\/arxiv.org\/abs\/1708.02002","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_2_8_29_1","unstructured":"Liu J. Chen Y. Ye X. Tian Z. Tan X. &Qi X.(2022).Spatial pruned sparse convolution for efficient 3D object detection. arXiv.http:\/\/arxiv.org\/abs\/2209.14201"},{"key":"e_1_2_8_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"e_1_2_8_31_1","doi-asserted-by":"crossref","unstructured":"Liu Z. Lin Y. Cao Y. Hu H. Wei Y. Zhang Z. Lin S. &Guo B.(2021).Swin Transformer: Hierarchical vision Transformer using shifted windows.2021 IEEE\/CVF International Conference on Computer Vision (ICCV) Montreal QC Canada (pp.9992\u201310002).https:\/\/doi.org\/10.1109\/ICCV48922.2021.00986","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_2_8_32_1","doi-asserted-by":"crossref","unstructured":"Liu Z. Zhang Z. Cao Y. Hu H. &Tong X.(2021).Group\u2010free 3D object detection via Transformers. arXiv.http:\/\/arxiv.org\/abs\/2104.00678","DOI":"10.1109\/ICCV48922.2021.00294"},{"key":"e_1_2_8_33_1","unstructured":"Malewar A.(2019).Spot robot is ready for on\u2010site inspection at a large construction site. InceptiveMind.https:\/\/www.inceptivemind.com\/spot\u2010robot\u2010ready\u2010site\u2010inspection\u2010large\u2010construction\u2010site\/10359\/"},{"key":"e_1_2_8_34_1","unstructured":"Mao J. Shi S. Wang X. &Li H.(2022).3D object detection for autonomous driving: A review and new outlooks. arXiv.http:\/\/arxiv.org\/abs\/2206.09474"},{"key":"e_1_2_8_35_1","doi-asserted-by":"crossref","unstructured":"Mao J. Xue Y. Niu M. Bai H. Feng J. Liang X. Xu H. &Xu C.(2021).Voxel Transformer for 3D object detection. arXiv.http:\/\/arxiv.org\/abs\/2109.02497","DOI":"10.1109\/ICCV48922.2021.00315"},{"key":"e_1_2_8_36_1","doi-asserted-by":"publisher","DOI":"10.1111\/exsy.12647"},{"key":"e_1_2_8_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.12.089"},{"key":"e_1_2_8_38_1","doi-asserted-by":"crossref","unstructured":"Misra I. Girdhar R. &Joulin A.(2021).An end\u2010to\u2010end Transformer model for 3D object detection. arXiv.http:\/\/arxiv.org\/abs\/2109.08141","DOI":"10.1109\/ICCV48922.2021.00290"},{"key":"e_1_2_8_39_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467\u20108667.2006.00466.x"},{"key":"e_1_2_8_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.autcon.2023.104856"},{"key":"e_1_2_8_41_1","first-page":"8024","article-title":"PyTorch: An imperative style, high\u2010performance deep learning library","volume":"32","author":"Paszke A.","year":"2019","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_8_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00521\u2010019\u201004146\u20104"},{"key":"e_1_2_8_43_1","doi-asserted-by":"crossref","unstructured":"Qi C. R. Litany O. He K. &Guibas L. J.(2019).Deep Hough voting for 3D object detection in point clouds. arXiv.https:\/\/doi.org\/10.48550\/arXiv.1904.09664","DOI":"10.1109\/ICCV.2019.00937"},{"key":"e_1_2_8_44_1","unstructured":"Qi C. R. Yi L. Su H. &Guibas L. J.(2017).PointNet++: Deep hierarchical feature learning on point sets in a metric space.Advances in Neural Information Processing Systems Long Beach CA."},{"key":"e_1_2_8_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"e_1_2_8_46_1","doi-asserted-by":"publisher","DOI":"10.1061\/(ASCE)CP.1943\u20105487.0000898"},{"key":"e_1_2_8_47_1","doi-asserted-by":"crossref","unstructured":"Robosense. (2023).Automotive grade LiDAR RS\u2010LiDAR\u2010M1 RoboSense LiDAR for autonomous driving robots. Robosense.https:\/\/www.robosense.cn\/en\/rslidar\/RS\u2010LiDAR\u2010M1","DOI":"10.1088\/978-0-7503-3723-6ch1"},{"key":"e_1_2_8_48_1","doi-asserted-by":"publisher","DOI":"10.1111\/mice.12749"},{"key":"e_1_2_8_49_1","unstructured":"Smith L. N.(2018).A disciplined approach to neural network hyper\u2010parameters: Part 1\u2014Learning rate batch size momentum and weight decay. arXiv.https:\/\/doi.org\/10.48550\/arXiv.1803.09820"},{"key":"e_1_2_8_50_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.autcon.2018.11.033"},{"key":"e_1_2_8_51_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.autcon.2021.103670"},{"key":"e_1_2_8_52_1","doi-asserted-by":"publisher","DOI":"10.1061\/(ASCE)CP.1943\u20105487.0000845"},{"key":"e_1_2_8_53_1","unstructured":"Vaswani A. Shazeer N. Parmar N. Uszkoreit J. Jones L. Gomez A. N. Kaiser \u0141. &Polosukhin I.(2017).Attention is all you need.Advances in Neural Information Processing Systems 30 (NIPS 2017) Long Beach CA."},{"key":"e_1_2_8_54_1","doi-asserted-by":"publisher","DOI":"10.1111\/mice.12536"},{"key":"e_1_2_8_55_1","unstructured":"Yan Y.(2023).spconv: Spatial sparse convolution (2.3.6) [Python].https:\/\/github.com\/traveller59\/spconv"},{"key":"e_1_2_8_56_1","doi-asserted-by":"publisher","DOI":"10.3390\/s18103337"},{"key":"e_1_2_8_57_1","doi-asserted-by":"crossref","unstructured":"Yang B. Luo W. &Urtasun R.(2018).PIXOR: Real\u2010time 3D object detection from point clouds.2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Salt Lake City UT (pp.7652\u20137660).https:\/\/doi.org\/10.1109\/CVPR.2018.00798","DOI":"10.1109\/CVPR.2018.00798"},{"key":"e_1_2_8_58_1","doi-asserted-by":"crossref","unstructured":"Yin T. Zhou X. &Krahenbuhl P.(2021).Center\u2010based 3D object detection and tracking.2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Nashville TN (pp.11779\u201311788).https:\/\/doi.org\/10.1109\/CVPR46437.2021.01161","DOI":"10.1109\/CVPR46437.2021.01161"},{"key":"e_1_2_8_59_1","doi-asserted-by":"crossref","unstructured":"Zhou Y. &Tuzel O.(2018).VoxelNet: End\u2010to\u2010end learning for point cloud based 3D object detection.2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Salt Lake City UT (pp.4490\u20134499).https:\/\/doi.org\/10.1109\/CVPR.2018.00472","DOI":"10.1109\/CVPR.2018.00472"},{"key":"e_1_2_8_60_1","unstructured":"Zhu B. Jiang Z. Zhou X. Li Z. &Yu G.(2019).Class\u2010balanced grouping and sampling for point cloud 3D object detection. arXiv.http:\/\/arxiv.org\/abs\/1908.09492"},{"key":"e_1_2_8_61_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.autcon.2017.05.005"}],"container-title":["Computer-Aided Civil and Infrastructure Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1111\/mice.13238","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,17]],"date-time":"2024-10-17T08:54:13Z","timestamp":1729155253000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1111\/mice.13238"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,26]]},"references-count":60,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2024,10]]}},"alternative-id":["10.1111\/mice.13238"],"URL":"https:\/\/doi.org\/10.1111\/mice.13238","archive":["Portico"],"relation":{},"ISSN":["1093-9687","1467-8667"],"issn-type":[{"type":"print","value":"1093-9687"},{"type":"electronic","value":"1467-8667"}],"subject":[],"published":{"date-parts":[[2024,5,26]]},"assertion":[{"value":"2023-08-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-08","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}