Abstract
Inventory of stacked goods in the stereoscopic warehouse is important for modern logistics. Currently, this inventory task is completed by counting manually. With the advance of industry 4.0 and deep learning technology, automatic inventory based on machine vision comes true, greatly saving labor and material costs. In this work, we firstly collected WSGID, an image dataset about wine boxes stacked in a stereoscopic winey warehouse. Moreover, we presented an automatic inventory method based on machine vision, consisting of a stacked goods surface detecting model and a prior-based quantity calculating algorithm. To get a better detecting performance, we introduced STCNet, an improved detection network based on Swin Transformer. The final results of 86.7 mAP, 82.8 mAP, and 85.9 mAP on three sub-datasets are achieved and are higher than the baselines. To count the quantity of goods after detection, we proposed an adaptive and robust calculating algorithm. Our method got an accuracy of 85.71 on the largest sub-dataset. Extensive experiments on the WSGID and COCO benchmark demonstrate the effectiveness of our approach. Our work indicates that the machine vision method successfully facilitates inventory for stacked goods in the stereoscopic warehouse.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Lim M, Bahr W, Leung S (2013) Rfid in the warehouse: a literature analysis (1995–2010) of its applications, benefits, challenges and future trends. Int J Prod Econ 145(1):409–430
Fernandez-Carames T, Blanco-Novoa O, Froiz-Miguez I, Fraga-Lamas P (2019) Towards an autonomous industry 4.0 warehouse: A uav and blockchain-based system for inventory and traceability applications in big data-driven supply chain management. Sens (Basel, Switzerland) 19(10):2394
Perez-Grau F, Ragel R, Caballero F, Viguria A, Ollero A (2018) An architecture for robust uav navigation in gps-denied areas. J Field Robot 35(1):121–145
Xiaozhen Y, Changzhun L, Zunpin G, Shuqin H, Ziwei W, Hua T (2017) Exploration and application of intelligent inventory in automatic stereo library. Wuliu jishu yuyingyong 22(03):117–118
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Girshick R, Donahue J, Darrell T (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Paper presented at IEEE conference on computer vision and pattern recognition (CVPR), 2014
Girshick R (2015) Fast r-cnn. In: Paper presented at IEEE international conference on computer vision (ICCV), 2015
Ren S, He K, Girshick R (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: Single shot multibox detector. In: Paper presented at European conference on computer vision (ECCV)
Lin TY, Goyal P, Girshick R (2017) Focal loss for dense object detection. In: Paper presented at IEEE international conference on computer vision (ICCV)
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Chattopadhyay P, Vedantam R, Selvaraju RR, Batra D, Parikh D (2017) Counting everyday objects in everyday scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1135–1144
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597 (2016)
Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the IEEE international conference on computer vision, pp 1861–1870 (2017)
Chen X, Bin Y, Sang N, Gao C (2019) Scale pyramid network for crowd counting. In: 2019 IEEE winter conference on applications of computer vision (WACV), pp 1941–1950. IEEE
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the european conference on computer vision (ECCV), pp 3–19
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 603–612
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision workshops
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229. Springer, Berlin
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159
Li T, Huang B, Li C, Huang M (2019) Application of convolution neural network object detection algorithm in logistics warehouse. J Eng 2019(23):9053–9058
Zaccaria M, Monica R, Aleotti J (2020) A comparison of deep learning models for pallet detection in industrial warehouses. In: 2020 IEEE 16th international conference on intelligent computer communication and processing (ICCP), pp 417–422. IEEE
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Rennie C, Shome R, Bekris KE, De Souza AF (2016) A dataset for improved rgbd-based object detection and pose estimation for warehouse pick-and-place. IEEE Robot Autom Lett 1(2):1179–1185
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 821–830
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Gong Y, Yu X, Ding Y, Peng X, Zhao J, Han Z (2021) Effective fusion factor in fpn for tiny object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1160–1168
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol 96, pp 226–231
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755. Springer, Berlin
Loshchilov I, Hutter F (2018) Fixing weight decay regularization in adam
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552
Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) Mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412
Chen K, Wang J, Pang J, Cao Y, Xiong Y, Li X, Sun S, Feng W, Liu Z, Xu J, Zhang Z, Cheng D, Zhu C, Cheng T, Zhao Q, Li B, Lu X, Zhu R, Wu Y, Dai J, Wang J, Shi J, Ouyang W, Loy CC, Lin D (2019) MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155
Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636
Bochkovskiy, A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Acknowledgements
This work was supported by a grant from the Institute for Guo Qiang, Tsinghua University (No. 2019GQG0002).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yin, H., Chen, C., Hao, C. et al. A Vision-based inventory method for stacked goods in stereoscopic warehouse. Neural Comput & Applic 34, 20773–20790 (2022). https://doi.org/10.1007/s00521-022-07551-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07551-4