A Vision-based inventory method for stacked goods in stereoscopic warehouse | Neural Computing and Applications Skip to main content
Log in

A Vision-based inventory method for stacked goods in stereoscopic warehouse

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Inventory of stacked goods in the stereoscopic warehouse is important for modern logistics. Currently, this inventory task is completed by counting manually. With the advance of industry 4.0 and deep learning technology, automatic inventory based on machine vision comes true, greatly saving labor and material costs. In this work, we firstly collected WSGID, an image dataset about wine boxes stacked in a stereoscopic winey warehouse. Moreover, we presented an automatic inventory method based on machine vision, consisting of a stacked goods surface detecting model and a prior-based quantity calculating algorithm. To get a better detecting performance, we introduced STCNet, an improved detection network based on Swin Transformer. The final results of 86.7 mAP, 82.8 mAP, and 85.9 mAP on three sub-datasets are achieved and are higher than the baselines. To count the quantity of goods after detection, we proposed an adaptive and robust calculating algorithm. Our method got an accuracy of 85.71 on the largest sub-dataset. Extensive experiments on the WSGID and COCO benchmark demonstrate the effectiveness of our approach. Our work indicates that the machine vision method successfully facilitates inventory for stacked goods in the stereoscopic warehouse.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Lim M, Bahr W, Leung S (2013) Rfid in the warehouse: a literature analysis (1995–2010) of its applications, benefits, challenges and future trends. Int J Prod Econ 145(1):409–430

    Article  Google Scholar 

  2. Fernandez-Carames T, Blanco-Novoa O, Froiz-Miguez I, Fraga-Lamas P (2019) Towards an autonomous industry 4.0 warehouse: A uav and blockchain-based system for inventory and traceability applications in big data-driven supply chain management. Sens (Basel, Switzerland) 19(10):2394

    Article  Google Scholar 

  3. Perez-Grau F, Ragel R, Caballero F, Viguria A, Ollero A (2018) An architecture for robust uav navigation in gps-denied areas. J Field Robot 35(1):121–145

    Article  Google Scholar 

  4. Xiaozhen Y, Changzhun L, Zunpin G, Shuqin H, Ziwei W, Hua T (2017) Exploration and application of intelligent inventory in automatic stereo library. Wuliu jishu yuyingyong 22(03):117–118

    Google Scholar 

  5. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

  6. Girshick R, Donahue J, Darrell T (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Paper presented at IEEE conference on computer vision and pattern recognition (CVPR), 2014

  7. Girshick R (2015) Fast r-cnn. In: Paper presented at IEEE international conference on computer vision (ICCV), 2015

  8. Ren S, He K, Girshick R (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  9. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: Single shot multibox detector. In: Paper presented at European conference on computer vision (ECCV)

  10. Lin TY, Goyal P, Girshick R (2017) Focal loss for dense object detection. In: Paper presented at IEEE international conference on computer vision (ICCV)

  11. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  12. Chattopadhyay P, Vedantam R, Selvaraju RR, Batra D, Parikh D (2017) Counting everyday objects in everyday scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1135–1144

  13. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597 (2016)

  14. Sindagi VA, Patel VM (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the IEEE international conference on computer vision, pp 1861–1870 (2017)

  15. Chen X, Bin Y, Sang N, Gao C (2019) Scale pyramid network for crowd counting. In: 2019 IEEE winter conference on applications of computer vision (WACV), pp 1941–1950. IEEE

  16. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  17. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the european conference on computer vision (ECCV), pp 3–19

  18. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  19. Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 603–612

  20. Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision workshops

  21. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  22. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  23. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229. Springer, Berlin

  24. Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159

  25. Li T, Huang B, Li C, Huang M (2019) Application of convolution neural network object detection algorithm in logistics warehouse. J Eng 2019(23):9053–9058

    Article  MATH  Google Scholar 

  26. Zaccaria M, Monica R, Aleotti J (2020) A comparison of deep learning models for pallet detection in industrial warehouses. In: 2020 IEEE 16th international conference on intelligent computer communication and processing (ICCP), pp 417–422. IEEE

  27. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

  28. Rennie C, Shome R, Bekris KE, De Souza AF (2016) A dataset for improved rgbd-based object detection and pose estimation for warehouse pick-and-place. IEEE Robot Autom Lett 1(2):1179–1185

    Article  Google Scholar 

  29. Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 821–830

  30. Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162

  31. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450

  32. Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415

  33. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  34. Gong Y, Yu X, Ding Y, Peng X, Zhao J, Han Z (2021) Effective fusion factor in fpn for tiny object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1160–1168

  35. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol 96, pp 226–231

  36. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755. Springer, Berlin

  37. Loshchilov I, Hutter F (2018) Fixing weight decay regularization in adam

  38. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  39. DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552

  40. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) Mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412

  41. Chen K, Wang J, Pang J, Cao Y, Xiong Y, Li X, Sun S, Feng W, Liu Z, Xu J, Zhang Z, Cheng D, Zhu C, Cheng T, Zhao Q, Li B, Lu X, Zhu R, Wu Y, Dai J, Wang J, Shi J, Ouyang W, Loy CC, Lin D (2019) MMDetection: open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155

  42. Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636

  43. Bochkovskiy, A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

  44. Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  45. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790

Download references

Acknowledgements

This work was supported by a grant from the Institute for Guo Qiang, Tsinghua University (No. 2019GQG0002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Biqing Huang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yin, H., Chen, C., Hao, C. et al. A Vision-based inventory method for stacked goods in stereoscopic warehouse. Neural Comput & Applic 34, 20773–20790 (2022). https://doi.org/10.1007/s00521-022-07551-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07551-4

Keywords

Navigation