Augmented reality and deep learning based system for assisting assembly process | Journal on Multimodal User Interfaces
Skip to main content

Augmented reality and deep learning based system for assisting assembly process

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

In Industry 4.0, manufacturing entails a rapid change in customer demands which leads to mass customization. The variation in customer requirements leads to small batch sizes and several process variations. Assembly task is one of most important steps in any manufacturing process. A factory floor worker often needs a guidance system due to variations in product or process, to assist them in assembly task. Existing Augmented Reality (AR) based systems use markers for each assembly component for detection which is time consuming and laborious. This paper proposed utilizing state-of-the-art deep learning based object detection technique and employed a regression based mapping technique to obtain the 3D locations of assembly components. Automatic detection of machine parts was followed by a multimodal interface involving both eye gaze and hand tracking to guide the manual assembly process. We proposed eye cursor to guide the user through the task and utilized fingertip distances along with object sizes to detect any error committed during the task. We analyzed the proposed mapping method and found that the mean mapping error was 1.842 cm. We also investigated the effectiveness of the proposed multimodal user interface by conducting two user studies. The first study indicated that the current interface design with eye cursor enabled participants to perform the task significantly faster compared to the interface without eye cursor. The shop floor workers during the second user study reported that the proposed guidance system was comprehendible and easy to use to complete the assembly task. Results showed that the proposed guidance system enabled 11 end users to finish the assembly of one pneumatic cylinder within 55 s with average TLX score less than 25 in a scale of 100 and Cronbach alpha score of 0.8 indicating convergence of learning experience.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Doolani S et al (2020) A review of extended reality (xr) technologies for manufacturing training. Technologies 84:77

    Article  Google Scholar 

  2. Werrlich S, Kai N, and Gunther N. (2017) Demand analysis for an augmented reality based assembly training. In: Proceedings of the 10th International Conference on PErvasive Technologies Related to Assistive Environments.

  3. Azuma RT (1997) A survey of augmented reality. Presence Teleopera Virtual Environ 64:355–385

    Article  Google Scholar 

  4. Blattgerste J et al (2017) Comparing conventional and augmented reality instructions for manual assembly tasks. Proceedings of the 10th international conference on pervasive technologies related to assistive environments.

  5. Deshpande A, Kim I (2018) The effects of augmented reality on improving spatial problem solving for object assembly. Adv Eng Inform 38:760–775

    Article  Google Scholar 

  6. Khuong BM et al (2014) The effectiveness of an AR-based context-aware assembly support system in object assembly. 2014 IEEE Virtual Reality (VR). IEEE.

  7. Khoshelham K, Tran H, Acharya D (2019) Indoor mapping eyewear: geometric evaluation of spatial mapping capability of HoloLens. Int Arch Photogramm Remote Sens Spat Inf Sci 42:805–810

    Article  Google Scholar 

  8. Li J et al (2019) Application research of improved Yolo V3 algorithm in PCB electronic component detection. Appl Sci 918:3750

    Article  Google Scholar 

  9. Luo Q, Fang X, Liu L, Yang C, Sun Y (2020) Automated visual defect detection for flat steel surface: a survey. IEEE Trans Instrum Meas 69(3):626–644

    Article  CAS  ADS  Google Scholar 

  10. Frank AG, Dalenogare LS, Ayala NF (2019) Industry 4.0 technologies: Implementation patterns in manufacturing companies. Int J Prod Econ 210:15–26

    Article  Google Scholar 

  11. Renner P and Thies P (2017) Attention guiding techniques using peripheral vision and eye tracking for feedback in augmented-reality based assistance systems. 2017 IEEE symposium on 3D user interfaces (3DUI). IEEE.

  12. Fast-Berglund Å, Gong L, Li D (2018) Testing and validating Extended Reality (xR) technologies in manufacturing. Procedia Manufacturing 25:31–38

    Article  Google Scholar 

  13. Tang A et al. (2003) Comparative effectiveness of augmented reality in object assembly. In: Proceedings of the SIGCHI conference on Human factors in computing systems.

  14. Henderson S, Feiner S (2010) Exploring the benefits of augmented reality documentation for maintenance and repair. IEEE Trans Visual Comput Gr 17(10):1355–1368

    Article  Google Scholar 

  15. Henderson SJ, and Steven KF. Augmented reality in the psychomotor phase of a procedural task. 2011 10th IEEE international symposium on mixed and augmented reality. IEEE, 2011.

  16. Upadhyay GK, et al. (2020) Augmented reality and machine learning based product identification in retail using vuforia and mobilenets. 2020 International Conference on Inventive Computation Technologies (ICICT). IEEE.

  17. Microsoft (2023) Spatial mapping - Mixed Reality | Microsoft Learn

  18. Blankemeyer S, Wiemann R, Raatz A (2018) Intuitive assembly support system using augmented reality, Tagungsband des 3 Kongresses Montage Handhabung Industrieroboter. Springer Vieweg, Berlin, Heidelberg, pp 195–203

    Book  Google Scholar 

  19. Hebenstreit M et al. (2020). An Industry 4.0 Production Workplace Enhanced by Using Mixed Reality Assembly Instructions with Microsoft HoloLens. Mensch und Computer 2020-Workshopband 

  20. Radkowski, Rafael, and Jarid Ingebrand. (2017) HoloLens for assembly assistance-a focus group report. In: International Conference on Virtual, Augmented and Mixed Reality. Springer, Cham.

  21. Funk M, Sven M, and Albrecht S. (2015) Using in-situ projection to support cognitively impaired workers at the workplace. Proceedings of the 17th international ACM SIGACCESS conference on Computers & accessibility

  22. Sand O, et al. (2016.) smart. assembly–projection-based augmented reality for supporting assembly workers. In: International Conference on Virtual, Augmented and Mixed Reality. Springer, Cham

  23. Hajek J et al. (2018) "Closing the calibration loop: an inside-out-tracking paradigm for augmented reality in orthopedic surgery." International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham,

  24. Agarwal A, JeevithaShree DV, Saluja KS, Sahay A, Mounika P, Sahu A, Bhaumik R, Rajendran VK and Biswas P (2019), Comparing two webcam based eye gaze trackers for users with severe speech and motor impairment, international conference on research into design (ICoRD 2019)

  25. Fukuda K et al. (2020) Assembly motion recognition framework using only images. 2020 IEEE/SICE International Symposium on System Integration (SII). IEEE.

  26. Tavakoli H et al. (2021) Small object detection for near real-time egocentric perception in a manual assembly scenario. arXiv preprint arXiv:2106.06403 .

  27. Li X et al. (2020) Object detection in the context of mobile augmented reality. 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE

  28. Yaskawa (2023) Motoman GP12 Robot | 12.0 kg

  29. Su Y et al. (2019) Deep multi-state object pose estimation for augmented reality assembly. 2019 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE.

  30. Bahri H, David K, and Jan K. (2019) Accurate object detection system on hololens using Yolo algorithm. 2019 International Conference on Control, Artificial Intelligence, Robotics & Optimization (ICCAIRO). IEEE

  31. Park K-B et al (2020) Deep learning-based smart task assistance in wearable augmented reality. Robotics Comput-Int Manuf 63:101887

    Article  Google Scholar 

  32. Farasin A et al. (2020) "Real-time object detection and tracking in mixed reality using microsoft hololens." 15th international joint conference on computer vision, imaging and computer graphics theory and applications, VISIGRAPP 2020. Vol. 4. SciTePress.

  33. Eckert M, Matthias B, and Christoph MF. (2018) "Object detection featuring 3D audio localization for Microsoft HoloLens." Proc. 11th Int. Joint Conf. on Biomedical Engineering Systems and Technologies. Vol. 5

  34. Skarbez R, Smith M, Whitton MC (2021) Revisiting milgram and kishino’s reality-virtuality continuum. Front Virtual Reality 2:647997

    Article  Google Scholar 

  35. Scaramuzza D, & Zhang Z. (2019). Visual-inertial odometry of aerial robots. arXiv preprint arXiv:1906.03289.

  36. Mukhopadhay A and Biswas P (2019), Comparing CNNs for Non-Conventional Traffic Participants, ACM Automotive UI (AutoUI)

  37. Microsoft (2023) MRTK packages - MRTK 2 | Microsoft Learn

  38. Dwivedi P. (2020, Jun 30). Yolov5 compared to Faster RCNN. Who wins?. https://towardsdatascience.com/Yolov5-compared-to-faster-rcnn-who-wins-a771cd6c9fb4

  39. Gandhi,, R. (2018). R-CNN,Fast R-CNN, Faster R-CNN, Yolo-Object Detection Algorithm. https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-Yolo-object-detection-algorithms-36d53571365e

  40. Girshick R. Fast r-cnn (2015) Proceedings of the IEEE international conference on computer vision.

  41. He K et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 379:1904–1916

    Article  Google Scholar 

  42. Liu S et al. Path aggregation network for instance segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

  43. Bochkovskiy A, Chien-Yao W and Hong-Yuan ML. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).

  44. Roboflow (2023) https://roboflow.com

  45. PyTorch (2023) https://pytorch.org/hub/ultralytics_Yolov5/index.html

  46. Biswas P, Langdon P (2015) Multimodal intelligent eye-gaze tracking system. Int J Human-Comput Interact 31(4):1044–7318

    Article  Google Scholar 

  47. Zhiqiang W., & Jun L. (2017). A review of object detection based on convolutional neural network. In 2017 36th Chinese control conference (CCC) (pp. 11104–11109). IEEE.

  48. Vuforia (2023) VuMarks | Vuforia Library

  49. Microsoft (2023) TextToSpeech Class (Microsoft.Maui.Media) | Microsoft Learn

  50. Microsoft (2023) HandJointUtils Class (Microsoft.MixedReality.Toolkit.Input) | Microsoft Learn

  51. Microsoft (2023) HoloLens 2 – Overview, Features and Specs | Microsoft HoloLens

  52. Redmon J, Divvala S, Girshick R, & Farhadi A. (2016). You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition pp 779–788.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Subin Raj or Pradipta Biswas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (MP4 90558 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raj, S., Murthy, L.R.D., Shanmugam, T.A. et al. Augmented reality and deep learning based system for assisting assembly process. J Multimodal User Interfaces 18, 119–133 (2024). https://doi.org/10.1007/s12193-023-00428-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-023-00428-3

Keywords