A new differentiable architecture search method for optimizing convolutional neural networks in the digital twin of intelligent robotic grasping

Hu, Weifei; Shao, Jinyi; Jiao, Qing; Wang, Chuxuan; Cheng, Jin; Liu, Zhenyu; Tan, Jianrong

doi:10.1007/s10845-022-01971-8

A new differentiable architecture search method for optimizing convolutional neural networks in the digital twin of intelligent robotic grasping

Published: 24 June 2022

Volume 34, pages 2943–2961, (2023)
Cite this article

Journal of Intelligent Manufacturing Aims and scope Submit manuscript

Weifei Hu¹,
Jinyi Shao²,
Qing Jiao³,
Chuxuan Wang¹,
Jin Cheng ORCID: orcid.org/0000-0002-3254-9976¹,
Zhenyu Liu¹ &
…
Jianrong Tan¹

1363 Accesses
1 Altmetric
Explore all metrics

Abstract

Convolutional neural networks (CNNs) have been widely used for object recognition and grasping posture planning in intelligent robotic grasping (IRG). Compared with the traditional usage of CNNs in image recognition, IRGs require high recognition accuracy and computational efficiency. However, the existing methodologies for CNN architecture design often rely on human experience and numerous trial-and-error attempts, which make it a very challenging task to obtain an optimal CNN for IRGs. To tackle this challenge, this paper develops a new differentiable architecture search (DARTS) method considering the floating-point operations (FLOPs) of CNNs, named the DARTS-F method, which converts the discrete CNN architecture search to a gradient-based continuous optimization problem and considers both the prediction accuracy and the computational cost of the CNN during the optimization. To efficiently identify the optimal neural network, this paper adopts a bilevel optimization, which first trains the neural network weights in the inner level and then optimizes the neural network architecture by fine-tuning the operational variables in the outer level. In addition, a new digital twin (DT) of IRG is developed considering the physics of realistic robotic grasping in the DT’s virtual space, which could not only improve the IRG accuracy but also avoid the expensive training time. In the experiments, the proposed DARTS-F method could generate an optimized CNN with higher prediction accuracy and lower FLOPs than those obtained by the original DARTS method. The DT framework improves the accuracy of real robotic grasping from 61 to 71%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A digital twin of intelligent robotic grasping based on single-loop-optimized differentiable architecture search and sim-real collaborative learning

Article 14 October 2024

Robotic grasp manipulation using evolutionary computing and deep reinforcement learning

Article 15 January 2021

Vision-based intelligent robot grasping using sparse neural network

Article 21 February 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The Cornell grasping dataset is available at https://www.kaggle.com/oneoneliu/cornell-grasp. The 3DNet dataset is available online at https://www.acin.tuwien.ac.at/en/vision-for-robotics/software-tools/3dnet-dataset/. The 3D-printed adversarial object dataset is available at https://berkeleyautomation.github.io/dex-net/. Demonstration videos of the IRG in the virtual and real spaces of the DT are available at https://www.youtube.com/watch?v=qOtc4UtcUqE. Our codes may be available upon request.

Abbreviations

_g :: Grasp in a global coordinate system
p :: Gripper’s center position in Cartesian coordinates
\(\theta \) :: The gripper’s rotation angle around the z-axis
w :: The gripper’s opening width
q :: A quality measure representing the grasping success rate
\(\mathbf{I}\) :: An image with the red, green, blue, and depth (RGB-D) channels
H :: The height of the image
W :: The width of the image
g _I :: Grasp in a RGB-D system
p _I :: The center point of a pixel in RGB-D image
\({\theta }_{I}\) :: The gripper’s rotation angle in RGB-D image
w _I :: The gripper’s opening width in RGB-D image
t _RC :: The transformation from the camera coordinate system to the global coordinate system
t _CI :: The transformation from the image coordinate system to the camera coordinate system
G _I :: A grasp map for all pixels in the image I
Θ _I :: Feature maps that save the values of the gripper’s rotation angle at each pixel p_I
W _I :: Feature maps that save the values of the gripper’s opening width at each pixel p_I.
Q _I :: Feature maps that save the values of the grasping success rate at each pixel p_I.
F :: The neural network calculated from image I to grasp map G_I
\({\mathrm{g}}_{I}^{*}\) :: The best visible grasp in the image space
\({F}^{*}\) :: The optimized CNN
x ⁱ :: The results calculated by an operation of the previous node
o ⁽ ⁱ ^, ^j ⁾ :: Operations between node xⁱ and node x^j
o _k(x):: The kth operation applied on a node x
\({\alpha }_{{o}_{k}}^{(i,j)}\) :: A continuous operation variable
\({\overline{o} }^{(i,j)}\) :: An average operation from node x_i to node x_j considering all candidate operations
O :: The set of n candidate operations
\(\boldsymbol{\alpha }\) :: The continuous operation vector
ω :: The weights in the inner-level function
ω* :: The optimal weights in the inner-level function
\({\mathcal{L}}_{val}\) :: The validation loss function
\({\mathcal{L}}_{tra}\) :: The training loss function
F(o _kl):: The function that calculates the floating-point operations (FLOPs) value of the operation o_kl
m :: The total number of bridges
n _o :: The total number of candidate operations for each bridge
n :: The number of losses in a loss set
C :: The constant to balance the sensitivities of the loss
G _k :: The sequential loss set
\({\varepsilon }_{1}\) :: The threshold for average loss
\({\varepsilon }_{2}\) :: The threshold for the difference between the maximum and the minimum loss values
A :: The predicted grasping rectangle
B :: The benchmark grasping rectangle
n _r :: The number of virtual robots for training in the DT framework
m _r :: The number of batches of projected RGB-D images

References

Akinola, I., Angelova, A., Lu, Y., Chebotar, Y., Kalashnikov, D., Varley, J., Ibarz, J., & Ryoo, M. S. (2021). Visionary: Vision architecture discovery for robot learning. In Proceedings of 2021 IEEE international conference on robotics and automation (ICRA), Xi'an, China. https://doi.org/10.1109/ICRA48506.2021.9561998
Andrychowicz, O. M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., & Ray, A. (2020). Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1), 3–20. https://doi.org/10.1177/0278364919887447
Article Google Scholar
Bicchi, A. (1994). On the problem of decomposing grasp and manipulation forces in multiple whole-limb manipulation. Robotics and Autonomous Systems, 13(2), 127–147. https://doi.org/10.1016/0921-8890(94)90055-8
Article Google Scholar
Bousmalis, K., Irpan, A., Wohlhart, P., Bai, Y., Kelcey, M., Kalakrishnan, M., Downs, L., Ibarz, J., Pastor, P., & Konolige, K. (2018). Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In Proceedings of 2018 IEEE international conference on robotics and automation (ICRA), Brisbane, QLD, Australia. https://doi.org/10.1109/ICRA.2018.8460875
Chen, X., Xie, L., Wu, J., & Tian, Q. (2019). Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In Proceedings of 2019 IEEE/CVF international conference on computer vision (ICCV), Seoul, South Korea. https://doi.org/10.1109/ICCV.2019.00138
Colson, B., Marcotte, P., & Savard, G. (2007). An overview of bilevel optimization. Annals of Operations Research, 153(1), 235–256. https://doi.org/10.1007/s10479-007-0176-2
Article Google Scholar
Duan, J.-G., Ma, T.-Y., Zhang, Q.-L., Liu, Z., & Qin, J.-Y. (2021). Design and application of digital twin system for the blade-rotor test rig. Journal of Intelligent Manufacturing. https://doi.org/10.1007/s10845-021-01824-w
Article Google Scholar
Elsken, T., Metzen, J. H., & Hutter, F. (2019). Efficient multi-objective neural architecture search via Lamarckian evolution. In International conference on learning representations, New Orleans, LA, USA. https://doi.org/10.48550/arXiv.1804.09081
Erez, T., Tassa, Y., & Todorov, E. (2015). Simulation tools for model-based robotics: Comparison of bullet, Havok, MuJoCo, ODE and PhysX. In Proceedings of 2015 IEEE international conference on robotics and automation (ICRA), Seattle, WA, USA. https://doi.org/10.1109/ICRA.2015.7139807
Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P., & Levine, S. (2018). Composable deep reinforcement learning for robotic manipulation. In Proceedings of 2018 IEEE international conference on robotics and automation (ICRA), Brisbane, QLD, Australia. https://doi.org/10.1109/ICRA.2018.8460756
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. https://doi.org/10.1109/CVPR.2016.90
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA. https://doi.org/10.1109/CVPR.2017.243
Hunt, K., & Torfason, L. (1987). A three-fingered pantograph manipulator. A kinematic study. Journal of Mechanisms, Transmissions, and Automation in Design, 109(2), 171–177. https://doi.org/10.1115/1.3267432
Article Google Scholar
Ibarz, J., Tan, J., Finn, C., Kalakrishnan, M., Pastor, P., & Levine, S. (2021). How to train your robot with deep reinforcement learning: Lessons we have learned. The International Journal of Robotics Research, 40(4–5), 698–721. https://doi.org/10.1177/0278364920987859
Article Google Scholar
James, S., Wohlhart, P., Kalakrishnan, M., Kalashnikov, D., Irpan, A., Ibarz, J., Levine, S., Hadsell, R., & Bousmalis, K. (2019). Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks. In Proceedings of the 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA. https://doi.org/10.1109/CVPR.2019.01291
Johns, E., Leutenegger, S., & Davison, A. J. (2016). Deep learning a grasp function for grasping under gripper pose uncertainty. In Proceedings of 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), Stockholm, Sweden. https://doi.org/10.1109/IROS.2016.7759657
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., & Vanhoucke, V. (2018). Scalable deep reinforcement learning for vision-based robotic manipulation. In Proceedings of conference on robot learning (CoRL 2018), Zürich, Switzerland.
Kandasamy, K., Neiswanger, W., Schneider, J., Poczos, B., & Xing, E. P. (2018). Neural architecture search with Bayesian optimisation and optimal transport. In Advances in neural information processing systems (NeurIPS 2018), Montréal, Canada. https://doi.org/10.5555/3326943.3327130
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In 3rd International conference on learning representations, San Diego, CA, USA.
Kumra, S., & Kanan, C. (2017). Robotic grasp detection using deep convolutional neural networks. In Proceedings of 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), Vancouver, BC, Canada. https://doi.org/10.1109/IROS.2017.8202237
Lenz, I., Lee, H., & Saxena, A. (2015). Deep learning for detecting robotic grasps. The International Journal of Robotics Research, 34(4–5), 705–724. https://doi.org/10.1177/0278364914549607
Article Google Scholar
Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J., & Quillen, D. (2018). Learning hand–eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research, 37(4–5), 421–436. https://doi.org/10.1177/0278364917710318
Article Google Scholar
Liang, H., Zhang, S., Sun, J., He, X., Huang, W., Zhuang, K., & Li, Z. (2019). DARTS+: Improved differentiable architecture search with early stopping. Computing Research Repository. https://doi.org/10.48550/arXiv.1909.06035
Article Google Scholar
Lim, K. Y. H., Zheng, P., & Chen, C.-H. (2020). A state-of-the-art survey of digital twin: Techniques, engineering product lifecycle management and business innovation perspectives. Journal of Intelligent Manufacturing, 31(6), 1313–1337. https://doi.org/10.1007/s10845-019-01512-w
Article Google Scholar
Liu, H., Simonyan, K., & Yang, Y. (2019). DARTS: Differentiable architecture search. In International conference on learning representations, New Orleans, LA, USA. https://doi.org/10.48550/arXiv.1806.09055
Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., Ojea, J. A., & Goldberg, K. (2017). Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. In Robotics: Science and systems, Cambridge, Massachusetts. https://doi.org/10.15607/RSS.2017.XIII.058
Mahler, J., Matl, M., Satish, V., Danielczuk, M., DeRose, B., McKinley, S., & Goldberg, K. (2019). Learning ambidextrous robot grasping policies. Science Robotics. https://doi.org/10.1126/scirobotics.aau4984
Article Google Scholar
Mamou, K., Lengyel, E., & Peters, A. (2016). Volumetric hierarchical approximate convex decomposition. In Game engine gems 3 (bls. 141–158). AK Peters.
Matas, J., James, S., & Davison, A. J. (2018). Sim-to-real reinforcement learning for deformable object manipulation. In Proceedings of conference on robot learning (CoRL 2018), Zürich, Switzerland.
Morrison, D., Corke, P., & Leitner, J. (2020). Learning robust, real-time, reactive robotic grasping. The International Journal of Robotics Research, 39(2–3), 183–201. https://doi.org/10.1177/0278364919859066
Article Google Scholar
Ohwovoriole, E. (1987). Kinematics and friction in grasping by robotic hands. Journal of Mechanisms, Transmissions, and Automation in Design, 109(3), 398–404. https://doi.org/10.1115/1.3258809
Article Google Scholar
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., & Antiga, L. (2019). PyTorch: An imperative style, high-performance deep learning library. In Proceeding of advances in neural information processing systems, Vancouver, Canada.
Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018). Efficient neural architecture search via parameter sharing. In International conference on machine learning, Stockholm, Sweden.
Pinto, L., & Gupta, A. (2016). Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In Proceedings of 2016 IEEE international conference on robotics and automation (ICRA), Stockholm, Sweden. https://doi.org/10.1109/ICRA.2016.7487517
Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2019). Regularized evolution for image classifier architecture search. In The AAAI conference on artificial intelligence, Honolulu, Hawaii, USA. https://doi.org/10.1609/aaai.v33i01.33014780
Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y. L., Tan, J., Le, Q., & Kurakin, A. (2017). Large-scale evolution of image classifiers. In International conference on machine learning, Sydney, NSW, Australia.
Redelinghuys, A., Basson, A. H., & Kruger, K. (2020). A six-layer architecture for the digital twin: A manufacturing case study implementation. Journal of Intelligent Manufacturing, 31(6), 1383–1402. https://doi.org/10.1007/s10845-019-01516-6
Article Google Scholar
Redmon, J., & Angelova, A. (2015). Real-time grasp detection using convolutional neural networks. In Proceedings of 2015 IEEE international conference on robotics and automation (ICRA), Seattle, WA, USA. https://doi.org/10.1109/ICRA.2015.7139361
Ruder, S. (2016). An overview of gradient descent optimization algorithms. Computing Research Repository. https://doi.org/10.48550/arXiv.1609.04747
Article Google Scholar
Sadeghi, F., Toshev, A., Jang, E., & Levine, S. (2018). Sim2Real viewpoint invariant visual servoing by recurrent control. In Proceedings of 2018 IEEE/CVF conference on computer vision and pattern recognition, Salt Lake City, UT, USA. https://doi.org/10.1109/CVPR.2018.00493
Song, Y., Gao, L., Li, X., & Shen, W. (2020). A novel robotic grasp detection method based on region proposal networks. Robotics Computer-Integrated Manufacturing, 65, 101963. https://doi.org/10.1016/j.rcim.2020.101963
Article Google Scholar
Tao, F., Zhang, H., Liu, A., & Nee, A. Y. (2018). Digital twin in industry: State-of-the-art. IEEE Transactions on Industrial Informatics, 15(4), 2405–2415. https://doi.org/10.1109/TII.2018.2873186
Article Google Scholar
Tateno, K., Tombari, F., Laina, I., & Navab, N. (2017). CNN-SLAM: Real-time dense monocular slam with learned depth prediction. In Proceedings of 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA. https://doi.org/10.1109/CVPR.2017.695
Wöhlke, G. (1992). Automatic grasp planning for multifingered robot hands. Journal of Intelligent Manufacturing, 3(5), 297–316. https://doi.org/10.1007/BF01577271
Article Google Scholar
Wohlkinger, W., Aldoma, A., Rusu, R. B., & Vincze, M. (2012). 3DNet: Large-scale object class recognition from CAD models. In Proceedings of 2012 IEEE international conference on robotics and automation, Saint Paul, MN, USA. https://doi.org/10.1109/ICRA.2012.6225116
Xu, Y., Xie, L., Zhang, X., Chen, X., Qi, G.-J., Tian, Q., & Xiong, H. (2020). PC-DARTS: Partial channel connections for memory-efficient differentiable architecture search. In International conference on learning representations, Addis Ababa, Ethiopia. https://doi.org/10.48550/arXiv.1907.05737
Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of 2018 IEEE/CVF conference on computer vision and pattern recognition, Salt Lake City, UT, USA. https://doi.org/10.1109/CVPR.2018.00907

Download references

Acknowledgements

The authors strongly acknowledge the support from the Ministry of Science and Technology of China, the Zhejiang Provincial Natural Science Foundation of China, and the State Key Laboratory of Fluid Power and Mechatronic Systems. The authors also thank the reviewers and the editors for their insightful comments, which help improve the paper quality.

Funding

The work was supported by the National Key Research and Development Program of China (Grant Number 2019YFB1312600), the Zhejiang Provincial Natural Science Foundation of China (Grant Number LZ22E050006), and the State Key Laboratory of Fluid Power and Mechatronic Systems (Grant Number SKLoFP_ZZ_2102).

Author information

Authors and Affiliations

State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou, 310027, China
Weifei Hu, Chuxuan Wang, Jin Cheng, Zhenyu Liu & Jianrong Tan
Engineering Research Center for Design Engineering and Digital Twin of Zhejiang Province, Hangzhou, 310027, China
Jinyi Shao
School of Mechanical Engineering, Zhejiang University, Hangzhou, 310027, China
Qing Jiao

Authors

Weifei Hu
View author publications
You can also search for this author inPubMed Google Scholar
Jinyi Shao
View author publications
You can also search for this author inPubMed Google Scholar
Qing Jiao
View author publications
You can also search for this author inPubMed Google Scholar
Chuxuan Wang
View author publications
You can also search for this author inPubMed Google Scholar
Jin Cheng
View author publications
You can also search for this author inPubMed Google Scholar
Zhenyu Liu
View author publications
You can also search for this author inPubMed Google Scholar
Jianrong Tan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Weifei Hu or Jin Cheng.

Ethics declarations

Conflict of interest

There are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Weifei Hu: ASME Membership (000100631478.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, W., Shao, J., Jiao, Q. et al. A new differentiable architecture search method for optimizing convolutional neural networks in the digital twin of intelligent robotic grasping. J Intell Manuf 34, 2943–2961 (2023). https://doi.org/10.1007/s10845-022-01971-8

Download citation

Received: 15 October 2021
Accepted: 18 May 2022
Published: 24 June 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10845-022-01971-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A new differentiable architecture search method for optimizing convolutional neural networks in the digital twin of intelligent robotic grasping

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A digital twin of intelligent robotic grasping based on single-loop-optimized differentiable architecture search and sim-real collaborative learning

Robotic grasp manipulation using evolutionary computing and deep reinforcement learning

Vision-based intelligent robot grasping using sparse neural network

Explore related subjects

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now