A new differentiable architecture search method for optimizing convolutional neural networks in the digital twin of intelligent robotic grasping | Journal of Intelligent Manufacturing Skip to main content
Log in

A new differentiable architecture search method for optimizing convolutional neural networks in the digital twin of intelligent robotic grasping

  • Published:
Journal of Intelligent Manufacturing Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) have been widely used for object recognition and grasping posture planning in intelligent robotic grasping (IRG). Compared with the traditional usage of CNNs in image recognition, IRGs require high recognition accuracy and computational efficiency. However, the existing methodologies for CNN architecture design often rely on human experience and numerous trial-and-error attempts, which make it a very challenging task to obtain an optimal CNN for IRGs. To tackle this challenge, this paper develops a new differentiable architecture search (DARTS) method considering the floating-point operations (FLOPs) of CNNs, named the DARTS-F method, which converts the discrete CNN architecture search to a gradient-based continuous optimization problem and considers both the prediction accuracy and the computational cost of the CNN during the optimization. To efficiently identify the optimal neural network, this paper adopts a bilevel optimization, which first trains the neural network weights in the inner level and then optimizes the neural network architecture by fine-tuning the operational variables in the outer level. In addition, a new digital twin (DT) of IRG is developed considering the physics of realistic robotic grasping in the DT’s virtual space, which could not only improve the IRG accuracy but also avoid the expensive training time. In the experiments, the proposed DARTS-F method could generate an optimized CNN with higher prediction accuracy and lower FLOPs than those obtained by the original DARTS method. The DT framework improves the accuracy of real robotic grasping from 61 to 71%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The Cornell grasping dataset is available at https://www.kaggle.com/oneoneliu/cornell-grasp. The 3DNet dataset is available online at https://www.acin.tuwien.ac.at/en/vision-for-robotics/software-tools/3dnet-dataset/. The 3D-printed adversarial object dataset is available at https://berkeleyautomation.github.io/dex-net/. Demonstration videos of the IRG in the virtual and real spaces of the DT are available at https://www.youtube.com/watch?v=qOtc4UtcUqE. Our codes may be available upon request.

Abbreviations

g :

Grasp in a global coordinate system

p :

Gripper’s center position in Cartesian coordinates

\(\theta \) :

The gripper’s rotation angle around the z-axis

w :

The gripper’s opening width

q :

A quality measure representing the grasping success rate

\(\mathbf{I}\) :

An image with the red, green, blue, and depth (RGB-D) channels

H :

The height of the image

W :

The width of the image

g I :

Grasp in a RGB-D system

p I :

The center point of a pixel in RGB-D image

\({\theta }_{I}\) :

The gripper’s rotation angle in RGB-D image

w I :

The gripper’s opening width in RGB-D image

t RC :

The transformation from the camera coordinate system to the global coordinate system

t CI :

The transformation from the image coordinate system to the camera coordinate system

G I :

A grasp map for all pixels in the image I

Θ I :

Feature maps that save the values of the gripper’s rotation angle at each pixel pI

W I :

Feature maps that save the values of the gripper’s opening width at each pixel pI.

Q I :

Feature maps that save the values of the grasping success rate at each pixel pI.

F :

The neural network calculated from image I to grasp map GI

\({\mathrm{g}}_{I}^{*}\) :

The best visible grasp in the image space

\({F}^{*}\) :

The optimized CNN

x i :

The results calculated by an operation of the previous node

o ( i , j ) :

Operations between node xi and node xj

o k(x):

The kth operation applied on a node x

\({\alpha }_{{o}_{k}}^{(i,j)}\) :

A continuous operation variable

\({\overline{o} }^{(i,j)}\) :

An average operation from node xi to node xj considering all candidate operations

O :

The set of n candidate operations

\(\boldsymbol{\alpha }\) :

The continuous operation vector

ω :

The weights in the inner-level function

ω* :

The optimal weights in the inner-level function

\({\mathcal{L}}_{val}\) :

The validation loss function

\({\mathcal{L}}_{tra}\) :

The training loss function

F(o kl):

The function that calculates the floating-point operations (FLOPs) value of the operation okl

m :

The total number of bridges

n o :

The total number of candidate operations for each bridge

n :

The number of losses in a loss set

C :

The constant to balance the sensitivities of the loss

G k :

The sequential loss set

\({\varepsilon }_{1}\) :

The threshold for average loss

\({\varepsilon }_{2}\) :

The threshold for the difference between the maximum and the minimum loss values

A :

The predicted grasping rectangle

B :

The benchmark grasping rectangle

n r :

The number of virtual robots for training in the DT framework

m r :

The number of batches of projected RGB-D images

References

  • Akinola, I., Angelova, A., Lu, Y., Chebotar, Y., Kalashnikov, D., Varley, J., Ibarz, J., & Ryoo, M. S. (2021). Visionary: Vision architecture discovery for robot learning. In Proceedings of 2021 IEEE international conference on robotics and automation (ICRA), Xi'an, China. https://doi.org/10.1109/ICRA48506.2021.9561998

  • Andrychowicz, O. M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., & Ray, A. (2020). Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1), 3–20. https://doi.org/10.1177/0278364919887447

    Article  Google Scholar 

  • Bicchi, A. (1994). On the problem of decomposing grasp and manipulation forces in multiple whole-limb manipulation. Robotics and Autonomous Systems, 13(2), 127–147. https://doi.org/10.1016/0921-8890(94)90055-8

    Article  Google Scholar 

  • Bousmalis, K., Irpan, A., Wohlhart, P., Bai, Y., Kelcey, M., Kalakrishnan, M., Downs, L., Ibarz, J., Pastor, P., & Konolige, K. (2018). Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In Proceedings of 2018 IEEE international conference on robotics and automation (ICRA), Brisbane, QLD, Australia. https://doi.org/10.1109/ICRA.2018.8460875

  • Chen, X., Xie, L., Wu, J., & Tian, Q. (2019). Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In Proceedings of 2019 IEEE/CVF international conference on computer vision (ICCV), Seoul, South Korea. https://doi.org/10.1109/ICCV.2019.00138

  • Colson, B., Marcotte, P., & Savard, G. (2007). An overview of bilevel optimization. Annals of Operations Research, 153(1), 235–256. https://doi.org/10.1007/s10479-007-0176-2

    Article  Google Scholar 

  • Duan, J.-G., Ma, T.-Y., Zhang, Q.-L., Liu, Z., & Qin, J.-Y. (2021). Design and application of digital twin system for the blade-rotor test rig. Journal of Intelligent Manufacturing. https://doi.org/10.1007/s10845-021-01824-w

    Article  Google Scholar 

  • Elsken, T., Metzen, J. H., & Hutter, F. (2019). Efficient multi-objective neural architecture search via Lamarckian evolution. In International conference on learning representations, New Orleans, LA, USA. https://doi.org/10.48550/arXiv.1804.09081

  • Erez, T., Tassa, Y., & Todorov, E. (2015). Simulation tools for model-based robotics: Comparison of bullet, Havok, MuJoCo, ODE and PhysX. In Proceedings of 2015 IEEE international conference on robotics and automation (ICRA), Seattle, WA, USA. https://doi.org/10.1109/ICRA.2015.7139807

  • Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P., & Levine, S. (2018). Composable deep reinforcement learning for robotic manipulation. In Proceedings of 2018 IEEE international conference on robotics and automation (ICRA), Brisbane, QLD, Australia. https://doi.org/10.1109/ICRA.2018.8460756

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. https://doi.org/10.1109/CVPR.2016.90

  • Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA. https://doi.org/10.1109/CVPR.2017.243

  • Hunt, K., & Torfason, L. (1987). A three-fingered pantograph manipulator. A kinematic study. Journal of Mechanisms, Transmissions, and Automation in Design, 109(2), 171–177. https://doi.org/10.1115/1.3267432

    Article  Google Scholar 

  • Ibarz, J., Tan, J., Finn, C., Kalakrishnan, M., Pastor, P., & Levine, S. (2021). How to train your robot with deep reinforcement learning: Lessons we have learned. The International Journal of Robotics Research, 40(4–5), 698–721. https://doi.org/10.1177/0278364920987859

    Article  Google Scholar 

  • James, S., Wohlhart, P., Kalakrishnan, M., Kalashnikov, D., Irpan, A., Ibarz, J., Levine, S., Hadsell, R., & Bousmalis, K. (2019). Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks. In Proceedings of the 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA. https://doi.org/10.1109/CVPR.2019.01291

  • Johns, E., Leutenegger, S., & Davison, A. J. (2016). Deep learning a grasp function for grasping under gripper pose uncertainty. In Proceedings of 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), Stockholm, Sweden. https://doi.org/10.1109/IROS.2016.7759657

  • Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., & Vanhoucke, V. (2018). Scalable deep reinforcement learning for vision-based robotic manipulation. In Proceedings of conference on robot learning (CoRL 2018), Zürich, Switzerland.

  • Kandasamy, K., Neiswanger, W., Schneider, J., Poczos, B., & Xing, E. P. (2018). Neural architecture search with Bayesian optimisation and optimal transport. In Advances in neural information processing systems (NeurIPS 2018), Montréal, Canada. https://doi.org/10.5555/3326943.3327130

  • Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In 3rd International conference on learning representations, San Diego, CA, USA.

  • Kumra, S., & Kanan, C. (2017). Robotic grasp detection using deep convolutional neural networks. In Proceedings of 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), Vancouver, BC, Canada. https://doi.org/10.1109/IROS.2017.8202237

  • Lenz, I., Lee, H., & Saxena, A. (2015). Deep learning for detecting robotic grasps. The International Journal of Robotics Research, 34(4–5), 705–724. https://doi.org/10.1177/0278364914549607

    Article  Google Scholar 

  • Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J., & Quillen, D. (2018). Learning hand–eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research, 37(4–5), 421–436. https://doi.org/10.1177/0278364917710318

    Article  Google Scholar 

  • Liang, H., Zhang, S., Sun, J., He, X., Huang, W., Zhuang, K., & Li, Z. (2019). DARTS+: Improved differentiable architecture search with early stopping. Computing Research Repository. https://doi.org/10.48550/arXiv.1909.06035

    Article  Google Scholar 

  • Lim, K. Y. H., Zheng, P., & Chen, C.-H. (2020). A state-of-the-art survey of digital twin: Techniques, engineering product lifecycle management and business innovation perspectives. Journal of Intelligent Manufacturing, 31(6), 1313–1337. https://doi.org/10.1007/s10845-019-01512-w

    Article  Google Scholar 

  • Liu, H., Simonyan, K., & Yang, Y. (2019). DARTS: Differentiable architecture search. In International conference on learning representations, New Orleans, LA, USA. https://doi.org/10.48550/arXiv.1806.09055

  • Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., Ojea, J. A., & Goldberg, K. (2017). Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. In Robotics: Science and systems, Cambridge, Massachusetts. https://doi.org/10.15607/RSS.2017.XIII.058

  • Mahler, J., Matl, M., Satish, V., Danielczuk, M., DeRose, B., McKinley, S., & Goldberg, K. (2019). Learning ambidextrous robot grasping policies. Science Robotics. https://doi.org/10.1126/scirobotics.aau4984

    Article  Google Scholar 

  • Mamou, K., Lengyel, E., & Peters, A. (2016). Volumetric hierarchical approximate convex decomposition. In Game engine gems 3 (bls. 141–158). AK Peters.

  • Matas, J., James, S., & Davison, A. J. (2018). Sim-to-real reinforcement learning for deformable object manipulation. In Proceedings of conference on robot learning (CoRL 2018), Zürich, Switzerland.

  • Morrison, D., Corke, P., & Leitner, J. (2020). Learning robust, real-time, reactive robotic grasping. The International Journal of Robotics Research, 39(2–3), 183–201. https://doi.org/10.1177/0278364919859066

    Article  Google Scholar 

  • Ohwovoriole, E. (1987). Kinematics and friction in grasping by robotic hands. Journal of Mechanisms, Transmissions, and Automation in Design, 109(3), 398–404. https://doi.org/10.1115/1.3258809

    Article  Google Scholar 

  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., & Antiga, L. (2019). PyTorch: An imperative style, high-performance deep learning library. In Proceeding of advances in neural information processing systems, Vancouver, Canada.

  • Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018). Efficient neural architecture search via parameter sharing. In International conference on machine learning, Stockholm, Sweden.

  • Pinto, L., & Gupta, A. (2016). Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In Proceedings of 2016 IEEE international conference on robotics and automation (ICRA), Stockholm, Sweden. https://doi.org/10.1109/ICRA.2016.7487517

  • Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2019). Regularized evolution for image classifier architecture search. In The AAAI conference on artificial intelligence, Honolulu, Hawaii, USA. https://doi.org/10.1609/aaai.v33i01.33014780

  • Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y. L., Tan, J., Le, Q., & Kurakin, A. (2017). Large-scale evolution of image classifiers. In International conference on machine learning, Sydney, NSW, Australia.

  • Redelinghuys, A., Basson, A. H., & Kruger, K. (2020). A six-layer architecture for the digital twin: A manufacturing case study implementation. Journal of Intelligent Manufacturing, 31(6), 1383–1402. https://doi.org/10.1007/s10845-019-01516-6

    Article  Google Scholar 

  • Redmon, J., & Angelova, A. (2015). Real-time grasp detection using convolutional neural networks. In Proceedings of 2015 IEEE international conference on robotics and automation (ICRA), Seattle, WA, USA. https://doi.org/10.1109/ICRA.2015.7139361

  • Ruder, S. (2016). An overview of gradient descent optimization algorithms. Computing Research Repository. https://doi.org/10.48550/arXiv.1609.04747

    Article  Google Scholar 

  • Sadeghi, F., Toshev, A., Jang, E., & Levine, S. (2018). Sim2Real viewpoint invariant visual servoing by recurrent control. In Proceedings of 2018 IEEE/CVF conference on computer vision and pattern recognition, Salt Lake City, UT, USA. https://doi.org/10.1109/CVPR.2018.00493

  • Song, Y., Gao, L., Li, X., & Shen, W. (2020). A novel robotic grasp detection method based on region proposal networks. Robotics Computer-Integrated Manufacturing, 65, 101963. https://doi.org/10.1016/j.rcim.2020.101963

    Article  Google Scholar 

  • Tao, F., Zhang, H., Liu, A., & Nee, A. Y. (2018). Digital twin in industry: State-of-the-art. IEEE Transactions on Industrial Informatics, 15(4), 2405–2415. https://doi.org/10.1109/TII.2018.2873186

    Article  Google Scholar 

  • Tateno, K., Tombari, F., Laina, I., & Navab, N. (2017). CNN-SLAM: Real-time dense monocular slam with learned depth prediction. In Proceedings of 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA. https://doi.org/10.1109/CVPR.2017.695

  • Wöhlke, G. (1992). Automatic grasp planning for multifingered robot hands. Journal of Intelligent Manufacturing, 3(5), 297–316. https://doi.org/10.1007/BF01577271

    Article  Google Scholar 

  • Wohlkinger, W., Aldoma, A., Rusu, R. B., & Vincze, M. (2012). 3DNet: Large-scale object class recognition from CAD models. In Proceedings of 2012 IEEE international conference on robotics and automation, Saint Paul, MN, USA. https://doi.org/10.1109/ICRA.2012.6225116

  • Xu, Y., Xie, L., Zhang, X., Chen, X., Qi, G.-J., Tian, Q., & Xiong, H. (2020). PC-DARTS: Partial channel connections for memory-efficient differentiable architecture search. In International conference on learning representations, Addis Ababa, Ethiopia. https://doi.org/10.48550/arXiv.1907.05737

  • Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of 2018 IEEE/CVF conference on computer vision and pattern recognition, Salt Lake City, UT, USA. https://doi.org/10.1109/CVPR.2018.00907

Download references

Acknowledgements

The authors strongly acknowledge the support from the Ministry of Science and Technology of China, the Zhejiang Provincial Natural Science Foundation of China, and the State Key Laboratory of Fluid Power and Mechatronic Systems. The authors also thank the reviewers and the editors for their insightful comments, which help improve the paper quality.

Funding

The work was supported by the National Key Research and Development Program of China (Grant Number 2019YFB1312600), the Zhejiang Provincial Natural Science Foundation of China (Grant Number LZ22E050006), and the State Key Laboratory of Fluid Power and Mechatronic Systems (Grant Number SKLoFP_ZZ_2102).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Weifei Hu or Jin Cheng.

Ethics declarations

Conflict of interest

There are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Weifei Hu: ASME Membership (000100631478.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, W., Shao, J., Jiao, Q. et al. A new differentiable architecture search method for optimizing convolutional neural networks in the digital twin of intelligent robotic grasping. J Intell Manuf 34, 2943–2961 (2023). https://doi.org/10.1007/s10845-022-01971-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10845-022-01971-8

Keywords