Parallel Implementation of the Nonlinear Semi-NMF Based Alternating Optimization Method for Deep Neural Networks | Neural Processing Letters
Skip to main content

Advertisement

Parallel Implementation of the Nonlinear Semi-NMF Based Alternating Optimization Method for Deep Neural Networks

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

For computing weights of deep neural networks (DNNs), the backpropagation (BP) method has been widely used as a de-facto standard algorithm. Since the BP method is based on a stochastic gradient descent method using derivatives of objective functions, the BP method has some difficulties finding appropriate parameters such as learning rate. As another approach for computing weight matrices, we recently proposed an alternating optimization method using linear and nonlinear semi-nonnegative matrix factorizations (semi-NMFs). In this paper, we propose a parallel implementation of the nonlinear semi-NMF based method. The experimental results show that our nonlinear semi-NMF based method and its parallel implementation have competitive advantages to the conventional DNNs with the BP method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. In [11], the simplified objective function (3), which discards bias vectors and sparse regularizations, was considered. To consider bias vectors and sparse regularizations, we need to construct algorithms for solving “constrained” (nonlinear) semi-NMFs with sparse regularizations, because \({\varvec{1}}\) is fixed in (1). Therefore, in this paper, we also consider the simplified objective function (3). Note that we have been developing methods for solving such constrained problems.

References

  1. Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layer-wisetraining of deep networks. Proc Adv Neural Inf Process Syst 19:153–160

    Google Scholar 

  2. Ciresan DC, Meier U, Masci J, Gambardella LM, Schmidhuber J (2011) Flexible, high performance convolutional neural networks for image classification. Proc. 22nd International joint conference on artificial intelligence, 1237–1242

  3. Ding D, Li T, Jordan MI (2010) Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 32:45–55

    Article  Google Scholar 

  4. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics, 249–256

  5. Hinton GE, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, Senior A, Vanhoucke V (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29:82–97

    Article  Google Scholar 

  6. Kingma DP Ba J (2015) ADAM: a method for stochastic optimization. The international conference on learning representations (ICLR), San Diego

  7. LeCun Y The MNIST database of handwritten digits, http://yann.lecun.com/exdb/mnist

  8. LeCun Y, Bottou L, Bengio Y, Huffier P (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86:2278–2324

    Article  Google Scholar 

  9. Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In Proc, ICML

  10. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536

    Article  MATH  Google Scholar 

  11. Sakurai T, Imakura A, Inoue Y, Futamura F (2016) Alternating optimization method based on nonnegative matrix factorizations for deep neural networks. In Proc. ICONIP2016

  12. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958

    MathSciNet  MATH  Google Scholar 

  13. TensorFlow, https://www.tensorflow.org/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akira Imakura.

Additional information

This research was supported partly by JST/ACT-I (No. JPMJPR16U6), JST/CREST, MEXT KAKENHI (No. 17K12690) and University of Tsukuba Basic Research Support Program Type A. This research in part used computational resources of the K computer provided by the RIKEN Advanced Institute for Computational Science through the HPCI System Research project (Project ID:hp160138) and COMA provided by Interdisciplinary Computational Science Program in Center for Computational Sciences, University of Tsukuba.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Imakura, A., Inoue, Y., Sakurai, T. et al. Parallel Implementation of the Nonlinear Semi-NMF Based Alternating Optimization Method for Deep Neural Networks. Neural Process Lett 47, 815–827 (2018). https://doi.org/10.1007/s11063-017-9642-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-017-9642-2

Keywords