Abstract
For computing weights of deep neural networks (DNNs), the backpropagation (BP) method has been widely used as a de-facto standard algorithm. Since the BP method is based on a stochastic gradient descent method using derivatives of objective functions, the BP method has some difficulties finding appropriate parameters such as learning rate. As another approach for computing weight matrices, we recently proposed an alternating optimization method using linear and nonlinear semi-nonnegative matrix factorizations (semi-NMFs). In this paper, we propose a parallel implementation of the nonlinear semi-NMF based method. The experimental results show that our nonlinear semi-NMF based method and its parallel implementation have competitive advantages to the conventional DNNs with the BP method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In [11], the simplified objective function (3), which discards bias vectors and sparse regularizations, was considered. To consider bias vectors and sparse regularizations, we need to construct algorithms for solving “constrained” (nonlinear) semi-NMFs with sparse regularizations, because \({\varvec{1}}\) is fixed in (1). Therefore, in this paper, we also consider the simplified objective function (3). Note that we have been developing methods for solving such constrained problems.
References
Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layer-wisetraining of deep networks. Proc Adv Neural Inf Process Syst 19:153–160
Ciresan DC, Meier U, Masci J, Gambardella LM, Schmidhuber J (2011) Flexible, high performance convolutional neural networks for image classification. Proc. 22nd International joint conference on artificial intelligence, 1237–1242
Ding D, Li T, Jordan MI (2010) Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 32:45–55
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics, 249–256
Hinton GE, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, Senior A, Vanhoucke V (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29:82–97
Kingma DP Ba J (2015) ADAM: a method for stochastic optimization. The international conference on learning representations (ICLR), San Diego
LeCun Y The MNIST database of handwritten digits, http://yann.lecun.com/exdb/mnist
LeCun Y, Bottou L, Bengio Y, Huffier P (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86:2278–2324
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In Proc, ICML
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536
Sakurai T, Imakura A, Inoue Y, Futamura F (2016) Alternating optimization method based on nonnegative matrix factorizations for deep neural networks. In Proc. ICONIP2016
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
TensorFlow, https://www.tensorflow.org/
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported partly by JST/ACT-I (No. JPMJPR16U6), JST/CREST, MEXT KAKENHI (No. 17K12690) and University of Tsukuba Basic Research Support Program Type A. This research in part used computational resources of the K computer provided by the RIKEN Advanced Institute for Computational Science through the HPCI System Research project (Project ID:hp160138) and COMA provided by Interdisciplinary Computational Science Program in Center for Computational Sciences, University of Tsukuba.
Rights and permissions
About this article
Cite this article
Imakura, A., Inoue, Y., Sakurai, T. et al. Parallel Implementation of the Nonlinear Semi-NMF Based Alternating Optimization Method for Deep Neural Networks. Neural Process Lett 47, 815–827 (2018). https://doi.org/10.1007/s11063-017-9642-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-017-9642-2