Training Discrete-Valued Neural Networks with Sign Activations Using Weight Distributions | SpringerLink
Skip to main content

Training Discrete-Valued Neural Networks with Sign Activations Using Weight Distributions

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11907))

Abstract

Since resource-constrained devices hardly benefit from the trend towards ever-increasing neural network (NN) structures, there is growing interest in designing more hardware-friendly NNs. In this paper, we consider the training of NNs with discrete-valued weights and sign activation functions that can be implemented more efficiently in terms of inference speed, memory requirements, and power consumption. We build on the framework of probabilistic forward propagations using the local reparameterization trick, where instead of training a single set of NN weights we rather train a distribution over these weights. Using this approach, we can perform gradient-based learning by optimizing the continuous distribution parameters over discrete weights while at the same time perform backpropagation through the sign activation. In our experiments, we investigate the influence of the number of weights on the classification performance on several benchmark datasets, and we show that our method achieves state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11210
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14013
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    A convolution can be cast as a matrix-vector multiplication.

  2. 2.

    We only consider distributions q where sampling and maximization is easy.

  3. 3.

    Given finite integer-valued summands, the activation distribution could also be computed exactly in sub-exponential time by convolving the probabilities. However, this would be impractical for gradient-based learning.

  4. 4.

    \(\mu _{i,tr}^{new} \leftarrow \xi _{bn} \mu _{i,bn} + (1 - \xi _{bn}) \mu _{i,tr}^{old}\) for \(\xi _{bn} \in (0,1)\), and similarly for \(\sigma _{i,tr}^2\).

References

  1. Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. CoRR abs/1308.3432 (2013)

    Google Scholar 

  2. Cai, Z., He, X., Sun, J., Vasconcelos, N.: Deep learning with low precision by half-wave Gaussian quantization. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5406–5414 (2017)

    Google Scholar 

  3. Chen, W., Wilson, J.T., Tyree, S., Weinberger, K.Q., Chen, Y.: Compressing neural networks with the hashing trick. In: International Conference on Machine Learning (ICML), pp. 2285–2294 (2015)

    Google Scholar 

  4. Cheng, Y., Yu, F.X., Feris, R.S., Kumar, S., Choudhary, A.N., Chang, S.: An exploration of parameter redundancy in deep networks with circulant projections. In: International Conference on Computer Vision (ICCV), pp. 2857–2865 (2015)

    Google Scholar 

  5. Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems (NIPS), pp. 3123–3131 (2015)

    Google Scholar 

  6. Denil, M., Shakibi, B., Dinh, L., Ranzato, M., de Freitas, N.: Predicting parameters in deep learning. In: Advances in Neural Information Processing Systems (NIPS), pp. 2148–2156 (2013)

    Google Scholar 

  7. Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Neural Information Processing Systems, pp. 1269–1277 (2014)

    Google Scholar 

  8. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations (ICLR) (2016)

    Google Scholar 

  9. Hernandez-Lobato, J.M., Adams, R.: Probabilistic backpropagation for scalable learning of Bayesian neural networks. In: International Conference on Machine Learning (ICML), pp. 1861–1869 (2015)

    Google Scholar 

  10. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  11. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 4107–4115 (2016)

    Google Scholar 

  12. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp. 448–456 (2015)

    Google Scholar 

  13. Jang, E., Gu, S., Poole, B.: Categorical reparameterization with Gumbel-softmax. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  14. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015). arXiv: 1412.6980

  15. Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. In: Advances in Neural Information Processing Systems (NIPS), pp. 2575–2583 (2015)

    Google Scholar 

  16. Krizhevsky, A.: Learning multiple layers of features from tiny images. University of Toronto, Technical report (2009)

    Google Scholar 

  17. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1106–1114 (2012)

    Google Scholar 

  18. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  19. Lin, X., Zhao, C., Pan, W.: Towards accurate binary convolutional neural network. In: Neural Information Processing Systems, pp. 344–352 (2017)

    Google Scholar 

  20. Maddison, C.J., Mnih, A., Teh, Y.W.: The concrete distribution: a continuous relaxation of discrete random variables. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  21. Molchanov, D., Ashukha, A., Vetrov, D.P.: Variational dropout sparsifies deep neural networks. In: International Conference on Machine Learning (ICML), pp. 2498–2507 (2017)

    Google Scholar 

  22. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: Deep Learning and Un-supervised Feature Learning Workshop @ NIPS (2011)

    Google Scholar 

  23. Peters, J.W.T., Welling, M.: Probabilistic binary neural networks. CoRR abs/1809.03368 (2018)

    Google Scholar 

  24. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32

    Chapter  Google Scholar 

  25. Roth, W., Pernkopf, F.: Variational inference in neural networks using an approximate closed-form objective. In: Bayesian Deep Learning Workshop @ NIPS (2016)

    Google Scholar 

  26. Roth, W., Pernkopf, F.: Bayesian neural networks with weight sharing using Dirichlet processes. IEEE Trans. Pattern Anal. Mach. Intell. 42(1), 246–252 (2020)

    Article  Google Scholar 

  27. Sermanet, P., Chintala, S., LeCun, Y.: Convolutional neural networks applied to house numbers digit classification. In: International Conference on Pattern Recognition (ICPR), pp. 3288–3291 (2012)

    Google Scholar 

  28. Shayer, O., Levi, D., Fetaya, E.: Learning discrete weights using the local reparameterization trick. In: International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  29. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  30. Sinha, D., Zhou, H., Shenoy, N.V.: Advances in computation of the maximum of a set of Gaussian random variables. IEEE Trans. CAD Integr. Circuits Syst 26(8), 1522–1533 (2007)

    Article  Google Scholar 

  31. Soudry, D., Hubara, I., Meir, R.: Expectation backpropagation: parameter-free training of multilayer neural networks with continuous or discrete weights. In: Advances in Neural Information Processing Systems (NIPS), pp. 963–971 (2014)

    Google Scholar 

  32. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 3104–3112 (2014)

    Google Scholar 

  33. Ullrich, K., Meeds, E., Welling, M.: Soft weight-sharing for neural network compression. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  34. Umuroglu, Y., et al.: FINN: a framework for fast, scalable binarized neural network inference. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (ISFPGA), pp. 65–74 (2017)

    Google Scholar 

  35. Wu, S., Li, G., Chen, F., Shi, L.: Training and inference with integers in deep neural networks. In: International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  36. Zhou, S., Ni, Z., Zhou, X., Wen, H., Wu, Y., Zou, Y.: DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR abs/1606.06160 (2016)

    Google Scholar 

  37. Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Austrian Science Fund (FWF) under the project number I2706-N31.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfgang Roth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Roth, W., Schindler, G., Fröning, H., Pernkopf, F. (2020). Training Discrete-Valued Neural Networks with Sign Activations Using Weight Distributions. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11907. Springer, Cham. https://doi.org/10.1007/978-3-030-46147-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46147-8_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46146-1

  • Online ISBN: 978-3-030-46147-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics