Supplementary-architecture weight-optimization neural networks | Neural Computing and Applications Skip to main content

Advertisement

Log in

Supplementary-architecture weight-optimization neural networks

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Research efforts in the improvement of artificial neural networks have provided significant enhancements in learning ability, either through manual improvement by researchers or through automated design by other artificial intelligence techniques, and largely focusing on the architecture of the neural networks or the weight update equations used to optimize these architectures. However, a promising unexplored area involves extending the traditional definition of neural networks to allow a single neural network model to consist of multiple architectures, where one is a primary architecture and the others supplementary architectures. In order to use the information from all these architectures to possibly improve learning, weight update equations are customized per set-of-weights, and can each use the error of either the primary architecture or a supplementary architecture to update the values of that set-of-weights, with some necessary constraints to ensure valid updates. This concept was implemented and investigated. Grammatical evolution was used to make the complex architecture choices for each weight update equation, which succeeded in finding optimal choice combinations for classification and regression benchmark datasets, the KDD Cup 1999 intrusion detection dataset, and the UCLA graduate admission dataset. These optimal combinations were compared to traditional single-architecture neural networks, which they reliably outperformed at high confidence levels across all datasets. These optimal combinations were analysed using data mining tools, and this identified clear patterns, with the theoretical explanation provided as to how these patterns may be linked to optimality. The optimal combinations were shown to be competitive with state-of-the-art techniques on the same datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of data and material

Not applicable

Code availability

The implementation code is available at the following link: https://github.com/jared-oreilly/sawo-nn

References

  1. Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145–151

    Article  Google Scholar 

  2. Nesterov Y (1983) A method for unconstrained convex minimization problem with the rate of convergence o (1/\(k^2\)). Dokl Ussr 269:543–547

    Google Scholar 

  3. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121–2159

    MathSciNet  MATH  Google Scholar 

  4. Hinton G, Srivastava N, Swersky K (2012) Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Cited on 14(8):2

  5. Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701

  6. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  7. Dozat T (2016) Incorporating nesterov momentum into adam

  8. Reddi SJ, Kale S, Kumar S (2019) On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237

  9. Floreano D, Dürr P, Mattiussi C (2008) Neuroevolution: from architectures to learning. Evolut Intell 1(1):47–62

    Article  Google Scholar 

  10. Stanley KO, Clune J, Lehman J, Miikkulainen R (2019) Designing neural networks through neuroevolution. Nat Mach Intell 1(1):24–35

    Article  Google Scholar 

  11. Schraudolph NN, Belew RK (1992) Dynamic parameter encoding for genetic algorithms. Mach Learn 9(1):9–21

    Google Scholar 

  12. Mattiussi C, Dürr P, Floreano D (2007) Center of mass encoding: a self-adaptive representation with adjustable redundancy for real-valued parameters. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pp. 1304–1311

  13. Such FP, Madhavan V, Conti E, Lehman J, Stanley KO, Clune J (2017) Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567

  14. Igel C (2003) Neuroevolution for reinforcement learning using evolution strategies. In: The 2003 congress on evolutionary computation, 2003. CEC’03., vol. 4. IEEE, pp. 2588–2595

  15. Salimans T, Ho J, Chen X, Sidor S, Sutskever I (2017) Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864

  16. Mania H, Guy A, Recht B (2018) Simple random search provides a competitive approach to reinforcement learning. arXiv preprint arXiv:1803.07055

  17. Elsken T, Metzen JH, Hutter F (2019) Neural architecture search: a survey. J Mach Learn Res 20(1):1997–2017

    MathSciNet  MATH  Google Scholar 

  18. Liu Y, Sun Y, Xue B, Zhang M, Yen GG, Tan KC (2020) A survey on evolutionary neural architecture search. arXiv preprint arXiv:2008.10937

  19. Stanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting topologies. Evolut Comput 10(2):99–127

    Article  Google Scholar 

  20. Gruau F (1994) Automatic definition of modular neural networks. Adapt Behav 3(2):151–183

    Article  Google Scholar 

  21. Nolfi S, Miglino O, Parisi D (1994) Phenotypic plasticity in evolving neural networks. In: Proceedings of PerAc’94. From perception to action. IEEE, pp. 146–157

  22. Husbands P, Harvey I, Cliff D, Miller G (1994) The use of genetic algorithms for the development of sensorimotor control systems. In: Proceedings of PerAc’94. From perception to action. IEEE, pp. 110–121

  23. Soltoggio A, Bullinaria JA, Mattiussi C, Dürr P, Floreano D (2008) Evolutionary advantages of neuromodulated plasticity in dynamic, reward-based scenarios. In: Proceedings of the 11th International Conference on Artificial Life (Alife XI). MIT Press, pp. 569–576

  24. Risi S, Stanley K.O (2012) A unified approach to evolving plasticity and neural geometry. In: The 2012 international joint conference on neural networks (IJCNN). IEEE, pp. 1–8

  25. Tonelli P, Mouret J-B (2013) On the relationships between generative encodings, regularity, and learning abilities when evolving plastic artificial neural networks. PloS One 8(11):79138

    Article  Google Scholar 

  26. Soltoggio A, Durr P, Mattiussi C, Floreano D (2007) Evolving neuromodulatory topologies for reinforcement learning-like problems. In: 2007 IEEE congress on evolutionary computation. IEEE, pp. 2471–2478.

  27. Velez R, Clune J (2017) Diffusion-based neuromodulation can eliminate catastrophic forgetting in simple neural networks. PloS One 12(11):0187736

    Article  Google Scholar 

  28. Husbands P, Smith T, Jakobi N, O’Shea M (1998) Better living through chemistry: evolving gasnets for robot control. Connect Sci 10(3–4):185–210

    Article  Google Scholar 

  29. Ellefsen KO, Mouret J-B, Clune J (2015) Neural modularity helps organisms evolve to learn new skills without forgetting old skills. PLoS Comput Biol 11(4):1004128

    Article  Google Scholar 

  30. Real E, Moore S, Selle A, Saxena S, Suematsu YL, Tan J, Le QV, Kurakin A (2017) Large-scale evolution of image classifiers. In: International conference on machine learning. PMLR, pp. 2902–2911

  31. Real E, Aggarwal A, Huang Y, Le QV (2019) Regularized evolution for image classifier architecture search. In: Proceedings of the aaai conference on artificial intelligence, vol. 33, pp. 4780–4789

  32. Chalmers DJ (1991) The evolution of learning: an experiment in genetic connectionism. Connectionist models. Elsevier, Amsterdam, pp 81–90

    Chapter  Google Scholar 

  33. Fontanari J, Meir R (1991) Evolving a learning algorithm for the binary perceptron. Netw Comput Neural Syst 2(4):353

    Article  Google Scholar 

  34. DAN AD, Oflazer K (1993) Genetic synthesis of unsupervised learning algorithms. In: Proceedings of the 2nd Turkish symposium on artificial intelligence and ANNs. Department of Computer Engineering and Information Science, Bilkent University, Ankara

  35. Baxter J (1992) The evolution of learning algorithms for artificial neural networks. Complex Syst 313–326

  36. Risi S, Stanley KO (2010) Indirectly encoding neural plasticity as a pattern of local rules. In: International conference on simulation of adaptive behavior. Springer, pp. 533–543

  37. Hebb DO (2005) The organisation of behaviour: a neuropsychological theory. Psychology Press

  38. Floreano D, Mondada F (1996) Evolution of plastic neurocontrollers for situated agents. In: Proc. of the fourth international conference on simulation of adaptive behavior (SAB), from animals to animats. ETH Zürich

  39. Floreano D, Urzelai J (2001) Evolution of plastic control networks. Auton Robots 11(3):311–317

    Article  Google Scholar 

  40. Di Paolo EA (2003) Evolving spike-timing-dependent plasticity for single-trial learning in robots. Philos Trans R Soc Lond Ser A Math Phys Eng Sci 361(1811):2299–2319

    Article  MathSciNet  Google Scholar 

  41. Nolfi S, Parisi D (1996) Learning to adapt to changing environments in evolving neural networks. Adapt Behav 5(1):75–98

    Article  Google Scholar 

  42. Floreano D, Urzelai J (2000) Evolutionary robots with on-line self-organization and behavioral fitness. Neural Netw 13(4–5):431–443

    Article  Google Scholar 

  43. O’Neill M, Ryan C (2001) Grammatical evolution. IEEE Trans Evolut Comput 5(4):349–358

    Article  Google Scholar 

  44. Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181

    MathSciNet  MATH  Google Scholar 

  45. Zhang C, Liu C, Zhang X, Almpanidis G (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 82:128–150

    Article  Google Scholar 

  46. Rai M, Mandoria HL (2019) Network intrusion detection: a comparative study using state-of-the-art machine learning methods. In: 2019 international conference on issues and challenges in intelligent computing techniques (ICICT), vol. 1. IEEE, pp. 1–5

  47. Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: 2009 IEEE symposium on computational intelligence for security and defense applications. IEEE, pp. 1–6

  48. Acharya MS, Armaan A, Antony AS (2019) A comparison of regression models for prediction of graduate admissions. In: 2019 international conference on computational intelligence in data science (ICCIDS). IEEE, pp. 1–5

  49. Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390

    Article  Google Scholar 

  50. Battiti R (1992) First-and second-order methods for learning: between steepest descent and newton’s method. Neural Comput 4(2):141–166

    Article  Google Scholar 

  51. Johansson EM, Dowla FU, Goodman DM (1991) Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method. Int J Neural Syst 2(04):291–301

    Article  Google Scholar 

  52. Setiono R, Hui LCK (1995) Use of a quasi-newton method in a feedforward neural network construction algorithm. IEEE Trans Neural Netw 6(1):273–277

    Article  Google Scholar 

  53. Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11(2):431–441

    Article  MathSciNet  Google Scholar 

Download references

Funding

This work is based on the research supported in part by the National Research Foundation of South Africa (Grant Numbers 46712). Opinions expressed and conclusions arrived at, are those of the author and are not necessarily to be attributed to the NRF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jared O’Reilly.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Full architecture set A

Please view the appendices on the following pages.

See all architectures in the architecture set A below:

$$\begin{aligned} \begin{array}{llll} {}&{}i \cdot \vert i c \vert \cdot a \cdot o&{}i \cdot \vert a b \vert \cdot \vert i c \vert \cdot o&{}i \cdot \vert i b \vert \cdot c \cdot a \cdot o\\ {}&{}i \cdot c \cdot \vert i a\vert \cdot o&{}i \cdot a\cdot \vert b c \vert \cdot o&{}i \cdot b \cdot \vert i c \vert \cdot a \cdot o\\ i \cdot o&{}i \cdot \vert i c \vert \cdot \vert i a \vert \cdot o&{}i \cdot \vert i a \vert \cdot \vert b c \vert \cdot o&{}i \cdot b \cdot c \cdot \vert i a \vert \cdot o\\ i \cdot a \cdot o&{} i \cdot \vert a c \vert \cdot o&{}i \cdot a \cdot \vert i b c \vert \cdot o&{}i \cdot \vert i b \vert \cdot \vert i c \vert \cdot a \cdot o\\ i \cdot \vert i a \vert \cdot o&{}i \cdot \vert i a c \vert \cdot o&{}i \cdot \vert a b c \vert \cdot o&{}i \cdot \vert i b \vert \cdot c \cdot \vert i a \vert \cdot o\\ i\cdot b \cdot o&{}i \cdot b \cdot c \cdot o&{}i \cdot \vert i a b c \vert \cdot o&{}i \cdot b \cdot \vert i c \vert \cdot \vert i a\vert \cdot o\\ i \cdot \vert i b \vert \cdot o&{}i \cdot \vert i b \vert \cdot c \cdot o&{}i\cdot a \cdot c \cdot b \cdot o&{}i \cdot \vert b c \vert \cdot a \cdot o\\ i \cdot c \cdot o&{}i \cdot b \cdot \vert i c \vert \cdot o&{}i \cdot \vert i a \vert \cdot c\cdot b \cdot o&{}i \cdot \vert i b c \vert \cdot a \cdot o\\ i \cdot \vert i c \vert \cdot o&{}i \cdot \vert i b \vert \cdot \vert i c \vert \cdot o&{}i \cdot a \cdot \vert i c \vert \cdot b \cdot o&{}i \cdot \vert b c \vert \cdot \vert i a \vert \cdot o\\ i \cdot a \cdot b \cdot o&{}i \cdot c \cdot b \cdot o&{}i\cdot a \cdot c \cdot \vert i b \vert \cdot o&{}i \cdot c \cdot a \cdot b \cdot o\\ i \cdot \vert i a \vert \cdot b \cdot o&{}i \cdot \vert i c \vert \cdot b \cdot o&{}i \cdot \vert i a\vert \cdot \vert i c \vert \cdot b \cdot o&{}i \cdot \vert i c \vert \cdot a \cdot b \cdot o\\ i \cdot a \cdot \vert i b \vert \cdot o&{}i \cdot c \cdot \vert i b \vert \cdot o&{}i \cdot \vert i a \vert \cdot c \cdot \vert i b \vert \cdot o&{}i \cdot c \cdot \vert i a \vert \cdot b \cdot o\\ i \cdot \vert i a \vert \cdot \vert i b \vert \cdot o&{}i \cdot \vert i c \vert \cdot \vert i b \vert \cdot o&{}i \cdot a \cdot \vert i c \vert \cdot \vert i b \vert \cdot o&{}i \cdot c \cdot a \cdot \vert i b \vert \cdot o\\ i \cdot b \cdot a \cdot o&{}i \cdot \vert b c \vert \cdot o&{}i \cdot \vert a c \vert \cdot b \cdot o&{}i \cdot \vert i c \vert \cdot \vert i a \vert \cdot b \cdot o\\ i \cdot \vert i b \vert \cdot a \cdot o&{}i \cdot \vert i b c \vert \cdot o&{}i \cdot \vert i a c \vert \cdot b \cdot o&{}i\cdot \vert i c \vert \cdot a \cdot \vert i b \vert \cdot o\\ i \cdot b \cdot \vert i a \vert \cdot o&{}i \cdot a \cdot b \cdot c \cdot o&{}i \cdot \vert a c \vert \cdot \vert i b \vert \cdot o&{}i \cdot c \cdot \vert i a \vert \cdot \vert i b \vert \cdot o\\ i \cdot \vert i b \vert \cdot \vert i a \vert \cdot o&{}i \cdot \vert i a \vert \cdot b \cdot c \cdot o&{}i \cdot b \cdot a \cdot c \cdot o&{}i \cdot c\cdot \vert a b \vert \cdot o\\ i \cdot \vert a b \vert \cdot o&{}i \cdot a \cdot \vert i b \vert \cdot c \cdot o&{}i \cdot \vert i b \vert \cdot a \cdot c \cdot o&{}i \cdot c \cdot b \cdot a \cdot o\\ i\cdot \vert i a b \vert \cdot o&{}i\cdot a \cdot b \cdot \vert i c \vert \cdot o&{}i \cdot b\cdot \vert i a \vert \cdot c \cdot o&{}i\cdot \vert i c \vert \cdot b \cdot a \cdot o\\ i \cdot a \cdot c \cdot o&{}i \cdot \vert i a \vert \cdot \vert i b \vert \cdot c \cdot o&{}i \cdot b \cdot a \cdot \vert i c \vert \cdot o&{}i \cdot c \cdot \vert i b \vert \cdot a \cdot o\\ i \cdot \vert i a \vert \cdot c \cdot o&{}i \cdot \vert i a \vert \cdot b \cdot \vert i c \vert \cdot o&{}i \cdot \vert i b \vert \cdot \vert i a \vert \cdot c \cdot o&{}i \cdot c \cdot b \cdot \vert i a \vert \cdot o\\ i \cdot a \cdot \vert i c \vert \cdot o&{}i \cdot a \cdot \vert i b\vert \cdot \vert i c \vert \cdot o&{}i \cdot \vert i b \vert \cdot a \cdot \vert i c\vert \cdot o&{}i \cdot \vert i c \vert \cdot \vert i b \vert \cdot a \cdot o\\ i \cdot \vert i a \vert \cdot \vert i c \vert \cdot o&{}i \cdot \vert a b \vert \cdot c\cdot o&{}i \cdot b \cdot \vert i a \vert \cdot \vert i c \vert \cdot o&{}i \cdot \vert i c \vert \cdot b \cdot \vert i a \vert \cdot o\\ i\cdot c \cdot a \cdot o&{}i \cdot \vert i a b \vert \cdot c \cdot o&{}i \cdot b \cdot \vert a c \vert \cdot o&{}i\cdot c \cdot \vert i b \vert \cdot \vert i a \vert \cdot o\\ \quad &{}\quad &{}i \cdot b \cdot c \cdot a \cdot o&{}\quad \end{array} \end{aligned}$$

Appendix 2: Dataset sources and processing

See the source URLs, pre-processing information and additional notes for each dataset used in this research, below. For all datasets, any categorical features were one-hot encoded, and any continuous features were either standardized, if an underlying Gaussian distribution was detected in the feature, or normalized/min-max scaled, if not detected. This preparation is necessary for neural network processing, to minimise saturation and ensure all features have equal importance.

  1. 1

    Breast Cancer Wisconsin (Diagnostic) Source: data.csv from https://www.kaggle.com/uciml/breast-cancer-wisconsin-data Pre-processing: Removed id field, target is diagnosis field.

  2. 2

    Mushroom Classification Source: mushrooms.csv from https://www.kaggle.com/uciml/mushroom-classification Pre-processing: Target is class field.

  3. 3

    Heart Attack Analysis and Prediction Source: heart.csv from https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset Pre-processing: Target is output field.

  4. 4

    Iris Species Source: Iris.csv from https://www.kaggle.com/uciml/iris Pre-processing: Removed Id field, target is Species field.

  5. 5

    Red Wine Quality Source: winequality-red.csv from https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009 Pre-processing: Target is quality field.

  6. 6

    Glass Classification Source: glass.csv from https://www.kaggle.com/uciml/glass Pre-processing: Target is Type field.

  7. 7

    Wheat Seeds Source: Seed_Data.csv from https://www.kaggle.com/dongeorge/seed-from-uci Pre-processing: Target is target field.

  8. 8

    Boston House Price Source: housing.csv from https://www.kaggle.com/vikrishnan/boston-house-prices Pre-processing: Target is MEDV field. Additional notes: Data from website is separated by spaces and not commas, have to parse differently, and also have to add header line.

  9. 9

    Abalone Rings Source: abalone.csv from https://www.kaggle.com/rodolfomendes/abalone-dataset Pre-processing: Target is Rings field. Additional notes: Ring values are technically discrete, but represent age, so can be considered continuous.

  10. 10

    1985 Automobile Insurance Source: auto_clean.csv from https://www.kaggle.com/fazilbtopal/auto85 Pre-processing: Target is normalized-losses field.

  11. 11

    KDDCup 99 Intrusion Detection Source: Train_data.csv from https://www.kaggle.com/sampadab17/network-intrusion-detection Pre-processing: Removed \({num_outbound_cmds}\) and \({is_host_login}\) fields, target is class field. Additional notes: The two removed fields were both equal to 0 across all rows in dataset, so provided no use.

  12. 12

    Graduate Admission Source: Admission_Predict.csv from https://www.kaggle.com/mohansacharya/graduate-admissions Pre-processing: Removed Serial No. field, target is Chance of Admit field.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

O’Reilly, J., Pillay, N. Supplementary-architecture weight-optimization neural networks. Neural Comput & Applic 34, 11177–11197 (2022). https://doi.org/10.1007/s00521-022-07035-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07035-5

Keywords

Navigation