Abstract
Research efforts in the improvement of artificial neural networks have provided significant enhancements in learning ability, either through manual improvement by researchers or through automated design by other artificial intelligence techniques, and largely focusing on the architecture of the neural networks or the weight update equations used to optimize these architectures. However, a promising unexplored area involves extending the traditional definition of neural networks to allow a single neural network model to consist of multiple architectures, where one is a primary architecture and the others supplementary architectures. In order to use the information from all these architectures to possibly improve learning, weight update equations are customized per set-of-weights, and can each use the error of either the primary architecture or a supplementary architecture to update the values of that set-of-weights, with some necessary constraints to ensure valid updates. This concept was implemented and investigated. Grammatical evolution was used to make the complex architecture choices for each weight update equation, which succeeded in finding optimal choice combinations for classification and regression benchmark datasets, the KDD Cup 1999 intrusion detection dataset, and the UCLA graduate admission dataset. These optimal combinations were compared to traditional single-architecture neural networks, which they reliably outperformed at high confidence levels across all datasets. These optimal combinations were analysed using data mining tools, and this identified clear patterns, with the theoretical explanation provided as to how these patterns may be linked to optimality. The optimal combinations were shown to be competitive with state-of-the-art techniques on the same datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and material
Not applicable
Code availability
The implementation code is available at the following link: https://github.com/jared-oreilly/sawo-nn
References
Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145–151
Nesterov Y (1983) A method for unconstrained convex minimization problem with the rate of convergence o (1/\(k^2\)). Dokl Ussr 269:543–547
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121–2159
Hinton G, Srivastava N, Swersky K (2012) Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Cited on 14(8):2
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Dozat T (2016) Incorporating nesterov momentum into adam
Reddi SJ, Kale S, Kumar S (2019) On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237
Floreano D, Dürr P, Mattiussi C (2008) Neuroevolution: from architectures to learning. Evolut Intell 1(1):47–62
Stanley KO, Clune J, Lehman J, Miikkulainen R (2019) Designing neural networks through neuroevolution. Nat Mach Intell 1(1):24–35
Schraudolph NN, Belew RK (1992) Dynamic parameter encoding for genetic algorithms. Mach Learn 9(1):9–21
Mattiussi C, Dürr P, Floreano D (2007) Center of mass encoding: a self-adaptive representation with adjustable redundancy for real-valued parameters. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pp. 1304–1311
Such FP, Madhavan V, Conti E, Lehman J, Stanley KO, Clune J (2017) Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567
Igel C (2003) Neuroevolution for reinforcement learning using evolution strategies. In: The 2003 congress on evolutionary computation, 2003. CEC’03., vol. 4. IEEE, pp. 2588–2595
Salimans T, Ho J, Chen X, Sidor S, Sutskever I (2017) Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864
Mania H, Guy A, Recht B (2018) Simple random search provides a competitive approach to reinforcement learning. arXiv preprint arXiv:1803.07055
Elsken T, Metzen JH, Hutter F (2019) Neural architecture search: a survey. J Mach Learn Res 20(1):1997–2017
Liu Y, Sun Y, Xue B, Zhang M, Yen GG, Tan KC (2020) A survey on evolutionary neural architecture search. arXiv preprint arXiv:2008.10937
Stanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting topologies. Evolut Comput 10(2):99–127
Gruau F (1994) Automatic definition of modular neural networks. Adapt Behav 3(2):151–183
Nolfi S, Miglino O, Parisi D (1994) Phenotypic plasticity in evolving neural networks. In: Proceedings of PerAc’94. From perception to action. IEEE, pp. 146–157
Husbands P, Harvey I, Cliff D, Miller G (1994) The use of genetic algorithms for the development of sensorimotor control systems. In: Proceedings of PerAc’94. From perception to action. IEEE, pp. 110–121
Soltoggio A, Bullinaria JA, Mattiussi C, Dürr P, Floreano D (2008) Evolutionary advantages of neuromodulated plasticity in dynamic, reward-based scenarios. In: Proceedings of the 11th International Conference on Artificial Life (Alife XI). MIT Press, pp. 569–576
Risi S, Stanley K.O (2012) A unified approach to evolving plasticity and neural geometry. In: The 2012 international joint conference on neural networks (IJCNN). IEEE, pp. 1–8
Tonelli P, Mouret J-B (2013) On the relationships between generative encodings, regularity, and learning abilities when evolving plastic artificial neural networks. PloS One 8(11):79138
Soltoggio A, Durr P, Mattiussi C, Floreano D (2007) Evolving neuromodulatory topologies for reinforcement learning-like problems. In: 2007 IEEE congress on evolutionary computation. IEEE, pp. 2471–2478.
Velez R, Clune J (2017) Diffusion-based neuromodulation can eliminate catastrophic forgetting in simple neural networks. PloS One 12(11):0187736
Husbands P, Smith T, Jakobi N, O’Shea M (1998) Better living through chemistry: evolving gasnets for robot control. Connect Sci 10(3–4):185–210
Ellefsen KO, Mouret J-B, Clune J (2015) Neural modularity helps organisms evolve to learn new skills without forgetting old skills. PLoS Comput Biol 11(4):1004128
Real E, Moore S, Selle A, Saxena S, Suematsu YL, Tan J, Le QV, Kurakin A (2017) Large-scale evolution of image classifiers. In: International conference on machine learning. PMLR, pp. 2902–2911
Real E, Aggarwal A, Huang Y, Le QV (2019) Regularized evolution for image classifier architecture search. In: Proceedings of the aaai conference on artificial intelligence, vol. 33, pp. 4780–4789
Chalmers DJ (1991) The evolution of learning: an experiment in genetic connectionism. Connectionist models. Elsevier, Amsterdam, pp 81–90
Fontanari J, Meir R (1991) Evolving a learning algorithm for the binary perceptron. Netw Comput Neural Syst 2(4):353
DAN AD, Oflazer K (1993) Genetic synthesis of unsupervised learning algorithms. In: Proceedings of the 2nd Turkish symposium on artificial intelligence and ANNs. Department of Computer Engineering and Information Science, Bilkent University, Ankara
Baxter J (1992) The evolution of learning algorithms for artificial neural networks. Complex Syst 313–326
Risi S, Stanley KO (2010) Indirectly encoding neural plasticity as a pattern of local rules. In: International conference on simulation of adaptive behavior. Springer, pp. 533–543
Hebb DO (2005) The organisation of behaviour: a neuropsychological theory. Psychology Press
Floreano D, Mondada F (1996) Evolution of plastic neurocontrollers for situated agents. In: Proc. of the fourth international conference on simulation of adaptive behavior (SAB), from animals to animats. ETH Zürich
Floreano D, Urzelai J (2001) Evolution of plastic control networks. Auton Robots 11(3):311–317
Di Paolo EA (2003) Evolving spike-timing-dependent plasticity for single-trial learning in robots. Philos Trans R Soc Lond Ser A Math Phys Eng Sci 361(1811):2299–2319
Nolfi S, Parisi D (1996) Learning to adapt to changing environments in evolving neural networks. Adapt Behav 5(1):75–98
Floreano D, Urzelai J (2000) Evolutionary robots with on-line self-organization and behavioral fitness. Neural Netw 13(4–5):431–443
O’Neill M, Ryan C (2001) Grammatical evolution. IEEE Trans Evolut Comput 5(4):349–358
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181
Zhang C, Liu C, Zhang X, Almpanidis G (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 82:128–150
Rai M, Mandoria HL (2019) Network intrusion detection: a comparative study using state-of-the-art machine learning methods. In: 2019 international conference on issues and challenges in intelligent computing techniques (ICICT), vol. 1. IEEE, pp. 1–5
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: 2009 IEEE symposium on computational intelligence for security and defense applications. IEEE, pp. 1–6
Acharya MS, Armaan A, Antony AS (2019) A comparison of regression models for prediction of graduate admissions. In: 2019 international conference on computational intelligence in data science (ICCIDS). IEEE, pp. 1–5
Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390
Battiti R (1992) First-and second-order methods for learning: between steepest descent and newton’s method. Neural Comput 4(2):141–166
Johansson EM, Dowla FU, Goodman DM (1991) Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method. Int J Neural Syst 2(04):291–301
Setiono R, Hui LCK (1995) Use of a quasi-newton method in a feedforward neural network construction algorithm. IEEE Trans Neural Netw 6(1):273–277
Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11(2):431–441
Funding
This work is based on the research supported in part by the National Research Foundation of South Africa (Grant Numbers 46712). Opinions expressed and conclusions arrived at, are those of the author and are not necessarily to be attributed to the NRF.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Full architecture set A
Please view the appendices on the following pages.
See all architectures in the architecture set A below:
Appendix 2: Dataset sources and processing
See the source URLs, pre-processing information and additional notes for each dataset used in this research, below. For all datasets, any categorical features were one-hot encoded, and any continuous features were either standardized, if an underlying Gaussian distribution was detected in the feature, or normalized/min-max scaled, if not detected. This preparation is necessary for neural network processing, to minimise saturation and ensure all features have equal importance.
-
1
Breast Cancer Wisconsin (Diagnostic) Source: data.csv from https://www.kaggle.com/uciml/breast-cancer-wisconsin-data Pre-processing: Removed id field, target is diagnosis field.
-
2
Mushroom Classification Source: mushrooms.csv from https://www.kaggle.com/uciml/mushroom-classification Pre-processing: Target is class field.
-
3
Heart Attack Analysis and Prediction Source: heart.csv from https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset Pre-processing: Target is output field.
-
4
Iris Species Source: Iris.csv from https://www.kaggle.com/uciml/iris Pre-processing: Removed Id field, target is Species field.
-
5
Red Wine Quality Source: winequality-red.csv from https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009 Pre-processing: Target is quality field.
-
6
Glass Classification Source: glass.csv from https://www.kaggle.com/uciml/glass Pre-processing: Target is Type field.
-
7
Wheat Seeds Source: Seed_Data.csv from https://www.kaggle.com/dongeorge/seed-from-uci Pre-processing: Target is target field.
-
8
Boston House Price Source: housing.csv from https://www.kaggle.com/vikrishnan/boston-house-prices Pre-processing: Target is MEDV field. Additional notes: Data from website is separated by spaces and not commas, have to parse differently, and also have to add header line.
-
9
Abalone Rings Source: abalone.csv from https://www.kaggle.com/rodolfomendes/abalone-dataset Pre-processing: Target is Rings field. Additional notes: Ring values are technically discrete, but represent age, so can be considered continuous.
-
10
1985 Automobile Insurance Source: auto_clean.csv from https://www.kaggle.com/fazilbtopal/auto85 Pre-processing: Target is normalized-losses field.
-
11
KDDCup 99 Intrusion Detection Source: Train_data.csv from https://www.kaggle.com/sampadab17/network-intrusion-detection Pre-processing: Removed \({num_outbound_cmds}\) and \({is_host_login}\) fields, target is class field. Additional notes: The two removed fields were both equal to 0 across all rows in dataset, so provided no use.
-
12
Graduate Admission Source: Admission_Predict.csv from https://www.kaggle.com/mohansacharya/graduate-admissions Pre-processing: Removed Serial No. field, target is Chance of Admit field.
Rights and permissions
About this article
Cite this article
O’Reilly, J., Pillay, N. Supplementary-architecture weight-optimization neural networks. Neural Comput & Applic 34, 11177–11197 (2022). https://doi.org/10.1007/s00521-022-07035-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07035-5