Abstract
Deep learning provides a variety of neural network-based models, known as deep neural networks (DNNs), which are being successfully used in several domains to build highly accurate predictors. A key factor which usually makes DNNs to outperform traditional machine learning models is the amount of data that is nowadays accessible and available. Nevertheless, there are other factors linked to DNNs topologies that may also have influence on the predictive performance of DNN models. In particular, fully connected deep neural networks (fc-DNNs) typically struggle in achieving good performance rates when applied to small datasets. This is due to the high number of parameters which need to be learned when training this kind of models, which makes them prone to over-fitting issues. In this paper, authors propose the use of problem-specific information in order to impose constraints to network architecture so that a fc-DNN is transformed into a partially connected DNN (pc-DNN), in such a way that network topology is driven by prior knowledge. This work compares two baseline models, the elastic net and fc-DNNs, to pc-DNNs applied on three synthetic datasets with different number of samples. Synthetic data was generated to estimate the goodness of using problem-specific information to drive network architectures. Furthermore, a similar analysis is performed herein on a real-world problem dataset to show the benefits of pc-DNN models in term of predictive performance. The results of the analysis showed that pc-DNNs with built-in problem-specific information clearly outperformed the elastic net and fc-DNNs in most of the datasets used, in either synthetic or real-world problems. The pc-DNNs turned out to be a useful model, especially when it is applied to small- or medium-size datasets, on which it significantly outperformed the baseline models considered in this study. Specifically, the pc-DNNs achieved AUC and MSE improvement rates of (\(8.21\%\), \(19.79\%\)) and (\(6.65\%\), \(20.54\%\)) in small- and medium-size datasets for both case studies analyzed, the synthetic and real-world problem, respectively.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Code available at: https://github.com/durda-ubu/pcDNNs/blob/main/code/activations.py.
Data available at: https://github.com/durda-ubu/pcDNNs/tree/main/synthetic%20dataset.
Apply for access at: http://www.juntadeandalucia.es/medioambiente/servtc5/WebClima/menu_consultas.jsp?b=s.
References
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. CVPR. IEEE Computer Society, Washington, pp 770–778
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., New York, pp 1097–1105
Cao Y, Geddes TA, Hwa Yang JY, Yang P (2020) Ensemble deep learning in bioinformatics. Nat Mach Intell 2(9):500–508
Angermueller C, Pärnamaa T, Parts L, Stegle O (2016) Deep learning for computational biology. Mol Syst Biol 12(7):878
Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118
Litjens G, Sánchez CI, Timofeeva N, Hermsen M, Nagtegaal I, Kovacs I, Hulsbergen-Van De Kaa C, Bult P, Van Ginneken B, Van Der Laak J (2016) Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci Rep 6(26):286
Amodei D, Ananthanarayanan S, et al (2016) Deep speech 2 : end-to-end speech recognition in english and mandarin. In: Balcan MF, Weinberger KQ (eds.) Proceedings of The 33rd international conference on machine learning, proceedings of machine learning research. PMLR. vol. 48, pp. 173–182
Shang C, Yang F, Huang D, Lyu W (2014) Data-driven soft sensor development based on deep learning technique. J Process Control 24(3):223–233
Lee D, Kang S, Shin J (2017) Using deep learning techniques to forecast environmental consumption level. Sustain Sci Pract Policy 9(10):1894
Banko M, Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Proceedings of the 39th annual meeting on association for computational linguistics, ACL ’01, pp 26–33
Pereira F, Norvig P, Halevy A (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24:8–12
Koumakis L (2020) Deep learning models in genomics; are we there yet? Comput Struct Biotechnol J 18:1466–1473
Tobore I, Li J, Yuhang L, Al-Handarish Y, Kandwal A, Nie Z, Wang L (2019) Deep learning intervention for health care challenges: some biomedical domain considerations. JMIR mHealth uHealth 7(8):e11966
Moradi R, Berangi C, Minaei B (2020) A survey of regularization strategies for deep models. Artif Intell Rev 53(6):3947–3985. https://doi.org/10.1007/s10462-019-09784-7
Lu J, Behbood V, Hao P, Zuo H, Xue S, Zhang G (2015) Transfer learning using computational intelligence: a survey. Knowl Based Syst 80:14–23. https://doi.org/10.1016/j.knosys.2015.01.010
Shorten C, Khoshgoftaar TM (2019) A n for deep learning. J Big Data. https://doi.org/10.1186/s40537-019-0197-0
Antoniou A, Storkey A, Edwards H (2018) Data augmentation generative adversarial networks
Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Gr Stat 22(2):231–245
Sagi O, Rokach L (2018) Ensemble learning: a survey. WIREs Data Min Knowl Discov 8(4):e1249. https://doi.org/10.1002/widm.1249
Nusrat I, Jang SB (2018) A comparison of regularization techniques in deep neural networks. Symmetry 10(11):648
Ghods A, Cook DJ (2020) A survey of deep network techniques all classifiers can adopt. Data Min Knowl Discov. https://doi.org/10.1007/s10618-020-00722-8
Noh H, You T, Mun J, Han B (2017) Regularizing deep neural networks by noise: its interpretation and optimization. In: Guyon I, Luxburg UV, Bengio , Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5109–5118. Curran Associates, Inc.. https://proceedings.neurips.cc/paper/2017/file/217e342fc01668b10cb1188d40d3370e-Paper.pdf
Khan SH, Hayat M, Porikli F (2019) Regularization of deep neural networks with spectral dropout. Neural Netw 110:82–90. https://doi.org/10.1016/j.neunet.2018.09.009
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks
Moreno-Barea FJ, Strazzera F, Jerez JM, Urda D, Franco L (2018) Forward noise adjustment scheme for data augmentation. In: 2018 IEEE symposium series on computational intelligence (SSCI), pp 728–734
Kingma DP, Welling M (2014) Auto-Encoding Variational Bayes. In: 2nd international conference on learning representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings
Li X, Zhang W, Ding Q, Sun JQ (2020) Intelligent rotating machinery fault diagnosis based on deep learning using data augmentation. J Intell Manuf 31:433–452. https://doi.org/10.1007/s10845-018-1456-1
Liu S, Lee K, Lee I (2020) Document-level multi-topic sentiment classification of email data with bilstm and data augmentation. Knowl Based Syst 197(105):918. https://doi.org/10.1016/j.knosys.2020.105918
Pan SJ, Yang Q et al (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Liang G, Zheng L (2020) A transfer learning method with deep residual network for pediatric pneumonia diagnosis. Comput Method Progr Biomed 187(104):964. https://doi.org/10.1016/j.cmpb.2019.06.023
Khan S, Islam N, Jan Z, Ud Din I, Rodrigues JJPC (2019) A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recognit Lett 125:1–6. https://doi.org/10.1016/j.patrec.2019.03.022
Wei W,Meng D, Zhao Q, Xu Z, Wu (2019) emi-supervised transfer learning for image rain removal. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67
Al-Smadi M, Al-Zboon S, Jararweh Y, Juola P (2020) Transfer learning for Arabic named entity recognition with deep neural networks. IEEE Access 8:37736–37745. https://doi.org/10.1109/ACCESS.2020.2973319
López-García G, Jerez JM, Franco L, Veredas FJ (2020) Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data. PLoS One 15(3):e0230536
Pesciullesi G, Schwaller P, Laino T, Reymond JL (2020) Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydrates. Nat Commun 11:4874. https://doi.org/10.1038/s41467-020-18671-7
Mocanu DC, Mocanu E, Stone P, Nguyen PH, Gibescu M, Liotta A (2018) Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat Commun 9(1):2383
Nosrati MS, Hamarneh G (2016) Incorporating prior knowledge in medical image segmentation: a survey. CoRR abs/1607.01092. http://arxiv.org/abs/1607.01092
Luque-Baena R, Urda D, Gonzalo Claros M, Franco L, Jerez J (2014) Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords. J Biomed Inform 49:32–44. https://doi.org/10.1016/j.jbi.2014.01.006
Kim Y, Kim Y, Lee S, Yang H, Kim S (2019) Personalized prediction of acquired resistance to EGFR-targeted inhibitors using a pathway-based machine learning approach. Cancers 11(1):45. https://doi.org/10.3390/cancers11010045
Urda D, Aragón F, Bautista R, Franco L, Veredas FJ, Claros MG, Jerez JM (2018) BLASSO: integration of biological knowledge into a regularized linear model. BMC Syst Biol 12(Suppl 5):94
Frecon J, Salzo S, Pontil M (2018) Bilevel learning of the group lasso structure. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates, Inc., New York, pp 8301–8311
Tian S, Wang C, Wang B (2019) Incorporating pathway information into feature selection towards better performed gene signatures. BioMed Res Int 2019. https://doi.org/10.1155/2019/2497509
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Nilashi M, Bagherifard K, Rahmani M, Rafe V (2017) A recommender system for tourism industry using cluster ensemble and prediction machine learning techniques. Comput Ind Eng 109:357–368. https://doi.org/10.1016/j.cie.2017.05.016
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320
Tibshirani R (1996) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 58(1):267–288
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22
Hassabis D, Kumaran D, Summerfield C, Botvinick M (2017) Neuroscience-inspired artificial intelligence. Neuron 95(2):245–258. https://doi.org/10.1016/j.neuron.2017.06.011
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, ICML’10, pp 807–814
KiseÎák J, Lu Y, Svihra J, Szépe P, Stehlík M, (2020) SPOCU: scaled polynomial constant unit activation function. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05182-1
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR. http://arxiv.org/abs/1502.03167
Chollet F, Allaire J, et al (2017) R interface to keras. https://github.com/rstudio/keras
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR abs/1412.6980. http://arxiv.org/abs/1412.6980
Eum KD, Kazemiparkouhi F, Wang B, Manjourides J, Pun V, Pavlu V, Suh H (2019) Long-term NO2 exposures and cause-specific mortality in american older adults. Environ Int 124:10–15
Sanyal S, Rochereau T, Maesano CN, Com-Ruelle L, Annesi-Maesano I (2018) Long-Term effect of outdoor air pollution on mortality and morbidity: a 12-year Follow-Up study for metropolitan France. Int J Environ Res Public Health 15(11):2487
Sabolová R, Sečkárová V, Dušek J, Stehlík M (2015) Entropy based statistical inference for methane emissions released from wetland. Chemom Intell Lab Syst 141:125–133. https://doi.org/10.1016/j.chemolab.2014.12.008
Kříž R (2014) Chaos in nitrogen dioxide concentration time series and its prediction. In: Zelinka I, Suganthan PN, Chen G, Snasel V, Abraham A, Rössler O (eds) Nostradamus 2014: prediction, modeling and analysis of complex systems. Springer International Publishing, Cham, pp 365–376
Liu Y, Tian Y, Chen M (2017) Research on the prediction of carbon emission based on the chaos theory and neural network. Int J Bioautom 21(4):339–348
Stehlík M, Dusek J, Kiselák J, (2016) Missing chaos in global climate change data interpreting? Ecol Complex 25:53–59. https://doi.org/10.1016/j.ecocom.2015.12.003
Navares R, Aznarte JL (2020) Predicting air quality with deep learning lstm: towards comprehensive models. Ecol Inform 55:101019. https://doi.org/10.1016/j.ecoinf.2019.101019
Izonin I, Greguš ml, M, Tkachenko R, Logoyda M, Mishchuk O, Kynash Y, (2019) Sgd-based wiener polynomial approximation for missing data recovery in air pollution monitoring dataset. In: Rojas I, Joya G, Catala A (eds) Adv Comput Intell. Springer International Publishing, Cham, pp 781–793
Wang J, Song G (2018) A deep spatial-temporal ensemble model for air quality prediction. Neurocomputing 314:198–206. https://doi.org/10.1016/j.neucom.2018.06.049
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence. Volume 2, IJCAI’95, pp 1137–1143
AlBadawy EA, Saha A, Mazurowski MA (2018) Deep learning for segmentation of brain tumors: impact of cross-institutional training and testing. Med Phys 45(3):1150–1158. https://doi.org/10.1002/mp.12752
Chui KT, Tsang KF, Chi HR, Ling BWK, Wu CK (2016) An accurate ECG-based transportation safety drowsiness detection scheme. IEEE Trans Ind Inform 12(4):1438–1452. https://doi.org/10.1109/TII.2016.2573259
Bergmeir C, Hyndman RJ, Koo B (2018) A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput Stat Data Anal 120:70–83. https://doi.org/10.1016/j.csda.2017.11.003
Bischl B, Richter J, Bossek J, Horn D, Thomas J, Lang M (2018) mlrMBO: a modular framework for model-based optimization of expensive black-box functions. http://arxiv.org/abs/1703.03373
Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Int Res 11(1):169–198
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Lacoste A, Laviolette F, Marchand M (2012) Bayesian comparison of machine learning algorithms on single and multiple datasets. Proc Fifteenth Int Conf Artif Intell Stat 22:665–675
Acknowledgements
Authors acknowledge support through grants RTI2018-098160-B-I00 and TIN2017-88728-C2 from the Spanish Ministerio de Ciencia, Innovación y Universidades, which include ERDF funds.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Urda, D., Veredas, F.J., González-Enrique, J. et al. Deep neural networks architecture driven by problem-specific information. Neural Comput & Applic 33, 9403–9423 (2021). https://doi.org/10.1007/s00521-021-05702-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-05702-7