CF-DAML: Distributed automated machine learning based on collaborative filtering | Applied Intelligence Skip to main content
Log in

CF-DAML: Distributed automated machine learning based on collaborative filtering

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The search for a good machine learning (ML) model takes a long time and requires the considerations of many alternatives, including data preprocessing, algorithm selection, and hyperparameter tuning methods. Thus, tedious searches face a combinatorial explosion problem. In this work, we build a new automated machine learning (AutoML) system called CF-DAML, a distributed automated system based on collaborative filtering (CF), to address these challenges by recommending and training suitable models for supervised learning tasks. CF-DAML first computes some informative meta-features for a new dataset, then uses a weighted \(l_1\)-norm (W1-norm) to accurately calculate the k nearest neighbors (kNN) of the new dataset, and finally recommends the top N models with good performances on each of its neighbors to the new dataset. We also design a distributed system (DSTM) for training the models to reduce the time complexity substantially. In addition, we develop a multilayer selective stacked ensemble system (MSSE), whose base models are selected from among suitable candidate models based on their runtimes, classification accuracies, and diversities, to enhance the stability of CF-DAML. To our knowledge, this is the first work to combine memory-based CF and the selective stacked ensemble to solve the AutoML problem. Extensive experiments are conducted on many UCI datasets and the comparative results demonstrate that our approach outperforms the current state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Agarwal A, Chauhan M, et al (2017) Similarity measures used in recommender systems: a study. International Journal of Engineering Technology Science and Research IJETSR :2394–3386

  2. Ahuja S, Panigrahi BK, Dey N, Rajinikanth V, Gandhi TK (2021) Deep transfer learning-based automated detection of covid-19 from lung ct scan slices. Applied Intelligence 51(1):571–585

    Article  Google Scholar 

  3. Alshammari G, Kapetanakis S, Polatidis N, Petridis M (2018) A triangle multi-level item-based collaborative filtering method that improves recommendations. In: International conference on engineering applications of neural networks. Springer, pp 145–157

  4. Asuncion A, Newman D (2007) UCI machine learning repository. Irvine, CA,

  5. Aziz ZA, Abdulqader DN, Sallow AB, Omer HK (2021) Python parallel processing and multiprocessing: A rivew. Academic Journal of Nawroz University 10(3):345–354

    Article  Google Scholar 

  6. Bergstra J, Yamins D, Cox D (2013) Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In: International conference on machine learning. PMLR, pp 115–123

  7. Brazdil P, Carrier CG, Soares C, Vilalta R (2008) Metalearning: Applications to data mining. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  8. Chen Z, Zhao P, Li F, Marquez-Lago TT, Leier A, Revote J, Zhu Y, Powell DR, Akutsu T, Webb GI et al (2020) ilearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of dna, rna and protein sequence data. Briefings in bioinformatics 21(3):1047–1057

    Article  Google Scholar 

  9. Cui Z, Xu X, Fei X, Cai X, Cao Y, Zhang W, Chen J (2020) Personalized recommendation system based on collaborative filtering for iot scenarios. IEEE Transactions on Services Computing 13(4):685–695

    Article  Google Scholar 

  10. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7:1–30

    MathSciNet  MATH  Google Scholar 

  11. Dunjko V, Briegel HJ (2018) Machine learning & artificial intelligence in the quantum domain: a review of recent progress. Reports on Progress in Physics 81(7):074001

    Article  MathSciNet  Google Scholar 

  12. Feurer M, Klein A, Eggensperger K, Springenberg JT, Blum M, Hutter F (2015) Efficient and robust automated machine learning. In: Proceedings of the 28th international conference on neural information processing systems - Volume 2, NIPS’15. MIT Press, Cambridge, pp 2755-2763

  13. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the american statistical association 32(200):675–701

    Article  MATH  Google Scholar 

  14. Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics 11(1):86–92

    Article  MathSciNet  MATH  Google Scholar 

  15. Fusi N, Sheth R, Elibol M (2018) Probabilistic matrix factorization for automated machine learning. Advances in neural information processing systems 31:3348–3357

    Google Scholar 

  16. Gogas P, Papadimitriou T (2021) Machine learning in economics and finance. Computational Economics 57(1):1–4

    Article  Google Scholar 

  17. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD explorations newsletter 11(1):10–18

    Article  Google Scholar 

  18. Han M, Park J, Baek W (2020) Design and implementation of a criticality-and heterogeneity-aware runtime system for task-parallel applications. IEEE Transactions on Parallel and Distributed Systems 32(5):1117–1132

    Article  Google Scholar 

  19. Han ST, Yingjiao R, Dongliang P, Mengfan X, Yunfei G (2020) A novel variable structure multi-model approach based on error-ambiguity decomposition. Chinese Journal of Aeronautics 33(6):1731–1746

    Article  Google Scholar 

  20. Jain G, Mahara T, Tripathi KN (2020) A survey of similarity measures for collaborative filtering-based recommender system. Soft Computing: Theories and Applications :343–352

  21. Kant S, Mahara T (2018) Merging user and item based collaborative filtering to alleviate data sparsity. International Journal of System Assurance Engineering and Management 9(1):173–179

    Google Scholar 

  22. Karabadji NEI, Beldjoudi S, Seridi H, Aridhi S, Dhifli W (2018) Improving memory-based user collaborative filtering with evolutionary multi-objective optimization. Expert Systems with Applications 98:153–165

    Article  Google Scholar 

  23. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY (2017) Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30:3146–3154

    Google Scholar 

  24. Kohavi R et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai, vol 14. Montreal, Canada, pp 1137–1145

    Google Scholar 

  25. Komer B, Bergstra J, Eliasmith C (2014) Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: ICML workshop on AutoML, vol 9. Citeseer, p 50

  26. Krogh A, Vedelsby J et al (1995) Neural network ensembles, cross validation, and active learning. Advances in neural information processing systems 7:231–238

    Google Scholar 

  27. Li S, Zhou X, Shi H, Pan F, Li X, Zhang Y (2018) Multimode processes monitoring based on hierarchical mode division and subspace decomposition. The Canadian Journal of Chemical Engineering 96(11):2420–2430

    Article  Google Scholar 

  28. Liu J, Jiang C, Zheng J (2021) Batch bayesian optimization via adaptive local search. Applied Intelligence 51(3):1280–1295

    Article  Google Scholar 

  29. Maher M, Sakr S (2019) Smartml: A meta learning-based framework for automated selection and hyperparameter tuning for machine learning algorithms. In: EDBT: 22nd international conference on extending database technology

  30. Nemenyi PB (1963) Distribution-free multiple comparisons. Princeton University, Princeton

    Google Scholar 

  31. Nguyen V, Gupta S, Rana S, Li C, Venkatesh S (2019) Filtering bayesian optimization approach in weakly specified search space. Knowledge and Information Systems 60(1):385–413

    Article  Google Scholar 

  32. Olson RS, Moore JH (2016) Tpot: A tree-based pipeline optimization tool for automating machine learning. In: Workshop on automatic machine learning. PMLR, pp 66–74

  33. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. Journal of machine Learning research 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  34. Prabuchandran K, Penubothula S, Kamanchi C, Bhatnagar S (2021) Novel first order bayesian optimization with an application to reinforcement learning. Applied Intelligence 51(3):1565–1579

    Article  Google Scholar 

  35. Rahmel J et al (2020) Applying artificial intelligence in finance and asset management: A discussion of status quo and the way forward. Journal of Financial Transformation 51:67–74

    Google Scholar 

  36. Ran SJ, Tirrito E, Peng C, Chen X, Tagliacozzo L, Su G, Lewenstein M (2020) Tensor network contractions: methods and applications to quantum many-body systems. Springer Nature, Berlin

    Book  MATH  Google Scholar 

  37. Reif M, Shafait F, Dengel A (2012) Meta-learning for evolutionary parameter optimization of classifiers. Machine learning 87(3):357–380

    Article  MathSciNet  Google Scholar 

  38. Rodríguez A, Navarro A, Asenjo R, Corbera F, Gran R, Suárez D, Nunez-Yanez J (2020) Parallel multiprocessing and scheduling on the heterogeneous xeon+ fpga platform. The Journal of Supercomputing 76(6):4645–4665

    Article  Google Scholar 

  39. Ryo M, Jeschke JM, Rillig MC, Heger T (2020) Machine learning with the hierarchy-of-hypotheses (hoh) approach discovers novel pattern in studies on biological invasions. Research synthesis methods 11(1):66–73

    Article  Google Scholar 

  40. van der Schaar M, Alaa AM, Floto A, Gimson A, Scholtes S, Wood A, McKinney E, Jarrett D, Lio P, Ercole A (2021) How artificial intelligence and machine learning can help healthcare systems respond to covid-19. Machine Learning 110(1):1–14

    Article  MathSciNet  MATH  Google Scholar 

  41. Schütt KT, Chmiela S, von Lilienfeld OA, Tkatchenko A, Tsuda K, Müller KR (2020) Scalone: machine learning meets quantum physics. Springer, Berlin

    Book  MATH  Google Scholar 

  42. Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2015) Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE 104(1):148–175

    Article  Google Scholar 

  43. Shen J, Zhou T, Chen L (2020) Collaborative filtering-based recommendation system for big data. International Journal of Computational Science and Engineering 21(2):219–225

    Article  Google Scholar 

  44. Shi J, Yu T, Goebel K, Wu D (2021) Remaining useful life prediction of bearings using ensemble learning: The impact of diversity in base learners and features. Journal of Computing and Information Science in Engineering 21(2):021004

    Article  Google Scholar 

  45. Shvets AT (2020) Multiprocessing with tasks. In: Beginning Ada programming. Springer, pp 167–194

  46. Singh PK, Sinha M, Das S, Choudhury P (2020) Enhancing recommendation accuracy of item-based collaborative filtering using bhattacharyya coefficient and most similar item. Applied Intelligence 50(12):4708–4731

    Article  Google Scholar 

  47. Srifi M, Oussous A, Ait Lahcen A, Mouline S (2020) Recommender systems based on collaborative filtering using review textsa survey. Information 11(6):317

    Article  Google Scholar 

  48. Stocker S, Csányi G, Reuter K, Margraf JT (2020) Machine learning in chemical reaction space. Nature communications 11(1):1–11

    Article  Google Scholar 

  49. Sun T, Zhou ZH (2018) Structural diversity for decision tree ensemble learning. Frontiers Comput. Sci. 12(3):560–570

    Article  Google Scholar 

  50. Székely GJ, Rizzo ML et al (2009) Brownian distance covariance. The annals of applied statistics 3(4):1236–1265

    MathSciNet  MATH  Google Scholar 

  51. Tamke M, Nicholas P, Zwierzycki M (2018) Machine learning for architectural design: Practices and infrastructure. International Journal of Architectural Computing 16(2):123–143

    Article  Google Scholar 

  52. Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. pp 847–855

  53. Tian Z, Luo C, Qiu J, Du X, Guizani M (2020) A distributed deep learning system for web attack detection on edge devices. IEEE Transactions on Industrial Informatics 16(3):1963–1971. https://doi.org/10.1109/TII.2019.2938778

    Article  Google Scholar 

  54. Ullah Z, Al-Turjman F, Mostarda L, Gagliardi R (2020) Applications of artificial intelligence and machine learning in smart cities. Computer Communications 154:313–323

    Article  Google Scholar 

  55. Valcarce D, Landin A, Parapar J, Barreiro Á (2019) Collaborative filtering embeddings for memory-based recommender systems. Engineering Applications of Artificial Intelligence 85:347–356

    Article  Google Scholar 

  56. Verbraeken J, Wolting M, Katzy J, Kloppenburg J, Verbelen T, Rellermeyer JS (2020) A survey on distributed machine learning. ACM Computing Surveys (CSUR) 53(2):1–33

    Article  Google Scholar 

  57. Wang D, Yih Y, Ventresca M (2020) Improving neighbor-based collaborative filtering by using a hybrid similarity measurement. Expert Systems with Applications 160:113651

    Article  Google Scholar 

  58. Wang Y, Deng J, Gao J, Zhang P (2017) A hybrid user similarity model for collaborative filtering. Information Sciences 418:102–118

    Article  Google Scholar 

  59. Wang Y, Wang D, Geng N, Wang Y, Yin Y, Jin Y (2019) Stacking-based ensemble learning of decision trees for interpretable prostate cancer detection. Applied Soft Computing 77:188–204

    Article  Google Scholar 

  60. Wolpert DH (1992) Stacked generalization. Neural networks 5(2):241–259

    Article  Google Scholar 

  61. Wu X, Zhang J, Wang FY (2020) Stability-based generalization analysis of distributed learning algorithms for big data. IEEE Transactions on Neural Networks and Learning Systems 31(3):801–812. https://doi.org/10.1109/TNNLS.2019.2910188

    Article  MathSciNet  Google Scholar 

  62. Xie Y, He M, Ma T, Tian W (2021) Optimal distributed parallel algorithms for deep learning framework tensorflow. Applied Intelligence :1–21

  63. Yang C, Akimoto Y, Kim DW, Udell M (2019) Oboe: Collaborative filtering for automl model selection. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. pp 1173–1183

  64. Yang C, Fan J, Wu Z, Udell M (2020) Automl pipeline selection: Efficiently navigating the combinatorial space. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. pp 1446–1456

  65. Yang C, Guo J, Zhang M et al (2018) Adaptive terminal sliding mode control method based on rbf neural network for operational auv and its experimental research. Robot 40(3):336–345

    Google Scholar 

  66. Yu M, Quan T, Peng Q, Yu X, Liu L (2021) A model-based collaborate filtering algorithm based on stacked autoencoder. Neural Computing and Applications :1–9

  67. Yue W, Wang Z, Liu W, Tian B, Lauria S, Liu X (2021) An optimally weighted user-and item-based collaborative filtering approach to predicting baseline data for friedreichs ataxia patients. Neurocomputing 419:287–294

    Article  Google Scholar 

  68. Yue W, Wang Z, Tian B, Pook M, Liu X (2020) A hybrid model-and memory-based collaborative filtering algorithm for baseline data prediction of friedreich’s ataxia patients. IEEE Transactions on Industrial Informatics 17(2):1428–1437

    Article  Google Scholar 

  69. Zhang J, Lin Y, Lin M, Liu J (2016) An effective collaborative filtering algorithm based on user preference clustering. Applied Intelligence 45(2):230–240

    Article  Google Scholar 

  70. Zhang Z, Zhang Y, Ren Y (2020) Employing neighborhood reduction for alleviating sparsity and cold start problems in user-based collaborative filtering. Information Retrieval Journal 23(4):449–472

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China under Grant No. 2019YFB1706202.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofeng Zhou.

Appendices

Appendix A: The models and their hyperparameters in the DataBase

Each number in brackets in the first column indicates times that the associated model appears in the DataBase. The second column contains hyperparameters of the models, where \({\{a, b,...\}}\) indicates discrete values; [ab] represents values of a, a + 1, a + 2, ..., b; U(ab) represents a uniform distribution in the region (a, b); \(e^{\lambda }, (a, b)\) represent a value distribution obeying an exponential distribution with the parameter \(\lambda\) and the value ranging from a to b. Preprocessors and their hyperparameter configurations are shown in Table 8, classifiers and their hyperparameter configurations are shown in Table 9.

Table 8 The preprocessors and their hyperparameter configurations
Table 9 The classifiers and their hyperparameter configurations

Appendix B: The distance correlation coefficients between selected meta-features

Here, we present the Dcc values for DatasetRatio (DR), InverseDatasetRatio (IDR), LogDatasetRatio (LDR), and LogInverseDatasetRatio (LIDR), in Table 10; for LogNumberOfFeatures (LOF) and NumberOfFeatures (NOF), in Table 11; for LogNumberOfInstances (LONI) and NumberOfInstances (NOI), in Table 12; for RatioNominalToNumerical (RNoTNu) and RatioNumericalToNominal (RNuTNo), in Table 13; for NumberOfClasses (NOC) and ClassEntropy (CE), in Table 14; for NumberOfCategoricalFeatures (NOCF) and SymbolsSum (SS), in Table 15; for NumberOfNumericFeatures (NuONuF) and LogNumberOfFeatures (LNuOF), in Table 16; and for PCAKurtosisFirstPC (PCAKFPC) and KurtosisMin (KM), in Table 17.

Table 10 Dcc values for {DR, IDR, LDR, LIDR}
Table 11 Dcc values for {LOF, NOF}
Table 12 Dcc values for {LNOI, NOI}
Table 13 Dcc values for {RNoTNu, RNuTNo}
Table 14 Dcc values for {NOC, CE}
Table 15 Dcc values for {NOCF, SS}
Table 16 Dcc values for {NuONuF, LNuOF}
Table 17 Dcc values for {PCAKFPC, KM}

The physical meanings of DR, LDR, IDR, and LIDR involve comparisons between the numbers of rows and columns in the datasets. The correlation coefficients between LDR and the other three meta-features are very large; therefore, LDR is chosen, and the other three meta-features are discarded (Tables 18 and 19).

Appendix C: Detailed attributes of the experimental datasets

Table 18 The detailed attributes of the small datasets
Table 19 The detailed attributes of the medium and large datasets

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, P., Pan, F., Zhou, X. et al. CF-DAML: Distributed automated machine learning based on collaborative filtering. Appl Intell 52, 17145–17169 (2022). https://doi.org/10.1007/s10489-021-03049-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-03049-z

Keywords