Abstract
Efficient classification methods can improve the data quality or relevance to better optimize some Internet applications such as fast searching engine and accurate identification. However, in the big data era, difficulties and volumes of data processing increase drastically. To decrease the huge computational cost, heuristic algorithms have been used. In this paper, an Adapting Chemotaxis Bacterial Foraging Optimization (ACBFO) algorithm is proposed based on basic Bacterial Foraging Optimization (BFO) algorithm. The aim of this work is to design a modified algorithm which is more suitable for data classification. The proposed algorithm has two updating strategies and one structural changing. First, the adapting chemotaxis step updating strategy is responsible to increase the flexibility of searching. Second, the feature subsets updating strategy better combines the proposed heuristic algorithm with the KNN classifier. Third, the nesting structure of BFO has been simplified to reduce the computation complexity. The ACBFO has been compared with BFO, BFOLIW and BPSO by testing on 12 widely used benchmark datasets. The result shows that ACBFO has a good ability of solving classification problems and gets higher accuracy than the other comparation algorithm.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Data classification is an essential process in data works, the sorting scheme for large-scale data brings severe challenges. For example, a good searching engine needs to classify large-scale data in a webpage, such as Yahoo!’s webpage taxonomy, which has around 300 thousand categories [1]. Besides, many enterprises attach importance to the customer relationship management system to maintain a good relation with customers. Its core function of accurate identification needs a well classification method [2]. In summary, massive amount of data will to be processed, efficiency becomes an important factor in Internet environment.
Feature selection is a widely used tool for data classification with substantial efficiency and effectiveness [2]. It refers to many areas, including text classification, chest pathology identification, facial emotion recognition [3, 4], and so on. With the exponentially increasing time of data processing in classification problems, improving the computational speed while ensuring the accuracy is a hot issue. Feature selection methods select representative features from massive data and the generating optimized subsets can help improve the efficiency of computation in classification. Cutting down the irrelevant, redundant or the trivial features is the core of it [5,6,7]. Methods of feature selection can be classified into two main categories. One divides the methods into supervised, semi-supervised and unsupervised types by observing the number of features’ label [8]. The other divides the methods into filter, wrapper and embedded by distinguishing the structure of the algorithms. [9, 10]. These methods have their own special characteristics, but are related to each other. Filter sorts and screens data before classification and delete the lower ranking features [13, 15]. Nevertheless, this method often loops over all data, which cost lots of time in solving high dimensional data. Different from this, wrapper randomly selecting features by combing classifiers with different heuristic algorithms such as PSO, ACO, GA [11, 12, 16] etc.
The classifiers often includes, naive Bayes classifier, SVM, random forest classifier and so on. Their combination reduces the amount of calculation. But the accuracy is decreased. To deal with this, embedded method has been created. It combines the advantage of filter and wrapper, aiming at improving calculating effectiveness of algorithms [17]. However, selecting an appropriate combination is very much depends on the researchers’ practice experience [18].
As a popular heuristic algorithm, BFO is selected to be modified. It’s proposed in 2002, inspired by the process of bacterial survival and reproduction [19]. This algorithm is good at randomly searching optimal solutions, because of its ‘reproduction’ and ‘dispersal-elimination’ strategies that can help the individual escapes from the local optima. Although, it can be applied into preprocessing the multidimensional data, the accuracy will not be increased if only use the original algorithm. Adding adaptive strategy can change the swimming method of each bacteria and makes their searching area more diversity [25]. Besides, a supervised classification method, KNN has been adapted as the classifier.
In sum up, this paper proposed a wrapper-supervised classification method, ACBFO (Adapting Chemotaxis Bacterial Foraging Optimization), which aims to realize the high efficiency of data classification. It is empirically faster and more accurate than the other methods in most of time, especially when dealing with large-scale data. In practice, this achievement is significant, which can save many computation costs in data works. The contribution in this research is a novel method that improves the classification efficiency of the heuristic algorithm (BFO). The main goals and organization of the paper are as follow.
1.1 Goals
Three classical heuristic algorithms will be compared: Bacterial Foraging Optimization (BFO) [19], Bacterial Foraging Optimization with linear chemotaxis (BFOLIW) [20], and Binary Particle Swarm Optimization (BPSO) [21]. The first two methods are bacterial foraging based algorithm, they adapt the same optimization framework but different in chemotaxis strategies, while BPSO employ the binary mechanism based on the PSO. The main aims of this research are listed below:
-
A modified bacterial foraging based algorithm is proposed with adaptive chemotaxis to increase the accuracy of classification.
-
An elite feature combination strategy are designed to adaptively reduce the dimension of feature subset to increase the classification efficiency.
-
The reproduction and elimination mechanisms are redesigned to reduce the computation cost.
1.2 Organization
The other components of this paper are as follows: Sect. 2 introduces the basic information of Bacterial foraging optimization algorithm. Section 3 elaborates the concrete details of proposed algorithm. The experimental design and result are discussed in Sect. 4. Finally, the conclusion and future work are presented in Sect. 5.
2 Bacterial Foraging Optimization Algorithm
Combining heuristic algorithm with certain classifier is popular used in data classification nowadays. It will be implemented by means if feature selection methods. Feature selection can be realized by multitudinous approaches. Traditional approaches are based on statistics which sorts the features one by one through traversing entire dataset generating feature subsets and evaluating them by evaluation function [2]. This is appropriate for less quantity data.
Large amount of time will be cost when dealing with high dimension. Heuristic algorithms have exceptional performance in optimization which is good to be integrated with the evaluation function of feature selection. As one of common heuristic algorithm, bacterial foraging optimization is a heuristic distributed optimization algorithm. It emulates the social foraging habits of E. coli bacteria which contains chemotaxis, reproduction and elimination-dispersal nested processing [19].
Chemotaxis simulates the process of bacterial foraging. In this secession, bacteria swarming towards the place with high concentration of nutrients. One chemotaxis contains two steps [19], a tumble after tumble or a run after a tumble. They determine the nutrient concentration at the site by special pheromone. Once a unit find a good place, it will release attraction pheromone to inform other units [22]. On the contrary, if the nutrients is low concentration or is presenting noxious substances, it will release repulsive pheromone to notice other units to avoid approaching. In the BFO algorithm, this mechanism can help find the fitness of evaluation function more precisely. The step size of bacterial foraging optimization during chemotaxis stage is:
where the \( \theta^{i} \left( {j,k,l} \right) \) indicates the concentration of nutrients in \( j_{th} \) chemotaxis, \( k_{th} \) reproduction and \( l_{th} \) elimination-dispersal. \( C\left( i \right) \) is the chemotaxis step and \( \Delta \left( i \right) \) is a random vector limited in [−1,1]. Formula of cell to cell effect of bacterial foraging optimization is:
where \( J\left( {i,j,k,l} \right) \) presents the pheromone of \( i_{th} \) bacteria. \( J_{cc} \left( {\theta^{i} \left( {j,k,l} \right),P\left( {j,k,l} \right)} \right) \) controls the spreading rate of attractant or repelling agent [19].
Reproduction means the updating of bacterial group which contains two steps. Firstly, ranking the concentration of remaining nutrients in the environment [23]. Second, half of the bacterial are replaced by the reproduction of the top 50%. The change of bacterial foraging optimization nutrients:
where the \( J_{health}^{i} \) is the concentration of remaining nutrients, it presents the consumption of nutrients, the less \( J_{health}^{i} \), the higher it ranks.
Elimination-dispersal happens randomly in a custom probability generation mechanism [24]. When the conditions are met, the position of bacterial will be reset to enhance the algorithm’s ability of escaping from the local optimum. In the next section, the proposed methods for feature selection based on BFO will be introduced.
3 Adapting Chemotaxis Bacterial Foraging Optimization
The basic BFO can be used in training the classifier of feature selection. However, it often takes long time to do it due to the high dimension of data will increase the calculation cost with the nested structural BFO algorithm. Besides, the size of the training data also has impact on it which cannot be ignored. To improve these deficiencies, Adapting Chemotaxis Bacterial Foraging Optimization algorithm (ACBFO) which design an adapting learning strategy in chemotaxis section. Meanwhile, the K Nearest Neighbor algorithm holds the position of classifier for its fast speed and the characteristics of easy implement. The basic framework of ACBFO shows in Fig. 1.
3.1 Adapting Chemotaxis Mechanism
Basic chemotaxis step in BFO is fixed, but the movement of bacteria is not rigid in reality. Fixed step makes the bacteria easy to be caught in a same local place which is not benefit for the steady development of the population. A simple adaptive step changing method is adopted [26]. In the ACBFO, the initial step for each bacterium is:
It means that each bacterium \( i \) has different swimming step, which increasing the diversity of bacterial population.
During the foraging time, every unit needs to learning from others which can improving the probability to find a nutrient-rich place. The learning strategy is classical in particle swarm optimization algorithm.
where the \( PBest_{i} \) is the personal optimal value for each bacterial colony and the \( Best \) is the global best value for each iteration. The pseudo code of ACBFO is below.
3.2 Feature Combination Updating Mechanism
When dealing with high dimension data, reducing the features which are barely improving the classification is good to better training the classifier and reduce the calculation time. The basic position updating formula is:
after this step, a judge mechanism is design to updating the feature combination for next calculation of nutrient \( J\left( {i,j} \right) \). The judge mechanism has shown in Table 1 and the calculating formula for \( J\left( {i,j} \right) \) is:
The pseudo code of feature combination updating is below (Table 2).
3.3 Bacterial Population Position Updating Mechanism
Bacterial population is updated after feature combination updating. During the bacterium swimming period, if the training performance of classifier is bad (e.g. training result leads the error rate lower than threshold value over certain times), elimination-dispersal will start. The population of bacterium needs reset. Otherwise, reproduce the bacterial population. The pseudo code of bacterial population updating is below (Table 3):
4 Experimental Design and Result
In this section, the proposed algorithm is compared with three classical intelligent heuristic algorithms. This paper evaluate the ACBFO algorithm with binary PSO, standard BFO and its variants BFOLIW (with linear chemotaxis) empirically by comparing their classification accuracy and time. The parameter setting and datasets for testing are as follow.
4.1 Parameter Setting
The parameters setting of them are followed: The popular size S is 50, the dimension of datasets is averagely divided into 10 parts. For example, 11_Tumors datasets take the rule of ‘5:5:50’, which means the number of selected features that be inputted into the algorithm is from 5 to 50, with the grow step of 5. The run times of algorithm in each dimension is 30. In this experiment, there is not much difference of accuracy between 5 iterations and 30 iterations. So, the iteration times of each runs is 5, because the amount of computation is enough under the appropriate population size and run times.
4.2 Datasets for Testing
The performance of the algorithms are evaluated by the classification accuracy based on 12 datasets which are widely used in testing the effect of feature selection algorithm. Table 4 shows the detail of the datasets, they are obtained from the http://www.gems-system.org/. which is used in testing the performance of a discrete bacterial algorithm for feature selection [27]. When training the classifier, randomly choosing 70% of the data as training set, the remain 30% are as testing set.
4.3 Experiment Result in Accuracy
The experiment was implemented in MATLAB, aims at analyzing the accuracy and run time of the proposed algorithm. The accuracy was measured by the rules ‘Accuracy = 1- The error rate of classification’. The results are shown in Fig. 2. The abscissa represents the number of evaluated features in each evaluation and the ordinate represents the accuracy of classification. The proposed ACBFO algorithm performs well in most of time, especially in SRBCT, Lung Cancer Large and Leukemia 2 for their ‘accuracy’ is higher than 90%. These datasets has 50 ~ 200 instances, 3 ~ 11 classes and the average ‘instances/feature attribution’ rate is 1.2%. It reflects the ACBFO is good at dealing with multi-attribution data.
As shown in Fig. 3, ACBFO does well in (g), (h) and (l) which has small ratio of ‘instances/feature attribution’. However, it is unsteady when dealing with the data with little attributions. The accuracy lower than the compared algorithm serval times in 2 class datasets (i), (j) and (k). In conclusion, ACBFO can increase the accuracy and efficiency of data classification when dealing with high dimension datasets. But it’s unstable if the classes of the dataset is less than 3.
4.4 Experiment Result in Efficiency
Table 5 shows the average accuracy and the average computation time of each compared algorithms. As shown in the results, the ACBFO still performance better than other compared algorithms in most time. Although, it seems to be surpassed several times in certain datasets, the overall performance in raising classification accuracy is good. On one hand, even BPSO is better than ACBFO in ‘German’ datasets, the actual accuracy difference is only 0.002. On the other hand, the computational effective of ACBFO is well in nigh-twelfth of data.
What needs illustration is that the statistics below are acquired from a new experiment that the population size of each algorithm are set into 5. The data in Table 5 proof that, the proposed algorithm can get good result even the searching group is small.
5 Conclusion and Future Work
Based on the basic BFO, a modified algorithm is proposed. ACBFO includes an adapting chemotaxis strategy and a feature updating strategy. It improves the performance of heuristic algorithm in feature selection and classification. After testing it in 12 popular basics datasets, the results shows that it can obtain better classification accuracy in the datasets of smaller ‘instances/feature attribution’ ratio. In theory, this research find a different way to make the BFO better connect with KNN classifier, acquiring a well computation accuracy and efficiency. In practice, this achievement can save many computation costs in data works. The contribution of this paper is a novel method that improves the classification efficiency of the heuristic algorithm (BFO). Comparing with BFO, BFOLIW and BPSO, the performance of ACBFO should to be enhance by increasing the stability and further improving of computational accuracy. For example, designing a communicating mechanism in different bacterial groups to avoid bacterium units becoming too scattered, which is averse to result convergency.
References
Liu, T., Yang, Y., Wan, H., Zeng, H., Chen, Z., Ma, W.: Support vector machines classification with a very large-scale taxonomy. SIGKDD Explor. 7, 36–43 (2005)
Saberi, M., Theobald, M., Hussain, O.K., Chang, E., Hussain, F.K.: Interactive feature selection for efficient customer recognition in contact centers: dealing with common names. Exp. Syst. Appl. 113, 356–376 (2018)
Li, J., et al.: Feature selection: a data perspective. JACS 50, 1–45 (2017)
Uysal, A.K.: An improved global feature selection scheme for text classification. Exp. Syst. Appl. 43, 82–92 (2016)
Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Komen, E., Greenspan, H.: Chest pathology identification using deep feature selection with non-medical training. Comput. Methods Biomech. Biomed. Eng.: Imaging Vis. 6, 259–263 (2018
Nakariyakul, S., Casasent, D.P.: Improved forward floating selection algorithm for feature subset selection (2008)
Tan, P., Wang, X., Wang, Y.: Dimensionality reduction in evolutionary algorithms-based feature selection for motor imagery brain-computer interface. Swarm Evol. Comput. 52, 100597 (2020)
Li, M., Wang, H., Yang, L., Liang, Y., Shang, Z., Wan, H.: Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction. Exp. Syst. Appl. 150, 113277 (2020)
Ashfaq, R.A.R., Wang, X.Z., Huang, J.Z., Abbas, H., He, Y.L.: Fuzziness based semi-supervised learning approach for intrusion detection system. Inf. Sci. 378, 484–497 (2017)
Zhang, R., Nie, F., Li, X., Wei, X.: Feature selection with multi-view data: a survey. 50, 158–167 (2019)
Tang, B., Zhang, L.: Local preserving logistic I-relief for semi-supervised feature selection. Neurocomputing (2020)
Banisakher, M., Mohammed, D., Nguyen, V.: A new optimization approach to resource distribution using semi-supervised learning graphs. Int. J. Simul.—Syst. Sci. Technol. 19 (2018)
Shahzad, W., Rehman, Q., Ahmed, E.: Missing data imputation using genetic algorithm for supervised learning. Int. J. Adv. Comput. Sci. Appl. 8(3), 438–445 (2017)
Dong, W., Zhou, M.: A supervised learning and control method to improve particle swarm optimization algorithms. IEEE Trans. Syst. Man Cybern.: Syst. 47, 1135–1148 (2016)
Tian, M., Bo, Y., Chen, Z., Wu, P., Yue, C.: A new improved firefly clustering algorithm for SMC-PHD filter. Appl. Soft Comput. 85, 105840 (2019)
Moghaddasi, S.S., Faraji, N.: A hybrid algorithm based on particle filter and genetic algorithm for target tracking. Exp. Syst. Appl. 147, 113188 (2020)
Labani, M., Moradi, P., Ahmadizar, F., Jalili, M.: A novel multivariate filter method for feature selection in text classification problems. Eng. Appl. Artif. Intell. 70, 25–37 (2018)
Maldonado, S., López, J.: Dealing with high-dimensional class-imbalanced datasets: embedded feature selection for SVM classification. Appl. Soft Comput. 67, 94–105 (2018)
Zhu, Q.H., Yang, Y.B.: Discriminative embedded unsupervised feature selection. Pattern Recogn. Lett. 112, 219–225 (2018)
Passino, K.M.: Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Syst. Mag. 22(3), 52–67 (2002)
Kora, P., Kalva, S.R.: Hybrid bacterial foraging and particle swarm optimization for detecting bundle branch block. SpringerPlus 4(1), 1–19 (2015). https://doi.org/10.1186/s40064-015-1240-z
Too, J., Abdullah, A.R., Mohd Saad, N.: A new co-evolution binary particle swarm optimization with multiple inertia weight strategy for feature selection. In: Informatics: Multidisciplinary Digital Publishing Institute, vol. 21 (2019)
Zeema, J.L., Christopher, D.F.X.: Evolving optimized neutrosophic C means clustering using behavioral inspiration of artificial bacterial foraging (ONCMC-ABF) in the prediction of Dyslexia. J. King Saud Univ.-Comput. Inf. Sci. (2019)
Pourpanah, F., Lim, C.P., Wang, X., Tan, C.J., Seera, M., Shi, Y.: A hybrid model of fuzzy min–max and brain storm optimization for feature selection and data classification. Neurocomputing 333, 440–451 (2019)
Pan, Y., Xia, Y., Zhou, T., Fulham, M.: Cell image segmentation using bacterial foraging optimization. Appl. Soft Comput. 58, 770–782 (2017)
Majhi, R., Panda, G., Majhi, B., Sahoo, G.: Efficient prediction of stock market indices using adaptive bacterial foraging optimization (ABFO) and BFO based techniques. Exp. Syst. Appl. 36(6), 10097–10104 (2009)
Wang, H., Jing, X., Niu, B.: A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data. Knowl.-Based Syst. 126, 8–19 (2017)
Acknowledgments
This work is partially supported by the Natural Science Foundation of China (Grant no. 71901152), Natural Science Foundation of Guangdong Province (2018A030310575), Natural Science Foundation of Shenzhen University (85303/00000155).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, H., Ou, Y. (2020). An Adapting Chemotaxis Bacterial Foraging Optimization Algorithm for Feature Selection in Classification. In: Tan, Y., Shi, Y., Tuba, M. (eds) Advances in Swarm Intelligence. ICSI 2020. Lecture Notes in Computer Science(), vol 12145. Springer, Cham. https://doi.org/10.1007/978-3-030-53956-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-53956-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-53955-9
Online ISBN: 978-3-030-53956-6
eBook Packages: Computer ScienceComputer Science (R0)