1 Introduction

Apple’s iPhone was released on June 29, 2007. Today iPhone has evolved and experienced an immense popularity due to its ability to provide a wide variety of services to users. Thereafter, iPhone is inevitably becoming the hot targets of hackers and there are many malicious programs targeting iPhone [1]. Two known root exploits on iPhone are: Libtiff and SMS fuzzing [2]. The attackers could use these exploits to steal personal data from iPhone. For Libtiff, discovered by Ormandy, it opens a potential vulnerability that could be exploited when SSH is actively running [36]. Rick Farrow demonstrated how a maliciously crafted TIFF can be opened and lead to arbitrary code execution [7]. The SMS fuzzing is another iPhone exploit that can allow a hacker to control the iPhone through SMS messages [5, 8]. The first worm, known as iKee, was developed by a 21-year-old Australian hacker named Ashley Towns [9]. This worm could change iPhone’s wallpaper to a photograph of the British 1980s pop singer named Rick Astley. After two weeks, a new malware named iKee.B was spotted by XS4ALL across almost Europe [9]. The iSAM is an iPhone stealth airborne malware incorporated six different malware mechanisms [10]. It could connect back to the bot server to update its programming logic or to obey commands and unleash a synchronized attack. The iPhone has the ability to zoom in/out and view maps. A lot of widgets with finger touches to the screen are also available for iPhone. Thereafter, the iPhone can easily access e-mails on the internet and store personal data. The spam e-mails are sent to users’ mailbox without their permission. The overabundant of spam e-mails not only affects the network bandwidth, but also becomes the hotbeds of malicious programs in information security [11]. It is an important issue for users of iPhone to filter spam e-mails and then prevent the leakage of personal data.

Traditionally, machine learning techniques formalize a problem of clustering of spam message collection through the objective function. The objective function is a maximization of similarity between messages in clusters, which is defined by \(k\)-nearest neighbor (\(k\)NN) algorithm. Genetic algorithm including penalty function for solving clustering problem is also proposed [12]. Unfortunately, above approaches do not provide good enough performance to filter spam e-mails for iPhone. In this paper, an artificial bee-based decision tree (ABBDT) is applied to filter spam e-mails for iPhone. In the proposed approach, decision tree is used to filter spam e-mails. In addition, artificial bee algorithm is used to ameliorate the testing accuracy of decision tree.

The remainder of this paper is organized as follows. The proposed approach is based on decision tree and artificial bee colony. Section 2 first introduces decision tree and artificial bee colony. Then, Sect. 3 introduces the proposed ABBDT approach to filter spam e-mails. Experimental results are compared with those of existing algorithms in Sect. 4. Conclusions and future work are finally drawn in Sect. 5.

2 The introduction of decision tree and artificial bee colony

The proposed ABBDT approach is based on decision tree and artificial bee colony (ABC). In this section, the brief descriptions of decision tree and artificial bee colony are introduced.

For artificial bee colony algorithm, proposed by Karaboga in 2005, simulates the foraging behavior of a bee colony into three groups of bees: employed bees (forager bees), onlooker bees (observer bees) and scouts [13]. ABC algorithms have been applied in many applications [1421]. The ABC algorithm starts with randomly produced initial food sources that correspond to the solutions for employed bees. In the ABC algorithm, each food source has only one employed bee. Employed bees investigate the food source and share their food information with onlooker bees in a hive. The higher quality of food source, the larger probability will be selected by onlooker bees. The employed bee of a discarded food source becomes a scout bee for searching for new food source. For decision tree (DT) learning algorithm, proposed by Quinlan, is a tree-like rule induction approach that the representing rules could be easily understood [22]. DT uses the partition information entropy minimization to recursively partition the data set into smaller subdivisions, and then generates a tree structure. This tree-like structure is composed of a root node (formed from all of the data), a set of internal nodes (splits), and a set of leaf nodes. A decision tree can be used to classify patterns by starting at the root node of the tree and moving through it until a leaf node has met [2326].

3 The proposed ABBDT approach

The operating system of iPhone, named as iOS, is defined at the WWDC conference in 2010 [27]. The iOS architecture is divided into core operating system layer, core service layer, media layer, and cocoa touch layer. Each layer provides programming frameworks for the development of applications that run on top of the underlying hardware. The iOS architecture is shown in Fig. 1. Using tools, it is easy to collect e-mails stored at the path of “/var/mobile/library/mail” [28, 29].

Fig. 1
figure 1

iOS system architecture

The flow chart of the proposed ABBDT approach is shown in Fig. 2.

Fig. 2
figure 2

The flow chart of the proposed algorithm

In Fig. 2, the dataset is first pre-processed as training and testing data and then the initial solutions are randomly generated. There are 12 attributes in the iPhone e-mails dataset as shown in Table 1. Some of these attributes are also important for spam e-mails in computers [30].

Table 1 The 12 attributes in the dataset

The solution is represented as 12 attributes followed with 2 variables, MCs and CF as shown in Fig. 3. The initial population of solutions is defined as the number of \(\beta \) in the \(D\)-dimensional food sources.

$$\begin{aligned} F({\varvec{X}}_i),\quad X_i \in R^D,\ i\in \{1,2,3,\ldots ,\beta \} \end{aligned}$$
(1)

where \(\varvec{X}_i=[x_{i1},x_{i2},\ldots ,x_{iD} ]\) is the position of the \(i\)th food source and \(F({\varvec{X}}_i)\) is the object function which represents the quality of the \(i\)th food source. To update a feasible food source (solution) position \({\varvec{V}}_i=[v_{i1},v_{i2},\ldots ,v_{iD}]\) from the old one \({\varvec{X}}_i\), the ABC algorithm uses Eq. (2) as follow.

$$\begin{aligned} v_{ij}=x_{ij} +\varphi _{ij} (x_{ij}-x_{kj}) \end{aligned}$$
(2)

In Eq. (2), \(v_{ij} \) is a new feasible solution, \(k\in \{1,2,3,\ldots ,\beta \}\) and \(j\in \{1,2,3,\ldots ,D\}\) are randomly chosen indexes, \(k\) has to be different from \(j\), and \(\varphi _{ij}\) is a random number in the range \([-1, 1].\) After all employed bees complete their searches, they share their information related to the nectar amounts and food sources positions with the onlooker bees on the dance area. An onlooker bee evaluates the nectar information taken from all employed bees. Additionally, the probability for an onlooker bee chooses a food source is defined as Eq. (3).

Fig. 3
figure 3

The representation of solution for ABBDT

$$\begin{aligned} P_i=F(X_i)/\sum \limits _{k=1}^S F(X_k) \end{aligned}$$
(3)

For the food source, its intake performance is defined as \(F/T\) where \(F\) is the amount of nectar and \(T\) is the time spent at the food source [20, 31]. If a food source cannot be further improved through a predetermined number of iterations, the food source is assumed to be abandoned, and the corresponding employed bee becomes a scout bee. The new random position chosen by the scout bee is described as follows.

$$\begin{aligned} x_{ij}=x_j^\mathrm{min}+\emptyset _{ij}*(x_j^\mathrm{max}-x_j^\mathrm{min}) \end{aligned}$$
(4)

where \(x_j^\mathrm{min}\) is the lower bound, \(x_j^\mathrm{max}\) is the upper bound of the food source position in dimension \(j,\) and \(\emptyset _{ij}\) is a random number in the range \([0, 1].\) Thereafter, Eqs. (2)–(4) are used to decide the best values of 12 attributes, MCs, and CF for DT. The values of 12 attributes range from 0 to 1. The corresponding attribute is selected if its value is less than or equal to 0.5. On the other hand, the corresponding attribute is not selected if its value is greater than 0.5. The values of MCs and CF are varied between 1 and 100. In the proposed ABBDT approach, it could select the best subset of attributes to maximize the testing accuracy. When applied to the set of train patterns, \(\hbox {Info}(S)\) measures the average amount of information needed to identify the class of the pattern \(S\).

$$\begin{aligned} \hbox {Info}(S)&= -\sum \limits _{j=1}^k\left\{ \left[ \hbox {freq}\left( C_{j}, \frac{S}{|S|}\right) \right] \right. \nonumber \\&\left. \log _2 \left[ \hbox {freq}\left( C_{j}, \frac{S}{|S|}\right) \right] \right\} \end{aligned}$$
(5)

where \(|S|\) is the number of cases in the training set. \(C_{j}\) is a class for \(j=1,2,\ldots ,k\) where \(k\) is the number of classes and \(\hbox {freq}(C_{j},\frac{s}{|S|})\) is the number of cases included in \(C_{j}.\) To consider the expected information value \(\hbox {Info}_x(S)\) for attribute \(X\) to the partition \(S\), it can be stated as:

$$\begin{aligned} \hbox {Info}_x(S)=-\sum \limits _{j=1}^n\left\{ \left[ \left( \frac{|{S_j}|}{|S|}\right) \hbox {Info}(S_j)\right] \right\} \end{aligned}$$
(6)

where \(n\) is the number of output for the attribute \(X\), \(S_{j}\) is a subset of \(S\) corresponding to the \(j\)th output and \(|{S_j}|\) is the number of cases of the subset \(S_{j}\). The information gain according to attribute \(X\) is shown as

$$\begin{aligned} \hbox {Gain}(X)=\hbox {Info}(S)-\hbox {Info}_x(S) \end{aligned}$$
(7)

Then, the potential information \(\hbox {SplitInfo}(X)\) generated by dividing \(S\) into \(n\) subsets is defined as.

$$\begin{aligned} \hbox {SplitInfo}(X)=-\sum \limits _{j=1}^n\left\{ \left[ \left( \frac{|S_j|}{|S|}\right) \log _2\left( \frac{|S_j|}{|S|}\right) \right] \right\} \end{aligned}$$
(8)

Finally, the gain ratio \(\hbox {GainRatio}(X)\) is calculated as

$$\begin{aligned} \hbox {GainRatio}(X)=\hbox {Gain}(X)/\hbox {SplitInfo}(X) \end{aligned}$$
(9)

where the \(\hbox {GainRatio}(X)\) represents the quantity of information provided by \(X\) in the training set, and the attribute with the highest \(\hbox {GainRatio}(X)\) is taken as the root of the decision tree. The proposed ABBDT approach is repeated until the stop criterion has met. Finally, the best testing accuracy and filtered e-mails are reported. The pseudocode of the proposed approach is listed as follow.

figure a

4 Experimental results

In the proposed ABBDT approach, maximum number of cycles was taken as 1,000. The percentage of onlooker bees was 50 % of the colony, the employed bees were 50 % of the colony and the number of scout bees was selected to be at most one for each cycle. In ABBDT, the number of onlooker bees is taken equal to the number of employed bees [31]. In this paper, the simulation results are compared with DT, back-propagation network (BPN), and support vector machine (SVM). BPN is the most widely used neural network model, and its network behavior is determined on the basis of input–output learning pairs [32, 33]. SVM is a learning system proposed by Vapnik that uses a hypothesis space of linear function in a high-dimensional feature space [34]. The \(k\)-nearest neighbor algorithm (\(k\)NN) is a method for classifying objects based on closest training examples in an n-dimensional pattern space. When given an unknown tuple the classifier searches the pattern space for the \(k\) training tuples that are closest to the unknown tuple. These \(k\) training tuple are the \(k\) nearest neighbor of the unknown tuple [35]. It is noted that the values of parameters for compared approaches are set as the same values as the proposed ABBDT approach. To filter e-mails for iPhone, there are total 504 e-mails in the dataset. In this dataset, 304 e-mails are normal and others are spam e-mails. In this paper, 173 e-mails (normal and spam e-mails) are randomly selected as testing data and others are training data. For the dataset, there are 95 spam e-mails which are correctly filtered as spam e-mails (true negative) and 69 normal e-mails which are also correctly filtered as normal e-mails (true positive). The testing accuracy of the proposed ABBDT approach for this dataset is 94.8 %. The result is shown in Table 2. From Table 2, the proposed ABBDT approach has the best performance among these compared algorithms.

Table 2 The testing accuracy for iPhone e-mails dataset

Furthermore, another spambase dataset obtained from UCI repository is used to evaluate the performance for the proposed ABBDT approach [36]. There are 4,601 instances with 57 attributes for spambase dataset. The definitions of the attributes are: (1) 48 continuous real \([0,100]\) attributes of type word_freq_word. A “word” means any string of alphanumeric characters bounded by non-alphanumeric characters or end-of-string. (2) 6 continuous real \([0,100]\) attributes of type char_freq_char (3) 1 continuous integer \([1,\ldots ]\) attribute of type capital_run_length_longest. (4) 1 continuous integer \([1,\ldots ]\) attribute of type capital_run_length_total. (5) 1 nominal \(\{0,1\}\) class attribute of type spam [24]. In spambase dataset, 1,000 e-mails are randomly selected as testing data and others are training data. The testing accuracy of spambase dataset for the proposed ABBDT approach is 93.7 % as shown in Table 3.

Table 3 The testing accuracy for spambase dataset

Clearly, the proposed approach outperforms other existing algorithms. It is easy to see that the proposed ABBDT approach outperforms DT, SVM, \(k\)NN and BPN, individually.

5 Conclusions and future work

In this paper, artificial bee-based decision tree (ABBDT) approach is applied to filter spam e-mails for iPhone. The dataset of iPhone is divided into 12 attributes and there are total 504 e-mails in this dataset. For spambase dataset, there are 4,601 instances with 57 attributes. A comparison of the obtained results with those of other approaches demonstrates that the proposed ABBDT approach improves the testing accuracy for both datasets. The proposed ABBDT approach was applied to effectively find better values of parameters. Thereafter, it ameliorates the overall outcomes of testing accuracy. From simulation results, the testing accuracy is 94.8 % for iPhone dataset as shown in Table 2. In Table 3, the testing accuracy is 93.7 % for spambase dataset. It indeed shows that the proposed ABBDT approach outperforms other approaches. In the future work, it could add more attributes and apply the proposed approach to build an APP for iPhone.