1 Introduction

New and effective ideas are often inspired by natural processing. Human beings often observe the nature to create new technologies. The created technologies may be different from the inspired natural phenomena. Natural computing is the field of science that develops new software, hardware or wetware tools inspired by nature for problem-solving purposes. Natural computing can also be referred to as bio-inspired computing or biologically motivated computing or computing with biological metaphors [15].

The nature-inspired computational algorithms and systems are the most popular and oldest natural computing approaches. There are two main objectives in this field: the modeling of natural phenomena and its simulation in computers is the first objective. The main goal in this direction is to develop models which can be implemented into computer systems which are able to quantitatively and qualitatively reproduce the functions of natural mechanism. Solving complex problems through algorithms and developing computing systems inspired by natural phenomena is the second objective. The stimulus for this innovation is finding techniques to solve the problems that could not be addressed properly in other traditional methods. Among many approaches within computing inspired by nature, the most well-known ones are the evolutionary algorithms, artificial neural networks, swarm intelligence and artificial immune systems.

This paper introduces a novel optimization method and its application to cluster analysis that is discovered through simulation of a simplified action of the heart organ in human beings body.

Clustering is the act of dividing a set of objects into a finite number of groups in such a way similarity of objects inside a group is maximal, while similarity of objects in different groups is minimal [6, 7]. There are many clustering algorithms available, but no uniformly adequate for all applications, data types and conditions [818]. Clustering algorithms have been used in a large variety of fields and applications ranging from biology, marketing and customer analysis, bankruptcy prediction, e-learning systems, intrusion detection, document clustering, outlier detection, medicine, image processing, etc., [1924].

The remainder of this study is organized as follows. Cluster analysis is discussed in Sect. 2. A simple explanation of the heart and circulation system is given in Sect. 3. Section 4 presents the heart optimization algorithm and its application to cluster analysis. Section 5 discusses the simulation results using four benchmark datasets. Finally, concluding remarks are made in Sect. 6.

2 Data clustering

Cluster analysis is the procedure of dividing a given set of \(n\) objects \({O} = \{ {O}_{1}, {O}_{2},\dots , {O}_{n} \}\), each of them explained by \(d\) attributes into a finite number of \(k\) partitions, also called clusters \({C} = \{ {C}_{1}, {C}_{2},\dots , {C}_{k} \}\), so that the objects in the same clusters are similar and objects in different clusters are dissimilar. To find cluster centers, the problem can be defined as an optimization problem. A famous objective function for measuring goodness of a clustering solution is the total mean-square quantization error [6]:

$$\begin{aligned} {f }({O},{C})= \sum \limits _{l=1}^k {\sum \limits _{O_i \in C_l } {d(O_i ,Z_l )} ^2} \end{aligned}$$
(1)

where \(d(O_{i}, Z_{l})\) specifies the dissimilarity between object \(O_{i}\) and cluster center \(Z_{l}\).

There are different functions for calculating the distance between objects in clustering problem. One of the popular and widely used distance functions is the Euclidean distance, where the distance between any two objects \(O_{i}\) and \(O_{j}\) is defined as follows [25]:

$$\begin{aligned} d(O_i ,O_j )=\sqrt{\sum \limits _{p=1}^d {(o{_i}{^p} -o{_j}{^p} )}{^2}} \end{aligned}$$
(2)

In this paper, the dissimilarity between the data objects is calculated by the Euclidean distance.

3 The heart and circulation system

The primary function of the human circulation system is to pump blood and traverse oxygen from the lungs to all the parts of the body. The circulation system constitutes three components namely heart, blood vessels and blood with specific functions. The heart acts as the pumping station that pumps the blood to the whole body. The blood vessels comprise of arteries, arterioles, capillaries, venules and veins they act as the channels for the transportation of blood. The main function of the blood is to transmit oxygen from the lungs to remaining organs of the body and carry carbon dioxide to be removed from the body. The circulation system also has two loops namely pulmonary loop and systemic loop. The pulmonary loops are the blood vessels that travel to and fro between the heart and lungs whereas the systemic loops are the blood vessels that connect the heart and body to and fro. The pulmonary loop gets rid of the carbon dioxide and oxidizes the blood. After the blood is oxygenated it is taken back to the heart to be pumped around the systemic loop whereby the oxygen is supplied to the whole body [2628].

The anatomy of the heart reveals four chambers namely right atria, left atria, right ventricle and left ventricle (Fig. 1). The atria act as inlet to receive blood and the ventricles act as outlet to eject blood from the heart. The blood gets ejected from the right ventricle to the lungs through the pulmonary artery. After oxidation, the blood moves into the left atria of the heart by the pulmonary vein. From the left atria the blood moves into the left ventricle and is ejected into systemic loop. The systemic loop begins with the aorta which carries it throughout the body with the help of small arteries called as artilleries. Oxygenated blood is carried by these artilleries to the capillaries where oxygen is deposited and carbon dioxide is collected. The capillaries in turn will move the blood to the major veins through venules which are smaller veins. The veins that lead back to heart are called as vena cava which brings back the deoxygenated blood to the heart through the right atria. The cycle is repeated when the blood moves from the right atria to the right ventricle.

Fig. 1
figure 1

The anatomy of the heart and associated vessels

In summary, the atria receive blood returning to the heart from the body and ventricles eject blood from the heart to the body. Arteries carry blood from the heart and veins bring blood back to the heart. Pulmonary loop connects heart and lungs whereas the systemic loop connects to the body.

4 Heart optimization algorithm

Most of the natural computing approaches are based on related simplified mechanisms or processes found in natural phenomena. There are various reasons for this simplification. The first and foremost reason is to allow computations with large number of tractable entities. At the same time it is beneficial to emphasize the minimum features needed to allow some aspects of a system to be reproduced and monitor some evolving properties.

Based on the above description, our proposed approach is a simplified and abstract version of the heart and blood circulatory system. This algorithm starts with some initial candidate solutions over the search space. The best candidate solution is selected to be the heart and all the other candidates form the blood molecules.

After creating the heart and blood molecules, the heart starts expansion and attracts the blood molecules and all the blood molecules start moving towards the heart. The movement of blood molecules toward the heart is formulated as follows:

$$\begin{aligned} x_{i}^{\mathrm{new}} \!=\! x_{i}^{\mathrm{current}} \,+\,\, \mathrm{rand}\times \left( x_{\mathrm{heart}}- x_{i}^{\mathrm{current}}\right) \quad i\!=\!1,2,\dots ,N \end{aligned}$$
(3)

where \(x_{i}^{\mathrm{current}}\) and \(x_{i}^{\mathrm{new}}\) are the current and new positions of the ith molecule in the search space, respectively, while \(x_{\mathrm{heart}}\) is the position of the current heart in the search space. rand is a random number between 0 and 1. \(N\) is the total number of blood molecules.

The fitness value of each molecule is again calculated by the objective function. If there is a molecule which has a better fitness value than the heart, they exchange their positions and that molecule becomes the new heart and the old heart changes to a normal molecule.

During the moving of the molecules toward the heart, there is possibility of entering molecules into the heart. In such a case, the contraction of the heart is happened and heart pumps these molecules to outside (search space). This action causes that the candidate solutions search the problem space efficiently. Without this action, some candidate solutions will reach the heart position or very near to it and will stay in the same place in the search space without any movement and activity. But the pump action enforces blood molecules (candidate solutions) to go away from the heart and start new search and consequently increases the chance of finding global optima. The pump action of the heart is formulated as follows:

$$\begin{aligned} x_{i}^{\mathrm{new}} = x_{\mathrm{heart}} \,+\,\mathrm{rand}\,\times \,p\,\times \,x_{\mathrm{heart}} \end{aligned}$$
(4)

where \(x_{i}^{\mathrm{new}}\) is the new position of the trapped blood molecule (that blood molecule which has entered into the heart or has reached the same position as the heart in the search space). \(x_{\mathrm{heart}}\) is the position of the current heart in the search space. rand is a random number in the interval [0,1]. \(p\) is a decreasing function of time, which controls the power of the heart. It is initialized to 1 at the beginning of the algorithm and is linearly decreased to 0 as the iterations elapse. By doing this, the heart spread the blood molecules (candidate solutions) over a much wider area in the search space at the early iterations, which assures the global convergence or the exploration of the algorithm and to prevent to stick on a local solution. At the later iterations of the algorithm, when the value of \(p\) is decreased, the heart spread the blood molecules (candidate solutions) in a small area of search space near the heart and increases the exploitation capability of the algorithm.

These expansion and contraction actions of the heart are carried out repeatedly and cause blood molecules (candidate solutions) to move in the problem space toward the heart and vice versa until the termination criterion has been met. Here, a maximum number of iterations are used as the termination criterion. The movement of candidate solutions toward the heart will hopefully cause all the candidate solutions to converge to a (near) optimal solution.

To apply heart optimization algorithm for data clustering a 1-dimension array is used to represent a candidate solution to the clustering problem. Each candidate solution is regarded as a set of \(k\) initial cluster centers where each cell in array is a cluster center dimension. Figure 2 shows an example of a candidate solution for a problem with three clusters, where each data object has four attributes.

Fig. 2
figure 2

Example of a candidate solution

In our proposed algorithm the following steps (Fig. 3) should be executed and repeated based on the description above.

Fig. 3
figure 3

The main steps of the proposed Heart algorithm

5 Experimental results

We have compared the performance of the proposed heart algorithm on data clustering with four other approaches including k-means [6], particle swarm optimization (PSO) [12], gravitational search algorithm (GSA) [14] and big bang-big crunch algorithm (BB-BC) [13]. Six popular benchmark datasets have been used to assess the performance of clustering algorithms. The datasets are Iris, Wine, Contraceptive Method Choice (CMC), Wisconsin breast cancer, Glass and vowel datasets taken from Machine Learning Laboratory which have the following characteristics [29]:

Iris (\(n=150, d=4, k=3\)): The dataset consists of three categories that consign a type of iris plant and each category contains 50 objects. One of the categories is linearly separable form the other two which are not linearly separable. The iris dataset consists of 150 instances with four numeric features without any missing attribute value. The attributes in the iris dataset are sepal length in cm, sepal width in cm, petal length in cm and petal width in cm.

Wine (\(n=178, d=13, k=3\)): This dataset contains chemical analysis results of wines grown by tree different agriculturists from the same region in Italy. The analysis determines the quantities of 13 constituents form each type of the wine. The wine dataset consists of 178 instances and 13 continuous-numeric attributes without any missing attribute values. Each category has 59, 71 and 48 objects, respectively.

CMC (\(n = 1473, d = 9, k = 3\)) also referred as Contraceptive Method Choice: This is a subset of the Indonesian national contraceptive prevalence survey of the year 1987. The respondents of the survey were married pregnant women or those who did not know their pregnancy status at the time of interview. The survey was conducted to predict the choice of contraceptive methods among women based on their demographic and socioeconomic attributes. 334 objects use long-term methods 510 objects used short-term method whereas 629 objects used none of the contraceptive methods.

Wisconsin breast cancer (\(n = 683, d = 9, k = 2\)): This dataset consists of 683 objects with 9 distinct features such as thickness of clump, uniformity of cell size, uniform cell shape, minor adhesion, solitary epithelial cell size, bare nuclei, weak chromatin, normal nucleoli, and mitoses. The dataset constitutes two categories such as malignant with 444 objects and benign with 239 objects.

Glass (\(n=214, d=9, k=6\)): This dataset contains samples from six different types of glass: tableware, building windows float processed, vehicle windows float processed, building windows non-float processed, containers, and headlamps. There are 214 instances with 9 numeric attributes in glass dataset.

Vowel (\(n=871, d=3, k=6\)): This dataset consists of 871 Indian Telugu vowel sounds. It has six overlapping classes and three attributes for each instance corresponding to the first, second, and third vowel frequencies.

Table 1 shows the quality of the solutions found by clustering algorithms on above mentioned datasets. The sum of the intra-cluster distance is used to measure the quality of the resulting clusters (Eq. 1). Clearly, the small value for the intra-cluster distance show the high quality clusters and vice versa. The results are given in terms of the best, average and worst values of the intra-cluster distance after 50 independent runs for each of the six datasets. Moreover, the standard deviation of solutions (STD) for each algorithm is given to evaluate the reliability and stability of algorithms. A low standard deviation indicates that the respective algorithm is more reliable and stable to find optimal solution.

Table 1 Experimental results of clustering algorithms

From Table 1, we can see that the Heart algorithm has achieved the best performance in terms of the average, best, and worst inter-cluster distances on all the test datasets. Only on the Iris dataset, the result of two algorithms including GSA and BB-BC is comparable with the proposed Heart algorithm. For the Wine dataset, the best, average and worst intra-cluster distances obtained by Heart are 16293.48995, 16294.74339 and 16296.31318, greatly better than other compared methods. On the Glass dataset, the Heart algorithm provides the optimum value of 210.52375, while the best solution found by other algorithms is for the K-means algorithm which is 215.67753. For the other three data sets, as seen from the results, the Heart algorithm performs far superior to other methods. The most interesting thing about the results is that in four of the test datasets, even the worst solution of the Heart algorithm is better than the best solutions found by the other algorithms.

In general, the simulation results indicate that the proposed Heart algorithm is a robust and precise approach for data clustering. This algorithm provides the optimum value and small standard deviation as opposed to other methods.

Tables 2, 3, 4, 5, 6 and 7 show the best centroids obtained by the Heart algorithm on the test datasets. These centroids are present to support and validate the results that are presented in Table 1. One can easily reach to the best value found by the Heart algorithm in Table 1 by assigning the data objects in each dataset to the corresponding centroids in this section. For example, by considering the Iris dataset, after assigning each of the 150 data objects within this dataset to the nearest centroid among three centroids that calculated and reported in Table 2 for this dataset by the Heart algorithm, the best solution found by the Heart algorithm in the Iris dataset which is reported in Table 1 should be reached (96.65667). Otherwise there is an inconsistency in the results and either the solutions in Table 1 or best centroids in Table 2 or both of them are not true.

Table 2 Best centroids obtained by the Heart algorithm on the Iris dataset
Table 3 Best centroids obtained by the Heart algorithm on the Wine dataset
Table 4 Best centroids obtained by the Heart algorithm on the Cancer dataset
Table 5 Best centroids obtained by the Heart algorithm on the CMC dataset
Table 6 Best centroids obtained by the Heart algorithm on the Glass dataset
Table 7 Best centroids obtained by the Heart algorithm on the Vowel dataset

6 Conclusion

Modeling the behavior of natural phenomena for the purpose of search and problem solving is an active research area. In this paper, a new optimization algorithm is developed to solve clustering problems which is inspired by the action of the heart and circulatory system. To evaluate the performance of the proposed algorithm, it is compared with other popular algorithms using four benchmark datasets. The experimental results indicate the high potential and efficiency of the Heart algorithm. This research leads to various dimensions of research directions which could be combined with local search strategy or fused with other Meta heuristic algorithms. Application of this proposed algorithm to solve other optimization problems can be considered as possible future research.