1 Introduction

In many engineering applications one is faced with the problem that several objectives have to be optimized concurrently resulting in a multi-objective optimization problem (MOP). For the treatment of MOPs stochastic search algorithms such as multi-objective evolutionary algorithms (MOEAs) have caught the interest of many researchers (see, e.g., [2, 4, 5, 19, 27] and references therein). Reasons for this include that MOEAs are applicable to a wide range of problems, are of global nature and hence, in principle, not depending on the initial candidate set (i.e., the initial population). Further, due to their set based approach they allow to compute a finite size representation of the entire Pareto set in one single run of the algorithm.

In this paper, we extend the analysis started in ([23], Sect. 4) about the behavior of multi-objective stochastic local search (MOSLS) where it has been shown that a pressure both toward and along the Pareto front is already inherent for unconstrained problems. Thus, MOSLS represents a powerful tool within MOEAs as it is crucial for the guidance of the populations’ individuals.

Following this study, and based on the Karush-Kuhn Tucker (KKT) equations for optimality, we identify in the present paper the subspaces that allow a movement along the Pareto front for points x near to the solution set for constrained MOPs. We show that a neighborhood sampling based on these subspaces provides enough information to perform a promising search movement. By doing so, this sampling shares some characteristics of multi-objective continuation methods (e.g, [9, 14, 15, 18]). Such methods are certainly much more efficient for the detection of further KKT points near a given solution; in contrast, our sampling methods do not require explicit Jacobian or Hessian information, for the objective functions, and are thus better suited for the use within evolutionary strategies.

Next, we consider two direct applications of this analysis. First, the experimental study of a particular population based algorithm that is merely using the proposed neighborhood search and which omits any ‘swarm-like’ behavior. Numerical results on unconstrained and constrained MOPs indicate that this algorithm, the simple neighborhood search (SNS), already yields satisfying results on many test problems with different characteristics. A preliminary study of this work can be found in [23] which is restricted to unconstrained problems.

Second, the subspace mutation operator (SPM) is proposed as an alternative variation operator for MOEAs. As a related work we can mention that some techniques to perform specialized mutation movements are proposed in the literature. For example, mutation has been guided for “small-moves heuristics” [24], differences on the objective values (in a way of a gradient-based search) [13], or using quasi random sampling to modify the mutation for CMA-ES—and later extended to other heuristics [25]. All these proposals deal with unconstrained single objective problems. For constrained cases most of the work can be found about feasibility control and penalizing functions [16] for single-objective problems. For the multi-objective case, the constraints management is left to the selection process in the evolutionary algorithm. In contrast, the SPM proposed here takes advantage of the studied neighborhood sampling to perform promising mutations when dealing with constrained MOPs. SPM is presented and tested in order to explore the potential of this proposal, and to open the discussion for future research aspects on this subject. We conjecture that these new insights will help in the design of powerful sampling techniques for the design of efficient variation operators on multi-objective memetic algorithms.

The remainder of this work is organized as follows: in Sect. 2, we briefly state the background required for the understanding of this paper and the analysis of the behavior of MOSLS for unconstrained MOPs. Based on this we identify in Sect. 3 the subspaces in which a movement along the Pareto front can be achieved for points x near the solution set of (inequality and equality) constrained problems. In Sect. 4, we propose SNS, that is merely using the proposed neighborhood sampling, and show some numerical results and comparisons on some benchmark problems; unconstrained and constrained. A second application of these insights is presented in Sect. 5; based on the subspace computation, the mutation operator SPM is proposed and its effectiveness is tested and analized to show the potential of this approach on the building of specialized operators. Finally, we conclude in Sect. 6 and give possible paths for future research.

2 Background

In the following we consider continuous multi-objective optimization problems (MOPs) that can be expressed as

$$\begin{aligned} \min _{x\in Q} F(x), \end{aligned}$$
(1)

where \(Q\subset \mathbb {R}^n\) is the domain and \(F:Q\rightarrow \mathbb {R}^k\) is defined as the vector of the objective functions

$$\begin{aligned} F(x) = (f_1(x),\ldots ,f_k(x))^T, \end{aligned}$$
(2)

and where each objective \(f_i:Q\rightarrow \mathbb {R}\) is assumed for simplicity to be continuously differentiable. We stress, however, that the sampling algorithms proposed in this paper do not use any gradient information. The domain Q can be expressed as

$$\begin{aligned} Q:= & {} \{ x\in \mathbb {R}^n\; : \; g_i(x)\nonumber \\\le & {} 0,\; i=1,\ldots , m,\; h_j(x)=0,\; j=1,\ldots ,p\}, \end{aligned}$$
(3)

where the \(g_i\)’s denote the inequality constraints and the \(h_j\)’s the equality constraints.

The optimality of a MOP is defined by the concept of dominance.

Definition 1

 

  1. (a)

    Let \(v,w\in \mathbb {R}^k\). Then the vector v is less than w (\(v<_pw\)), if \(v_i< w_i\) for all \(i\in \{1,\ldots ,k\}\). The relation \(\le _p\) is defined analogously.

  2. (b)

    A vector \(y\in Q\) is dominated by a vector \(x\in Q\) (\(x\prec y\)) with respect to (1) if \(F(x)\le _p F(y)\) and \(F(x)\ne F(y)\), else y is called non-dominated by x.

  3. (c)

    A point \(x\in Q\) is called (Pareto) optimal or a Pareto point if there is no \(y\in Q\) which dominates x.

  4. (d)

    The set \(P_Q\) of all Pareto optimal solutions is called the Pareto set and its image \(F(P_Q)\) the Pareto front.

It is known that if x is a solution of (1), then there exist vectors \(\alpha \in \mathbb {R}^k\), \(\lambda \in \mathbb {R}^m\), and \(\mu \in \mathbb {R}^p\) such that \((x,\alpha ,\lambda ,\mu )\) satisfies

$$\begin{aligned} \begin{aligned}&\sum _{i=1}^k \alpha _i \nabla f_i(x) + \sum _{i=1}^m \lambda _i\nabla g_i(x) + \sum _{i=1}^p \mu _i\nabla h_i(x) = 0\\&\quad \alpha _i \ge 0,\quad i=1,\ldots ,k\\&\quad \sum _{i=1}^k \alpha _i = 1\\&\quad \lambda _i\ge 0,\quad i=1,\ldots ,m\\&\quad \lambda _i g_i(x) = 0,\quad i=1,\ldots ,m. \end{aligned} \end{aligned}$$
(4)

Points x for those such vectors \(\alpha \), \(\lambda \), and \(\mu \) exist such that \((x,\alpha ,\lambda ,\mu )\) satisfies (4) are called Karush-Kuhn-Tucker (KKT) points, after the works of Karush [10] and Kuhn and Tucker [11]. If \(\alpha _i\ne 0\), \(i=1,\ldots ,k\), the vector \(\alpha \) has a geometrical meaning which will be crucial for our considerations, namely that it is orthogonal to the linearized Pareto front at F(x) [9].

Note that for the unconstrained case (i.e., \(Q=\mathbb {R}^n\)), the first equation in (4) reads as

$$\begin{aligned} J^T \alpha = 0, \end{aligned}$$
(5)

where \(J := J(x)\) denotes the Jacobian of F at x, i.e.,

(6)

Next, we state some relations between line search and stochastic local search that will be helpful in the sequel. In line search, a point \(x\in \mathbb {R}^n\) in decision space is given together with a search direction \(\nu \in \mathbb {R}^n\). The new candidate solution \(x_{new}\) is then found via

$$\begin{aligned} x_{new} = x + t\nu , \end{aligned}$$
(7)

where \(t>0\) is a chosen step size. While the movement in decision space, i.e., \(x_{new}-x\), is apparently given by \(t \nu ,\) the according movement in objective space is given by

$$\begin{aligned} d_\nu = Jv \end{aligned}$$
(8)

for infinitesimal step sizes t. To see this, consider that

$$\begin{aligned} \lim _{t\rightarrow 0} \frac{f_i(x+t\nu ) - f_i(x)}{t} = \nabla f_i(x)^T\nu = (J\nu )_i,\quad i=1,\ldots ,k. \end{aligned}$$
(9)

On the other hand, in neighborhood sampling, one chooses from a given point x a point \(y\in N(x)\) from a (small) neighborhood N(x) of x. Note that y can be written as

$$\begin{aligned} y = x + y - x = x + \underbrace{\Vert y-x\Vert }_{=:\tilde{t}}\underbrace{ \frac{y-x}{\Vert y-x\Vert }}_{=:\tilde{\nu }} = x + \tilde{t}\tilde{\nu }. \end{aligned}$$
(10)

That is, y can be formulated as in (7), i.e., as a result of a line search.

2.1 Behavior of MOSLS for unconstrained MOPs

For convenience of the reader, we recall in the following the behavior of MOSLS for unconstrained MOPs taken from [23]. That is, we analyze the “movement” that is described in objective space by the difference vector \(F(y)-F(x)\), where x is a given candidate solution and \(y\in N(x)\) is a result of a neighborhood sampling. In the following, we will make separate considerations for the cases that x is either (1) ‘far away’ or (2) ‘near’ to the solution set, or that x is (3) ‘in between’.

Case 1 (xis ‘far away’ from the Pareto set): At least in early stages of the search one can assume that many individuals of the population are far from being optimal. In [3] it has been observed that for points x that are far away from the Pareto set the objectives’ gradients point in nearly the same direction. For sake of simplicity, we assume for now the extreme case, namely that all gradients point into the same direction. That is, we assume that it holds

$$\begin{aligned} \nabla f_i(x) = \mu _i \nabla f_1(x),\quad i=1,\ldots ,k, \end{aligned}$$
(11)

and where all \(\mu _i\)’s are positive. Then, for a search direction \(\nu \) we obtain for the according movement in objective space

(12)

where \(\mu =(\mu _1,\ldots ,\mu _k)^T\). Note that (12) describes a one-dimensional movement. Since the dimension of the kernel of J is \(k-1\), the probability for a randomly chosen vector \(\nu \) is one that \(J\nu \ne 0\). That is, for a randomly chosen \(\nu \) one obtains a movement in direction \(\mu \), and since \(\mu >_p 0\) MOSLS generates either dominating or dominated solutions. In case a dominated solution is found, one can simply switch the search [i.e., considering the direction \(\tilde{\nu }:=(x-y)/\Vert x-y\Vert _2\) instead of \(\tilde{v}\) in (10)] to obtain a sample that dominates x.

Case 2 (xis ‘near’ to the Pareto set): The other extreme situation is that a candidate solution is already very near to the Pareto set. We consider again the ideal situation, that is, that x is a KKT point. Thus, there exists a vector \(\alpha \in \mathbb {R}^k\) with \(\alpha _i>0\), \(i=1,\ldots ,k\), \(\sum _{i=1}^k\alpha _i = 1\), and \(J^T\alpha = 0\). We obtain for a search vector \(\nu \)

$$\begin{aligned} \langle J\nu ,\alpha \rangle = \langle \nu , J^T\alpha \rangle = 0. \end{aligned}$$
(13)

That is similar to the above discussion, we obtain now a movement orthogonal to \(\alpha \), and thus, along the linearized Pareto front at F(x) as \(\alpha \) is orthogonal to this set. If we assume that the rank of J is \(k-1\), we have again that this movement is performed with probability one for a randomly chosen search direction \(\nu \) in decision space.

Case 3 (xis ‘in between’): Apparently, candidate solutions do not have to be far away nor very close to the solution set, but can be ‘in between’. Consequently, the samples generated by MOSLS will not lead to clear movements either toward the Pareto front (or at least yielding dominated or dominating solutions) or along the Pareto front. Instead, other movements can be observed. In fact, if the rank of J is k (i.e., maximal), then there exists for every direction \(d\in \mathbb {R}^k\) in objective space a (\(n-k\))-dimensional subspace S(d) of the \(\mathbb {R}^n\) such that \(J\nu =d\) for all \(v\in S(d)\).

Example 1

Consider the convex unconstrained bi-objective problem

$$\begin{aligned} f_i(x) = \Vert x-a^i\Vert ^2,\quad i=1,2, \end{aligned}$$
(14)

where \(n=2\), \(a^1=(1,1)^T\), and \(a^2=(-1,-1)^T\). The Pareto set is the line segment that connects \(a^1\) and \(a^2\). Figure 1 shows 100 randomly chosen neighboring samples from points with different distances to the Pareto set. For \(\delta =0.25,\) the neighborhood is defined as

$$\begin{aligned} \mathscr {B}_\delta (x) = \{y\in \mathbb {R}^n\; : \; \Vert y-x\Vert _\infty \le \delta \}. \end{aligned}$$
(15)

The first testing point is \(x^0 = (30,-30)^T\) which is far away from the Pareto set. As the sample points are chosen uniformly at random from \(\mathscr {B}_\delta (x^0)\), no clear movement in decision space can be observed. In objective space, however, a clear movement can be seen toward (and away from) the Pareto front. The next testing point is the KKT point \(x^2=(0,0)^T\). Now, a clear movement along the Pareto front can be observed regardless of the respective movements in decision space. Finally, the last candidate solution is \(x^1=(1,-1)^T\) which is in between. Here, movements in all directions in objective space via MOSLS are possible.

In summary, the above rather simple analysis already reveals that a pressure both toward and along the Pareto front is already inherent in MOSLS for unconstrained MOPs; and that this movement is performed ‘automatically’ depending on the distance of the candidate solution x to the set of optimal solutions. Further, if x is neither far away nor near the solution set, the search can be performed in all possible directions in objective space. The latter gives rise to the possibility of the population (i.e., a set of individuals) to diverge and e.g. to detect multiple connected components of the Pareto front. The results presented on Sects. 4 and 5 will show that the neighborhood sampling based on these insights set new paths for to the design of effective variation operators for MOEAs.

Fig. 1
figure 1

Neighborhood sampling for MOP (14) starting from three different candidate solutions [23] a When x is ‘far away’ from the Pareto set. b When x is ‘near’ to the Pareto set. c When x is ‘in between’

To close this section, we include the description of the performance measure known as \(\varDelta _{p}\)[21] based on the Hausdorff distance between finite sets A and B. It has been proved in [20] that \(\varDelta _p\) indicator prefers evenly spaced points along the Pareto front, specifically for bi-objective problems.

Considering \(N:=|A|,\)\(M:=|B|,\)\(dist(a_{i},B):= \min _{b \in B}||a_{i}- b||,\)

$$\begin{aligned} GD_{p}(A,B): =\left( \frac{1}{N} \sum _{i=1}^{N} dist(a_{i},B)^{p} \right) ^{1/p}, \end{aligned}$$

and

$$\begin{aligned} IGD_{p}(A,B): = \left( \frac{1}{M}\sum _{i=1}^{M} dist(b_{i},A)^{p} \right) ^{1/p}, \end{aligned}$$

the indicator is defined as:

$$\begin{aligned} \varDelta _{p} := \max \{ GD_{p}(A,B) , IGD_{p}(A,B) \}. \end{aligned}$$

For the experiments of this paper the value of \(p=2\) is considered.

3 Neighborhood sampling for constrained MOPs

Here we extend the consideration of the behavior of MOSLS for constrained MOPs. In particular, based on the KKT equations, we identify the subspaces in which the samples have to be chosen such that a movement along the Pareto front is obtained for optimal (and nearly optimal) solutions as this is the most challenging aspect for variation operators within MOEAs. We address first inequality constrained and then equality constrained problems, as they need separate considerations.

3.1 Inequality constrained MOPs

In the following we consider MOPs of the form

$$\begin{aligned} \begin{aligned} \min _{x}&\; F(x)\\ \text{ s.t. }&\; g_i(x)\le 0,\quad i=1,\ldots ,m. \end{aligned} \end{aligned}$$
(16)

Let x be a KKT point of (16) and \(g_{j_1},\ldots , g_{j_l}\), \(1\le l\le m\), be the active inequality constraints at x. Then there exists \(\alpha \in \mathbb {R}^k\) with \(\alpha _i\ge 0\), \(i=1,\ldots ,k\), \(\sum _{i=1}^k \alpha _i=1\), and \(\lambda \in \mathbb {R}^l\) such that

$$\begin{aligned} \sum _{i=1}^k \alpha _i \nabla f_i(x) + \sum _{i=1}^l \lambda _i \nabla g_{j_i}(x) = J^T\alpha + G^T\lambda = 0, \end{aligned}$$
(17)

where

(18)

Now assume we are given a search direction \(\nu \in \mathbb {R}^n\) such that

$$\begin{aligned} \nabla g_{j_i}(x)^T \nu = 0,\quad i=1,\ldots ,l, \end{aligned}$$
(19)

i.e., that \(\nu \) is orthogonal to all gradients of the active constraints at x (note that (19) is equivalent to \(G\nu =0\)).

Then, according to Eq. (8) and the related discussion, a movement along \(\nu \) (in decision space) leads to a movement along the linearized Pareto front at F(x) (in objective space):

$$\begin{aligned} \langle J\nu , \alpha \rangle = \langle \nu , J^T\alpha \rangle = - \langle \nu ,G^T\lambda \rangle =- \langle G\nu ,\lambda \rangle = 0. \end{aligned}$$
(20)

That is, \(J\nu \) is orthogonal to \(\alpha \) which is orthogonal to the linearized Pareto front at F(x).

It remains to find such kernel vectors of G. For this, consider first linear inequalities that are of the form

$$\begin{aligned} g_i(x) = a_i^Tx - b_i, \end{aligned}$$
(21)

where \(a_i\in \mathbb {R}^n\) and \(b_i\in \mathbb {R}\). The gradient of \(g_i\) is given by

$$\begin{aligned} \nabla g_i(x) = a_i. \end{aligned}$$
(22)

To obtain vectors \(\nu \) that are orthogonal to all \(a_i\)’s we can proceed as follows: define

$$\begin{aligned} A := (a_{j_1},\ldots ,a_{j_l})\in \mathbb {R}^{n\times l} \end{aligned}$$
(23)

and compute a QR factorization [17] of A, i.e.,

$$\begin{aligned} A = QR = (q_1,\ldots ,q_l,q_{l+1},\ldots ,q_n)R, \end{aligned}$$
(24)

where \(Q\in \mathbb {R}^{n\times n}\) is an orthogonal matrix and R is a (generalized) right upper triangular matrix. By construction, it is \(a_{j_i}\in span\{q_1,\ldots ,q_l\}\), \(i=1,\ldots , l;\) and thus, \(\{q_{l+1},\ldots ,q_n\}\) builds an orthonormal basis (ONB) of the desired kernel of G.

A random direction \(\nu \in ker(G)\) can now be computed as follows:

  1. 1.

    \(Q_2:= (q_{l+1},\ldots ,q_n)\in \mathbb {R}^{n\times (n-l)}.\)

  2. 2.

    Choose \(\xi \in \mathbb {R}^{n-l}\) at random.

  3. 3.

    Set \(\nu := Q_2\xi /\Vert Q_2\xi \Vert _2.\)

The resulting neighborhood sampling for linear inequality constraints, that is based on the above discussion, is outlined in Algorithm 1. Hereby, \(t_{max}\) is a maximal step size such that \(x+t_{max}\nu \) fulfils the box constraints (in case they are present). It could change over the iterations.

figure a
Fig. 2
figure 2

Neighborhood search for a MOP with linear inequality constraints (considering \(x^0\) from Example 2)

Fig. 3
figure 3

Neighborhood search for a MOP with box constraints (considering \(x^1\) from Example 2)

We stress that an application of Algorithm 1 does not require any additional functional evaluation. The time complexity of Algorithm 1 is determined by the QR factorization that requires \(O(n^3)\) flops [17]. Further, we stress that the constraints have to be given explicitly as else the operator cannot be built.

For the special case that the domain Q of the MOP is given by box constraints, the sampling can be simplified. In that case, it is

$$\begin{aligned} g_i(x) = \pm x_{j_i} - b_i \le 0, \end{aligned}$$
(25)

where \(b_i\in \mathbb {R}\) and

$$\begin{aligned} \nabla g_i(x) = \pm e_{j_i}, \end{aligned}$$
(26)

where \(e_j\) denotes the j-th canonical vector. Define

$$\begin{aligned} I := \{1,\ldots ,n\} \backslash \{j_1,\ldots ,j_l\}, \end{aligned}$$
(27)

then \(Q_2\) is given by

$$\begin{aligned} Q_2 = (e_{j_{l+1}},\ldots ,e_{j_n}). \end{aligned}$$
(28)

Thus, the neighborhood search can be performed as follows:

  1. 1.

    Choose \(\tilde{y}\) at random from a neighborhood of x.

  2. 2.

    Set y as

    (29)

Example 2

We re-consider MOP (14) but this time we impose the constraint

$$\begin{aligned} g(x)=\frac{1}{3}x_1-x_2+0.1\le 0. \end{aligned}$$
(30)

Figure 2 shows the Pareto set and front of the resulting MOP as well as a neighborhood sampling around \(x^0=(-0.07,0.07)^T\). As \(n=2\) and \(l=1,\) the affine subspace \(S= x^0+\mathbb {R} q_2\) coincides around \(x^0\) with the Pareto set. Thus, a sampling around \(x^0\) within S leads to a movement along the Pareto front.

Next, we consider again Example 2 but replace the constraint (30) by

$$\begin{aligned} g(x) = -x_2\le 0. \end{aligned}$$
(31)

The point \(x^1 = (-0.4,0)^T\) is Pareto optimal. Thus, a sampling around \(x^1\) within \(\tilde{S} = x^1 + \mathbb {R}e_1\) leads as for the above example to a movement along the Pareto front (see Fig. 3).

Fig. 4
figure 4

Neighborhood search for a MOP with general inequality constraints (considering \(x^0\) from Example 3)

Finally, we address the case when general inequalities are present. For this, assume for a moment that we are given the search directions \(\nu _1,\ldots ,\nu _r\in \mathbb {R}^n\) as well as the directional derivatives

$$\begin{aligned} \langle \nabla g_{j_i}(x),\nu _s\rangle ,\quad i=1,\dots ,l,\; s=1,\ldots ,r, \end{aligned}$$
(32)

and further, that we are interested in directions \(\nu \) within the subspace \(span\{\nu _1,\ldots ,\nu _r\}\) such that

$$\begin{aligned} 0= & {} \langle \nabla g_{j_i}(x),\nu \rangle = \langle \nabla g_{j_i}(x), \sum _{s=1}^r \xi _s \nu _s \rangle \nonumber \\= & {} \sum _{s=1}^r \xi _s \langle \nabla g_{j_i}(x),\nu _s\rangle ,\quad i=1,\ldots ,l. \end{aligned}$$
(33)

Define

$$\begin{aligned} V := (\nu _1,\dots ,\nu _r)\in \mathbb {R}^{n\times r}, \end{aligned}$$
(34)

and then,

$$\begin{aligned} GV = \left( \langle \nabla g_{j_i}(x),\nu _s\rangle \right) _{\begin{array}{c} {i=1,\dots ,l}\\ {s=1,\ldots ,r} \end{array}} \in \mathbb {R}^{l\times r} \end{aligned}$$
(35)

is composed of the directional derivatives from (32). Here we assume that \(r>l\). Note that (33) is equivalent to

$$\begin{aligned} GV\xi = 0, \end{aligned}$$
(36)

i.e., we are again interested in the kernel, this time of GV. To find an ONB of ker(GV) we can again utilize a QR factorization: compute

$$\begin{aligned} (GV)^T = QR = (q_1,\ldots ,q_l,q_{l+1},\ldots ,q_r)R \end{aligned}$$
(37)

and define

$$\begin{aligned} Q_3 := (q_{l+1},\ldots ,q_r)\in \mathbb {R}^{r\times (r-l)}. \end{aligned}$$
(38)

Then, the column vectors of \(Q_3\) build the desired ONB of the kernel of GV, and we can proceed as above to find randomly chosen kernel vectors.

The above method can be realized via (gradient free) neighborhood sampling as follows: let \(x_1,\ldots , x_r\) be neighboring solutions of x, then define the search directions via

$$\begin{aligned} \nu _i := \frac{x_i - x}{\Vert x_i-x\Vert },\quad i=1,\ldots ,r. \end{aligned}$$
(39)

Further, the matrix GV can be approximated via following estimations of the directional derivatives:

$$\begin{aligned} \langle \nabla g_{j_i}(x),\nu _s\rangle \approx \frac{g_{j_i}(x_s) - g_{j_i}(x)}{\Vert x_s-x\Vert } ,\quad i=1,\dots ,l,\; s=1,\ldots ,r. \end{aligned}$$
(40)

The error in (40) is of the order \(O(\Vert x_s-x\Vert )\) which can be seen by considering the function \(g_{i,\nu }:\mathbb {R}\rightarrow \mathbb {R}\), \(g_{i,\nu }(t) := g_i(x+t\nu )\).

Algorithm 2 puts the above discussion together and defines a neighborhood sampling for MOPs with general inequality constraints. The sampling is in principle more expensive than NS-LI as several samples have to be considered before a predictor y can be generated such that \(F(y)-F(x)\) points along the linearized Pareto front (though certainly existing information such as the neighboring individuals from the current population can be used). Further, the search is restricted to the subspace \(span\{\nu _1,\ldots ,\nu _r\}\). In our computations, we have fixed the value of r to five and have omitted the exploitation of existing information. More sophisticated strategies are conceivable and subject of ongoing work.

figure b

Example 3

We re-consider MOP (14), and impose the following nonlinear constraint

$$\begin{aligned} g(x)=-x_1^2+x_2+1\le 0. \end{aligned}$$
(41)

Figures 4, 5 and 6 show the Pareto set, and front, of the resulting MOP, as well as a neighborhood sampling around three different points \(x^0=(-0.6995,-0.7028)^T\), \(x^1=(0.20534, -0.95659)^T\), and \(x^2=(-0.90317, -0.18197)^T\) . As \(n=2\) and \(l=1,\) the affine subspace \(S= x^i+\mathbb {R} q_2\) coincides around \(x^i\) with the Pareto set in the three cases. Thus, a sampling around \(x^i\) within S leads to a movement along the Pareto front when there are general inequality constraints affecting the problem.

Fig. 5
figure 5

Neighborhood search for a MOP with general inequality constraints (considering \(x^1\) from Example 3)

Fig. 6
figure 6

Neighborhood search for MOP with general inequality constraints (considering \(x^2\) from Example 3)

3.2 Equality constrained MOPs

Now we consider problems of the form

$$\begin{aligned} \begin{aligned} \min _{x}&\; F(x)\\ \text{ s.t. }&\; h_i(x)= 0,\quad i=1,\ldots ,p. \end{aligned} \end{aligned}$$
(42)
Fig. 7
figure 7

Neighborhood sampling for the penalized MOP described in Example 4, using \(C = 10\)

Fig. 8
figure 8

Neighborhood sampling for the penalized MOP described in Example 4, using \(C = 100\)

One possibility for the treatment of equality constraints is to utilize again the KKT equations: if x is a KKT point then there exist \(\alpha \in \mathbb {R}^k\) with \(\alpha _i\ge 0\), \(i=1,\ldots ,k\), \(\sum _{i=1}^k \alpha _i=1\), and \(\mu \in \mathbb {R}^p\) such that

$$\begin{aligned} \sum _{i=1}^k \alpha _i \nabla f_i(x) + \sum _{i=1}^p \mu _i \nabla h_i(x) = J^T\alpha + H^T\mu = 0, \end{aligned}$$
(43)

where

(44)

Assume for a search direction \(\nu \in \mathbb {R}^n\) it holds

$$\begin{aligned} \nabla h_i(x)^T \nu = 0,\quad i=1,\ldots ,p, \end{aligned}$$
(45)

then we obtain

$$\begin{aligned} \langle J\nu , \alpha \rangle = \langle \nu , J^T\alpha \rangle = - \langle \nu ,H^T\lambda \rangle =- \langle H\nu ,\lambda \rangle = 0. \end{aligned}$$
(46)

That is, we can proceed analogously to the above discussion for linear and nonlinear equalities. On the other hand, equality constraints are typically not included as such in MOEAs: as equality constraints reduce the dimension of the domain, the probability is zero for a sampling operator to detect a feasible solution. As an alternative, for instance penalization strategies are considered that transform the constrained problem into an unconstrained one. That is, for equality constrained problems each objective \(f_i\) is replaced by

$$\begin{aligned} \tilde{f}_i (x) = f_i(x) + C\sum _{i=1}^m h_i(x)^2,\quad i=1,\ldots ,k, \end{aligned}$$
(47)

where \(C>0\) is a given constant. Doing so, the sampling is in principle done as for the unconstrained case. One difference, however, is the choice of C as this has a certain impact on the movements which is discussed in the following.

Example 4

We consider again MOP (14) with the additional equality constraint

$$\begin{aligned} h(x) = x_2 = 0 \end{aligned}$$
(48)

so that the Pareto set is the line segment connecting the points \((-1,0)^T\) and \((1,0)^T\). Figures 7 and 8 show a neighborhood sampling for the penalized problem \(\tilde{F} = (\tilde{f}_1,\ldots ,\tilde{f}_k)^T\) around the optimal point \(x^0 = (-0.5,0)^T\) for the values \(C_1 = 10\) and \(C_2 = 100\), respectively. It can be observed that the penalty coefficient affects the effect of the neighborhood search in the objective space. The way the penalty coefficient grows, more points will be away from the Pareto front.

4 First application: simple neighborhood search (SNS)

In order to investigate the behavior of the neighborhood sampling we introduce here the simplest population based algorithm that uses these insights. The simple neighborhood search (SNS), is merely using the neighborhood sampling described above and omits any ‘swarm-like’ strategie (e.g., crossover or parent selection) for the construction of the search direction (i.e., a new individual). An unconstrained version of this approach was already introduced in [23]; there, an experimental testing was made over simple unconstrained problems and random sampling. In this section, we include a state-of-the-art MOEA for the comparisons; we also extend here the proposal for constrained MOPs. The pseudocode of SNS is presented in Algorithm 3. First, an initial population \(A_0\subset \mathbb {R}^n\) is chosen at random. Then, each iteration i consists of computing a sample \(b = b(a),\) for every \(a \in A_i,\) based on two cases: (i) if a certain constraint is active for a,  the sampling is made according the subspace using Algorithm 1 and Algorithm 2. On the contrary, (ii) a regular sampling is performed just on a regular way. Finally, the new archive \(A_{i+1}\) is selected by the archiver ArchiveUpdateTight2,  described in [22], considering the previous archive \(A_i\) and the union of the newly created samples b(a),  for every \(a \in A_{i}.\) For the reader convenience we include it here as Algorithm 4.

figure c
figure d

4.1 Numerical results

Table 1 Numerical results for applying SNS, GS and NSGA-II on unconstrained MOPs, fixing a budget of 1000 function calls. This results are averaged over 30 independent runs (D = disconected, MM = multimodal)
Table 2 Numerical results when applying SNS, SNS-U, GS and NSGA-II on constrained MOPs, fixing a budget of 1000 function calls. Results are averaged over 30 independent runs (LC = linear constraints, NLC = non-linear constraints)

This section is dedicated to empirically observe the impact to the above described neighborhood sampling used by SNS. As testing MOPs we have chosen 12 constrained and 14 unconstrained problems [4, 8, 28]. Those benchmarks are widely used and present different features such as: different values of n and k, different shapes of the Pareto front, different modalities, and different nature of the constraints. In order to compare our results, we have chosen (i) GS (for global sampling) that selects candidate solutions out of the given domain at random (which is thus the counterpart of SNS) and (ii) NSGA-II [7] which is a state-of-the-art MOEA. Further, for the constrained problems we use (iii) SNS-U which is the naive neighborhood sampling—that is, every sample b(a) from a is chosen uniformly at random from \(\mathscr {B}_{t_{max}}(a)\) [see (15)]. This is done in order to see the direct impact of the proposed neighborhood sampling. In all cases we allow a relatively low budget of 1000 function evaluations in order to measure the ability of the method to quickly identify the region around the Pareto front. As performance measure we consider the performance indicator \(\varDelta _2\) which aims, roughly speaking, for evenly spread solutions along the Pareto front [21]. This indicator also complies with the terms spread and convergence frequently used in the evolutionary multi-objective optimization (EMO) community.

The numerical results for the tested unconstrained MOPs are shown in Table 1. SNS obtains the best \(\varDelta _2\) values in 13 out of 14 cases, where the difference in the values of the performance indicator is significant in almost all of these cases. Only for the ZDT4 problem SNS gets outperformed by GS, as it was anticipated, since this problem is highly multi-modal. For such problems, global strategies are generally most promising; since local search is likely to get stuck in local minima. SNS also significantly outperforms NSGA-II on many problems, however, we stress that this work does not intent to show superiority of this method over NSGA-II, since the latter has not been designed to obtain good results for a relatively low number of function calls. It is worth to remind also, that SNS do not use any swarm-like strategy like the most common evolutionary operators; it just uses a neighborhood sampling based on the subspace induced by the constraints.

The numerical results for the tested constrained MOPs are shown in Table 2. The superiority of SNS is not that clear-cut any more as the problems are more complex leading, similar to multi-modal MOPs, to larger problems for pure local search. However, SNS still outperforms the other in 8 out of the 12 examples. Further, the comparison of SNS with SNS-U shows that the proposed neighborhood sampling has definitely a positive effect on the performance of the algorithm. The used parameters are shown on Table 3.

5 Second application: subspace polynomial mutation

The success of variation operators in evolutionary algorithms could be assessed by the immediate survival chances of their new generated individuals. Though individual feasibility determines directly the survival chances, regular variation operators in MOEAs do not consider any information from the MOP constraints. In this section, we introduce the subspace polynomial mutation (SPM) operator; it arises from the above discussed neighborhood sampling and is particularly designed to deal with constrained MOPs. Given a particular population individual, SPM stochastically generates a new individual considering a suitable subspace which promotes feasibility. This is important since the survival rate of the mutated individual is in this way increased, i.e., represents a successful mutation. Also, the exploitation of this particular subspace is very convenient for the generation of candidate points probably lying along the Pareto set of the constrained MOP.

A popular mutation scheme for MOEAs is the polynomial mutation (PM) [6]. The main part of PM is described by

$$\begin{aligned} p' = {\left\{ \begin{array}{ll} p+\bar{\delta }_{L}(p-x_i^{(L)}) &{} \quad \text { for } u \le 0.5,\\ p+\bar{\delta }_{R}(x_i^{(U)}-p) &{} \quad \text { for } u > 0.5,\\ \end{array}\right. } \end{aligned}$$
(49)

where \(u\in [0,1]\) is chosen at random, p is a parent solution, \(p'\) is the mutated solution for the particular variable i,  and \(\bar{\delta }_{L},\bar{\delta }_{R}\) come from a particular probability distribution. The limits for each variable, \(x_i^{(L)}\) and \(x_i^{(U)}\) are also taken into account. SPM adapts PM to effectively deal with constrained MOPs—it is presented in Algorithm 5. Instead of making a perturbation on each canonical coordinate variable, as it is done in (49), SPM performs movements over the basic vectors of a suitable subspace for mutation. First, it starts with a parent solution x to compute the mutation subspace—considering the MOP constraints—as described in Algorithm 6. Then, a perturbation is made along each basic direction of the mutation subspace. It is important to notice that the mutation strength \(t_{i}\) (line 12 of Algorithm 5) is particularly controlled over each direction.

figure e
figure f

The implementation of SPM into a MOEA is described in Algorithms 7 and 8. In order to investigate the advantages of SPM over polynomial mutation when dealing with constrained MOPs, we replaced the mutation module of the algorithms NSGA-II [7] and GDE3 [12]. Seeking to apply SPM to promising solutions, we consider the following definition:

Table 3 Parameters of the algorithm NSGAII

Definition 1

Let P be the current population, \(x \in P\) a candidate solution and \(\epsilon \in \mathbb {R}_{+}.\) We say that \(p\in P\setminus \{x\}\) is a \(\delta \)-close neighbor of x if and only if \(p \in \mathscr {B}_\delta (x).\)

figure g
figure h

Then, the use of SPM is considered for every generation of the MOEA, when the parent population P is under mutation for variation. Nevertheless, its application depends on the fulfillment of the following three conditions for \(x\in P:\)

  1. 1.

    ParRank(x) should be equal to 1. Here ParRank comes from the Pareto ranking of the nondominated sorting procedure of the MOEA.

  2. 2.

    Candidate solution x has at least one active constraint. We consider that \(g_{j_l}\) is an active constraint at x, if \(g_{j_l}(x)<\varepsilon \) where \(\varepsilon \in \mathbb {R}_{{+}}\) is a problem dependent parameter.

  3. 3.

    Candidate solution x has at least r\(\delta \)-close neighbors.

In this way, SPM is applied just when it is detected that can be useful; and its frequency of use is controlled by the conditions listed before. Otherwise, the original mutation of the MOEA is applied (still subject to the original mutation probability \(p_{m}\)). It is worth to emphasize that despite SPM is a free-cost operator—in terms of function evaluations—its application is controlled by the conditions above, in order not to interfere with the population diversity in the evolutionary process.

Table 4 Properties of the selected test problems (k: number of objectives, n: number of decision variables, D = disconnected Pareto front, LC = linear constraints are present, NLC = non-linear constraints are present)
Table 5 Parameters of the algorithms NSGAII and NSGAII+SPM
Table 6 Parameters of the algorithms GDE3 and GDE3+SPM
Table 7 Parameters of MTS algorithm
Table 8 Values obtained for the indicator \(\varDelta _2\) after 10,000 objective function evaluations. A smaller value corresponds to a better quality of the solution approximation. It is emphasized in bold the better value among each proposal and its base MOEA. The indicator for the best approximation among the five algorithms is underlined for each problem. The results are the average over 30 independent runs
Table 9 Values obtained for hypervolume indicator after 10,000 objective function evaluations. A bigger value corresponds to a better quality of the solution approximation. It is emphasized in bold the better value among each proposal and its base MOEA. The indicator for the best approximation among the five algorithms is underlined for each problem. The results are the average over 30 independent runs. The used reference points were \(r=[6,6]\) for CTP1 to CTP6 and Tanaka, and \(r=[-50,100]\) for Osyczka
Fig. 9
figure 9

Partial views from the Pareto front for Osyczka test problem on a certain execution. SPM is able to fills gaps on certain parts of the front

5.1 Numerical results

The performance of the algorithms (i) NSGA-II\(+\)SPM and (ii) GDE3\(+\)SPM are here compared against their corresponding base MOEAs, NSGA-II and GDE, which are provided only with the traditional polynomial mutation. A preliminar study of this proposal was published in [1]; there, several tests on simple functions—like those described in Table 2—were performed. Here, more complicated constrained MOPs are considered, i.e., the CTP suite [5]. The selection of the test problems was made based on the kind of constraints that each of them consider, and their relationship with the constrained Pareto front—more insights about this subject are mentioned next. The features of each test problem are summarized on Table 4. The parameter values used in these experiments, for each algorithm, are declared in Tables 5, 6 and 7. It is worth to notice that the intention of this study is not to show a supremacy of the method over other MOEAs; instead, we want to state that the proposed neigbourhood sampling represents wide open possibilities for the design of specialized operators for MOEAs and multi-objective memetic algorithms. For completeness, a comparison against the algorithm MTSFootnote 1 is also included. It is worth to notice that MTS presented some troubles to manage constrains during the search, maybe due to its selection process, and for this reason it was not suitable to perform a hybridization with SPM. Also, the solutions obtained by MTS were filtered to avoid taking into account dominated solutions, that were still present, when computing the results presented in Table 8 and Table 9.

Fig. 10
figure 10

Partial view from the Pareto front for Tanaka test problem on a certain execution. SPM is able to fills gaps on certain parts of the front

Overall performance: the comparison of the two mutation schemes, for each test problem, is presented in two tables. First, in Table 8 were the performance indicator \(\varDelta _{2}\) (see [21]) is used to measure their effectiveness. This performance indicator is based on the Hausdorff distance of the solution and the approximation set, which makes it a good choice for assessing the solution quality of the population over each generation. The beter result of each comparison is shown as black in the table. According with this indicator, it can be observed that the use of SPM improves the performance of the base algorithm in all the cases (sixteen, considering the two hybridizations) but one. The results for this particular problem CTP5 are discussed below. The best value among the five algorithms is shown underlined in the table; it is worth to notice that it corresponds in all the cases to an algorithm which uses SPM.

Second, in Table 9 the hypervolume indicator [29] is used to assess the results. In this case, the SPM shown an improvement in fourteen out of sixteen cases. Again CTP5 and, for this indicator, also CTP2; we will discuss those cases in the following. It is worth to notice that except for these two cases the best value among the five algorithms corresponds again to an algorithm which uses SPM.

Fig. 11
figure 11

a CTP2 entire Pareto front, b partial view for one Pareto front component, SPM is able to generate the extremes of the component

Fig. 12
figure 12

Constrained Pareto front for problem CTP5. It can be seen from a that SPM gets a refinement for degenerated components when applied

Filling gaps: Figures 9 and 10 illustrate the typical behavior of a MOEA combined with SPM. The SPM operator is useful to fill gaps over the constrained Pareto front of the problem. This is possible since SPM takes advantage of the MOP constraints information in order to generate successful mutations; producing in this way individuals that are unlikely produced just by the PM operator, as it is traditionally defined. In the case of CTP2 problem, the Pareto front consists of several convex disconnected components (Fig. 11a). It can be noticed, from Fig. 11b, that SPM promotes the generation of the extremes of such components. This behavior was observed in all the cases when SPM operator was applied.

Solution refining: the constrained Pareto front of CTP5 is shown in Fig. 12. It is formed by a large continuous component and a set of discrete components—one point each one. Since this is a degenerated case, it is not easy with traditional evolutionary variation operators to produce improvements for solutions located over the discrete components. On the other hand, it can be seen from Fig. 12(a) that SPM certainly helps with this situation when it is applied. It is noticeable that regarding this, it is just in this problem were the overall performance of SPM does not look effective from Tables 8 and 9. This is due to the population archive management that NSGA-II (and most MOEAs) do. In this case, the SPM is more likely applied on the continuous component, were, however, its effect is not significant on this problem. This motivates the development of a suitable archiving strategy to exploit these situations, and do a beneficial interleaving with memetic operators; more effective than the three application criteria proposed above. This is subject to ongoing work.

As a local searcher: An extension of this approach could consists on withdraw the third condition for application, i.e., the existence of the at least r\(\epsilon \)-close neighbors in the population, and stochastichally generate the necessary neighbors for a certain solution. This disrupts the free-cost condition of the SPM operator; but, it would still be a low-cost local searcher (i.e., no more than \(l+1\) function evaluations) specialized for constrained MOPs. In this case, memetic design aspects should be taken into account; constituting then a promising branch for future work.

In this section we introduced the SPM as a specialized mutation operator alternative to the polynomial mutation for constrained MOPs. SPM is a variation operator that is able to produce individuals inside a particular subspace, defined by the MOP active constraints, to explore along the Pareto front of a constrained MOP. This feature increases the survival rate of the mutated solutions. i.e., influences the mutation success. The operator is applied in a dynamic way: it is applied more often when the algorithm detects that it is more promising. An experimental investigation for the use of SPM was performed over eight non-linear constrained test MOPs. The theoretical advantages of this operator were confirmed by the numerical results.

6 Conclusions and future work

In this paper, we extended the investigation on the behavior of the multi-objective stochastic local search (MOSLS). Based on the observation that a pressure both toward and along the Pareto front is already inherent in MOSLS, and the first order conditions for optimality, we have adapted a neighborhood sampling for constrained MOPs such that a movement along the Pareto front is obtained for candidate solutions that are near to the solution set. Finally, we have shown the impact of MOSLS and the new neigbourhood sampling by considering two applications; first, the algorithm SNS that is merely using these strategies and have empirically shown that (though free of any ‘swarm-like’ strategies) is already producing competitive results for both unconstrained and constrained problems. Second, the SPM operator was presented; it produces individuals inside a particular subspace, exploiting information of the MOP active constraints, in order to explore along the constrained Pareto front. This feature increases the survival rate of the mutated solutions i.e., influences the mutation success. Some memetic aspects about these experiments were analised and discussed thorough the presentation of numerical results. This study hopefully opens a wide window of potential applications for this sampling strategies to build more efficient memetic algorithms.

Though the results on the new neighborhood sampling are already very promising, there are some aspects that have to be taken care of in the future. For instance, the efficient handling of equality constraints, which has almost left out in this study, is of great importance for the effective treatment of MOPs. Also, other memetic aspects inherent to the inclusion of neighborhood sampled operators into population strategies should be investigated, and are subject to ongoing work. For example, our approach is (theoretically) not limited to a few objectives. Since the the computational cost is low due to the information extracted from the population, the efficiency is not affected when the number of objectives increases. Even though, a computational issue is that current archiving strategies can loose improved individuals, and this problem could be magnified when the number of objectives increases. Then, specialized archiving strategies should be designed in this path.