Sparse and robust SVM classifier for large scale classification

Wang, Huajun; Shao, Yuanhai

doi:10.1007/s10489-023-04511-w

Sparse and robust SVM classifier for large scale classification

Published: 13 March 2023

Volume 53, pages 19647–19671, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Huajun Wang¹ &
Yuanhai Shao²

902 Accesses
1 Altmetric
Explore all metrics

Abstract

Support vector machine (SVM) has drawn wide attention in various fields, such as image classification, pattern recognition and disease diagnosis and so on. Nevertheless, it requires much memory and runs very slow in large-scale datasets setting. To reduce computational cost and the required storage, we first design a new sparse and robust SVM model according to our construct truncated smoothly clipped absolute deviation (SCAD) loss and then establish its first-order necessary and sufficient optimality conditions though newly defined proximal stationary point. Following the idea of the proximal stationary point, we define the L_ts support vectors of L_ts-SVM and then show that the L_ts support vectors are a small portion of the whole training set, which is convenient for us to introduce a novel working set in each step. We next present a new alternating direction method of multipliers with working set (L_ts-ADMM) to deal with L_ts-SVM, which proves that the proposed algorithm not only converges to a local minimizer of L_ts-SVM but also possesses a relatively low computational complexity. Finally, the extensive numerical experiments demonstrate that the proposed algorithm enjoys better performance than nine leading state-of-the-art methods with regard to best classification accuracy, smallest number of support vectors and super fast computational speed in large-scale datasets setting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Unified SVM algorithm based on LS-DC loss

Article 21 July 2021

A semismooth Newton method for support vector classification and regression

Article 14 February 2019

Proximal gradient method for huberized support vector machine

Article 31 May 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20(3):273–297
Article MATH Google Scholar
Wang HJ, Shao YH, Zhou SL, Zhang C, Xiu NH (2022) Support vector machine classifier via l_0/1 soft-margin loss. IEEE Trans Pattern Anal Mach Intell 40(10):7253–7265
Article Google Scholar
Wen Y, Ma J, Yuan C, Yang L (2020) Projection multi-birth support vector machine for multi-classification. Appl Intell 50(13):1–17
Google Scholar
Wang HJ, Shao YH (2022) Fast truncated Huber loss SVM for large scale classification. Knowl-Based Syst 26:1–17
Google Scholar
Zhou SL (2022) Sparse SVM for sufficient data reduction. IEEE Trans Pattern Anal Mach Intell 44(9):5560–5571
Google Scholar
Akram-Ali-Hammouri Z, Fernandez-Delgado M, Cernadas E, Barro S (2022) Fast support vector classification for large-scale problems. IEEE Trans Pattern Anal Mach Intell 44(10):6184–6195
Article Google Scholar
Niu DB, Wang CJ, Tang PP, Wang QS, Song E (2022) An efficient algorithm for a class of large-scale support vector machines exploiting hidden sparsity. IEEE Trans Signal Process 99:1–16
MathSciNet Google Scholar
Huang XL, Shi L, Suykens JAK (2014) Support vector machine classifier with pinball loss. IEEE Trans Pattern Anal Mach Intell 36(5):984–997
Article Google Scholar
Tanveer M, Sharma S, Rastogi R, Anand P (2022) Sparse support vector machine with pinball loss. IEEE Trans Emer Tele Tech 32(2):1–13
Google Scholar
Shen X, Niu LF, Qi ZQ, Tian YJ (2017) Support vector machine classifier with truncated pinball loss. Pattern Recognit 68:199–210
Article Google Scholar
Wang HR, Xu YT, Zhou ZJ (2021) Twin-parametric margin support vector machine with truncated pinball loss. Neural Comput Appl 33(8):3781–3798
Article Google Scholar
Yin J, Li QN (2019) A semismooth Newton method for support vector classification and regression. Comput Optim Appl 73(2):477–508
Article MathSciNet MATH Google Scholar
Xiao XS, Xu YT, Zhang Y, Zhong PW (2022) A novel self-weighted Lasso and its safe screening rule. Appl Intell 52(12):14465–14477
Article Google Scholar
Yan YQ, Li QN (2021) An efficient augmented Lagrangian method for support vector machine. Optim Method Softw 35(4):855–883
Article MathSciNet MATH Google Scholar
Feng RX, Xu YT (2022) Support matrix machine with pinball loss for classification. Neural Comput Appl 34(21):18643–18661
Article Google Scholar
Wang HM, Yitian Xu YT (2022) A safe double screening strategy for elastic net support vector machine. Inf Sci 582:382–397
Article MathSciNet Google Scholar
Zhu BZ, Ye SX, Wang P, Chevallier JL, Wei LM (2022) Forecasting carbon price using a multi-objective least squares support vector machine with mixture kernels. J Forecast, Online, https://doi.org/10.1002/for.2784
Zhao J, Xu YT, Fujita H (2019) An improved non-parallel universum support vector machine and its safe sample screening rule. Knowl -Based Syst 170:79–88
Article Google Scholar
Allen-Zhu Z (2018) Katyusha: the first direct acceleration of stochastic gradient methods. J Mach Learn Res 18:1–51
MathSciNet MATH Google Scholar
Zhu WX, Song YY, Xiao YY (2022) Support vector machine classifier with huberized pinball loss. Eng Appl Artif Intell 91:1–16
Google Scholar
Wang HJ, Shao YH, Xiu NH (2022) Proximal operator and optimality conditions for ramp loss SVM. Optim Lett 16(3):999–1014
Article MathSciNet MATH Google Scholar
Wang HR, Xu YT, Zhou ZJ (2022) Ramp loss KNN-weighted multi-class twin support vector machine. Soft Comput 26(14):6591–6618
Article Google Scholar
Pang XY, Zhao J, Xu YT (2022) A novel ramp loss-based multi-task twin support vector machine with multi-parameter safe acceleration. Neural Netw 150:194–212
Article Google Scholar
Park SY, Liu YF (2021) Robust penalized logistic regression with truncated loss functions. Can J Stat 39(2):300–323
Article MathSciNet MATH Google Scholar
Feng YL, Yang Y, Huang XL, Mehrkanoon S Suykens JAK (2018) Robust support vector machines for classification with nonconvex and smooth losses. Neural Comput 28(6):1217–1247
Article MathSciNet MATH Google Scholar
Yang LM, Dong HW (2018) Support vector machine with truncated pinball loss and its application in pattern recognition. Chemometrics Intell Lab Syst 177:89–99
Article Google Scholar
Chang X, Liu S, Zhao P, Song D (2019) A generalization of linearized alternating direction method of multipliers for solving two-block separable convex programming. J Comput Appl Math 357(2):251–272
Article MathSciNet MATH Google Scholar
Zhou SL, Xiu NH, Qi HD (2021) Global and quadratic convergence of newton hard-thresholding pursuit. J Mach Learn Res 22:1–45
MathSciNet MATH Google Scholar
Wang R, Xiu NH, Zhou SL (2022) An extended newton-type algorithm for L2-regularized sparse logistic regression and its efficiency for classifying large-scale datasets. J Comput Appl Math, Online. https://doi.org/10.1016/j.cam.2022.113656
MATH Google Scholar
Guan L, Qiao LB, Li DS, Sun T, Ge KS, Lu XC (2018) An efficient ADMM -based algorithm to nonconvex penalized support vector machines. In: Proc Int Conf Data Mining Workshops, pp 1209–1216
Dong W, Wozniak M, Wu JS, Li WG, Bai ZW (2022) De-noising aggregation of graph neural networks by using principal component analysis. IEEE Trans Industr Inform, Online, https://doi.org/10.1109/TII.2022.3156658
Dong W, Wu JS, Zhang XW, Bai ZW, Peng Wang P, Wozniak M (2022) Improving performance and efficiency of graph neural networks by injective aggregation. Knowl -Based Syst, Online, https://doi.org/10.1016/j.knosys.2022.109616

Download references

Acknowledgements

The authors sincerely thank the associate editor and the five referees for their constructive comments, which have significantly improved the quality of the paper. This work is supported by the National Natural Science Foundation of China (11971052, 11871183), the Changsha Municipal Natural Science Foundation (kq2208214), and the Scientific Research Fund of Hunan Provincial Education Department (22C0152).

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Changsha University of Science and Technology, Changsha, People’s Republic of China
Huajun Wang
School of Management, Hainan University, Haikou, People’s Republic of China
Yuanhai Shao

Authors

Huajun Wang
View author publications
You can also search for this author inPubMed Google Scholar
Yuanhai Shao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Huajun Wang.

Ethics declarations

Competing interests

he authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proofs of all theorems

1.1 A.1 Proof of Lemma 3.1

Proof

From (7), it is evidently that the L_ts(p) is separable, which makes us to get (9). Next, we split the proof of explicit expression (10) into the following two cases:

(i)
For p_i≠ 0, the ℓ_ts(p_i) is differentiable. Therefore, we can yield ∂ℓ_ts(p_i) = 0 for p_i ≥ 1 or p_i < 0, ∂ℓ_ts(p_i) = (8 − 8t)/3 for p_i ∈ [1/2,1) and ∂ℓ_ts(p_i) = 4/3 for p_i ∈ (0,1/2).
(ii)
For p_i = 0, the ℓ_ts(p_i) is non-differentiable. Based on Definition 3.1, we can get that ∂ℓ_ts(p_i) = 0 with p_i < 0 and ∂ℓ_ts(p_i) = 4/3 with p_i ∈ (0,1/2). Hence, we obtain ∂ℓ_ts(0) ∈ [0,4/3].

Therefore, we yield (10). This completes the proof. □

1.2 A.2 Proof of Lemma 3.2

Proof

According to (5) and (11), we yield that $\text {prox}_{\lambda \tau \ell _{ts}}(r)$ is the minimizer of following function

$$ \begin{array}{@{}rcl@{}} \phi(z):= \begin{cases} \phi_1(z):=\lambda\tau+ \frac{(z-r)^2}{2}, & z\geq1,\\ \phi_2(z):=\frac{-4z^2+8z-1}{3}\lambda\tau+ \frac{(z-r)^2}{2}, & z \in(1/2, 1),\\ \phi_3(z):=\frac{4z}{3}\lambda\tau+ \frac{(z-r)^2}{2}, & z \in(0, 1/2],\\ \phi_4(z):=r^2/2, & z=0,\\ \phi_5(z):=(z-r)^2/2, & z<0. \end{cases} \end{array} $$

Evidently, the minimizers of ϕ₁(z), ϕ₂(z), ϕ₃(z), ϕ₄(z) and ϕ₅(z) are obtained at $z^{*}_{1}=r$, $z^{*}_{2}=\frac {3r-8\lambda \tau }{3-8\lambda \tau }$, $z^{*}_{3}=$ r − 4λτ/3, $z^{*}_{4}=0$ and $z^{*}_{5}=r$ respectively.

(i)
For λτ ∈ (0,3/8), we compare the five values of $\phi (z^{*}_{1}),\phi (z^{*}_{2}),\phi (z^{*}_{3})$, $\phi (z^{*}_{4})$ and $\phi (z^{*}_{5})$ to obtain conclusion:

(a1)
As r ≥ 1, we get $\min \limits \{\phi (z^{*}_{2}),\phi (z^{*}_{3}),\phi (z^{*}_{4}),\phi (z^{*}_{5})\}>\phi (z^{*}_{1})$, which means $z^{*}=z^{*}_{1}=r$.
(a2)
As $r\in [\frac {1}{2}+\frac {4\lambda \tau }{3},1)$, we obtain $\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{3}), \phi (z^{*}_{4}),\phi (z^{*}_5)\}> \phi (z^{*}_2)$, which implies $z^{*}=z^{*}_2=\frac {3r-8\lambda \tau }{3-8\lambda \tau }$.
(a3)
As $r\in (\frac {4\lambda \tau }{3},\frac {1}{2}+\frac {4\lambda \tau }{3})$, we have $\min \limits \{\phi (z^{*}_1), \phi (z^{*}_2), \phi (z^{*}_4), \phi (z^{*}_{5}) \}>\phi (z^{*}_{3})$, which represents $z^{*}=z^{*}_{3}=r-4\lambda \tau /3$.
(a4)
As $r\in [0,\frac {4\lambda \tau }{3}]$, we receive $\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{2}),\phi (z^{*}_{3}), \phi (z^{*}_5)\}>\phi (z^{*}_4)$, which gets $z^{*}=z^{*}_4=0$.
(a5)
As r < 0, we derive $\min \limits \{\phi (z^{*}_1),\phi (z^{*}_2),\phi (z^{*}_3), \phi (z^{*}_4)\}>\phi (z^{*}_5)$, which obtains $z^{*}=z^{*}_5=r$. By (a1)–(a5), we get (12).

(ii)
For λτ ∈ [3/8,9/8), by contrasting the five values of $\phi (z^{*}_1),\phi (z^{*}_{2}),\phi (z^{*}_3), \phi (z^{*}_4)$ and $\phi (z^{*}_5)$, we yield conclusion:

(b1)
As $r>\frac {3}{4}+\frac {2\lambda \tau }{3}$, we get $\min \limits \{\phi (z^{*}_2),\phi (z^{*}_3),\phi (z^{*}_4),$ $\phi (z^{*}_5)\} >\phi (z^{*}_{1})$, which obtains $z^{*}=z^{*}_{1}=r$.
(b2)
As $r=\frac {3}{4}+\frac {2\lambda \tau }{3}$, we derive $\min \limits \{\phi (z^{*}_{2}),\phi (z^{*}_{4}),$ $\phi (z^{*}_{5})\}> \phi (z^{*}_{1})=\phi (z^{*}_{3})$, which can obtain $z^{*}=z^{*}_{1}=r$ or $z^{*}= z^{*}_{3}=r-4\lambda \tau /3$.
(b3)
As $r\in (\frac {4\lambda \tau }{3},\frac {3}{4}+\frac {2\lambda \tau }{3})$, we obtain $\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{2}), \phi (z^{*}_4),\phi (z^{*}_5)\}>\phi (z^{*}_3)$, which means $z^{*}=z^{*}_3=r-4\lambda \tau /3$.
(b4)
As $r\in [0,\frac {4\lambda \tau }{3}]$, we receive $\min \limits \{\phi (z^{*}_1),\phi (z^{*}_2),\phi (z^{*}_3), \phi (z^{*}_{5})\}>\phi (z^{*}_{4})$, which means $z^{*}=z^{*}_{4}=0$.
(b5)
As r < 0, we get $\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{2}),\phi (z^{*}_{3}),\phi (z^{*}_{4})> \phi (z^{*}_{5})$, which gets $z^{*}=z^{*}_{4}=r$. By (b1)-(b5), we get (13).

(iii)
For λτ ≥ 9/8, by contrasting the five values of $\phi (z^{*}_{1}),\phi (z^{*}_{2}),\phi (z^{*}_{3}), \phi (z^{*}_{4})$ and $\phi (z^{*}_{5})$, we yield conclusion:

(c1)
As $r>\sqrt {2\lambda \tau }$, we get $\min \limits \{\phi (z^{*}_{2}),\phi (z^{*}_{3}),\phi (z^{*}_{4}), $ $\phi (z^{*}_{5})\}> \phi (z^{*}_{1})$, which implies $z^{*}=z^{*}_{1}=r$.
(c2)
As $r=\sqrt {2\lambda \tau }$, we obtain $\min \limits \{\phi (z^{*}_{2}),\phi (z^{*}_{3}),\phi (z^{*}_{5})\}> \phi (z^{*}_{1})=\phi (z^{*}_{4})$, which means $z^{*}=z^{*}_{1}=r$ or $z^{*}=z^{*}_{4}=0$.
(c3)
As $r\in [0,\sqrt {2\lambda \tau })$, we derive $\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{2}),$ $\phi (z^{*}_{4}), \phi (z^{*}_5)\}>\phi (z^{*}_4)$, which means $z^{*}=z^{*}_4=0$.
(c4)
As r < 0, we obtain $\min \limits \{\phi (z^{*}_1),\phi (z^{*}_2),\phi (z^{*}_3),$ $\phi (z^{*}_4)\}> \phi (z^{*}_5)$, which obtains $z^{*}=z^{*}_5=r$. By (c1)-(c4), we get (14), which completes the whole proof.

□

1.3 A.3 Proof of Lemma 3.3

Proof

From the separate property of L_ts(u), it is evidently that $[\text {prox}_{\lambda \tau L_{ts}}({ \textbf {r}})]_i=\text {prox}_{\lambda \tau \ell _{ts}}({ r_i})$, where

$$ \text{prox}_{\lambda\tau \ell_{ts}}(r_{i})={\arg}\min_{z\in {\mathbb{R}}} \lambda\tau \ell_{ts}(z)+ \frac{1}{2}(z-r_{i})^{2}, i\in [m]. $$

Therefore, we yield (15). This completes the proof. □

1.4 A.4 Proof of Theorem 3.1

Proof

Let (w^∗;b^∗;p^∗) be a local minimizer of (8). It is evidently that there exists $\boldsymbol {\theta }^{*}\in \mathbb {R}^{m}$ such that (w^∗;b^∗;p^∗) is a KKT point of (8). In other words, we have

$$ \begin{array}{@{}rcl@{}} \left\{ \begin{array}{lll} \textbf{w}^{*} + M^{\top}{\boldsymbol{\theta}^{*}} = \textbf{0}, \\ \langle \textbf{y}, {\boldsymbol{\theta}^{*}} \rangle = 0, \\ \textbf{p}^{*}+M \textbf{w}^{*}+b^{*}\textbf{y} = {\textbf{1}}, \\ \boldsymbol{\theta}^{*}+\tau \partial L_{ts}(\textbf{p}^{*}) & \ni & \textbf{0}. \end{array}\right. \end{array} $$

(34)

According to (16) and (34), it can be clearly seen that to show (16), we only prove that for any 0 < λ ≤ λ^∗, if the (𝜃^∗;p^∗) satisfies $\textbf {0}\in \boldsymbol {\theta }^{*}+\tau \partial L_{ts}(\textbf {p}^{*})$, then we receive $\textbf {p}^{*}\in \P :=\text {prox}_{\lambda \tau L_{ts}}(\textbf {p}^{*}-\tau {\boldsymbol {\theta }^{*}}).$ From $\textbf {0}\in \boldsymbol {\theta }^{*}+\tau \partial L_{ts}(\textbf {p}^{*})$, (9) and (10), we have that

$$ \theta^{*}_{i}\in \begin{cases} 0, & p^{*}_{i}\geq1, \\ -8(1-p^{*}_{i})\tau/3, & p^{*}_{i}\in[1/2,1),\\ -4\tau/3, & p^{*}_{i}\in(0,1/2),\\ [-4\tau/3,0], & p^{*}_{i}=0,\\ 0, & p^{*}_{i}<0. \end{cases} i\in[m]. $$

(35)

Based on Lemma 3.2, we prove (16) by three scenarios: λτ ∈ (0,3/8), λτ ∈ [3/8,9/8) and λτ ≥ 9/8.

Case (i)::: For τ > 0 and λ ∈ (0,λ^∗] satisfying λτ ∈ (0,3/8), we consider the following five cases.

(i)
For $i\in \mathbb {S}^{*}$, we yield $p^{*}_i> 1$ and get $\theta ^{*}_i=0$ by (35), which implies that
$$ u_{i}^{*}:=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}>1. $$
(36)
(ii)
For $i\in \mathbb {T}^{*}$, we have $p^{*}_i\in [1/2,1]$ and receive $\theta ^{*}_i=-8(1-p^{*}_i)\tau /3$ from (35). Hence, we derive
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}+8\lambda\tau(1-p^{*}_{i})/3, $$
(37)
which together with $\lambda \leq \lambda ^{*}_1=3/(8\tau )$ gets
$$ u_{i}^{*}\leq p^{*}_{i}+8(3/8\tau)\tau (1-p^{*}_{i})/3=1. $$
(38)
From (37), $p^{*}_i\in [1/2,1)$ and $\lambda \leq \lambda ^{*}_1=3/(8\tau )$, we obtain
$$ u_{i}^{*}\geq1/2+4\lambda\tau/3, $$
which together with (38) results in
$$u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[1/2+4\lambda\tau/3, 1].$$
(iii)
For $i\in \mathbb {E}^{*}$, we receive $p^{*}_i\in (0,1/2)$ and obtain $\theta ^{*}_i=-4\tau /3$ based on (35). Therefore, we can obtain
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}+4\lambda\tau/3, $$
(39)
which combine with $p^{*}_i\in (0,1/2)$ yields
$$ u_{i}^{*}=p^{*}_{i}+4\lambda\tau/3\in(4\lambda\tau/3,1/2+4\lambda\tau/3). $$
(40)
(iv)
For $i\in \mathbb {I}^{*}$, we have $p^{*}_i=0$ and $\theta ^{*}_i\in [-4\tau /3,0]$ by (35), which yields
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[0,4\lambda\tau/3]. $$
(v)
For $i\in \mathbb {O}^{*}$, we get $p^{*}_i<0$ and $\theta _i^{*}=0$ by (35), which results in
$$ u^{*}_{i}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}<0. $$

Based on the above (i)-(v), we yield (12). This completes the proof of Case (i). □

Case (ii)::: For τ > 0 and λ ∈ (0,λ^∗] satisfying λτ ∈ [3/8,9/8), we consider five cases as below.

(i)
For $i\in \mathcal { S}^{*}$, we have $p^{*}_i>3/4$. By $\lambda \leq \lambda _2^{*}=(12p^{*}_i-9)/(8\tau )$, we yield
$$ p^{*}_{i}\geq 3/4+2\lambda\tau/3, $$
which combine with λτ ∈ [3/8,9/8) obtains $p^{*}_i\geq 1$ and $\theta ^{*}_i=0$ by (35). Therefore, we get
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}\geq3/4+2\lambda\tau/3. $$
(ii)
For $i\in \mathcal { I}^{*}$, we have $p^{*}_i=3/4$ and $\theta ^{*}_i=-8(1-p^{*}_i)\tau /3$ from (35). Hence, we obtain
$$ \begin{array}{@{}rcl@{}} u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=3/4+2\lambda\tau/3. \end{array} $$
(iii)
For $i\in \mathcal { T}^{*}$, we have $p^{*}_i\in (0,3/4)$. From $\lambda \leq \lambda _3^{*}=3(3-4p^{*}_i)/8\tau $ and λτ ∈ [3/8,9/8), we obtain
$$ \begin{array}{@{}rcl@{}} p^{*}_{i}<3/4-2/3\lambda\tau\in(0,1/2), \end{array} $$
which together with (35) yields $\theta ^{*}_i=-4 \tau /3$ and
$$ \begin{array}{@{}rcl@{}} u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in(4\lambda \tau/3,3/4+2\lambda \tau/3). \end{array} $$
(iv)
For $i\in \mathbb {I}^{*}$, we have $p^{*}_i=0$ and $\theta ^{*}_i\in [-4\tau /3,0]$ by (35), which gets
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[0,4\lambda\tau/3]. $$
(v)
For $i\in \mathbb {O}^{*}$, we receive $p^{*}_i<0$ and $\theta _i^{*}=0$, which implies
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}<0. $$

From the above (i)-(v), we receive (13). This completes the proof Case(ii). $ \Box $

Case (iii)::: For τ > 0 and λ ∈ (0,λ^∗] satisfying λτ ≥ 9/8, we consider three cases as follows.

(i)
For $i\in T^{*}:=\mathbb {S}^{*} \cup \mathbb {T}^{*}\cup \mathbb {E}^{*}$, we get $p^{*}_i>0$. By $\lambda \leq \lambda _5^{*}$, we have
$$ p^{*}_{i}\geq \sqrt{2\lambda_{5}^{*}\tau}\geq\sqrt{2\lambda\tau}, $$
which together with λτ ≥ 9/8 obtains $\sqrt {2\lambda \tau }\geq 3/2$ and $\theta ^{*}_i=0$ by (35). Hence, we obtain
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}\geq\sqrt{2\lambda\tau}. $$
(ii)
For $i\in \mathbb {I}^{*}$, we have $p^{*}_i=0$ and $\theta ^{*}_i\in [-4\tau /3, 0]$ by (35). From $\lambda \leq \lambda _4^{*}\leq 9/8\tau $, we have
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[0,3/2], $$
which together with λτ ≥ 9/8 gets $\sqrt {2\lambda \tau }\geq 3/2$ and
$$ \begin{array}{@{}rcl@{}} u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[0,\sqrt{2\lambda\tau}]. \end{array} $$
(iii)
For $i\in \mathbb {O}^{*}$, we have $p^{*}_i<0$ and $\theta _i^{*}=0$, which means
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}<0. $$

From the above (i)-(iii), we yield (14). This completes the whole proof.

1.5 A.5 Proof of Theorem 3.2

Proof

Firstly, we denote Ω := {χ := (w;b;p) : p + Mw + by = 1} and χ^∗ := (w^∗;b^∗;p^∗). For any χ ∈Ω, it is observed that p + Mw + by = 1, which combine with (16) results in

$$ -M(\textbf{w}-\textbf{w}^{*}) = (b -b^{*})\textbf{y}+\textbf{p}-\textbf{p}^{*}. $$

(41)

According to the convexity of ∥w∥², we derive

$$ \begin{array}{@{}rcl@{}} \|\textbf{w}\|^{2}-\|\textbf{w}^{*}\|^{2} & \geq & 2\langle\textbf{w}-\textbf{w}^{*},\textbf{w}^{*}\rangle\\ & \overset{(16)}{=}& -2\langle \boldsymbol{\theta}^{*}, M(\textbf{w}-\textbf{w}^{*})\rangle\\ & \overset{(41)}{=} & 2\langle\boldsymbol{\theta}^{*}, \textbf{p}-\textbf{p}^{*}\rangle+2 (b -b^{*})\langle\boldsymbol{\theta}^{*}, \textbf{y}\rangle\\ & \overset{(16)}{=} & 2\langle \boldsymbol{\theta}^{*}, \textbf{p}-\textbf{p}^{*}\rangle. \end{array} $$

Based on the above facts, we will show that χ^∗ is a local minimizer of (8). That is to say, there is a neighborhood N(χ^∗,ρ) := {χ : ∥χ −χ^∗∥≤ ρ} of χ^∗ ∈Ω with ρ > 0 such that for any χ ∈Ω∩ N(χ^∗,ρ), we receive

$$ \begin{array}{@{}rcl@{}} \frac{1}{2}\|\textbf{w} \|^{2}+\tau L_{ts}(\textbf{p})\geq \frac{1}{2}\| \textbf{w}^{*} \|^{2}+\tau L_{ts}(\textbf{p}^{*}). \end{array} $$

(42)

To show (42), we consider the following three scenarios: λτ ∈ (0,3/8), λτ ∈ [3/8,9/8) and λτ ≥ 9/8.

Case (i)::: For λτ ∈ (0,3/8), define $\textbf {s}^{*}:=\textbf {p}^{*}-\lambda \boldsymbol {\theta }^{*}\in \mathbb {R}^m$,

$$ \begin{array}{@{}rcl@{}} \mathcal{ A}_{1}^{*}&:=&\!\{i\!\in\![m]\!: s^{*}_{i} \!<\! 0\},\mathcal{ A}_{2}^{*}:=\{i\in[m]: s^{*}_{i} \in[0,\widetilde{\tau}]\},\\ \mathcal{ A}_{3}^{*}&:=&\{i\in[m]: s^{*}_{i} \in(\widetilde{\tau},\overline{\tau}),~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\\ \mathcal{ A}_{4}^{*}&:=&\{i\in[m]: s^{*}_{i} \in[\overline{\tau},1),~ \mathcal{A}_{5}^{*}:=\{i\in[m]: s^{*}_{i}\geq1\}.~~ \end{array} $$

(43)

where $\widetilde {\tau }:=4\lambda \tau /3$ and $\overline {\tau }:=1/2+4\lambda \tau /3$. From (12), (15) and (43), it is evidently that $\textbf {p}^{*}\in \text {prox}_{\lambda \tau L_{ts}}(\textbf {p}^{*}-\lambda {\boldsymbol {\theta }^{*}})$ is identical to

$$ \begin{array}{@{}rcl@{}} {\textbf{p}^{*}} =\left[\begin{array}{c} (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})_{\mathcal{A}_{1}^{*}} \\ {\bf0}_{\mathcal{ A}_{2}^{*}} \\ (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*}-\frac{4\lambda\tau }{3}\textbf{1})_{\mathcal{A}_{3}^{*}} \\ \frac{(3(\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})-8\lambda\tau \textbf{1})_{\mathcal{A}_{4}^{*}}}{3-8\lambda\tau}\\ (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})_{\mathcal{A}_{5}^{*}} \end{array}\right], \end{array} $$

which is equivalent to

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\theta}^{*}_{\mathcal{ A}_{1}^{*}} & = & \textbf{0}_{\mathcal{A}_{1}^{*}},~\boldsymbol{p}^{*}_{\mathcal{ A}_{2}^{*}}={\bf0}_{\mathcal{ A}_{2}^{*}}, \boldsymbol{\theta}^{*}_{\mathcal{ A}_{3}^{*}}=\frac{-4\tau}{3} \textbf{1}_{\mathcal{A}_{3}^{*}}\\ \boldsymbol{\theta}^{*}_{\mathcal{ A}_{4}^{*}} & = & - \frac{8\tau(\textbf{1}-\textbf{p}^{*})_{\mathcal{ A}_{4}^{*}}}{3},~\boldsymbol{\theta}^{*}_{\mathcal{ A}_{5}^{*}}=\textbf{0}_{\mathcal{ A}_{5}^{*}}. \end{array} $$

This combine with (43) leads to

$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*} & = & 0,~~p_{i}^{*}<0,~i\in \mathcal{ A}^{*}_{1}, \end{array} $$

(44)

$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*} & \in & [-4\tau/3,0],~~ p_{i}^{*}=0,~i \in \mathcal{ A}^{*}_{2}, \end{array} $$

(45)

$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*} & \in & -4\tau/3,~~ p_{i}^{*}\in(0,1/2),~i \in \mathcal{ A}^{*}_{3},\\ \theta_{i}^{*} & = & -8\tau(1-p^{*}_{i})/3,~~p_{i}^{*}\in[1/2,1),~i \in \mathcal{ A}^{*}_{4}, \\ \theta_{i}^{*} & = & 0,~~p^{*}_{i}\geq 1,~i \in \mathcal{ A}^{*}_{5}. \end{array} $$

(46)

Denote $\mathcal {A}^{*}:=\mathcal { A}_2^{*}\cup \mathcal { A}_3^{*}\cup \mathcal { A}_4^{*}$ and $\overline {\mathcal { A}}^{*}:=\mathcal { A}_1^{*}\cup \mathcal { A}_5^{*}$. To proof (42), by (42)-(44), we only need to show two facts:

$$ \tau L_{ts}(\textbf{p}_{{\mathcal{ A}}^{*} })-\tau L_{ts}(\textbf{p}^{*}_{{\mathcal{ A}}^{*} })+\langle \boldsymbol{\theta}^{*}_{\mathcal{ A}^{*}},\textbf{p}_{\mathcal{ A}^{*}}-\textbf{p}^{*}_{\mathcal{ A}^{*}} \rangle\geq0,\\ $$

(47)

$$ \tau L_{ts}(\textbf{p}_{\overline{\mathcal{ A}}^{*} })- \tau L_{ts}(\textbf{p}^{*}_{\overline{\mathcal{A}}^{*} })\geq0. $$

(48)

According to (5) and (42)-(44), we yield desired conclusion:

(i)
For $i\in {\mathscr{H}}^{*}:=\mathcal { A}_1^{*}\cup \mathcal { A}_3^{*}\cup \mathcal { A}_4^{*}\cup \mathcal { A}_5^{*}$, we obtain that the $\ell _{ts}(p^{*}_i)$ is differentiable. From Definition 3.1, we get that there exists a neighborhood $N(\boldsymbol {\chi }^{*},\overline {\rho })$ with $\overline {\rho }>0$ such that for any $\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\overline {\rho })$, we yield
$$ \begin{array}{@{}rcl@{}} \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle\tau \nabla\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, \end{array} $$
which together with $-\tau \nabla \ell _{ts}(p^{*}_i)=-4\tau /3= \theta ^{*}_i$, for $i\in \mathcal { A}_3^{*}$ and $-\tau \nabla \ell _{ts}(p^{*}_i)=-8\tau (1-p^{*}_i)/3= \theta ^{*}_i$, for $i\in \mathcal {A}_4^{*}$ makes us to get (47) and combine with $\nabla \ell _{ts}(p^{*}_i)=0$, for $i\in \mathcal {A}_1^{*}\cup \mathcal {A}_5^{*}$ allows us to obtain (48).
(ii)
For $i\in \mathcal { A}_2^{*}$, we can see that the $\ell _{ts}(p^{*}_i)$ is non-differentiable. From Definition 3.1, we also receive that there exists a neighborhood $N(\boldsymbol {\chi }^{*},\widetilde {\rho })$ with $\widetilde {\rho }>0$ such that for any $\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\widetilde {\rho })$, we obtain
$$ \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle\tau \partial\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, $$
which together with $-\tau \partial \ell _{ts}(p^{*}_i)=\theta _i^{*}\in [-4\tau /3,0]$, for $i\in \mathcal { A}_2^{*}$ leads to (47).

By the above (i) and (ii), we show (w^∗;b^∗;p^∗) is local minimizer of problem (8) in a local region Ω ∩ N(χ^∗,ρ^∗) with $\rho ^{*}:=\min \limits \{\overline {\rho },\widetilde {\rho }\}$. This completes the proof of Case (i).

Case (ii)::: For λτ ∈ [3/8,9/8), let $\textbf {s}^{*}=\textbf {p}^{*}-\lambda \boldsymbol {\theta }^{*}\in \mathbb {R}^m$,

$$ \begin{array}{@{}rcl@{}} \mathcal{ C}_{1}^{*}:=\{i\in[m]: s^{*}_{i} < 0\}, \mathcal{ C}_{2}^{*}:=\{i\in[m]: s^{*}_{i} \in [0,\widetilde{\tau}]\},\\ \mathcal{ C}_{3}^{*}:=\{i\in[m]: s^{*}_{i} \in(\widetilde{\tau},\overline{\tau})~\text{or}~s^{*}_{i}=\widehat{\tau}, \theta^{*}_{i}\neq 0\},~~~~~~~~~~~\\ \mathcal{ C}^{*}_{4}:=\{i\in[m]: s^{*}_{i}>\widehat{\tau}~\text{or}~s^{*}_{i}=\widehat{\tau},\theta^{*}_{i}= 0\},~~~~~~~~~~~~~~~~~~ \end{array} $$

(49)

where $\widetilde {\tau }=4\lambda \tau /3$ and $\widehat {\tau }:=3/4+2\lambda \tau /3$. From (15) and (13), we get that $\textbf {p}^{*}\in \text {prox}_{\lambda \tau L_{ts}}(\textbf {p}^{*}-\lambda {\boldsymbol {\theta }^{*}})$ is equivalent to

$$ \begin{array}{@{}rcl@{}} {\textbf{p}^{*}} =\left[\begin{array}{c} (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})_{\mathcal{C}_{1}^{*}}\\ \textbf{0}_{\mathcal{ C}_{2}^{*}} \\ (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*}-\frac{4\lambda \tau }{3} \textbf{1})_{\mathcal{ C}_{3}^{*}} \\ (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})_{\mathcal{ C}_{4}^{*}} \end{array}\right], \end{array} $$

which is identical to

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\theta}^{*}_{\mathcal{ C}_{1}^{*}}&=&\textbf{0}_{\mathcal{ C}_{1}^{*}},~~~ \textbf{p}^{*}_{\mathcal{ C}_{2}^{*}}=\textbf{0}_{\mathcal{ C}_{2}^{*}}, ~~~ \boldsymbol{\theta}^{*}_{\mathcal{C}_{3}^{*}}=-\frac{4\tau}{3}\textbf{e}_{\mathcal{C}_{3}^{*}},\\ \boldsymbol{\theta}^{*}_{\mathcal{C}_{4}^{*}}&=&\textbf{0}_{\mathcal{C}_{4}^{*}}. \end{array} $$

(50)

This combine with (49) obtains

$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*} & = & 0,p_{i}^{*}<0, i~ \in \mathcal{ C}^{*}_{1},\\ \theta_{i}^{*} & \in & [-\frac{4\tau}{3},0], p_{i}^{*}=0,i \in \mathcal{ C}^{*}_{2}, \\ \theta_{i}^{*} & = & -\frac{4\tau}{3},p_{i}^{*}\in(0,\frac{3}{4}-\frac{2\lambda\tau}{3}], i \in \mathcal{ C}^{*}_{3}, \\ \theta_{i}^{*} & = & 0,p^{*}_{i}\geq \frac{3}{4}+\frac{2\lambda\tau}{3}, i \in \mathcal{ C}^{*}_{4}. \end{array} $$

Define $\mathcal { C}^{*}:=\mathcal { C}_2^{*}\cup \mathcal { C}_3^{*}$ and $\overline {\mathcal { C}}^{*}:=\mathcal { C}_1^{*}\cup \mathcal { C}_4^{*}$. To show (42), according to (42) and (44), we only need to prove two facts:

$$ \begin{array}{@{}rcl@{}} \tau L_{ts}(\textbf{p}_{{\mathcal{ C}}^{*} })-\tau L_{ts}(\textbf{p}^{*}_{{\mathcal{C}}^{*} })+\langle \boldsymbol{\theta}^{*}_{\mathcal{ C}^{*}},\textbf{p}_{\mathcal{ C}^{*}}-\textbf{p}^{*}_{\mathcal{ C}^{*}} \rangle\geq0, \end{array} $$

(51)

$$ \begin{array}{@{}rcl@{}} \tau L_{ts}(\textbf{p}_{\overline{\mathcal{ C}}^{*} })- \tau L_{ts}(\textbf{p}^{*}_{\overline{\mathcal{ C}}^{*} })\geq0. \end{array} $$

(52)

Based on (5), (49) and (51), we obtain the desired conclusion:

(i)
For $i\in C^{*}:=\mathcal { C}^{*}_1\cup \mathcal { C}^{*}_3\cup \mathcal { C}^{*}_4$, from λτ ∈ [3/8,9/8), we get $(0,\frac {3}{4}-\frac {2\lambda \tau }{3}]\subset (0,1/2]$ and $\frac {3}{4}+\frac {2\lambda \tau }{3}\geq 1$, which means that the $\ell _{ts}(p^{*}_i)$ is differentiable. From Definition 3.1, there is a neighborhood $N(\boldsymbol {\chi }^{*},\overline {\eta })$ with $\overline {\eta }>0$ such that for any $\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\overline {\eta })$, we obtain
$$ \begin{array}{@{}rcl@{}} \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle \tau\nabla\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, \end{array} $$
which together with $-\tau \nabla \ell _{ts}(p^{*}_i)=-\frac {4\tau }{3}=\theta _i^{*}, i\in \mathcal { C}^{*}_3$ gets (51) and $-\tau \nabla \ell _{ts}(p^{*}_i)=0=\theta _i^{*}, i\in \mathcal {C}^{*}_1\cup \mathcal {C}^{*}_4$ obtains (52).
(ii)
For $i\in \mathcal { C}_2^{*}$, we obtain that the $\ell _{ts}(p^{*}_i)$ is non-differentiable. Based on Definition 3.1, we have that there is a neighborhood $N(\boldsymbol {\chi }^{*},\widetilde {\eta })$ with $\widetilde {\eta }>0$ such that for any $\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\widetilde {\eta })$, we yield
$$ \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle \partial \tau\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, $$
which together with $-\tau \partial \ell _{ts}(p^{*}_i)=\theta _i^{*}\in [-\frac {4\tau }{3},0]$ gets (51).

By the above (i)-(iii), we prove (w^∗;b^∗;p^∗) is local minimizer of problem (8) in a local region Ω ∩ N(χ^∗,η^∗) with $\eta ^{*}:=\min \limits \{\overline {\eta },\widetilde {\eta }\}$, which completes the proof of Case (ii).

Case (iii)::: For λτ ≥ 9/8, let $\textbf {s}^{*}=\textbf {p}^{*}-\lambda \boldsymbol {\theta }^{*}\in \mathbb {R}^m$,

$$ \begin{array}{@{}rcl@{}} \mathcal{ E}_{1}^{*} & := & \{i\in[m]: s_{i}^{*} < 0\}, \\ \mathcal{ E}^{*}_{2} & := & \{i\in[m]: s_{i}^{*} \in[0,\sqrt{2\nu\alpha})~\text{or}~s_{i}^{*} =\sqrt{2\nu\alpha}, \theta_{i}^{*} \neq0\},\\ \mathcal{ E}_{3}^{*} & := & \{i\in[m]: s_{i}^{*} > \sqrt{2 \nu\alpha}~\text{or}~s_{i}^{*} =\sqrt{2\nu\alpha}, \theta_{i}^{*} =0\}. \end{array} $$

(53)

Based on (15) and (13), we receive that $\textbf {p}^{*}\in \text {prox}_{\lambda \tau L_{ts}}(\textbf {p}^{*}-\tau {\boldsymbol {\theta }^{*}})$ is equivalent to

$$ \begin{array}{@{}rcl@{}} {\textbf{p}^{*}} =\left[\begin{array}{c} (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})_{\mathcal{E}_{1}}^{*} \\ \textbf{0}_{\mathcal{E}_{2}}^{*} \\ (\textbf{p}^{*}-\lambda \boldsymbol{\theta}^{*})_{\mathcal{E}_{3}}^{*} \end{array}\right], \end{array} $$

which is identical to

$$ \begin{array}{@{}rcl@{}} \boldsymbol{\theta}^{*}_{\mathcal{ E}_{1}^{*}}={\bf0}_{\mathcal{ E}_{1}^{*}},~~~ \textbf{p}^{*}_{\mathcal{ E}_{2}^{*}}={\bf0}_{\mathcal{ E}_{2}^{*}}, ~~~\boldsymbol{\theta}^{*}_{\mathcal{E}_{3}^{*}}={\bf0}_{\mathcal{E}_{3}^{*}}. \end{array} $$

(54)

This combine with (53) derives

$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*} & = & 0, p_{i}^{*}<0,~~i \in\mathcal{ E}_{1}^{*}, \\ \theta_{i}^{*} & \in & [-\sqrt{2\tau/\lambda},0], p_{i}^{*}=0,~~ i\in \mathcal{ E}^{*}_{2},\\ \theta_{i}^{*} & = & 0,p_{i}^{*}\geq\sqrt{2 \lambda\tau},i\in\mathcal{ E}_{3}^{*}, \end{array} $$

Define $\mathcal { E}^{*}:=\mathcal { E}^{*}_2,~~ \overline {\mathcal { E}}^{*}:=\mathcal { E}_1^{*}\cup \mathcal { E}_3^{*}$. To show (42), we only need to prove two inequalities:

$$ \begin{array}{@{}rcl@{}} \tau L_{ts}(\textbf{p}_{{\mathcal{ E}}^{*} })-\tau L_{ts}(\textbf{p}^{*}_{{\mathcal{ E}}^{*} })+\langle \boldsymbol{\theta}^{*}_{\mathcal{ E}^{*}}, \textbf{p}_{\mathcal{E}^{*}}-\textbf{p}^{*}_{\mathcal{E}^{*}}\rangle\geq0, \end{array} $$

(55)

$$ \begin{array}{@{}rcl@{}} \tau L_{ts}(\textbf{p}_{\overline{\mathcal{ E}}^{*}})-\tau L_{ts}(\textbf{p}^{*}_{\overline{\mathcal{ E}}^{*} })\geq0. \end{array} $$

(56)

By (5), (53) and (55), we obtain the desired conclusion:

(i)
For $i\in \mathcal { E}^{*}_1\cup \mathcal { E}^{*}_3$, by λτ ≥ 9/8, we get $p_i^{*}\geq \sqrt {2 \lambda \tau }\geq 3/2$, which means that $\ell _{ts}(p^{*}_i)$ is differentiable. By Definition 3.1, we get that there exists a neighborhood $N(\boldsymbol {\chi }^{*},\overline {\kappa })$ with $\overline {\kappa }>0$ such that for any $\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\overline {\kappa })$, we lead to
$$ \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle \tau \nabla\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, $$
which together with $\tau \nabla \ell _{ts}(p^{*}_i)=0=\theta ^{*}_i$ obtains (56).
(ii)
For $i\in \mathcal { E}_2^{*}$, we can see that the $\ell _{ts}(p^{*}_i)$ is non-differentiable. According to Definition 3.1, we yield that there is a neighborhood $N(\boldsymbol {\chi }^{*},\widetilde {\kappa })$ with $\widetilde {\kappa }>0$ such that for any $\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\widetilde {\kappa })$, we obtain
$$ \begin{array}{@{}rcl@{}} \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle \partial \tau\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, \end{array} $$
which with $-\tau \partial \ell _{ts}(p^{*}_i)=\theta _i^{*}\in [-\sqrt {2\tau /\lambda },0]$ results in (55).

By the above (i) and (ii), we show (w^∗;b^∗;p^∗) is local minimizer of problem (8) in a local region Ω ∩ N(χ^∗,ρ^∗) with $\rho ^{*}:=\min \limits \{\overline {\rho },\widetilde {\rho }\}$. This completes the proof of Case (i). □

1.6 A.6 Proof of Theorem 3.3

Proof

For λτ ∈ (0,3/8), let (w^∗;b^∗;p^∗) be a proximal stationary point of (8). Based on Theorem 3.2 and (44), we obtain

$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*} & \in & [-4\tau/3,0], i \in \mathcal{A}^{*}_{2},~~\theta_{i}^{*}=-4\tau/3, i \in \mathcal{ A}^{*}_{3}, \\ {\theta}^{*}_{i} & = & -8\tau(1-p^{*}_{i})/3, i\in \mathcal{ A}^{*}_{4}, ~~ {\theta}^{*}_{i}=0, i\in\overline{\mathcal{ A}}^{*}. \end{array} $$

(57)

Based on $M=[y_1\textbf {x}_{1}~\cdots ~y_m\textbf {x}_{m}]^{\top }\in {\mathbb {R}}^{m\times n}$ and first equation of (16), we can obtain that

$$ \begin{array}{@{}rcl@{}} \textbf{w}^{*} = -M^{\top}_{\mathcal{ A}^{*}}{\boldsymbol{\theta}^{*}_{\mathcal{ A}^{*}}} - M^{\top}_{\overline{\mathcal{ A}}^{*}}{\boldsymbol{\theta}^{*}_{\overline{\mathcal{ A}}^{*}}} =- M^{\top}_{\mathcal{ A}^{*}}{\boldsymbol{\theta}^{*}_{\mathcal{ A}^{*}}}=-\underset{i \in \mathcal{ A}^{*}}{\sum} \theta_{i}^{*}y_{i}\textbf{x}_{i}, \end{array} $$

which leads to (17). Moreover, according to (44), we get $p_i^{*}=0, i\in \mathcal { A}^{*}_2$, $p_i^{*}\in (0,1/2), i\in \mathcal { A}^{*}_3$ and $p_i^{*}\in [1/2,1), i\in \mathcal {A}^{*}_4$. This together with the definition of M and third equation of (16) results in (18). This completes the proof. □

1.7 A.7 Proof of Theorem 3.4

Proof

For λτ ∈ [3/8,9/8), let (w^∗;b^∗;p^∗) with $\boldsymbol {\theta }^{*}\in \mathbb {R}^m$ be a proximal stationary point. By Theorem 3.2 and (51), we receive

$$ \begin{array}{@{}rcl@{}} &&\theta_{i}^{*}\in [-4\tau/3,0] , i\in \mathcal{ C}^{*}_{2},\\ &&\theta_{i}^{*}= -4\tau/3 , i\in \mathcal{ C}^{*}_{3},~\theta^{*}_{i}=0, i\in\overline{\mathcal{ C}}^{*}, \end{array} $$

which together with the matrix M and (16) derives

$$ \begin{array}{@{}rcl@{}} \textbf{w}^{*} = -M^{\top}_{\mathcal{ C}^{*}}{\boldsymbol{\theta}^{*}_{\mathcal{ C}^{*}}} - M^{\top}_{\overline{\mathcal{ C}}^{*}}{\boldsymbol{\theta}^{*}_{\overline{\mathcal{ C}}^{*}}} =-M^{\top}_{\mathcal{ C}^{*}}{\boldsymbol{\theta}^{*}_{\mathcal{ C}^{*}}}=-\underset{i \in \mathcal{ C}^{*}}{\sum} \theta_{i}^{*}y_{i}\textbf{x}_{i}. \end{array} $$

By (51), we get $p_i^{*}=0, i\in \mathcal { C}^{*}_2$ and $p_i^{*}\in (0,\frac {3}{4}-\frac {2\lambda \tau }{3}], i \in \mathcal { C}^{*}_3$. This with (16) and M gets (20). This completes the proof. □

1.8 A.8 Proof of Theorem 3.5

Proof

For λτ ≥ 9/8, because (w^∗;b^∗;p^∗) with $\boldsymbol {\theta }^{*}\in \mathbb {R}^m$ is a proximal stationary point of (8), from Theorem 3.2 and (55), we get

$$ \begin{array}{@{}rcl@{}} \theta_{i}^{*}\in [-\sqrt{2\tau/\lambda},0] , i\in \mathcal{ E}^{*},~~\theta^{*}_{i}=0, i\in\overline{\mathcal{ E}}^{*}, \end{array} $$

which combine with the M and (16) leads to

$$ \begin{array}{@{}rcl@{}} \textbf{w}^{*} = -M^{\top}_{\mathcal{ E}^{*}}{\boldsymbol{\theta}^{*}_{\mathcal{ E}^{*}}} - M^{\top}_{\overline{\mathcal{ E}}^{*}}{\boldsymbol{\theta}^{*}_{\overline{\mathcal{ E}}^{*}}} =-M^{\top}_{\mathcal{ E}^{*}}{\boldsymbol{\theta}^{*}_{\mathcal{ E}^{*}}}=-\underset{i \in \mathcal{ E}^{*}}{\sum} \theta_{i}^{*}y_{i}\textbf{x}_{i}. \end{array} $$

In addition, based on (55), we get $p_i^{*}=0, i\in \mathcal { E}^{*}$, which with (16) and M results in (22). This completes the proof. □

1.9 A.9 Proof of Theorem 3.6

Proof

It is observed that for $k\rightarrow \infty $, the working set $C_k\subseteq [m]$ has finite many elements, which represents that we can find a subset ${\Upsilon }\subseteq \{1,2,3,\cdots \}$ such that

$$ \begin{array}{@{}rcl@{}} C_{i}\equiv:C, ~~\forall~i\in {\Upsilon}. \end{array} $$

(58)

We now define the sequence Ψ^k := (w^k,b^k,p^k,𝜃^k) and its limit point Ψ^∗ := (w^∗,b^∗,p^∗,𝜃^∗), i.e., $\{\boldsymbol {\Psi }^k\}\rightarrow \boldsymbol {\Psi }^{*}$, which represents $\{\boldsymbol {\Psi }^i\}_{i\in {\Upsilon }}\rightarrow \boldsymbol {\Psi }^{*} $ and $\{\boldsymbol {\Psi }^{i+1}\}_{i\in {\Upsilon }}\rightarrow \boldsymbol {\Psi }^{*}$. Taking the limit along with Υ of (33), i.e., $k\in {\Upsilon }, k\rightarrow \infty $, we derive

$$ \begin{array}{@{}rcl@{}} \left\{ \begin{array}{lll} {\boldsymbol{\theta}}^{*}_{C} & = & {\boldsymbol{\theta}}^{*}_{C}+\kappa\eta{\boldsymbol{\Phi}}^{*}_{C},\\ {\boldsymbol{\theta}}^{*}_{\overline{C}} & = & \textbf{0}, \end{array}\right. \end{array} $$

(59)

which obtains ${\boldsymbol {\Phi }}^{*}_{C}= \textbf {0}$. Taking the limit along with Υ of s^k, we can lead to

$$ \begin{array}{@{}rcl@{}} \textbf{s}^{*} & = & {\textbf{1}}-M \textbf{w}^{*}-b^{*}\textbf{y}- {\boldsymbol{\theta}}^{*}/\eta\\ & = & {\textbf{1}}-M \textbf{w}^{*}-b^{*}\textbf{y}-\textbf{p}^{*}+\textbf{p}^{*}- {\boldsymbol{\theta}}^{*}/\eta\\ & = & - {\boldsymbol{\Phi}}^{*}+\textbf{p}^{*}- {\boldsymbol{\theta}}^{*}/\eta \end{array} $$

(60)

As for (26), we take the limit along with Υ and get

$$ \begin{array}{@{}rcl@{}} {\textbf{p}}^{*}_{T^{1}}&=& {\bf0},{\textbf{p}}^{*}_{T^{2}}= \overline{\textbf{s}}^{*}_{T^{2}}, {\textbf{p}}^{*}_{T^{3}} = \widehat{\textbf{s}}^{*}_{T^{3}}, {\textbf{p}}^{*}_{\overline{C}}= \textbf{s}^{*}_{\overline{C}}, \lambda\tau\in\left( 0,\frac{3}{8}\right), \\ {\textbf{p}}^{*}_{J^{1}}&=& {\bf0},{\textbf{p}}^{*}_{J^{2}}= \overline{\textbf{s}}^{*}_{J^{2}},{\textbf{p}}^{*}_{\overline{C}} = \textbf{s}^{*}_{\overline{C}},~~~~~~~~~\lambda\tau\in\left[\frac{3}{8},\frac{9}{8}\right),~~~~~\\ {\textbf{p}}^{*}_{I}&=&{\bf0},{\textbf{p}}^{*}_{\overline{C}}= \textbf{s}^{*}_{\overline{C}},~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \lambda\tau\geq\frac{9}{8}, \end{array} $$

(61)

where $\overline {\textbf {s}}^{*}:=\textbf {s}^{*}-\frac {4\lambda \tau }{3} $, $\widehat {\textbf {s}}^{*}:=(3\textbf {s}^{*}-8\lambda \tau )/(3-8\lambda \tau )$. Then, we have

$$ \begin{array}{@{}rcl@{}} \textbf{p}^{*}_{\overline{C}} & = & \textbf{s}^{*}_{\overline{C}} \\ & \overset{(60)}{=} & - {\boldsymbol{\Phi}}^{*}_{\overline{C}}+\textbf{p}^{*}_{\overline{C}}- {\boldsymbol{\theta}}^{*}_{\overline{C}}/\eta\\ & \overset{(59)}{=} & - {\boldsymbol{\Phi}}^{*}_{\overline{C}}+\textbf{p}^{*}_{\overline{C}}. \end{array} $$

Therefore, we get ${\boldsymbol {\Phi }}^{*}_{\overline {C}}=\textbf {0}$ and Φ^∗ = 0. Again from (60), we receive s^∗ = p^∗−𝜃^∗/η, which together with the L_ts proximal operator (15) and (61) derives

$$ \begin{array}{@{}rcl@{}} {\textbf{p}}^{*}\in\text{Prox}_{\frac{\tau}{\eta}L_{ts}}(\textbf{s}^{*})=\text{Prox}_{\frac{\tau}{\eta}L_{ts}}(\textbf{p}^{*}- {\boldsymbol{\theta}}^{*}/\eta) \end{array} $$

(62)

For (29), we take the limit along with Υ and obtain

$$ \begin{array}{@{}rcl@{}} (I +\eta M_{C}^{\top} M_{C} ) \textbf{w}^{*} & = & \eta M_{C}^{\top} \boldsymbol{\xi}_{C}^{*} \\ & = & -\eta M_{C}^{\top} ({\textbf{p}}^{*}_{C}+b^{*}\textbf{y}_{C}-{\textbf{1}}+{\boldsymbol{\theta}}^{*}_{C}/\eta)\\ & = & -\eta M_{C}^{\top} ({\boldsymbol{\Phi}}^{*}_{C}-M_{C}{\textbf{w}}^{*}+{\boldsymbol{\theta}}^{*}_{C}/\eta)\\ & = & -\eta M_{C}^{\top} (-M_{C}{\textbf{w}}^{*}+{\boldsymbol{\theta}}^{*}_{C}/\eta), \end{array} $$

where ξ^∗ = −(p^∗ + b^∗y −1 + 𝜃^∗/η) and the last two equations hold because of ${\boldsymbol {\Phi }}^{*}_{C}= \textbf {0}$ by (59) and Φ^∗ = p^∗ + Mw^∗ + b^∗y −e = 0. Thus, the last equation derives

$$ \begin{array}{@{}rcl@{}} \textbf{w}^{*} = -M_{C}^{\top} {\boldsymbol{\theta}}^{*}_{C} \overset{(59)}{=}- M^{\top} {\boldsymbol{\theta}}^{*}. \end{array} $$

Finally, for (32), we take the limit along with Υ and receive

$$ \begin{array}{@{}rcl@{}} b^{*}= \langle \textbf{y},\boldsymbol{\psi}^{*}\rangle/m & = & -\langle \textbf{y}, M \textbf{w}^{*}-{\textbf{1}}+{\textbf{p}}^{*}+{\boldsymbol{\theta}}^{*}/\eta\rangle/m\\ & = & -\langle \textbf{y},{\boldsymbol{\Phi}}^{*}-b^{*}\textbf{y}+{\boldsymbol{\theta}}^{*}/\eta\rangle/m\\ & = & -\langle \textbf{y}, -b^{*}\textbf{y}+{\boldsymbol{\theta}}^{*}/\eta\rangle/m \\ & = & b^{*}-\langle \textbf{y}, {\boldsymbol{\theta}}^{*}\rangle/(m\eta), \end{array} $$

which yields 〈y,𝜃^∗〉 = 0 and the three equation holds due to Φ^∗ = 0. To summarize, we obtain

$$ \begin{array}{@{}rcl@{}} \left\{ \begin{array}{rll} \textbf{w}^{*} + M^{\top}{\boldsymbol{\theta}^{*}} & = & \textbf{0},\\ \langle \textbf{y}, {\boldsymbol{\theta}^{*}} \rangle & = & \textbf{0},\\ \textbf{p}^{*}+M \textbf{w}^{*}+b^{*}\textbf{y} & = & {\textbf{1}},\\ \text{prox}_{\frac{\tau}{\eta}L_{ts}}(\textbf{p}^{*}- {\boldsymbol{\theta}^{*}}/\eta) & \ni & \textbf{p}^{*}, \end{array}\right. \end{array} $$

which proves that the (w^∗;b^∗;p^∗) is a proximal stationary point with λ = 1/η. According to Theorem 3.2, we also yield that it is a local minimizer to (8), which completes the proof. □

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, H., Shao, Y. Sparse and robust SVM classifier for large scale classification. Appl Intell 53, 19647–19671 (2023). https://doi.org/10.1007/s10489-023-04511-w

Download citation

Accepted: 04 February 2023
Published: 13 March 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s10489-023-04511-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Sparse and robust SVM classifier for large scale classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unified SVM algorithm based on LS-DC loss

A semismooth Newton method for support vector classification and regression

Proximal gradient method for huberized support vector machine

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Appendix A: Proofs of all theorems

Appendix A: Proofs of all theorems

1.1 A.1 Proof of Lemma 3.1

Proof

1.2 A.2 Proof of Lemma 3.2

Proof

1.3 A.3 Proof of Lemma 3.3

Proof

1.4 A.4 Proof of Theorem 3.1

Proof

1.5 A.5 Proof of Theorem 3.2

Proof

1.6 A.6 Proof of Theorem 3.3

Proof

1.7 A.7 Proof of Theorem 3.4

Proof

1.8 A.8 Proof of Theorem 3.5

Proof

1.9 A.9 Proof of Theorem 3.6

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now