Abstract
Support vector machine (SVM) has drawn wide attention in various fields, such as image classification, pattern recognition and disease diagnosis and so on. Nevertheless, it requires much memory and runs very slow in large-scale datasets setting. To reduce computational cost and the required storage, we first design a new sparse and robust SVM model according to our construct truncated smoothly clipped absolute deviation (SCAD) loss and then establish its first-order necessary and sufficient optimality conditions though newly defined proximal stationary point. Following the idea of the proximal stationary point, we define the Lts support vectors of Lts-SVM and then show that the Lts support vectors are a small portion of the whole training set, which is convenient for us to introduce a novel working set in each step. We next present a new alternating direction method of multipliers with working set (Lts-ADMM) to deal with Lts-SVM, which proves that the proposed algorithm not only converges to a local minimizer of Lts-SVM but also possesses a relatively low computational complexity. Finally, the extensive numerical experiments demonstrate that the proposed algorithm enjoys better performance than nine leading state-of-the-art methods with regard to best classification accuracy, smallest number of support vectors and super fast computational speed in large-scale datasets setting.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20(3):273–297
Wang HJ, Shao YH, Zhou SL, Zhang C, Xiu NH (2022) Support vector machine classifier via l0/1 soft-margin loss. IEEE Trans Pattern Anal Mach Intell 40(10):7253–7265
Wen Y, Ma J, Yuan C, Yang L (2020) Projection multi-birth support vector machine for multi-classification. Appl Intell 50(13):1–17
Wang HJ, Shao YH (2022) Fast truncated Huber loss SVM for large scale classification. Knowl-Based Syst 26:1–17
Zhou SL (2022) Sparse SVM for sufficient data reduction. IEEE Trans Pattern Anal Mach Intell 44(9):5560–5571
Akram-Ali-Hammouri Z, Fernandez-Delgado M, Cernadas E, Barro S (2022) Fast support vector classification for large-scale problems. IEEE Trans Pattern Anal Mach Intell 44(10):6184–6195
Niu DB, Wang CJ, Tang PP, Wang QS, Song E (2022) An efficient algorithm for a class of large-scale support vector machines exploiting hidden sparsity. IEEE Trans Signal Process 99:1–16
Huang XL, Shi L, Suykens JAK (2014) Support vector machine classifier with pinball loss. IEEE Trans Pattern Anal Mach Intell 36(5):984–997
Tanveer M, Sharma S, Rastogi R, Anand P (2022) Sparse support vector machine with pinball loss. IEEE Trans Emer Tele Tech 32(2):1–13
Shen X, Niu LF, Qi ZQ, Tian YJ (2017) Support vector machine classifier with truncated pinball loss. Pattern Recognit 68:199–210
Wang HR, Xu YT, Zhou ZJ (2021) Twin-parametric margin support vector machine with truncated pinball loss. Neural Comput Appl 33(8):3781–3798
Yin J, Li QN (2019) A semismooth Newton method for support vector classification and regression. Comput Optim Appl 73(2):477–508
Xiao XS, Xu YT, Zhang Y, Zhong PW (2022) A novel self-weighted Lasso and its safe screening rule. Appl Intell 52(12):14465–14477
Yan YQ, Li QN (2021) An efficient augmented Lagrangian method for support vector machine. Optim Method Softw 35(4):855–883
Feng RX, Xu YT (2022) Support matrix machine with pinball loss for classification. Neural Comput Appl 34(21):18643–18661
Wang HM, Yitian Xu YT (2022) A safe double screening strategy for elastic net support vector machine. Inf Sci 582:382–397
Zhu BZ, Ye SX, Wang P, Chevallier JL, Wei LM (2022) Forecasting carbon price using a multi-objective least squares support vector machine with mixture kernels. J Forecast, Online, https://doi.org/10.1002/for.2784
Zhao J, Xu YT, Fujita H (2019) An improved non-parallel universum support vector machine and its safe sample screening rule. Knowl -Based Syst 170:79–88
Allen-Zhu Z (2018) Katyusha: the first direct acceleration of stochastic gradient methods. J Mach Learn Res 18:1–51
Zhu WX, Song YY, Xiao YY (2022) Support vector machine classifier with huberized pinball loss. Eng Appl Artif Intell 91:1–16
Wang HJ, Shao YH, Xiu NH (2022) Proximal operator and optimality conditions for ramp loss SVM. Optim Lett 16(3):999–1014
Wang HR, Xu YT, Zhou ZJ (2022) Ramp loss KNN-weighted multi-class twin support vector machine. Soft Comput 26(14):6591–6618
Pang XY, Zhao J, Xu YT (2022) A novel ramp loss-based multi-task twin support vector machine with multi-parameter safe acceleration. Neural Netw 150:194–212
Park SY, Liu YF (2021) Robust penalized logistic regression with truncated loss functions. Can J Stat 39(2):300–323
Feng YL, Yang Y, Huang XL, Mehrkanoon S Suykens JAK (2018) Robust support vector machines for classification with nonconvex and smooth losses. Neural Comput 28(6):1217–1247
Yang LM, Dong HW (2018) Support vector machine with truncated pinball loss and its application in pattern recognition. Chemometrics Intell Lab Syst 177:89–99
Chang X, Liu S, Zhao P, Song D (2019) A generalization of linearized alternating direction method of multipliers for solving two-block separable convex programming. J Comput Appl Math 357(2):251–272
Zhou SL, Xiu NH, Qi HD (2021) Global and quadratic convergence of newton hard-thresholding pursuit. J Mach Learn Res 22:1–45
Wang R, Xiu NH, Zhou SL (2022) An extended newton-type algorithm for L2-regularized sparse logistic regression and its efficiency for classifying large-scale datasets. J Comput Appl Math, Online. https://doi.org/10.1016/j.cam.2022.113656
Guan L, Qiao LB, Li DS, Sun T, Ge KS, Lu XC (2018) An efficient ADMM -based algorithm to nonconvex penalized support vector machines. In: Proc Int Conf Data Mining Workshops, pp 1209–1216
Dong W, Wozniak M, Wu JS, Li WG, Bai ZW (2022) De-noising aggregation of graph neural networks by using principal component analysis. IEEE Trans Industr Inform, Online, https://doi.org/10.1109/TII.2022.3156658
Dong W, Wu JS, Zhang XW, Bai ZW, Peng Wang P, Wozniak M (2022) Improving performance and efficiency of graph neural networks by injective aggregation. Knowl -Based Syst, Online, https://doi.org/10.1016/j.knosys.2022.109616
Acknowledgements
The authors sincerely thank the associate editor and the five referees for their constructive comments, which have significantly improved the quality of the paper. This work is supported by the National Natural Science Foundation of China (11971052, 11871183), the Changsha Municipal Natural Science Foundation (kq2208214), and the Scientific Research Fund of Hunan Provincial Education Department (22C0152).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
he authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Proofs of all theorems
Appendix A: Proofs of all theorems
1.1 A.1 Proof of Lemma 3.1
Proof
From (7), it is evidently that the Lts(p) is separable, which makes us to get (9). Next, we split the proof of explicit expression (10) into the following two cases:
-
(i)
For pi≠ 0, the ℓts(pi) is differentiable. Therefore, we can yield ∂ℓts(pi) = 0 for pi ≥ 1 or pi < 0, ∂ℓts(pi) = (8 − 8t)/3 for pi ∈ [1/2,1) and ∂ℓts(pi) = 4/3 for pi ∈ (0,1/2).
-
(ii)
For pi = 0, the ℓts(pi) is non-differentiable. Based on Definition 3.1, we can get that ∂ℓts(pi) = 0 with pi < 0 and ∂ℓts(pi) = 4/3 with pi ∈ (0,1/2). Hence, we obtain ∂ℓts(0) ∈ [0,4/3].
Therefore, we yield (10). This completes the proof. □
1.2 A.2 Proof of Lemma 3.2
Proof
According to (5) and (11), we yield that \(\text {prox}_{\lambda \tau \ell _{ts}}(r)\) is the minimizer of following function
Evidently, the minimizers of ϕ1(z), ϕ2(z), ϕ3(z), ϕ4(z) and ϕ5(z) are obtained at \(z^{*}_{1}=r\), \(z^{*}_{2}=\frac {3r-8\lambda \tau }{3-8\lambda \tau }\), \(z^{*}_{3}=\) r − 4λτ/3, \(z^{*}_{4}=0\) and \(z^{*}_{5}=r\) respectively.
-
(i)
For λτ ∈ (0,3/8), we compare the five values of \(\phi (z^{*}_{1}),\phi (z^{*}_{2}),\phi (z^{*}_{3})\), \(\phi (z^{*}_{4})\) and \(\phi (z^{*}_{5})\) to obtain conclusion:
-
(a1)
As r ≥ 1, we get \(\min \limits \{\phi (z^{*}_{2}),\phi (z^{*}_{3}),\phi (z^{*}_{4}),\phi (z^{*}_{5})\}>\phi (z^{*}_{1})\), which means \(z^{*}=z^{*}_{1}=r\).
-
(a2)
As \(r\in [\frac {1}{2}+\frac {4\lambda \tau }{3},1)\), we obtain \(\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{3}), \phi (z^{*}_{4}),\phi (z^{*}_5)\}> \phi (z^{*}_2)\), which implies \(z^{*}=z^{*}_2=\frac {3r-8\lambda \tau }{3-8\lambda \tau }\).
-
(a3)
As \(r\in (\frac {4\lambda \tau }{3},\frac {1}{2}+\frac {4\lambda \tau }{3})\), we have \(\min \limits \{\phi (z^{*}_1), \phi (z^{*}_2), \phi (z^{*}_4), \phi (z^{*}_{5}) \}>\phi (z^{*}_{3})\), which represents \(z^{*}=z^{*}_{3}=r-4\lambda \tau /3\).
-
(a4)
As \(r\in [0,\frac {4\lambda \tau }{3}]\), we receive \(\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{2}),\phi (z^{*}_{3}), \phi (z^{*}_5)\}>\phi (z^{*}_4)\), which gets \(z^{*}=z^{*}_4=0\).
-
(a5)
As r < 0, we derive \(\min \limits \{\phi (z^{*}_1),\phi (z^{*}_2),\phi (z^{*}_3), \phi (z^{*}_4)\}>\phi (z^{*}_5)\), which obtains \(z^{*}=z^{*}_5=r\). By (a1)–(a5), we get (12).
-
(ii)
For λτ ∈ [3/8,9/8), by contrasting the five values of \(\phi (z^{*}_1),\phi (z^{*}_{2}),\phi (z^{*}_3), \phi (z^{*}_4)\) and \(\phi (z^{*}_5)\), we yield conclusion:
-
(b1)
As \(r>\frac {3}{4}+\frac {2\lambda \tau }{3}\), we get \(\min \limits \{\phi (z^{*}_2),\phi (z^{*}_3),\phi (z^{*}_4),\) \(\phi (z^{*}_5)\} >\phi (z^{*}_{1})\), which obtains \(z^{*}=z^{*}_{1}=r\).
-
(b2)
As \(r=\frac {3}{4}+\frac {2\lambda \tau }{3}\), we derive \(\min \limits \{\phi (z^{*}_{2}),\phi (z^{*}_{4}),\) \(\phi (z^{*}_{5})\}> \phi (z^{*}_{1})=\phi (z^{*}_{3})\), which can obtain \(z^{*}=z^{*}_{1}=r\) or \(z^{*}= z^{*}_{3}=r-4\lambda \tau /3\).
-
(b3)
As \(r\in (\frac {4\lambda \tau }{3},\frac {3}{4}+\frac {2\lambda \tau }{3})\), we obtain \(\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{2}), \phi (z^{*}_4),\phi (z^{*}_5)\}>\phi (z^{*}_3)\), which means \(z^{*}=z^{*}_3=r-4\lambda \tau /3\).
-
(b4)
As \(r\in [0,\frac {4\lambda \tau }{3}]\), we receive \(\min \limits \{\phi (z^{*}_1),\phi (z^{*}_2),\phi (z^{*}_3), \phi (z^{*}_{5})\}>\phi (z^{*}_{4})\), which means \(z^{*}=z^{*}_{4}=0\).
-
(b5)
As r < 0, we get \(\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{2}),\phi (z^{*}_{3}),\phi (z^{*}_{4})> \phi (z^{*}_{5})\), which gets \(z^{*}=z^{*}_{4}=r\). By (b1)-(b5), we get (13).
-
(iii)
For λτ ≥ 9/8, by contrasting the five values of \(\phi (z^{*}_{1}),\phi (z^{*}_{2}),\phi (z^{*}_{3}), \phi (z^{*}_{4})\) and \(\phi (z^{*}_{5})\), we yield conclusion:
-
(c1)
As \(r>\sqrt {2\lambda \tau }\), we get \(\min \limits \{\phi (z^{*}_{2}),\phi (z^{*}_{3}),\phi (z^{*}_{4}), \) \(\phi (z^{*}_{5})\}> \phi (z^{*}_{1})\), which implies \(z^{*}=z^{*}_{1}=r\).
-
(c2)
As \(r=\sqrt {2\lambda \tau }\), we obtain \(\min \limits \{\phi (z^{*}_{2}),\phi (z^{*}_{3}),\phi (z^{*}_{5})\}> \phi (z^{*}_{1})=\phi (z^{*}_{4})\), which means \(z^{*}=z^{*}_{1}=r\) or \(z^{*}=z^{*}_{4}=0\).
-
(c3)
As \(r\in [0,\sqrt {2\lambda \tau })\), we derive \(\min \limits \{\phi (z^{*}_{1}),\phi (z^{*}_{2}),\) \(\phi (z^{*}_{4}), \phi (z^{*}_5)\}>\phi (z^{*}_4)\), which means \(z^{*}=z^{*}_4=0\).
-
(c4)
As r < 0, we obtain \(\min \limits \{\phi (z^{*}_1),\phi (z^{*}_2),\phi (z^{*}_3),\) \(\phi (z^{*}_4)\}> \phi (z^{*}_5)\), which obtains \(z^{*}=z^{*}_5=r\). By (c1)-(c4), we get (14), which completes the whole proof.
□
1.3 A.3 Proof of Lemma 3.3
Proof
From the separate property of Lts(u), it is evidently that \([\text {prox}_{\lambda \tau L_{ts}}({ \textbf {r}})]_i=\text {prox}_{\lambda \tau \ell _{ts}}({ r_i})\), where
Therefore, we yield (15). This completes the proof. □
1.4 A.4 Proof of Theorem 3.1
Proof
Let (w∗;b∗;p∗) be a local minimizer of (8). It is evidently that there exists \(\boldsymbol {\theta }^{*}\in \mathbb {R}^{m}\) such that (w∗;b∗;p∗) is a KKT point of (8). In other words, we have
According to (16) and (34), it can be clearly seen that to show (16), we only prove that for any 0 < λ ≤ λ∗, if the (𝜃∗;p∗) satisfies \(\textbf {0}\in \boldsymbol {\theta }^{*}+\tau \partial L_{ts}(\textbf {p}^{*})\), then we receive \(\textbf {p}^{*}\in \P :=\text {prox}_{\lambda \tau L_{ts}}(\textbf {p}^{*}-\tau {\boldsymbol {\theta }^{*}}).\) From \(\textbf {0}\in \boldsymbol {\theta }^{*}+\tau \partial L_{ts}(\textbf {p}^{*})\), (9) and (10), we have that
Based on Lemma 3.2, we prove (16) by three scenarios: λτ ∈ (0,3/8), λτ ∈ [3/8,9/8) and λτ ≥ 9/8.
- Case (i)::
-
For τ > 0 and λ ∈ (0,λ∗] satisfying λτ ∈ (0,3/8), we consider the following five cases.
-
(i)
For \(i\in \mathbb {S}^{*}\), we yield \(p^{*}_i> 1\) and get \(\theta ^{*}_i=0\) by (35), which implies that
$$ u_{i}^{*}:=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}>1. $$(36) -
(ii)
For \(i\in \mathbb {T}^{*}\), we have \(p^{*}_i\in [1/2,1]\) and receive \(\theta ^{*}_i=-8(1-p^{*}_i)\tau /3\) from (35). Hence, we derive
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}+8\lambda\tau(1-p^{*}_{i})/3, $$(37)which together with \(\lambda \leq \lambda ^{*}_1=3/(8\tau )\) gets
$$ u_{i}^{*}\leq p^{*}_{i}+8(3/8\tau)\tau (1-p^{*}_{i})/3=1. $$(38)From (37), \(p^{*}_i\in [1/2,1)\) and \(\lambda \leq \lambda ^{*}_1=3/(8\tau )\), we obtain
$$ u_{i}^{*}\geq1/2+4\lambda\tau/3, $$which together with (38) results in
$$u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[1/2+4\lambda\tau/3, 1].$$ -
(iii)
For \(i\in \mathbb {E}^{*}\), we receive \(p^{*}_i\in (0,1/2)\) and obtain \(\theta ^{*}_i=-4\tau /3\) based on (35). Therefore, we can obtain
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}+4\lambda\tau/3, $$(39)which combine with \(p^{*}_i\in (0,1/2)\) yields
$$ u_{i}^{*}=p^{*}_{i}+4\lambda\tau/3\in(4\lambda\tau/3,1/2+4\lambda\tau/3). $$(40) -
(iv)
For \(i\in \mathbb {I}^{*}\), we have \(p^{*}_i=0\) and \(\theta ^{*}_i\in [-4\tau /3,0]\) by (35), which yields
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[0,4\lambda\tau/3]. $$ -
(v)
For \(i\in \mathbb {O}^{*}\), we get \(p^{*}_i<0\) and \(\theta _i^{*}=0\) by (35), which results in
$$ u^{*}_{i}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}<0. $$
Based on the above (i)-(v), we yield (12). This completes the proof of Case (i). □
- Case (ii)::
-
For τ > 0 and λ ∈ (0,λ∗] satisfying λτ ∈ [3/8,9/8), we consider five cases as below.
-
(i)
For \(i\in \mathcal { S}^{*}\), we have \(p^{*}_i>3/4\). By \(\lambda \leq \lambda _2^{*}=(12p^{*}_i-9)/(8\tau )\), we yield
$$ p^{*}_{i}\geq 3/4+2\lambda\tau/3, $$which combine with λτ ∈ [3/8,9/8) obtains \(p^{*}_i\geq 1\) and \(\theta ^{*}_i=0\) by (35). Therefore, we get
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}\geq3/4+2\lambda\tau/3. $$ -
(ii)
For \(i\in \mathcal { I}^{*}\), we have \(p^{*}_i=3/4\) and \(\theta ^{*}_i=-8(1-p^{*}_i)\tau /3\) from (35). Hence, we obtain
$$ \begin{array}{@{}rcl@{}} u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=3/4+2\lambda\tau/3. \end{array} $$ -
(iii)
For \(i\in \mathcal { T}^{*}\), we have \(p^{*}_i\in (0,3/4)\). From \(\lambda \leq \lambda _3^{*}=3(3-4p^{*}_i)/8\tau \) and λτ ∈ [3/8,9/8), we obtain
$$ \begin{array}{@{}rcl@{}} p^{*}_{i}<3/4-2/3\lambda\tau\in(0,1/2), \end{array} $$which together with (35) yields \(\theta ^{*}_i=-4 \tau /3\) and
$$ \begin{array}{@{}rcl@{}} u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in(4\lambda \tau/3,3/4+2\lambda \tau/3). \end{array} $$ -
(iv)
For \(i\in \mathbb {I}^{*}\), we have \(p^{*}_i=0\) and \(\theta ^{*}_i\in [-4\tau /3,0]\) by (35), which gets
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[0,4\lambda\tau/3]. $$ -
(v)
For \(i\in \mathbb {O}^{*}\), we receive \(p^{*}_i<0\) and \(\theta _i^{*}=0\), which implies
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}<0. $$
From the above (i)-(v), we receive (13). This completes the proof Case(ii). \( \Box \)
- Case (iii)::
-
For τ > 0 and λ ∈ (0,λ∗] satisfying λτ ≥ 9/8, we consider three cases as follows.
-
(i)
For \(i\in T^{*}:=\mathbb {S}^{*} \cup \mathbb {T}^{*}\cup \mathbb {E}^{*}\), we get \(p^{*}_i>0\). By \(\lambda \leq \lambda _5^{*}\), we have
$$ p^{*}_{i}\geq \sqrt{2\lambda_{5}^{*}\tau}\geq\sqrt{2\lambda\tau}, $$which together with λτ ≥ 9/8 obtains \(\sqrt {2\lambda \tau }\geq 3/2\) and \(\theta ^{*}_i=0\) by (35). Hence, we obtain
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}\geq\sqrt{2\lambda\tau}. $$ -
(ii)
For \(i\in \mathbb {I}^{*}\), we have \(p^{*}_i=0\) and \(\theta ^{*}_i\in [-4\tau /3, 0]\) by (35). From \(\lambda \leq \lambda _4^{*}\leq 9/8\tau \), we have
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[0,3/2], $$which together with λτ ≥ 9/8 gets \(\sqrt {2\lambda \tau }\geq 3/2\) and
$$ \begin{array}{@{}rcl@{}} u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}\in[0,\sqrt{2\lambda\tau}]. \end{array} $$ -
(iii)
For \(i\in \mathbb {O}^{*}\), we have \(p^{*}_i<0\) and \(\theta _i^{*}=0\), which means
$$ u_{i}^{*}=p^{*}_{i}-\lambda \theta^{*}_{i}=p^{*}_{i}<0. $$
From the above (i)-(iii), we yield (14). This completes the whole proof.
1.5 A.5 Proof of Theorem 3.2
Proof
Firstly, we denote Ω := {χ := (w;b;p) : p + Mw + by = 1} and χ∗ := (w∗;b∗;p∗). For any χ ∈Ω, it is observed that p + Mw + by = 1, which combine with (16) results in
According to the convexity of ∥w∥2, we derive
Based on the above facts, we will show that χ∗ is a local minimizer of (8). That is to say, there is a neighborhood N(χ∗,ρ) := {χ : ∥χ −χ∗∥≤ ρ} of χ∗ ∈Ω with ρ > 0 such that for any χ ∈Ω∩ N(χ∗,ρ), we receive
To show (42), we consider the following three scenarios: λτ ∈ (0,3/8), λτ ∈ [3/8,9/8) and λτ ≥ 9/8.
- Case (i)::
-
For λτ ∈ (0,3/8), define \(\textbf {s}^{*}:=\textbf {p}^{*}-\lambda \boldsymbol {\theta }^{*}\in \mathbb {R}^m\),
where \(\widetilde {\tau }:=4\lambda \tau /3\) and \(\overline {\tau }:=1/2+4\lambda \tau /3\). From (12), (15) and (43), it is evidently that \(\textbf {p}^{*}\in \text {prox}_{\lambda \tau L_{ts}}(\textbf {p}^{*}-\lambda {\boldsymbol {\theta }^{*}})\) is identical to
which is equivalent to
This combine with (43) leads to
Denote \(\mathcal {A}^{*}:=\mathcal { A}_2^{*}\cup \mathcal { A}_3^{*}\cup \mathcal { A}_4^{*}\) and \(\overline {\mathcal { A}}^{*}:=\mathcal { A}_1^{*}\cup \mathcal { A}_5^{*}\). To proof (42), by (42)-(44), we only need to show two facts:
According to (5) and (42)-(44), we yield desired conclusion:
-
(i)
For \(i\in {\mathscr{H}}^{*}:=\mathcal { A}_1^{*}\cup \mathcal { A}_3^{*}\cup \mathcal { A}_4^{*}\cup \mathcal { A}_5^{*}\), we obtain that the \(\ell _{ts}(p^{*}_i)\) is differentiable. From Definition 3.1, we get that there exists a neighborhood \(N(\boldsymbol {\chi }^{*},\overline {\rho })\) with \(\overline {\rho }>0\) such that for any \(\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\overline {\rho })\), we yield
$$ \begin{array}{@{}rcl@{}} \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle\tau \nabla\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, \end{array} $$which together with \(-\tau \nabla \ell _{ts}(p^{*}_i)=-4\tau /3= \theta ^{*}_i\), for \(i\in \mathcal { A}_3^{*}\) and \(-\tau \nabla \ell _{ts}(p^{*}_i)=-8\tau (1-p^{*}_i)/3= \theta ^{*}_i\), for \(i\in \mathcal {A}_4^{*}\) makes us to get (47) and combine with \(\nabla \ell _{ts}(p^{*}_i)=0\), for \(i\in \mathcal {A}_1^{*}\cup \mathcal {A}_5^{*}\) allows us to obtain (48).
-
(ii)
For \(i\in \mathcal { A}_2^{*}\), we can see that the \(\ell _{ts}(p^{*}_i)\) is non-differentiable. From Definition 3.1, we also receive that there exists a neighborhood \(N(\boldsymbol {\chi }^{*},\widetilde {\rho })\) with \(\widetilde {\rho }>0\) such that for any \(\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\widetilde {\rho })\), we obtain
$$ \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle\tau \partial\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, $$which together with \(-\tau \partial \ell _{ts}(p^{*}_i)=\theta _i^{*}\in [-4\tau /3,0]\), for \(i\in \mathcal { A}_2^{*}\) leads to (47).
By the above (i) and (ii), we show (w∗;b∗;p∗) is local minimizer of problem (8) in a local region Ω ∩ N(χ∗,ρ∗) with \(\rho ^{*}:=\min \limits \{\overline {\rho },\widetilde {\rho }\}\). This completes the proof of Case (i).
- Case (ii)::
-
For λτ ∈ [3/8,9/8), let \(\textbf {s}^{*}=\textbf {p}^{*}-\lambda \boldsymbol {\theta }^{*}\in \mathbb {R}^m\),
where \(\widetilde {\tau }=4\lambda \tau /3\) and \(\widehat {\tau }:=3/4+2\lambda \tau /3\). From (15) and (13), we get that \(\textbf {p}^{*}\in \text {prox}_{\lambda \tau L_{ts}}(\textbf {p}^{*}-\lambda {\boldsymbol {\theta }^{*}})\) is equivalent to
which is identical to
This combine with (49) obtains
Define \(\mathcal { C}^{*}:=\mathcal { C}_2^{*}\cup \mathcal { C}_3^{*}\) and \(\overline {\mathcal { C}}^{*}:=\mathcal { C}_1^{*}\cup \mathcal { C}_4^{*}\). To show (42), according to (42) and (44), we only need to prove two facts:
Based on (5), (49) and (51), we obtain the desired conclusion:
-
(i)
For \(i\in C^{*}:=\mathcal { C}^{*}_1\cup \mathcal { C}^{*}_3\cup \mathcal { C}^{*}_4\), from λτ ∈ [3/8,9/8), we get \((0,\frac {3}{4}-\frac {2\lambda \tau }{3}]\subset (0,1/2]\) and \(\frac {3}{4}+\frac {2\lambda \tau }{3}\geq 1\), which means that the \(\ell _{ts}(p^{*}_i)\) is differentiable. From Definition 3.1, there is a neighborhood \(N(\boldsymbol {\chi }^{*},\overline {\eta })\) with \(\overline {\eta }>0\) such that for any \(\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\overline {\eta })\), we obtain
$$ \begin{array}{@{}rcl@{}} \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle \tau\nabla\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, \end{array} $$which together with \(-\tau \nabla \ell _{ts}(p^{*}_i)=-\frac {4\tau }{3}=\theta _i^{*}, i\in \mathcal { C}^{*}_3\) gets (51) and \(-\tau \nabla \ell _{ts}(p^{*}_i)=0=\theta _i^{*}, i\in \mathcal {C}^{*}_1\cup \mathcal {C}^{*}_4\) obtains (52).
-
(ii)
For \(i\in \mathcal { C}_2^{*}\), we obtain that the \(\ell _{ts}(p^{*}_i)\) is non-differentiable. Based on Definition 3.1, we have that there is a neighborhood \(N(\boldsymbol {\chi }^{*},\widetilde {\eta })\) with \(\widetilde {\eta }>0\) such that for any \(\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\widetilde {\eta })\), we yield
$$ \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle \partial \tau\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, $$which together with \(-\tau \partial \ell _{ts}(p^{*}_i)=\theta _i^{*}\in [-\frac {4\tau }{3},0]\) gets (51).
By the above (i)-(iii), we prove (w∗;b∗;p∗) is local minimizer of problem (8) in a local region Ω ∩ N(χ∗,η∗) with \(\eta ^{*}:=\min \limits \{\overline {\eta },\widetilde {\eta }\}\), which completes the proof of Case (ii).
- Case (iii)::
-
For λτ ≥ 9/8, let \(\textbf {s}^{*}=\textbf {p}^{*}-\lambda \boldsymbol {\theta }^{*}\in \mathbb {R}^m\),
Based on (15) and (13), we receive that \(\textbf {p}^{*}\in \text {prox}_{\lambda \tau L_{ts}}(\textbf {p}^{*}-\tau {\boldsymbol {\theta }^{*}})\) is equivalent to
which is identical to
This combine with (53) derives
Define \(\mathcal { E}^{*}:=\mathcal { E}^{*}_2,~~ \overline {\mathcal { E}}^{*}:=\mathcal { E}_1^{*}\cup \mathcal { E}_3^{*}\). To show (42), we only need to prove two inequalities:
By (5), (53) and (55), we obtain the desired conclusion:
-
(i)
For \(i\in \mathcal { E}^{*}_1\cup \mathcal { E}^{*}_3\), by λτ ≥ 9/8, we get \(p_i^{*}\geq \sqrt {2 \lambda \tau }\geq 3/2\), which means that \(\ell _{ts}(p^{*}_i)\) is differentiable. By Definition 3.1, we get that there exists a neighborhood \(N(\boldsymbol {\chi }^{*},\overline {\kappa })\) with \(\overline {\kappa }>0\) such that for any \(\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\overline {\kappa })\), we lead to
$$ \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle \tau \nabla\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, $$which together with \(\tau \nabla \ell _{ts}(p^{*}_i)=0=\theta ^{*}_i\) obtains (56).
-
(ii)
For \(i\in \mathcal { E}_2^{*}\), we can see that the \(\ell _{ts}(p^{*}_i)\) is non-differentiable. According to Definition 3.1, we yield that there is a neighborhood \(N(\boldsymbol {\chi }^{*},\widetilde {\kappa })\) with \(\widetilde {\kappa }>0\) such that for any \(\boldsymbol {\chi }\in {\Omega }\cap N(\boldsymbol {\chi }^{*},\widetilde {\kappa })\), we obtain
$$ \begin{array}{@{}rcl@{}} \tau \ell_{ts}(p_{i})-\tau \ell_{ts}(p^{*}_{i})-\langle \partial \tau\ell_{ts}(p^{*}_{i}) ,p_{i}-p^{*}_{i} \rangle\geq0, \end{array} $$which with \(-\tau \partial \ell _{ts}(p^{*}_i)=\theta _i^{*}\in [-\sqrt {2\tau /\lambda },0]\) results in (55).
By the above (i) and (ii), we show (w∗;b∗;p∗) is local minimizer of problem (8) in a local region Ω ∩ N(χ∗,ρ∗) with \(\rho ^{*}:=\min \limits \{\overline {\rho },\widetilde {\rho }\}\). This completes the proof of Case (i). □
1.6 A.6 Proof of Theorem 3.3
Proof
For λτ ∈ (0,3/8), let (w∗;b∗;p∗) be a proximal stationary point of (8). Based on Theorem 3.2 and (44), we obtain
Based on \(M=[y_1\textbf {x}_{1}~\cdots ~y_m\textbf {x}_{m}]^{\top }\in {\mathbb {R}}^{m\times n}\) and first equation of (16), we can obtain that
which leads to (17). Moreover, according to (44), we get \(p_i^{*}=0, i\in \mathcal { A}^{*}_2\), \(p_i^{*}\in (0,1/2), i\in \mathcal { A}^{*}_3\) and \(p_i^{*}\in [1/2,1), i\in \mathcal {A}^{*}_4\). This together with the definition of M and third equation of (16) results in (18). This completes the proof. □
1.7 A.7 Proof of Theorem 3.4
Proof
For λτ ∈ [3/8,9/8), let (w∗;b∗;p∗) with \(\boldsymbol {\theta }^{*}\in \mathbb {R}^m\) be a proximal stationary point. By Theorem 3.2 and (51), we receive
which together with the matrix M and (16) derives
By (51), we get \(p_i^{*}=0, i\in \mathcal { C}^{*}_2\) and \(p_i^{*}\in (0,\frac {3}{4}-\frac {2\lambda \tau }{3}], i \in \mathcal { C}^{*}_3\). This with (16) and M gets (20). This completes the proof. □
1.8 A.8 Proof of Theorem 3.5
Proof
For λτ ≥ 9/8, because (w∗;b∗;p∗) with \(\boldsymbol {\theta }^{*}\in \mathbb {R}^m\) is a proximal stationary point of (8), from Theorem 3.2 and (55), we get
which combine with the M and (16) leads to
In addition, based on (55), we get \(p_i^{*}=0, i\in \mathcal { E}^{*}\), which with (16) and M results in (22). This completes the proof. □
1.9 A.9 Proof of Theorem 3.6
Proof
It is observed that for \(k\rightarrow \infty \), the working set \(C_k\subseteq [m]\) has finite many elements, which represents that we can find a subset \({\Upsilon }\subseteq \{1,2,3,\cdots \}\) such that
We now define the sequence Ψk := (wk,bk,pk,𝜃k) and its limit point Ψ∗ := (w∗,b∗,p∗,𝜃∗), i.e., \(\{\boldsymbol {\Psi }^k\}\rightarrow \boldsymbol {\Psi }^{*}\), which represents \(\{\boldsymbol {\Psi }^i\}_{i\in {\Upsilon }}\rightarrow \boldsymbol {\Psi }^{*} \) and \(\{\boldsymbol {\Psi }^{i+1}\}_{i\in {\Upsilon }}\rightarrow \boldsymbol {\Psi }^{*}\). Taking the limit along with Υ of (33), i.e., \(k\in {\Upsilon }, k\rightarrow \infty \), we derive
which obtains \({\boldsymbol {\Phi }}^{*}_{C}= \textbf {0}\). Taking the limit along with Υ of sk, we can lead to
As for (26), we take the limit along with Υ and get
where \(\overline {\textbf {s}}^{*}:=\textbf {s}^{*}-\frac {4\lambda \tau }{3} \), \(\widehat {\textbf {s}}^{*}:=(3\textbf {s}^{*}-8\lambda \tau )/(3-8\lambda \tau )\). Then, we have
Therefore, we get \({\boldsymbol {\Phi }}^{*}_{\overline {C}}=\textbf {0}\) and Φ∗ = 0. Again from (60), we receive s∗ = p∗−𝜃∗/η, which together with the Lts proximal operator (15) and (61) derives
For (29), we take the limit along with Υ and obtain
where ξ∗ = −(p∗ + b∗y −1 + 𝜃∗/η) and the last two equations hold because of \({\boldsymbol {\Phi }}^{*}_{C}= \textbf {0}\) by (59) and Φ∗ = p∗ + Mw∗ + b∗y −e = 0. Thus, the last equation derives
Finally, for (32), we take the limit along with Υ and receive
which yields 〈y,𝜃∗〉 = 0 and the three equation holds due to Φ∗ = 0. To summarize, we obtain
which proves that the (w∗;b∗;p∗) is a proximal stationary point with λ = 1/η. According to Theorem 3.2, we also yield that it is a local minimizer to (8), which completes the proof. □
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, H., Shao, Y. Sparse and robust SVM classifier for large scale classification. Appl Intell 53, 19647–19671 (2023). https://doi.org/10.1007/s10489-023-04511-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04511-w