Extended ADMM and BCD for nonseparable convex minimization models with quadratic coupling terms: convergence analysis and insights | Mathematical Programming Skip to main content
Log in

Extended ADMM and BCD for nonseparable convex minimization models with quadratic coupling terms: convergence analysis and insights

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

In this paper, we establish the convergence of the proximal alternating direction method of multipliers (ADMM) and block coordinate descent (BCD) method for nonseparable minimization models with quadratic coupling terms. The novel convergence results presented in this paper answer several open questions that have been the subject of considerable discussion. We firstly extend the 2-block proximal ADMM to linearly constrained convex optimization with a coupled quadratic objective function, an area where theoretical understanding is currently lacking, and prove that the sequence generated by the proximal ADMM converges in point-wise manner to a primal-dual solution pair. Moreover, we apply randomly permuted ADMM (RPADMM) to nonseparable multi-block convex optimization, and prove its expected convergence for a class of nonseparable quadratic programming problems. When the linear constraint vanishes, the 2-block proximal ADMM and RPADMM reduce to the 2-block cyclic proximal BCD method and randomly permuted BCD (RPBCD). Our study provides the first iterate convergence result for 2-block cyclic proximal BCD without assuming the boundedness of the iterates. We also theoretically establish the expected iterate convergence result concerning multi-block RPBCD for convex quadratic optimization. In addition, we demonstrate that RPBCD may have a worse convergence rate than cyclic proximal BCD for 2-block convex quadratic minimization problems. Although the results on RPADMM and RPBCD are restricted to quadratic minimization models, they provide some interesting insights: (1) random permutation makes ADMM and BCD more robust for multi-block convex minimization problems; (2) cyclic BCD may outperform RPBCD for “nice” problems, and RPBCD should be applied with caution when solving general convex optimization problems especially with a few blocks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The models considered in [29, 31] are more general than problem (1), as the authors of [29, 31] actually allow generally nonseparable smooth function in the objective, but in (1) the coupled objective is a quadratic function.

References

  1. Agarwal, A., Negahban, S., Wainwright, M.J.: Noisy matrix decomposition via convex relaxation: optimal rates in high dimensions. Ann. Stat. 40(2), 1171–1197 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  2. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  3. Beck, A.: On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes. SIAM J. Optim. 25(1), 185–209 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  4. Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(2), 2037–2060 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena-Scientific, Belmont (1999)

    MATH  Google Scholar 

  6. Bertsekas, D.P., Tsitsiklis, J.N.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. In: Parallel and Distributed Computation: Numerical Methods, 2nd ed. Athena Scientific, Belmont, MA (1997)

  7. Bolte, J., Sabach, S.Y., Teboulle, M.: Proximal alternating linearized minimization nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  8. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  9. Cai, X., Han, D., Yuan, X.: On the convergence of the direct extension of ADMM for three-block separable convex minimization models with one strongly convex function. Comput. Optim. Appl. 66(1), 39–73 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  10. Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1), 57–79 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  11. Chen, C., Shen, Y., You, Y.: On the convergence analysis of the alternating direction method of multipliers with three blocks. Abstract and Applied Analysis, 2013, Article ID 183961, 7 pages

  12. Chen, L., Sun, D., Toh, K.-C.: A note on the convergence of ADMM for linearly constrained convex optimization problems. Comput. Optim. Appl. 66(2), 327–343 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  13. Chen, L., Sun, D., Toh, K.-C.: An efficient inexact symmetric Gauss-Seidel based majorized ADMM for high-dimensional convex composite conic programming. Math. Program. 161(1), 237–270 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  14. Cui, Y., Li, X., Sun, D., Toh, K.-C.: On the convergence properties of a majorized alternating direction method of multipliers for linearly constrained convex optimization problems with coupled objective functions. J. Optim. Theory Appl. 169(3), 1013–1041 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  15. Davis, D., Yin, W.: Convergence rate analysis of several splitting schemes. UCLA CAM Report, 14–51 (2014)

  16. Deng, W., Lai, M., Peng, Z., Yin, W.: Parallel multi-block ADMM with \(o(1/k)\) convergence. J. Sci. Comput. 71(2), 712–736 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  17. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  18. Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1), 293–318 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  19. Feng, C., Xu, H., Li, B.C.: An alternating direction method approach to cloud traffic management. IEEE Trans. Parallel Distrib. Syst. 28(8), 2145–2158 (2017)

    Article  Google Scholar 

  20. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2, 17–40 (1976)

    Article  MATH  Google Scholar 

  21. Gao, X., Zhang, S.: First-order algorithms for convex optimization with nonseparable objective and coupled constraints. J. Oper. Res. Soc. China 5(2), 131–159 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  22. Glowinski, R.: Numerical Methods for Nonlinear Variational Problems. Springer, New York (1984)

    Book  MATH  Google Scholar 

  23. Glowinski, R., Marroco, A.: Sur l’approximation, par elements finis d’ordre un, et la resolution, par penalisation-dualite, d’une classe de problemes de dirichlet non lineares. Revue Franqaise d’Automatique, Informatique et Recherche Opirationelle 9, 41–76 (1975)

    MATH  Google Scholar 

  24. Han, D., Yuan, X.: A note on the alternating direction method of multipliers. J. Optim. Theory Appl. 155(1), 227–238 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  25. Han, D., Yuan, X., Zhang, W., Cai, X.: An ADM-based splitting method for separable convex programming. Comput. Optim. Appl. 54, 343–369 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  26. He, B., Tao, M., Yuan, X.: A splitting method for separable convex programming. IMA J. Numer. Anal. 35, 394–426 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  27. He, B., Tao, M., Yuan, X.: Alternating direction method with Gaussian back substitution for separable convex programming. SIAM J. Optim. 22, 313–340 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  28. He, B., Yuan, X.: On the O\((1/n)\) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  29. Hong, M., Chang, T., Wang, X., Razaviyayn, M., Ma, S., Luo, Z.: A block successive upper bound minimization method of multipliers for linearly constrained convex optimization. arXiv:1401.7079 (2014)

  30. Hong, M., Luo, Z.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162, 165–199 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  31. Hong, M., Luo, Z., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  32. Hong, M., Wang, X., Razaviyayn, M., Luo, Z.: Iteration complexity analysis of block coordinate descent methods. arXiv:1310.6957v2 (2014)

  33. Li, M., Sun, D., Toh, K.-C.: A convergent 3-block semi-proximal ADMM for convex minimization problems with one strongly convex block. Asia Pac. J Oper. Res. 32, 1550024 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  34. Li, M., Sun, D., Toh, K.-C.: A majorized ADMM with indefinite proximal terms for linearly constrained convex composite optimization. SIAM J. Optim. 26(2), 922–950 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  35. Li, X., Sun, D., Toh, K.-C.: A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions. Math. Program. 155(1), 333–373 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  36. Lin, T., Ma, S., Zhang, S.: Iteration complexity analysis of multi-block ADMM for a family of convex minimization without strong convexity. J. Sci. Comput. 69(1), 52–81 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  37. Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the ADMM with multi-block variables. SIAM J. Optim. 25, 1478–1497 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  38. Lin, T., Ma, S., Zhang, S.: On the sublinear convergence rate of multi-block ADMM. J. Oper. Res. Soc. China 3(3), 251–274 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  39. Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152, 615–642 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  40. Monteiro, R., Svaiter, B.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  41. Mota, J.F.C., Xavier, J.M.F., Aguiar, P.M.F., Puschel, M.: Distributed optimization with local domains: Applications in MPC and network flows. IEEE Trans. Autom. Control 60(7), 2004–2009 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  42. Peng, Y.G., Ganesh, A., Wright, J., Xu, W.L., Ma, Y.: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2233–2246 (2012)

    Article  Google Scholar 

  43. Razaviyayn, M., Hong, M., Luo, Z.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  44. Richtárik, P., Takác̆, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(2), 1–38 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  45. Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14, 567–599 (2013)

    MathSciNet  MATH  Google Scholar 

  46. Shefi, R., Teboulle, M.: On the rate of convergence of the proximal alternating linearized minimization algorithm for convex problems. EURO J. Comput. Optim. 4(1), 27–46 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  47. Sun, D., Toh, K.-C., Yang, L.: A convergent 3-block semi-proximal alternating direction method of multipliers for conic programming with 4-block constraints. SIAM J. Optim. 25(2), 882–915 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  48. Sun, R., Luo, Z., Ye, Y.: On the expected convergence of randomly permuted ADMM. arXiv:1503.06387v1 (2015)

  49. Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  50. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  51. Wright, S.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  52. Zhang Y (2010) Convergence of a class of stationary iterative methods for saddle point problems. Rice University Technique Report, TR10-24

Download references

Acknowledgements

Caihua Chen was supported by the National Natural Science Foundation of China [Grant No. 11401300, 71732003, 71673130]. Min Li was supported by the National Natural Science Foundation of China [Grant No.11771078, 71390335, 71661147004]. Xin Liu was supported by the National Natural Science Foundation of China [Grant No. 11622112, 11471325, 91530204, 11331012, 11461161005, 11688101], the National Center for Mathematics and Interdisciplinary Sciences, CAS, the Youth Innovation Promotion Association, CAS, and Key Research Program of Frontier Sciences, CAS. Yinyu Ye was supported by the AFOSR Grant [Grant No. FA9550-12-1-0396]. The authors would like to thank Dr. Ji Liu from University of Rochester and Dr. Ruoyu Sun from Stanford University for the helpful discussions on the block coordinate descent method. The authors would also like to thank the associate editor and two anonymous referees for their detailed and valuable comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Liu.

Appendices

Appendix A.

The proof of Lemma 2 is similar to, but not exactly the same as, that of [48, Lemma 2]. Since S is allowed to be singular here, we need also show the positive definiteness of Q by mathematical induction. For completeness, we will provide a concise proof here. Interested readers are referred to [48] for the motivation and other details of this proof.

Lemma 2 actually reveals a linear algebra property, and is essentially not related with H, A and \(\beta \) if we define \(L_\sigma \) directly by S. For brevity, we restate the main assertion to be proved as following:

$$\begin{aligned} \mathrm {eig}(QS)\subset \left[ 0,\frac{4}{3}\right) , \end{aligned}$$
(74)

where \(S\in \mathbb {R}^{d\times d}\) is positive semidefinite, \(S_{ii}\in \mathbb {R}^{d_i\times d_i}\) (\(i=1,\ldots ,n\)) is positive definite,

$$\begin{aligned} (L_{\sigma })_{\sigma (i),\sigma (j)}:=\left\{ \begin{array}{l@{\quad }l} S_{\sigma (i)\sigma (j)}, &{} \hbox {if} \;\; 1\le j \le i \le n,\\ 0, &{} \hbox {otherwise}, \end{array} \right. \qquad Q:=\frac{1}{n!}\sum \limits _{\sigma \in \Gamma } L^{-1}_\sigma , \end{aligned}$$
(75)

and \(\Gamma \) is a set consisting of all permutations of \((1,\ldots ,n)\).

For the brevity of notation, we define the block permutation matrix \(P_k\) as following:

$$\begin{aligned} (P_k)_{ij}:=\left\{ \begin{array}{l@{\quad }l} I_{d_i}, &{} \text{ if }\, 1\le i=j\le k-1;\\ I_{d_i}, &{} \text{ if }\, k+1\le i=j+1\le n ;\\ I_{d_i}, &{} \text{ if }\, i=k,\,j=n;\\ 0_{d_i\times d_j}, &{} \text{ if }\, 1\le j\le k-1,\, i\ne j;\\ 0_{d_i\times d_{j+1}}, &{}\text{ if }\, k\le j \le n-1,\, i\ne j+1;\\ 0_{d_i\times d_k}, &{}\text{ otherwise. } \end{array} \right. \end{aligned}$$
(76)

It can be easily verified that \(P_k^{\top }= P_k^{-1}\), and \(P_n=I_d\). For \(k\in \{1,\ldots ,n\}\), we define \(\Gamma _k:=\{\sigma '\mid \sigma ' \text{ is } \text{ a } \text{ permutation } \text{ of }\, \{1,\ldots ,k-1,k+1,\ldots ,n)\}\). For any \(\sigma '\in \Gamma _k\), we define \(L_{\sigma '}\in \mathbb {R}^{(d-d_k)\times (d-d_k)}\) as the following

$$\begin{aligned} (L_{\sigma '})_{\sigma '(i),\sigma '(j)}:=\left\{ \begin{array}{l@{\quad }l} S_{\sigma '(i)\sigma '(j)}, &{} \hbox {if} \;\; 1\le j \le i \le n-1,\\ 0, &{} \hbox {otherwise}. \end{array} \right. \end{aligned}$$
(77)

We define \(\hat{Q}_k\in \mathbb {R}^{(n-d_k)\times (n-d_k)}\) by

$$\begin{aligned} \hat{Q}_k := \frac{1}{|\Gamma _k|}\sum \limits _{\sigma '\in \Gamma _k}L_{\sigma '}^{-1},\qquad k=1,\ldots ,n, \end{aligned}$$
(78)

and \(W_k\) as the k-th block-column of S excluding the block \(S_{kk}\), i.e.

$$\begin{aligned} W_k =[S_{k1},\ldots ,S_{kn}]^{\top }. \end{aligned}$$
(79)

Due to the positive semi-definiteness of S, and by a slight abuse of the notation A, there exists \(A\in \mathbb {R}^{d\times d}\) satisfying

$$\begin{aligned} S=A^{\top }A. \end{aligned}$$
(80)

Let \(A_i\in \mathbb {R}^{d\times d_i}\) (\(i=1,\ldots ,n\)) be the column blocks of A, and it is clear that \(S_{ij} = A_i^{\top }A_j\) for all \(1\le i,j\le n\). For convenience, we define

$$\begin{aligned} \hat{A}_k:=[A_1,\ldots ,A_{k-1},A_{k+1},\ldots ,A_n], \end{aligned}$$
(81)

we have \(AP_k = [\hat{A}_k,A_k]\).

For the clearness of the proof structure, we introduce the following two lemmas.

Lemma 7

Let \(S\in \mathbb {R}^{d\times d}\) be a positive semidefinite matrix \(L_\sigma \), Q, \(\hat{Q}^k\) and \(P_k\) be defined by (48), (50), (78) and (76). It holds that

$$\begin{aligned} Q=\frac{1}{n}\sum \limits _{k=1}^n P_k Q_k P_k^{\top }, \end{aligned}$$
(82)

where

$$\begin{aligned} Q_k:=\left[ \begin{array}{c@{\quad }c} \hat{Q}_k &{} -\frac{1}{2}\hat{Q}_k W_k\\ -\frac{1}{2}W_k^{\top }\hat{Q}_k &{} I_{d_k} \end{array} \right] . \end{aligned}$$
(83)

Proof

Let \(\sigma '\in \Gamma _k\), we can partition \(L_{\sigma '}\) as following

$$\begin{aligned} L_{\sigma '} =\left[ \begin{array}{c@{\quad }c} Z_{11} &{} Z_{12}\\ Z_{21} &{} Z_{22} \end{array} \right] . \end{aligned}$$
(84)

Here the sizes of \(Z_{11}\) and \(Z_{22}\) are \((d_1+\cdots + d_{k-1})\times (d_1+\cdots + d_{k-1})\) and \((d_{k+1}+\cdots + d_{n})\times (d_{k+1}+\cdots + d_{n})\), respectively. The sizes of \(Z_{12}\) and \(Z_{21}\) can be determined accordingly. We denote

$$\begin{aligned} U_k=(A_1,\ldots ,A_{k-1}),\qquad V_k = (A_{k+1},\ldots ,A_n), \end{aligned}$$

which implies

$$\begin{aligned} W_k = [U_k,V_k]^{\top }A_k = \left[ \begin{array}{c} U_k^{\top }A_k\\ V_k^{\top }A_k \end{array} \right] . \end{aligned}$$
(85)

It is then easy to verify that

$$\begin{aligned} L_{(\sigma ',k)} = \left[ \begin{array}{ccc} Z_{11} &{} U_k^{\top }A_k &{} Z_{12}\\ 0 &{} I_{d_k} &{} 0\\ Z_{21} &{} V_k^{\top }A_k &{} Z_{22} \end{array} \right] . \end{aligned}$$

Left and right multiplying both sides of the above relationship by \(P_k^{\top }\) and \(P_k\), respectively, we obtain

$$\begin{aligned} P_k^{\top }L_{(\sigma ',k)} P_k = P_k^{\top } \left[ \begin{array}{ccc} Z_{11} &{} Z_{12} &{} U_k^{\top }A_k \\ 0 &{} 0 &{} I_{d_k}\\ Z_{21} &{} Z_{22} &{} V_k^{\top }A_k \end{array} \right] = \left[ \begin{array}{ccc} Z_{11} &{} Z_{12} &{} U_k^{\top }A_k \\ Z_{21} &{} Z_{22} &{} V_k^{\top }A_k\\ 0 &{} 0 &{} I_{d_k}\\ \end{array} \right] = \left[ \begin{array}{cc} L_{\sigma '} &{} W_k\\ 0 &{} I_{d_k} \end{array} \right] .\nonumber \\ \end{aligned}$$
(86)

Taking the inverse of both sides of (86), we obtain

$$\begin{aligned} P_k^{\top }L_{(\sigma ',k)}^{-1}P_k = \left[ \begin{array}{cc} L_{\sigma '}^{-1}&{} -L_{\sigma '}^{-1}W_k\\ 0 &{} I_{d_k} \end{array} \right] . \end{aligned}$$
(87)

Summing up (87) for all \(\sigma '\in \Gamma _k\) and dividing by \(|\Gamma _k|\), we get

$$\begin{aligned} \frac{1}{|\Gamma _k|} \sum \limits _{\sigma '\in \Gamma _k} P_k^{\top }L_{(\sigma ',k)}^{-1}P_k= & {} \left[ \begin{array}{cc} \frac{1}{|\Gamma _k|} \sum \limits _{\sigma '\in \Gamma _k} L_{(\sigma ')}^{-1}&{} -\frac{1}{|\Gamma _k|} \sum \limits _{\sigma '\in \Gamma _k} L_{(\sigma ')}^{-1}W_k\\ 0 &{} I_{d_k} \end{array} \right] \nonumber \\ {}= & {} \left[ \begin{array}{cc} \hat{Q}_k &{} -\hat{Q}_k W_k\\ 0 &{} I_{d_k} \end{array} \right] . \end{aligned}$$
(88)

Here, the last equality follows from (78). By the definition of \(L_\sigma \), it is easy to verify that \(L_{\sigma }^{\top }= L_{\bar{\sigma }}\), where \(\bar{\sigma }\) is a “reverse permutation” of \(\sigma \) that satisfies \(\bar{\sigma }(i)=\sigma (n+1-i)\) (\(i=1,\ldots ,n\)). Thus we have \(L_{(\sigma ',k)}=L_{(k,\bar{\sigma }')}^{\top }\), where \(\bar{\sigma }'\) is a reverse permutation of \(\sigma '\). Summing over all \(\sigma '\), we get

$$\begin{aligned} \sum \limits _{\sigma '\in \Gamma _k}L_{(\sigma ',k)}^{-1}= \sum \limits _{\sigma '\in \Gamma _k} L_{(k,\bar{\sigma }')}^{-\top }= \sum \limits _{\sigma '\in \Gamma _k} L_{(k,\sigma ')}^{-\top }, \end{aligned}$$

where the last equality follows from the fact that the summing over \(\bar{\sigma }'\) is the same as summing over \(\sigma '\). Thus, we have

$$\begin{aligned} \frac{1}{|\Gamma _k|} \sum \limits _{\sigma '\in \Gamma _k} P_k^{\top }L_{(k,\sigma ')}^{-1}P_k =\left( \frac{1}{|\Gamma _k|} \sum \limits _{\sigma '\in \Gamma _k} P_k^{\top }L_{(\sigma ',k)}^{-1}P_k\right) ^{\top }= \left[ \begin{array}{cc} \hat{Q}_k &{} 0\\ -W_k^{\top }\hat{Q}_k &{} I_{d_k} \end{array} \right] . \end{aligned}$$

Here, the last equality uses the symmetry of \(\hat{Q}_k\). Combining the above relation, (88) and the definition of \(Q_k\), we have

$$\begin{aligned} \frac{1}{2|\Gamma _k|}P_k^{\top }\sum \limits _{\sigma '\in \Gamma _k} \left( L_{(k,\sigma ')}^{-1}+ L_{(\sigma ',k)}^{-1}\right) P_k = \left[ \begin{array}{cc} \hat{Q}_k &{} -\frac{1}{2}\hat{Q}_k W_k\\ -\frac{1}{2}W_k^{\top }\hat{Q}_k &{} I_{d_k} \end{array} \right] =Q_k. \end{aligned}$$
(89)

Using the definition of \(P_k\) and the fact that \(|\Gamma _k|=(n-1)!\), we can rewrite (89) as

$$\begin{aligned} S_kQ_kS_k^{\top }= \frac{1}{2(n-1)!}\sum \limits _{\sigma '\in \Gamma _k} \left( L_{(k,\sigma ')}^{-1}+ L_{(\sigma ',k)}^{-1}\right) . \end{aligned}$$

Summing up the above relation for \(k=1,\ldots ,n\) and then dividing by n, we immediately arrive at (82).\(\square \)

Lemma 8

Let Q, \(\hat{Q}_k\), \(Q_k\), A, \(\hat{A}_n\) and \(W_n\) be defined by (50), (78), (83), (80), (81) and (79). Suppose \(\hat{Q}_n\succ 0\) and

$$\begin{aligned} \mathrm {eig}(\hat{Q}_n \hat{A}_n^{\top }\hat{A}_n)\subset \left[ 0,\frac{4}{3}\right) . \end{aligned}$$
(90)

It holds that

$$\begin{aligned} \mathrm {eig}(AQ_nA^{\top })\subset \left[ 0,\frac{4}{3}\right) . \end{aligned}$$
(91)

Proof

For simplicity, we use W, \(\hat{Q}\) and \(\hat{A}\) to take the place \(W_n\), \(\hat{Q}_n\) and \(\hat{A}_n\), respectively.

It is implied by assumptions \(\hat{Q}\succ 0\) and (90) that \(\Theta :=W^{\top }\hat{Q}W \succeq 0\). Recall that \(S_{nn}=A_n^{\top }A_n = I_{d_n}\), we have

$$\begin{aligned} \rho (\Theta )= & {} \max \limits _{v\in \mathbb {R}^{d_n},\,||v||=1}\,v^{\top }A_n^{\top }\hat{A}^{\top }\hat{Q}\hat{A}A_nv \nonumber \\&\le \rho (\hat{A}\hat{Q}\hat{A}) \max \limits _{v\in \mathbb {R}^{d_n},\,||v||=1}\,||A_nv||_2^2 <\frac{4}{3}||A_n||^2_{\text {F}}=\frac{4}{3}. \end{aligned}$$
(92)

Hence, we obtain

$$\begin{aligned} 0\preceq \Theta \prec \frac{4}{3}I_{d_n}. \end{aligned}$$
(93)

Recall the definition (83), we have

$$\begin{aligned} Q_n = \left[ \begin{array}{cc} I_{d-d_n} &{} 0\\ -\frac{1}{2}W^{\top }&{} I_{d_n} \end{array} \right] \, \left[ \begin{array}{cc} \hat{Q}&{} 0\\ 0 &{} I_{d_n} - \frac{1}{4}W^{\top }\hat{Q}W \end{array} \right] \, \left[ \begin{array}{cc} I &{} -\frac{1}{2}W\\ 0 &{} I_{d_n} \end{array} \right] =J \left[ \begin{array}{cc} \hat{Q}&{} 0\\ 0 &{} C \end{array} \right] J^{\top }, \end{aligned}$$
(94)

where \(J:= \left[ \begin{array}{cc} I_{d-d_n} &{} 0\\ -\frac{1}{2}W^{\top }&{} I_{d_n} \end{array} \right] \) and \(C:=I_{d_n} - \frac{1}{4}W^{\top }\hat{Q}W\). Apparently, we have \(C\succ 0\). Together with \(\hat{Q}\succ 0\), it implies \(Q_n\succ 0\). Thus, we directly obtain \( \mathrm {eig}(AQ_nA^{\top }) \subset \left[ 0,\infty \right) \). It remains to show

$$\begin{aligned} \rho (AQ_nA^{\top }) <\frac{4}{3}. \end{aligned}$$
(95)

Denote \(\hat{B}:=\hat{A}^{\top }\hat{A}\), then we can write S as

$$\begin{aligned} S=A^{\top }A = \left[ \begin{array}{cc} \hat{B}&{} W\\ W^{\top }&{} I_{d_n} \end{array} \right] . \end{aligned}$$

We can reformulate \(\rho (AQ_nA^{\top })\) as follows:

$$\begin{aligned} \rho (AQ_nA^{\top }) = \rho \left( AJ \left[ \begin{array}{cc} \hat{Q}&{} 0\\ 0 &{} C \end{array} \right] J^{\top }A^{\top }\right) =\rho \left( \left[ \begin{array}{cc} \hat{Q}&{} 0\\ 0 &{} C \end{array} \right] J^{\top }A^{\top }A J \right) . \end{aligned}$$
(96)

It is easy to verify that

$$\begin{aligned} J^{\top }A^{\top }A J = \left[ \begin{array}{cc} I_{d-d_n} &{} -\frac{1}{2}W\\ 0 &{} I_{d_n} \end{array} \right] \, \left[ \begin{array}{cc} \hat{B}&{} W\\ W^{\top }&{} I \end{array} \right] \, \left[ \begin{array}{cc} I_{d-d_n} &{} 0\\ -\frac{1}{2}W^{\top }&{} I_{d_n} \end{array} \right] = \left[ \begin{array}{cc} \hat{B}-\frac{3}{4}WW^{\top }&{} \frac{1}{2}W\\ \frac{1}{2}W^{\top }&{} I_{d_n} \end{array} \right] . \end{aligned}$$

Thus,

$$\begin{aligned} Z:= \left[ \begin{array}{cc} \hat{Q}&{} 0\\ 0 &{} C \end{array} \right] J^{\top }A^{\top }A J = \left[ \begin{array}{cc} \hat{Q}\hat{B}-\frac{3}{4}\hat{Q}WW^{\top }&{} \frac{1}{2}\hat{Q}W\\ \frac{1}{2}CW^{\top }&{} C \end{array} \right] . \end{aligned}$$
(97)

According to (96), it suffices to prove \(\rho (Z)<\frac{4}{3}\). Suppose \(\lambda \) is an arbitrary eigenvalue of Z, and \(v\in \mathbb {R}^d\) is one of its associate eigenvector. In the rest, we only need to show

$$\begin{aligned} \lambda <\frac{4}{3} \end{aligned}$$
(98)

holds. Then, using its arbitrariness, we have \(\rho (Z)<\frac{4}{3}\) which implies (95), and then (91) holds.

Partition v into \(v= \left[ \begin{array}{c} v_1\\ v_0 \end{array} \right] \), where \(v_1\in \mathbb {R}^{d-d_n}\), \(v_0\in \mathbb {R}^{d_n}\). Then, \(Zv=\lambda v\) implies that

$$\begin{aligned} \left( \hat{Q}\hat{B}-\frac{3}{4}\hat{Q}WW^{\top }\right) v_1 + \frac{1}{2} \hat{Q}W v_0= & {} \lambda v_1,\end{aligned}$$
(99)
$$\begin{aligned} \frac{1}{2}CW^{\top }v_1 + Cv_0= & {} \lambda v_0. \end{aligned}$$
(100)

If \(\lambda I_{d_n}-C\) is singular, i.e. \(\lambda \) is an eigenvalue of C. By the definition of C and (93), we have \(\frac{2}{3}I_{d_n}\prec C=I_{d_n}-\frac{1}{4}\Theta \preceq I_{d_n}\), which implies that \(\lambda \le 1\), thus inequality (98) holds. In the following, we assume \(\lambda I_{d_n}-C\) is nonsingular. An immediate consequence is \(v_1\ne 0\).

By (100), we obtain \(v_0=\frac{1}{2}(\lambda I_{d_n}-C)^{-1}CW^{\top }v_1\). Substituting this explicit formula into (99), we obtain

$$\begin{aligned} \lambda v_1= & {} \left( \hat{Q}\hat{B}-\frac{3}{4}\hat{Q}WW^{\top }\right) v_1 + \frac{1}{4} \hat{Q}W (\lambda I_{d_n}-C)^{-1}CW^{\top }v_1\nonumber \\= & {} (\hat{Q}\hat{B}+\hat{Q}W\Phi W^{\top })v_1, \end{aligned}$$
(101)

where \( \Phi := -I_{d_n} +\lambda [(4\lambda -4) I_{d_n} +\Theta ]^{-1}\). Since \(\Theta \) is a symmetric matrix, \(\Phi \) is also symmetric.

Suppose \(\lambda _{\max }(\Phi )>0\), the definition of \(\Phi \) gives us

$$\begin{aligned} \theta \in \mathrm {eig}(\Theta ) \Leftrightarrow -1+\frac{\lambda }{(4\lambda -4)+\theta } \in \mathrm {eig}(\Phi ). \end{aligned}$$

Together with \(\lambda _{\max }(\Phi )>0\), there exists \(\theta \in \mathrm {eig}(\Theta )\) such that \(-1+\frac{\lambda }{(4\lambda -4)+\theta }\). If \(\lambda \le 1\), (98) already holds. Otherwise, \(\lambda >1\), which implies \(1<\frac{\lambda }{(4\lambda -4)+\theta }\le \frac{\lambda }{4\lambda -4}\), and then (98) holds.

Now we assume \(\lambda _{\max }(\Phi )\le 0\), i.e. \(\Phi \preceq 0\). By the induction, we have \(\hat{\lambda }:=\rho (\hat{Q}\hat{B})= \rho (\hat{Q}\hat{A}^{\top }\hat{A}) \subset \left[ 0,\frac{4}{3}\right) \). Due to the positive definiteness of \(\hat{Q}\), there exists nonsingular \(U\in \mathbb {R}^{(d-d_n)\times (d-d_n)}\) such that \(\hat{Q}=U^{\top }U\). Let \(Y:=UW\Phi W^{\top }U^{\top }\in \mathbb {R}^{(d-d_n)\times (d-d_n)}\).

We have \(v^{\top }Yv =v^{\top }UW\Phi W^{\top }U^{\top }v =(W^{\top }U^{\top }v)^{\top }\Phi (W^{\top }U^{\top }v) \le 0\) holds for all \(v\in \mathbb {R}^{d-d_n}\), where the last inequality follows from \(\Phi \preceq 0\). Thus, \(Y\preceq 0\). Pick up arbitrary g satisfying \(g>\rho (Y)\). Then, it holds that

$$\begin{aligned} \rho (g I_{d-d_n} +Y)\le g. \end{aligned}$$
(102)

From (101), we can conclude that \((g+\lambda ) v_1 = (\hat{Q}\hat{B}+\hat{Q}W\Phi W^{\top }+ g I_{d-d_n})v_1\). Consequently,

$$\begin{aligned} g+\lambda \in \mathrm {eig}( \hat{Q}\hat{B}+\hat{Q}W\Phi W^{\top }+ g I_{d-d_n} ) = \mathrm {eig}(U\hat{B}U^{\top }+UW\Phi W^{\top }U^{\top }+g I_{d-d_n}), \end{aligned}$$

which implies

$$\begin{aligned} g+\lambda\le & {} \rho (U\hat{B}U^{\top }+Y +gI) \le \rho (U\hat{B}U^{\top }) +\rho (Y+gI)\nonumber \\= & {} \hat{\lambda }+\rho (Y+gI) \le \hat{\lambda }+ g, \end{aligned}$$
(103)

where the last inequality follows from (102). The relation (103) directly gives us that \(\lambda \le \hat{\lambda }<\frac{4}{3}\). Namely, (98) also holds in this case.

We have completed the proof.\(\square \)

Now we are ready to present the main proof of Lemma 2.

Proof of Lemma 2

Without loss of generality, we assume \(S_{ii}=I_{d_i}\) (\(i=1,\ldots ,n\)). Otherwise, we denote

$$\begin{aligned} D:=\mathrm {Diag}\left( S_{11}^{-\frac{1}{2}},\ldots , S_{nn}^{-\frac{1}{2}}\right) . \end{aligned}$$

It is easy to verify that \(\tilde{Q} = D^{-1}QD^{-1}\), if \(\tilde{S} =DSD\), and \(\tilde{L}_{\sigma }\) and \(\tilde{Q}\) are defined by (75) with \(\tilde{S}\). It holds that

$$\begin{aligned} \mathrm {eig}(\tilde{Q}\tilde{S})=\mathrm {eig}(D^{-1}QD^{-1}DSD)= \mathrm {eig}(D^{-1}QSD)=\mathrm {eig}(QS), \end{aligned}$$

and \(\tilde{S}_{ii}=I_{d_i}\) (\(i=1,\ldots ,n\)).

It follows from the definition of A, (80), that \(\mathrm {eig}(QS)=\mathrm {eig}(AQA^{\top })\). Now we use mathematical induction to prove this lemma. Firstly, the assertion (74) and \(Q\succ 0\) hold when \(n=1\), as \(QS=I\) in this case. Next, we will prove the lemma for any \(n\ge 2\) given that the assertion (74) and \(Q\succ 0\) hold for \(n-1\).

By using Lemma 7, it directly follows from (82) that \(AQA^{\top }= \frac{1}{n}\sum \limits _{k=1}^n AP_k Q_kP_k^{\top }A^{\top }\). Consequently,

$$\begin{aligned}&\frac{1}{n}\sum \limits _{k=1}^n \lambda _{\min }(AP_kQ_kP_k^{\top }A^{\top }) \le \lambda _{\min }(AQA^{\top })\le \lambda _{\max }(AQA^{\top })\nonumber \\&\quad \le \frac{1}{n}\sum \limits _{k=1}^n \lambda _{\max }(AP_kQ_kP_k^{\top }A^{\top }). \end{aligned}$$
(104)

By the induction assumptions and Lemma 8, we obtain the relationship (91). Together with the similarity among the blocks, the relationship (91) implies

$$\begin{aligned} \mathrm {eig}(AP_kQ_kP_k^{\top }A^{\top })\subset \left[ 0,\frac{4}{3}\right) ,\qquad \text{ for } \text{ all }\, k=1,\ldots ,n. \end{aligned}$$
(105)

Substituting (105) into (104), we prove the assertion (74) for n, and hence complete the proof of Lemma 2.

Appendix B

Proof of Lemma 3

For convenience, we use the notation

$$\begin{aligned} g(\lambda ;S,T):= \mathrm{det}\big [(\lambda -1)^2 I + (2\lambda -1) S + (\lambda -1) T \big ]. \end{aligned}$$

We prove this lemma by mathematical induction on the dimension d. When \(d=1\), it is easily seen that

$$\begin{aligned} g(\lambda ;S,T) = \left\{ \begin{array}{l@{\quad }l} (\lambda -1)^0 [(\lambda -1)^2 + (2\lambda -1) S + (\lambda -1)T] &{} \hbox {if}\;S\ne 0,\\ (\lambda -1)^1 (\lambda -1 +T) &{} \hbox {if}\; S=0,\, T\ne 0,\\ (\lambda -1)^2 \cdot 1 &{}\hbox {if}\; S =0,\, T=0, \end{array}\right. \end{aligned}$$

which means that Lemma 3 holds in this case. Suppose this lemma is valid for \(d\le k-1\). Consider the case where \(d =k\).

  1. Case 1:

    \(S\succ 0\). In this case, \(\mathrm{Rank}(S) = \mathrm{Rank}(S+T)=k\) and then \(l = 0\). Because

    $$\begin{aligned} g(\lambda ;S,T)=(\lambda -1)^{l} g(\lambda ;S,T) \qquad \mathrm{and}\qquad g(1;S,T) = \mathrm{det}(S) >0, \end{aligned}$$

    Lemma 3 holds in this case.

  2. Case 2:

    \(S\succeq 0\) but not positive definite. Let S admit the following eigenvalue decomposition

    $$\begin{aligned} P^\top SP = \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 0 &{} &{} &{} &{} &{}\\ &{}\ddots &{} &{}&{}&{}\\ &{} &{} 0 &{} &{}&{}\\ &{} &{} &{} s_1 &{} &{} \\ &{}&{}&{}&{}\ddots &{}\\ &{}&{}&{}&{}&{} s_t \end{array} \right] :=D, \end{aligned}$$

    where P is a orthogonal matrix and \(s_i>0\). If we let \(W = P^\top TP\succeq 0\), then

    $$\begin{aligned} g(\lambda ;S,T) = g(\lambda ;D,W). \end{aligned}$$

    The proof proceeds by considering the following two subcases.

    1. Case 2.1:

      \(W_{11}=0\). Since W is positive semidefinite, then \(W_{1i}= W_{i1}=0\) for \(i=1,2,\ldots ,k\). Note that

      $$\begin{aligned} g(\lambda ;D,W) = (\lambda -1)^2 g(\lambda ;D',W') \end{aligned}$$

      where \(D'\) and \(W'\) are the submatrices of D and W obtained by deleting the first row and column. As we have assumed that Lemma 3 holds for \(d =k-1\) , there exists a polynomial p(x) such that

      $$\begin{aligned} g(\lambda ;D,W) = (\lambda -1)^2(\lambda -1)^{2k-2-\mathrm{Rank}{D'} -\mathrm{Rank}{(D'+W')}}p(\lambda ). \end{aligned}$$

      Note that \(\mathrm{Rank}(D')=\mathrm{Rank}(D)= \mathrm{Rank}(S)\) and \(\mathrm{Rank}(D'+W')=\mathrm{Rank}(D+W)= \mathrm{Rank}(S+T)\). Thus, we have

      $$\begin{aligned} g(\lambda ;S,T) = (\lambda -1)^{2k- \mathrm{Rank}(S)-\mathrm{Rank}(S+T)}, \end{aligned}$$

      which implies that Lemma 3 is true for \(d=k\) in this subcase.

    2. Case 2.2:

      \(W_{11}\ne 0\). Without loss of generality, assume \(W_{11}=1\). Let \(w^\top =[W_{12},\ldots ,W_{1k}]\). By direct calculation, we obtain

      $$\begin{aligned} g(\lambda ;D,W) = (\lambda -1)^2 g(\lambda ;D',W') + (\lambda -1)g(\lambda ;D',W'- ww^\top ). \end{aligned}$$

      Since \(\mathrm{Rank}(D'+W')\le \mathrm{Rank}(D+W) =\mathrm{Rank}(S+T)\), there exists a polynomial \(p_1(x)\) such that

      $$\begin{aligned} g(\lambda ;D',W') = (\lambda -1)^{2k-2-\mathrm{Rank}(S) - \mathrm{Rank}(S+T)}p_1(\lambda ), \end{aligned}$$

      where \(p_1(1)\ge 0\). On the other hand, since \(\mathrm{Rank}(D'+W'-ww^\top ) = \mathrm{Rank}(D+W)-1 =\mathrm{Rank}(S+T)-1\), there exists a polynomial \(p_2(x)\) such that

      $$\begin{aligned} g(\lambda ;D', W'- ww^\top )= (\lambda -1)^{2k-1-\mathrm{Rank}(S)-\mathrm{Rank}(S+T)}p_2(\lambda ), \end{aligned}$$

      where \(p_2(1)>0\). Therefore,

      $$\begin{aligned} g(\lambda ;S,T) =(\lambda -1)^{2k-\mathrm{Rank}(S)-\mathrm{Rank}(S+T)}(p_1(\lambda )+p_2(\lambda )) \end{aligned}$$

      and then Lemma 3 holds for this subcase.

This completes the proof.\(\square \)

Appendix C

Proof of Lemma 4

It is easily seen that

$$\begin{aligned} \mathrm{Rank}(S) +\mathrm{Rank}(\beta A^\top A)= \mathrm{Rank} \left[ \begin{array}{c@{\quad }c} S &{} 0 \\ 0 &{} \beta AA^\top \end{array} \right] , \end{aligned}$$

and therefore we need only prove that

$$\begin{aligned} \mathrm{Rank} \left[ \begin{array}{c@{\quad }c} S &{} -A^{\top }\\ \beta A &{} 0 \end{array} \right] = \mathrm{Rank} \left[ \begin{array}{c@{\quad }c} S &{} 0 \\ 0 &{} \beta AA^\top \end{array} \right] . \end{aligned}$$
(106)

Indeed, consider the following linear system

$$\begin{aligned} \left[ \begin{array}{c@{\quad }c} S &{} -A^{\top }\\ \beta A &{} 0 \end{array} \right] \left[ \begin{array}{c} x \\ \mu \end{array} \right] =0, \end{aligned}$$
(107)

which is equivalent to

$$\begin{aligned} \left\{ \begin{array}{l} Sx - A^\top \mu =0,\\ Ax =0. \end{array} \right. \end{aligned}$$

It then holds that

$$\begin{aligned} x^\top Sx = x^\top A^\top \mu = (Ax)^\top \mu =0, \end{aligned}$$

and therefore \(Sx =0\) and \(A^\top \mu =0\), because \(S=H+\beta A^\top A\) is positive semidefinite. This means that

$$\begin{aligned} \left[ \begin{array}{c@{\quad }c} S &{} 0 \\ 0 &{} \beta AA^{\top } \end{array}\right] \left[ \begin{array}{c} x \\ \mu \end{array}\right] =0. \end{aligned}$$
(108)

On the other hand, it is not difficult to verify that any solution of (108) is the solution of (107), in other words, linear systems (107) and (108) are equivalent. As a result, the rank equality (106) holds, which completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, C., Li, M., Liu, X. et al. Extended ADMM and BCD for nonseparable convex minimization models with quadratic coupling terms: convergence analysis and insights. Math. Program. 173, 37–77 (2019). https://doi.org/10.1007/s10107-017-1205-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-017-1205-9

Keywords

Mathematics Subject Classification

Navigation