Abstract
The aim of this paper was to introduce and study a two-step debiasing method for variational regularization. After solving the standard variational problem, the key idea is to add a consecutive debiasing step minimizing the data fidelity on an appropriate set, the so-called model manifold. The latter is defined by Bregman distances or infimal convolutions thereof, using the (uniquely defined) subgradient appearing in the optimality condition of the variational method. For particular settings, such as anisotropic \(\ell ^1\) and TV-type regularization, previously used debiasing techniques are shown to be special cases. The proposed approach is, however, easily applicable to a wider range of regularizations. The two-step debiasing is shown to be well-defined and to optimally reduce bias in a certain setting. In addition to visual and PSNR-based evaluations, different notions of bias and variance decompositions are investigated in numerical studies. The improvements offered by the proposed scheme are demonstrated, and its performance is shown to be comparable to optimal results obtained with Bregman iterations.
Similar content being viewed by others
References
Benning, M., Burger, M.: Ground states and singular vectors of convex variational regularization methods. Methods Appl. Anal. 20, 295–334 (2013)
Borwein, J., Zhu, Q.: Techniques of variational analysis. Springer Science & Business Media (2006)
Bredies, K., Kunisch, K., Pock, T.: Total generalized variation. SIAM J. Imaging Sci. 3, 492–526 (2010)
Bredies, K., Pikkarainen, H.: Inverse problems in spaces of measures. ESAIM Control Optim. Calc. Var. 9(1), 190–218 (2013)
Burger, M.: Bregman distances in inverse problems and partial differential equations. In: Hiriard-Urrurty, J., Korytowski, A., Maurer, H., Szymkat, M. (eds.) Advances in Mathematical Modeling, Optimization, and Optimal Control. Springer, New York (2016)
Burger, M., Gilboa, G., Moeller, M., Eckardt, L., Cremers, D.: Spectral decompositions using one-homogeneous functionals. arxiv 1601.02912 (2016)
Burger, M., Gilboa, G., Osher, S., Xu, J.: Nonlinear inverse scale space methods. Commun. Math. Sci. 4, 178–212 (2006)
Burger, M., Moeller, M., Benning, M., Osher, S.: An adaptive inverse scale space method for compressed sensing. Math. Comput. 82, 269–299 (2013)
Burger, M., Osher, S.: Convergence rates of convex variational regularization. Inverse Probl. 20, 1411–1421 (2004)
Burger, M., Osher, S.: A guide to the TV zoo. In: Level Set and PDE Based Reconstruction Methods in Imaging, pp. 1 70. Springer International Publishing (2013)
Burger, M., Resmerita, E., He, L.: Error estimation for Bregman iterations and inverse scale space methods in image restoration. Computing 81, 109–135 (2007)
Candes, E.J., Plan, Y.: A probabilistic and RIPless theory of compressed sensing. IEEE Trans. Inf. Theory 57(11), 7235–7254 (2011)
Caselles, V., Chambolle, A., Novaga, M.: Total variation in imaging. In: Handbook of Mathematical Methods in Imaging, pp. 1016–1057. Springer Science & Business Media (2011)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)
Combettes, P.L., Salzo, S., Villa, S.: Regularized learning schemes in Banach space. Preprint, arXiv:1410.6847 (2014)
Deledalle, C.A., Papadakis, N., Salmon, J.: On debiasing restoration algorithms: applications to total-variation and nonlocal-means. In: Scale Space and Variational Methods in Computer Vision 2015, Lecture Notes in Computer Science, 9087, pp. 129–141 (2015)
Deledalle, C.A., Papadakis, N., Salmon, J., Vaiter, S.: CLEAR: Covariant LEAst-Square Re-fitting with Applications to Image Restoration. Preprint, arXiv:1606.05158 (2016)
Ekeland, I., Temam, R.: Convex Analysis and Variational Problems. SIAM, Philadelphia (1999)
Esser, E., Zhang, X., Chan, T.: A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci. 3(4), 1015–1046 (2010)
Figueiredo, M., Nowak, R., Wright, S.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process. 1(4), 586–598 (2007)
Hintermüller, M., Papafitsoros, K., Rautenberg, C.: Analytical aspects of spatially adapted total variation regularisation. Preprint, arXiv:1609.01074 (2016)
Lederer, J.: Trust, but verify: benefits and pitfalls of least-squares refitting in high dimensions. Preprint, arXiv:1306.0113v1 (2013)
Min, J., Vonesch, C., Kirshner, H., Carlini, L., Olivier, N., Holden, S., Manley, S., Ye, J.C., Unser, M.: FALCON: fast and unbiased reconstruction of high-density super-resolution microscopy data. Sci. Rep. 4, 4577 (2014)
Moeller, M., Brinkmann, E.-M., Burger, M., Seybold, T.: Color Bregman TV. SIAM J. Imaging Sci. 7(4), 2771–2806 (2014)
Moeller, M., Burger, M.: Multiscale methods for polyhedral regularizations. SIAM J. Optim. 23, 1424–1456 (2013)
Osher, S., Ruan, F., Xiong, J., Yao, Y., Yin, W.: Sparse recovery via differential inclusions. Appl. Comput. Harmon. Anal. 41(2), 436–469 (2016)
Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. SIAM Multiscale Model. Simul. 4, 460–489 (2005)
Pock, T., Cremers, D., Bischof, H., Chambolle, A.: An algorithm for minimizing the Mumford-Shah functional. In: IEEE 12th International Conference on Computer Vision 2009, pp. 1133–1140 (2009)
Resmerita, E.: Regularization of ill-posed problems in Banach spaces: convergence rates. Inverse Probl. 21(4), 1303 (2005)
Rockafellar, R.T.: Convex Analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton (1997)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 60, 259–268 (1992)
Scherzer, O., Groetsch, C.: Inverse scale space theory for inverse problems. Scale-space Springer LNCS 2106, pp. 317–325 (2001)
Shapiro, A.: On concepts of directional differentiability. J. Optim. Theory Appl. 66(3), 477–487 (1990)
Schuster, T., Kaltenbacher, B., Hofmann, B., Kazimierski, K.S.: Regularization Methods in Banach Spaces. DeGruyter, Berlin (2012)
Tadmor, E., Nezzar, S., Vese, L.: A multiscale image representation using hierarchical (BV, L2) decompositions. Multiscale Model. Simul. 2, 554–579 (2004)
Acknowledgements
This work was supported by ERC via Grant EU FP 7-ERC Consolidator Grant 615216 LifeInverse. MB acknowledges support by the German Science Foundation DFG via EXC 1003 Cells in Motion Cluster of Excellence, Münster, Germany.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
We have included some examples and proofs in the Appendix in order not to interrupt the flow of the paper. These are in particular the proof for shrinkage and the calculation of the corresponding derivatives for isotropic and anisotropic shrinkage in Example 1, and the calculation of the infimal convolution of two \(\ell ^1\)-Bregman distances in Example 4.
1.1 Shrinkage
Let \(f \in \ell ^2(\mathbb {R}^d)\) be a vector-valued signal for \(d \in \mathbb {N}\). Then the solution of the isotropic shrinkage problem
is given by the isotropic soft-thresholding
Proof
We first point out, that the objective allows to exploit strong duality. Following [2, Theorem 4.4.3 and Lemma 4.3.1], strong duality holds if
Since the \(\ell ^1(\mathbb {R}^d)\)-norm has full domain and the \(\ell ^2(\mathbb {R}^d)\)-norm is continuous everywhere, this is trivially fulfilled. Hence, by the dual definition of the \(\ell ^1(\mathbb {R}^d)\)-norm, we find
where we used strong duality to interchange the infimum and the supremum. We can explicitly compute the minimizer for u as \(u = f-r\) and hence
This supremum can be computed explicitly pointwise with the corresponding Lagrangian
with \(\lambda \le 0\). Note that both the objective function and the constraints are continuously differentiable and that Slater’s condition holds. Optimality with respect to \(r_i\) yields
and hence
We distinguish two cases:
If \(|r_i| = \alpha \), then \(\alpha (1- 2\lambda ) = |f_i|\) and
The nonpositivity of \(\lambda \) implies that \(|f_i| \ge \alpha \). In case \(|r_i| < \alpha \), we obtain that \(\lambda = 0\) and hence \(r_i = f_i\) and \(u_i = 0\) when \(|f_i| < \alpha \). Note that since \(f \in \ell ^2(\mathbb {R}^d)\) there exists a finite N such that \(|f_i| \le \alpha \) for all \(i > N\). Hence trivially \(u_\alpha (f) \in \ell ^1(\mathbb {R}^d)\) as \(\sum _{i \in \mathbb {N}} \vert \left[ u_\alpha (f)\right] _i \vert \) is a finite sum. This yields the assertion. \(\square \)
Remark
For \(d = 1\) and a square-summable sequence \(f \in \ell ^2\), we immediately obtain the anisotropic case: the solution to
for \(\alpha > 0\) is given by
Directional derivative The computation of the directional derivative requires a little more work. At first, let us compute the directional derivative of the function \(F :\mathbb {R}^d\backslash \{0\} \rightarrow \mathbb {R}\), \(x \mapsto \frac{1}{|x|}\) into the direction \(g \in \mathbb {R}^d\). We define \(G :\mathbb {R}^d\backslash \{0\} \rightarrow \mathbb {R}\), \(x \mapsto \frac{1}{|x|^2}\) and calculate
Then, by the chain rule, we obtain
Let us further define the projection of a vector \(x \in \mathbb {R}^d\) onto another vector \(y \in \mathbb {R}^d\backslash \{0\}\) as
We now have to compute
and we can distinguish four cases:
Let at first \(|f_i| > \alpha \). Then for t small enough we have \(|f_i + t g_i| > \alpha \) and hence
For \(|f_i| < \alpha \) and t small enough, we easily find \(|f_i + t g_i| < \alpha \) and hence
In case \(|f_i| = \alpha \) we need to distinguish whether \(|f_i + t g_i| > \alpha \) or \(|f_i + t g_i| \le \alpha \) for arbitrarily small t. We hence compute
which for arbitrarily small t is true only if \(f_i \cdot g_i \ge 0\). Analogously, we find that \(|f_i + t g_i| < \alpha \) for small t is only true if \(f_i \cdot g_i < 0\).
Hence let now \(|f_i| = \alpha \) and \(f_i \cdot g_i \ge 0\). Then we obtain
Using \(\alpha = |f_i|\), we find
In the last case \(|f_i| = \alpha \) and \(f_i \cdot g_i < 0\), we find
Summing up we have
It remains to show that
for \(t \rightarrow 0^+\). Again, since \(f \in \ell ^2(\mathbb {R}^d)\), there exists \(N \in \mathbb {N}\) such that \(|f_i| < \alpha \) and hence \([\mathrm {d}u_\alpha (f;g)]_i = 0\) for all \(i > N\). The difference quotient as well vanishes for all \(i > N\), hence the above \(\ell ^1\) norm is a finite sum and thus we trivially obtain convergence in \(\ell ^1(\mathbb {R}^d)\).
Remark
For \(d=1\) and \(f \in \ell ^2\) we obtain the anisotropic result:
where we mention that here \(\Pi _{f_i}(g_i) = g_i\).
Model manifold: The corresponding (isotropic) model manifold is given by
Analogously to the anisotropic case discussed in Example 1, the model manifold allows for arbitrary elements, here even including the direction, if the magnitude \(|f_i|\) of the signal is strictly above the threshold parameter \(\alpha \). As already discussed in Example 1, \(|f_i| = \alpha \) is the odd case of the three, since in contrast to \(|f_i| > \alpha \) it only allows for changes into the direction of the signal \(f_i\). If we exclude that case, we again find a linear derivative, hence a Gâteaux derivative and even a Fréchet derivative. Accordingly the isotropic shrinkage is the immediate generalization of the anisotropic shrinkage, which we can find as a special case for \(d = 1\).
Summing up, the debiasing procedure on this manifold again yields the solution of hard thresholding:
Note that we again maintain the signal directly on the threshold.
1.2 Infimal Convolution of \(\ell ^1\) Bregman Distances
Theorem 5
Let \(\Gamma :\ell ^2(\mathbb {R}^n) \rightarrow \ell ^1(\mathbb {R}^m)\) be linear and bounded and \(J(u) = \Vert \Gamma u \Vert _{\ell ^1(\mathbb {R}^m)}\) for \(m,n \in \mathbb {N}\). Let further \(q_\alpha \in \partial \Vert \cdot \Vert _{\ell ^1(\mathbb {R}^m)} (\Gamma u_\alpha )\) such that \(p_\alpha = \Gamma ^* q_\alpha \). Then
Proof
\(\square \)
Note that we get equality for surjective \(\Gamma \) in Theorem 5.
Theorem 6
Let \(v,u \in \ell ^1(\mathbb {R}^m)\) and \(q \in \partial \Vert v\Vert _{\ell ^1(\mathbb {R}^m)}\). Then
with \(G :\mathbb {R}^m \times \mathbb {R}^m \rightarrow \mathbb {R}\) defined as
where \(\varphi _i\) denotes the angle between \(u_i\) and \(q_i\), i.e., \(\cos (\varphi _i) |u_i| |q_i| = u_i \cdot q_i\) with \(\varphi _i := 0\) for \(q_i = 0\) or \(u_i = 0\).
Proof
Let
Since \((f_1 \Box f_2)^* = f_1^* + f_2^*\) and by the definition of the biconjugate, we know that
-
(1)
We shall first compute the right-hand side. We have
$$\begin{aligned} f_1^*(w)&= \iota _{B^{\infty }(1)}(w +q), \\ f_2^*(w)&= \iota _{B^{\infty }(1)}(w -q), \end{aligned}$$
where \(\iota _{B^{\infty }(1)}\) denotes the characteristic function of the \(\ell ^{\infty }(\mathbb {R}^m)\)-ball
Thus
Taking into account the specific form of these constraints, we can carry out the computation pointwise, i.e.,
From now on we drop the dependence on i for simplicity.
-
Let us first consider the case \(|q| = 1\). We immediately deduce that \(w = 0\) and \(u \cdot w = 0\).
-
Hence we assume \(|q| < 1\) from now on, and set up the corresponding Lagrangian
$$\begin{aligned} \mathcal {L}(w, \lambda , \mu ) =&- w \cdot u + \lambda (|w - q|^2 -1) \nonumber \\&+ \mu (|w + q|^2 -1). \end{aligned}$$(8.2)
Both the objective functional and the constraints are differentiable, so every optimal point of (8.2) has to fulfill the four Karush–Kuhn–Tucker conditions, namely
Slater’s condition implies the existence of Lagrange multipliers for a KKT-point of (8.2). The first KKT-condition yields
-
Let us first remark that the case \(u=0\) causes the objective function to vanish anyway, hence in the following \(u\ne 0\).
-
Then let us address the case \(q=0\) in which (8.3) yields
$$\begin{aligned} u = 2 (\lambda + \mu ) w. \end{aligned}$$In case \(|w| = 1\) we find that \(2 (\lambda + \mu ) = |u|\), hence \(w = \frac{u}{|u|}\). We infer
$$\begin{aligned} w \cdot u = \frac{u \cdot u}{|u|} = |u|. \end{aligned}$$Note that for \(|w| < 1\), we find that \(\lambda = \mu = 0\) and hence \(u = 0\).
-
If \(q \ne 0\), we can distinguish four cases: 1st case: \(|w -q|^2 <1, |w +q|^2 = 1\). Thus \(\lambda = 0\) and (8.3) yields
$$\begin{aligned} u = 2 \mu (w +q). \end{aligned}$$
Since \(|w + q|^2 = 1\), we deduce \(\mu = |u|/2\), so
and finally for the value of the objective function
2nd case: \(|w +q|^2 <1, |w -q|^2 = 1\).
We analogously find
The first two cases thus occur whenever (insert w into the conditions)
We calculate
Hence \(q\cdot u >0\) and
In the second case, we analogously find
hence \(q \cdot u <0\) and
so we may summarize the first two cases as
whenever \(|q| < |\cos (\varphi )|\).
3rd case: \(|w -q|^2 =1, |w +q|^2 = 1\).
At first we observe that from
we may deduce that \(w \cdot q = 0\). Therefore we have
We multiply the optimality condition (8.3) by q and obtain
Multiplying (8.3) by w yields
and another multiplication of (8.3) by u yields
where we inserted the previous results in the last two steps. We rearrange and find
Note that \(|w| > 0\) since \(|q| < 1\). This finally leads us to
4th case: \(|w -q|^2< 1, |w +q|^2 < 1\).
Here the first KKT-condition yields \(u =0\), which can only occur if the objective function \(w\cdot u\) vanishes anyway. Summing up, we have
-
(2)
It remains to show that
$$\begin{aligned} (f_1 \Box f_2)(u)&= \inf _{z \in \ell ^1(\mathbb {R}^m)} \sum _{i \in \mathbb {N}} g_i(z_i) \\&\le (f_1^* + f_2^*)^*(u), \end{aligned}$$
where
Again we need to distinguish four cases.
1st case: If \(|q_i| < \cos (\varphi _i)\), we have \(q_i \cdot u_i > 0\) and we can choose \(z_i = 0\) to obtain
2nd case: Analogously if \(|q_i| < -\cos (\varphi _i)\), we have \(q_i \cdot u_i < 0\) and choose \(z_i = u_i\), thus
3rd case: If \(|q_i| = 1\), we compute for \(z_i = \frac{u_i}{2} - \frac{c}{2}q_i\) , \(c > 0\),
Using a Taylor expansion around q we obtain
Hence with \(|q_i| = 1\) we find
for \(c \rightarrow \infty \). Hence for every \(\varepsilon \) there exists a \(c_i > 0\) such that \(g_i(z_i) \le \varepsilon / 2^i\).
4th case: Finally, if \(|q_i| \ge |\cos (\varphi _i)|\) and \(|q_i| < 1\), we pick \(z_i = 2 \lambda _i (w_i - q_i)\), with \(\lambda _i\) and \(w_i\) being the Lagrange multiplier and the dual variable from the above computation of \((f_1^* + f_2^*)^*\). It is easy to see that
Hence we define \(z := (z_i)_i\) such that
Let \(z^N\) denote z truncated at index \(N \in \mathbb {N}\), i.e.,
Then trivially \(z^N \in \ell ^1(\mathbb {R}^m)\) and we compute
as \(N \rightarrow \infty \). This completes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Brinkmann, EM., Burger, M., Rasch, J. et al. Bias Reduction in Variational Regularization. J Math Imaging Vis 59, 534–566 (2017). https://doi.org/10.1007/s10851-017-0747-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-017-0747-z