Abstract
This study proposes multivariate kernel density estimation by stagewise minimization algorithm based on U-divergence and a simple dictionary. The dictionary consists of an appropriate scalar bandwidth matrix and a part of the original data. The resulting estimator brings us data-adaptive weighting parameters and bandwidth matrices, and provides a sparse representation of kernel density estimation. We develop the non-asymptotic error bound of the estimator that we obtained via the proposed stagewise minimization algorithm. It is confirmed from simulation studies that the proposed estimator performs as well as, or sometimes better than, other well-known density estimators.
Similar content being viewed by others
References
Basu A, Harris IR, Hjort NL, Jones MC (1998) Robust and efficient estimation by minimising a density power divergence. Biometrika 85:549–559
Bregman LM (1967) The relaxition method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput Math Math Phys 15(1):17–30
Dukkipati A (2010) On Kolmogorov-Nagumo averages and nonextensive entropy, 2010 International symposium on information theory and its applications, ISITA
Duong T, Hazelton ML (2003) Plug-in bandwidth matrices for bivariate kernel density estimation. J Nonparametric Stat 15(1):17–30
Friedman JH, Stuetzle W, Schroeder A (1984) Projection pursuit density estimation. J Am Stat Assoc 79(387):599–608
Girolami M, He C (2003) Probability density estimation from optimally condensed data samples. IEEE Trans Pattern Anal Mach Intell 25:1253–1264
Klemelä J (2007) Density estimation with stagewise optimization of the empirical risk. Mach Learn 90:29–57
Murata N, Takenouchi T, Kanamori T, Eguchi S (2004) Information geometry of U-boost and Bregman divergence. Neural Comput 16:1437–1481
Minami M, Eguchi S (2002) Robust blind source separation by beta-divergence. Neural Comput 14:1859–1886
Naito K, Eguchi S (2013) Density estimation with minimization of \(U\)-divergence. Mach Learn 90(1):29–57
Nash JW, Sellers LT, Talbot RS, Cawthorn JA, Ford BW (1994) The population biology of abalone (haliotis species) in Tasmania. I. blacklip abalone (H. rubra) from the north coast and islands of bass Strait, Sea fisheries division, technical report No. 48 (ISSN 1034-3288)
Ridgeway G (2002) Looking for lumps: boosting and bagging for density estimation. Comput Stat Data Anal 38(1):379–392
Scott D (2015) Multivariate density estimation, Theory, practice, and visualization, 2nd edn. Wiley
Schott JR (2017) Matrix analysis for statistics, 3rd edn. Wiley
Wand MP, Jones MC (1993) Comparison of somoothing parametrizations in bivariate density estimation. J Am Stat Assoc 88:520–528
Zhang C, Jiang Y, Chai Y (2010) Penalized bregman divergence for large-dimensional regression and classification. Biometrika 97:551–566
Zhang C, Jiang Y, Shang Z (2009) New aspects of bregman divergence in regression and classification with parametric and nonparametric estimation. Can J Stat 37:119–139
Acknowledgements
Kiheiji NISHIDA is grateful for the financial support from Hyogo Medical University Grant for Research Promotion, 2022. Kanta NAITO gratefully acknowledges the financial support from KAKENHI 19K11851. We sincerely thank the editor and two anonymous reviewers for taking the time to review our manuscript and providing many constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
To prove Theorem 1, we need the following Lemmas 1–5. Let \({\tilde{f}}({\textbf{x}})\) be any density estimator and let
where \(p_{j} \ge 0\), \(\sum _{j=1}^{N} p_{j} = 1\), and \(\phi _{j}({\textbf{x}}|{\textbf{X}}^{*}) \in D\). We consider two functions of the variable \(\pi \in [0, 1]\) defined as follows:
Then, Lemmas 1–3 are obtained as follows:
Lemma 1
Let \({\tilde{f}}({\textbf{x}})\) be a density estimator of \(f({\textbf{x}})\) and let \(f^{*}({\textbf{x}}|{\textbf{X}}^{*})\) be the estimator given in (19). For \(0 \le \pi \le 1\), it then follows that
Lemma 2
Let \({\tilde{f}}({\textbf{x}})\) be a density estimator of \(f({\textbf{x}})\) and let \(f^{*}({\textbf{x}}|{\textbf{X}}^{*})\) be the estimator given in (19). Then for any \(\pi \in [0, 1]\),
Lemma 3
Let \(f^{*}({\textbf{x}}|{\textbf{X}}^{*})\) be as given in (19) and let
for some \(\sum _{l=1}^{k} q_{l} \xi (\tilde{\phi _{l}}(\cdot |{\textbf{X}}^{*})) \in co(\xi (D))\). Under Assumption 1, it follows that
for any \(\pi \in [0,1]\).
In Lemmas 4-5, let \(W = \{ (\lambda _{\phi })_{\phi \in D} | \lambda _{\phi } \ge 0, \sum _{\phi \in D}\lambda _{\phi }=1, \#\{ \lambda _{\phi } > 0 \} < \infty \}\) and we define the following convex combination:
where \(\Lambda = (\lambda _{\phi })_{\phi \in D} \in W\). Then, we obtain Lemmas 4-5.
Lemma 4
For stagewise minimization density estimator \({\widehat{f}}({\textbf{x}}|{\textbf{X}}^{*})\), it holds under Assumption 1 that
Lemma 5
For \(\tau > 0\), let \({\widehat{f}}(\cdot |{\textbf{X}}^{*}) \in \{ f(\cdot . \Lambda ) | \Lambda \in W\}\) be such that
Then,
If we replace the dictionary in the proofs of Klemelä (2007) and Naito and Eguchi (2013) with the one in (9), we obtain Lemmas 1–5. Using Lemmas 4-5 in conjunction with Lemmas 1–3, we obtain the non-asymptotic error bound given by Theorem 1.
Appendix B
We obtain Theorem 2 using Lemma 6.
Lemma 6
For any \(h({\textbf{x}}|{\textbf{X}}^{*}) \in co(\xi (D))\), let \(g({\textbf{x}}|{\textbf{X}}^{*}) = u(h({\textbf{x}}|{\textbf{X}}^{*}))\) and let \(g_{c}({\textbf{x}}|{\textbf{X}}^{*}) = v_{g}^{-1}g({\textbf{x}}|{\textbf{X}}^{*})\), where
Under Assumption 2, we have
If we replace the dictionary in the proofs of Klemelä (2007) and Naito and Eguchi (2013) with the one in (9), we obtain Lemma 6.
Appendix C
Using convex property of \(U(t) = \exp (t)\), we can verify for the KL divergence that
We evaluate the constant \(B_{KL}({\textbf{X}}^{*})^{2}\). Let \(h_{a}\), \(h_{b}\) and \(h_{c}\) be three different scalar bandwidths, which are not random variables by assumption. Let \({\textbf{X}}_{i}^{*}\), \({\textbf{X}}_{j}^{*}\) and \({\textbf{X}}_{k}^{*}\) respectively be the means of the words. In what follows, we denote the density of the d-dimensional multivariate normal distribution \(N_{d}({\textbf{X}}_{i}^{*}, h_{a}^{2}{\textbf{I}})\) to be \(\phi _{a}({\textbf{x}}|{\textbf{X}}_{i}^{*})\). We also denote the density of the d-dimensional standard normal distribution to be \(\phi (\cdot )\). For notational convenience, we define \(h_{ab} \equiv h_{a}/h_{b}\) and \(h_{ac} \equiv h_{a}/h_{c}\). Then, we obtain
where we obtain the last equation by change of variable \({\textbf{t}} = {\textbf{x}} - {\textbf{X}}_{i}^{*}\) and define to be
The symbol \(\Vert {\textbf{x}} \Vert\) means \(({\textbf{x}}^{T}{\textbf{x}})^{1/2}\). Using the fact that the odd order moments of normal distribution are zero, and Theorem 11.22 in Schott (2017, p.480), we obtain
We define
where \(h_{min}\) and \(h_{max}\) are the minimum and the maximum bandwidths in the dictionary, respectively. We also define
Then, we obtain
and
Therefore, using (21), (22), and (23) in (20), we obtain the upper bound
\(\square\)
Appendix D
We prove that the finiteness of \(E_{{\textbf{X}}^{*}}[B_{U}({\textbf{X}}^{*})^{2}]\) in (15) is ensured if the fourth moment of \({\textbf{X}}_{i}^{*}\) is assumed in the case of KL divergence as described in Remark 3. Considering the expectation of the right-hand side of (24), we need to evaluate
Furthermore, we obtain
Hence, it suffices to assume \(E_{{\textbf{X}}^{*}} \Bigl [ \Vert {\textbf{X}}_{i}^{*} \Vert ^{4} \Bigr ] < \infty\). \(\square\)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nishida, K., Naito, K. Kernel density estimation by stagewise algorithm with a simple dictionary. Comput Stat 39, 523–560 (2024). https://doi.org/10.1007/s00180-022-01303-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-022-01303-7