Abstract
The problem of testing the equality of mean vectors for high-dimensional data has been intensively investigated in the literature. However, most of the existing tests impose strong assumptions on the underlying group covariance matrices which may not be satisfied or hardly be checked in practice. In this article, an F-type test for two-sample Behrens–Fisher problems for high-dimensional data is proposed and studied. When the two samples are normally distributed and when the null hypothesis is valid, the proposed F-type test statistic is shown to be an F-type mixture, a ratio of two independent \(\chi ^2\)-type mixtures. Under some regularity conditions and the null hypothesis, it is shown that the proposed F-type test statistic and the above F-type mixture have the same normal and non-normal limits. It is then justified to approximate the null distribution of the proposed F-type test statistic by that of the F-type mixture, resulting in the so-called normal reference F-type test. Since the F-type mixture is a ratio of two independent \(\chi ^2\)-type mixtures, we employ the Welch–Satterthwaite \(\chi ^2\)-approximation to the distributions of the numerator and the denominator of the F-type mixture respectively, resulting in an approximation F-distribution whose degrees of freedom can be consistently estimated from the data. The asymptotic power of the proposed F-type test is established. Two simulation studies are conducted and they show that in terms of size control, the proposed F-type test outperforms two existing competitors. The good performance of the proposed F-type test is also illustrated by a COVID-19 data example.
Similar content being viewed by others
References
Anderson TW (2003) An introduction to multivariate statistical analysis. Wiley series in probability and statistics. Wiley, Hoboken
Bai ZD, Saranadasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6(2):311–329
Chen SX, Qin Y-L (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38(2):808–835
Dempster AP (1958) A high dimensional two sample significance test. Ann Math Stat 29(4):995–1010
Dempster AP (1960) A significance test for the separation of two highly multivariate small samples. Biometrics 16(1):41–50
Fisher RA (1935) The fiducial argument in statistical inference. Ann Eugen 6(4):391–398
Fisher RA (1939) The comparison of samples with possibly unequal variances. Ann Eugen 9(2):174–180
James G (1954) Tests of linear hypotheses in univariate and multivariate analysis when the ratios of the population variances are unknown. Biometrika 41(1/2):19–43
Johansen S (1980) The Welch-James approximation to the distribution of the residual sum of squares in a weighted linear regression. Biometrika 67(1):85–92
Liu X, Guo J, Zhou B, Zhang J-T (2016) Two simple tests for heteroscedastic two-way ANOVA. Stat Res Lett 5(6):6–16
Satterthwaite FE (1946) An approximate distribution of estimates of variance components. Biom Bull 2(6):110–114
Scheffé H (1970) Practical solutions of the Behrens-Fisher problem. J Am Stat Assoc 65(332):1501–1508
Srivastava MS, Fujikoshi Y (2006) Multivariate analysis of variance with fewer observations than the dimension. J Multivar Anal 97(9):1927–1940. https://doi.org/10.1016/j.jmva.2005.08.010
Tang S, Tsui K-W (2007) Distributional properties for the generalized p-value for the Behrens-Fisher problem. Stat Probab Lett 77(1):1–8. https://doi.org/10.1016/j.spl.2006.05.005
Thair SA, He YD, Hasin-Brumshtein Y, Sakaram S, Pandya R, Toh J, Rawling D, Remmel M, Coyle S, Dalekos GN (2021) Transcriptomic similarities and differences in host response between ARS-CoV-2 and other viral infections. Iscience 24(1):101947
Welch BL (1947) The generalization of ‘Student’s’ problem when several different population variances are involved. Biometrika 34(1/2):28–35
Yao Y (1965) An approximate degrees of freedom solution to the multivariate Behrens-Fisher problem. Biometrika 52(1/2):139–147
Zhang J-T (2005) Approximate and asymptotic distributions of chi-squared-type mixtures with applications. J Am Stat Assoc 100(469):273–285
Zhang J-T (2011) Two-way MANOVA with unequal cell sizes and unequal cell covariance matrices. Technometrics 53(4):426–439
Zhang J-T (2012) An approximate Hotelling \(T^2\)-test for heteroscedastic one-way MANOVA. Open J Stat 2(1):1–11
Zhang J-T (2013) Analysis of variance for functional data. Chapman and Hall/CRC, New York
Zhang J-T (2013) Tests of linear hypotheses in the ANOVA under heteroscedasticity. Int J Adv Stat Probab 1(2):9–24
Zhang J-T, Zhu T (2022) A further study on Chen-Qin’s test for two-sample Behrens-Fisher problems for high-dimensional data. J Stat Theory Pract 16(1):1
Zhang J-T, Guo J, Zhou B, Liu X (2016) A modified Bartlett test for heteroscedastic two-way MANOVA. J Adv Stat 1(2):94–108
Zhang J-T, Guo J, Zhou B, Cheng M-Y (2020) A simple two-sample test in high dimensions based on \(L^2\)-norm. J Am Stat Assoc 115(530):1011–1027
Zhang J-T, Zhou B, Guo J, Zhu T (2021) Two-sample Behrens-Fisher problems for high-dimensional data: a normal reference approach. J Stat Plan Inference 213:142–161
Acknowledgements
Zhang and Zhu’s research was partially supported by the National University of Singapore academic research grant (22-5699-A0001) and the National Institute of Education (NIE) start-up grant (NIE-SUG 6-22 ZTM), respectively. The authors thank the Editor in Chief and the anonymous reviewers for their constructive comments and suggestions which help us to improve the article substantially.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix A. Technical proofs
Appendix A. Technical proofs
Proof of Theorem 1
We shall apply Theorems 1 and 2 of Zhang et al. (2021) for the proof of this theorem. Notice that under Conditions C1–C3, by applying Theorem 2 of Zhang et al. (2021), \({\text {tr}}(\hat{{{\varvec{\Sigma }}}}_n)\) is ratio-consistent for \({\text {tr}}({{\varvec{\Sigma }}}_n)\) uniformly for all p. We can write
From (13), we have \({\text {E}}(S_{n,p,0}^*)={\text {tr}}({{\varvec{\Sigma }}}_n)\) and
Under Condition C3, as \(n\rightarrow \infty\), \({\text {Var}}\left[ S_{n,p,0}^*/{\text {tr}}({{\varvec{\Sigma }}}_n)\right] \rightarrow 0\) uniformly for all p. That is, \(S_{n,p,0}^*/{\text {tr}}({{\varvec{\Sigma }}}_n){\mathop {\longrightarrow }\limits ^{P}}1\) uniformly for all p. Therefore, we can write
Then under Conditions C1–C4, as \(n,p\rightarrow \infty\), Theorem 1(a) and (17) follow directly from Theorem 1(a) of Zhang et al. (2021), and under Conditions C1–C3 and C5, as \(n,p\rightarrow \infty\), Theorem 1(b) and (17) follow directly from Theorem 1(b) of Zhang et al. (2021). \(\square\)
Proof of Theorem 2
We first prove (a). Under Conditions C1–C4, Theorem 1(a) indicates that as \(n,p\rightarrow \infty\) we have \((F_{n,p,0}-1)/\sqrt{2/d_1}{\mathop {\longrightarrow }\limits ^{L}}\zeta\). In addition, under Conditions C1–C3, as \(n\rightarrow \infty\), we have \(\hat{d}_1/d_1{\mathop {\longrightarrow }\limits ^{P}}1\) and \(\hat{d}_2/d_2{\mathop {\longrightarrow }\limits ^{P}}1\) uniformly for all p. Therefore, under the given conditions, we have
Next we prove (b). Under Conditions C1–C3 and C5, Theorem 1(b) indicates that as \(n\rightarrow \infty\), we have \((F_{n,p,0}-1)/\sqrt{2/d_1}{\mathop {\longrightarrow }\limits ^{L}}\mathcal {N}(0,1)\). By Remark 2, we have \([F_{d_1,d_2}(\alpha )-1]/\sqrt{2/d_1}\rightarrow z_\alpha\) when \(d_2\rightarrow \infty\). Therefore, under the given conditions, we have
where \(\Phi (\cdot )\) denotes the cumulative distribution of \(\mathcal {N}(0,1)\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, T., Wang, P. & Zhang, JT. Two-sample Behrens–Fisher problems for high-dimensional data: a normal reference F-type test. Comput Stat 39, 3207–3230 (2024). https://doi.org/10.1007/s00180-023-01433-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-023-01433-6