Two-sample Behrens–Fisher problems for high-dimensional data: a normal reference F-type test

Zhu, Tianming; Wang, Pengfei; Zhang, Jin-Ting

doi:10.1007/s00180-023-01433-6

Two-sample Behrens–Fisher problems for high-dimensional data: a normal reference F-type test

Original Paper
Published: 24 November 2023

Volume 39, pages 3207–3230, (2024)
Cite this article

Computational Statistics Aims and scope Submit manuscript

197 Accesses
1 Citation
Explore all metrics

Abstract

The problem of testing the equality of mean vectors for high-dimensional data has been intensively investigated in the literature. However, most of the existing tests impose strong assumptions on the underlying group covariance matrices which may not be satisfied or hardly be checked in practice. In this article, an F-type test for two-sample Behrens–Fisher problems for high-dimensional data is proposed and studied. When the two samples are normally distributed and when the null hypothesis is valid, the proposed F-type test statistic is shown to be an F-type mixture, a ratio of two independent $\chi ^2$-type mixtures. Under some regularity conditions and the null hypothesis, it is shown that the proposed F-type test statistic and the above F-type mixture have the same normal and non-normal limits. It is then justified to approximate the null distribution of the proposed F-type test statistic by that of the F-type mixture, resulting in the so-called normal reference F-type test. Since the F-type mixture is a ratio of two independent $\chi ^2$-type mixtures, we employ the Welch–Satterthwaite $\chi ^2$-approximation to the distributions of the numerator and the denominator of the F-type mixture respectively, resulting in an approximation F-distribution whose degrees of freedom can be consistently estimated from the data. The asymptotic power of the proposed F-type test is established. Two simulation studies are conducted and they show that in terms of size control, the proposed F-type test outperforms two existing competitors. The good performance of the proposed F-type test is also illustrated by a COVID-19 data example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A Further Study on Chen–Qin’s Test for Two-Sample Behrens–Fisher Problems for High-Dimensional Data

Article 10 January 2022

Testing the Mean Vector for High-Dimensional Data

Article 09 December 2024

Testing high-dimensional mean vector with applications

Article 28 October 2021

References

Anderson TW (2003) An introduction to multivariate statistical analysis. Wiley series in probability and statistics. Wiley, Hoboken
Google Scholar
Bai ZD, Saranadasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6(2):311–329
MathSciNet Google Scholar
Chen SX, Qin Y-L (2010) A two-sample test for high-dimensional data with applications to gene-set testing. Ann Stat 38(2):808–835
Article MathSciNet Google Scholar
Dempster AP (1958) A high dimensional two sample significance test. Ann Math Stat 29(4):995–1010
Article MathSciNet Google Scholar
Dempster AP (1960) A significance test for the separation of two highly multivariate small samples. Biometrics 16(1):41–50
Article MathSciNet Google Scholar
Fisher RA (1935) The fiducial argument in statistical inference. Ann Eugen 6(4):391–398
Article Google Scholar
Fisher RA (1939) The comparison of samples with possibly unequal variances. Ann Eugen 9(2):174–180
Article Google Scholar
James G (1954) Tests of linear hypotheses in univariate and multivariate analysis when the ratios of the population variances are unknown. Biometrika 41(1/2):19–43
Article MathSciNet Google Scholar
Johansen S (1980) The Welch-James approximation to the distribution of the residual sum of squares in a weighted linear regression. Biometrika 67(1):85–92
Article MathSciNet Google Scholar
Liu X, Guo J, Zhou B, Zhang J-T (2016) Two simple tests for heteroscedastic two-way ANOVA. Stat Res Lett 5(6):6–16
Satterthwaite FE (1946) An approximate distribution of estimates of variance components. Biom Bull 2(6):110–114
Article Google Scholar
Scheffé H (1970) Practical solutions of the Behrens-Fisher problem. J Am Stat Assoc 65(332):1501–1508
MathSciNet Google Scholar
Srivastava MS, Fujikoshi Y (2006) Multivariate analysis of variance with fewer observations than the dimension. J Multivar Anal 97(9):1927–1940. https://doi.org/10.1016/j.jmva.2005.08.010
Article MathSciNet Google Scholar
Tang S, Tsui K-W (2007) Distributional properties for the generalized p-value for the Behrens-Fisher problem. Stat Probab Lett 77(1):1–8. https://doi.org/10.1016/j.spl.2006.05.005
Article MathSciNet Google Scholar
Thair SA, He YD, Hasin-Brumshtein Y, Sakaram S, Pandya R, Toh J, Rawling D, Remmel M, Coyle S, Dalekos GN (2021) Transcriptomic similarities and differences in host response between ARS-CoV-2 and other viral infections. Iscience 24(1):101947
Article Google Scholar
Welch BL (1947) The generalization of ‘Student’s’ problem when several different population variances are involved. Biometrika 34(1/2):28–35
Article MathSciNet Google Scholar
Yao Y (1965) An approximate degrees of freedom solution to the multivariate Behrens-Fisher problem. Biometrika 52(1/2):139–147
Article MathSciNet Google Scholar
Zhang J-T (2005) Approximate and asymptotic distributions of chi-squared-type mixtures with applications. J Am Stat Assoc 100(469):273–285
Article MathSciNet Google Scholar
Zhang J-T (2011) Two-way MANOVA with unequal cell sizes and unequal cell covariance matrices. Technometrics 53(4):426–439
Article MathSciNet Google Scholar
Zhang J-T (2012) An approximate Hotelling $T^2$-test for heteroscedastic one-way MANOVA. Open J Stat 2(1):1–11
Article MathSciNet Google Scholar
Zhang J-T (2013) Analysis of variance for functional data. Chapman and Hall/CRC, New York
Book Google Scholar
Zhang J-T (2013) Tests of linear hypotheses in the ANOVA under heteroscedasticity. Int J Adv Stat Probab 1(2):9–24
Article Google Scholar
Zhang J-T, Zhu T (2022) A further study on Chen-Qin’s test for two-sample Behrens-Fisher problems for high-dimensional data. J Stat Theory Pract 16(1):1
Article MathSciNet Google Scholar
Zhang J-T, Guo J, Zhou B, Liu X (2016) A modified Bartlett test for heteroscedastic two-way MANOVA. J Adv Stat 1(2):94–108
Article Google Scholar
Zhang J-T, Guo J, Zhou B, Cheng M-Y (2020) A simple two-sample test in high dimensions based on $L^2$-norm. J Am Stat Assoc 115(530):1011–1027
Article Google Scholar
Zhang J-T, Zhou B, Guo J, Zhu T (2021) Two-sample Behrens-Fisher problems for high-dimensional data: a normal reference approach. J Stat Plan Inference 213:142–161
Article MathSciNet Google Scholar

Download references

Acknowledgements

Zhang and Zhu’s research was partially supported by the National University of Singapore academic research grant (22-5699-A0001) and the National Institute of Education (NIE) start-up grant (NIE-SUG 6-22 ZTM), respectively. The authors thank the Editor in Chief and the anonymous reviewers for their constructive comments and suggestions which help us to improve the article substantially.

Author information

Authors and Affiliations

National Institute of Education, Nanyang Technological University, 1 Nanyang Walk, Singapore, 637616, Singapore
Tianming Zhu & Pengfei Wang
Department of Statistics and Data Science, National University of Singapore, 6 Science Drive 2, Singapore, 117546, Singapore
Jin-Ting Zhang

Authors

Tianming Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jin-Ting Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tianming Zhu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (zip 3 KB)

Appendix A. Technical proofs

Proof of Theorem 1

We shall apply Theorems 1 and 2 of Zhang et al. (2021) for the proof of this theorem. Notice that under Conditions C1–C3, by applying Theorem 2 of Zhang et al. (2021), ${\text {tr}}(\hat{{{\varvec{\Sigma }}}}_n)$ is ratio-consistent for ${\text {tr}}({{\varvec{\Sigma }}}_n)$ uniformly for all p. We can write

$$\begin{aligned} F_{n,p,0}=\frac{T_{n,p,0}}{{\text {tr}}({{\varvec{\Sigma }}}_n)}[1+o_p(1)],\; \text{ and } \;\tilde{F}_{n,p,0}=\frac{T_{n,p,0}-{\text {tr}}({{\varvec{\Sigma }}}_n)}{\sqrt{2{\text {tr}}({{\varvec{\Sigma }}}_n^2)}}[1+o_p(1)]. \end{aligned}$$

From (13), we have ${\text {E}}(S_{n,p,0}^*)={\text {tr}}({{\varvec{\Sigma }}}_n)$ and

$$\begin{aligned} {\text {Var}}\left[ S_{n,p,0}^*/{\text {tr}}({{\varvec{\Sigma }}}_n)\right] =\frac{2[n_2^{2}(n_1-1)^{-1}{\text {tr}}({{\varvec{\Sigma }}}_1^2)+n_1^{2}(n_2-1)^{-1}{\text {tr}}({{\varvec{\Sigma }}}_2^2)]}{n_2^{2}{\text {tr}}^2({{\varvec{\Sigma }}}_1)+n_1^{2}{\text {tr}}^2({{\varvec{\Sigma }}}_2)+2n_1n_2{\text {tr}}({{\varvec{\Sigma }}}_1){\text {tr}}({{\varvec{\Sigma }}}_2)}. \end{aligned}$$

Under Condition C3, as $n\rightarrow \infty$, ${\text {Var}}\left[ S_{n,p,0}^*/{\text {tr}}({{\varvec{\Sigma }}}_n)\right] \rightarrow 0$ uniformly for all p. That is, $S_{n,p,0}^*/{\text {tr}}({{\varvec{\Sigma }}}_n){\mathop {\longrightarrow }\limits ^{P}}1$ uniformly for all p. Therefore, we can write

$$\begin{aligned} F^*_{n,p,0}=\frac{T^*_{n,p,0}}{{\text {tr}}({{\varvec{\Sigma }}}_n)}[1+o_p(1)],\; \text{ and } \;\tilde{F}^*_{n,p,0}=\frac{T^*_{n,p,0}-{\text {tr}}({{\varvec{\Sigma }}}_n)}{\sqrt{2{\text {tr}}({{\varvec{\Sigma }}}_n^2)}}[1+o_p(1)]. \end{aligned}$$

Then under Conditions C1–C4, as $n,p\rightarrow \infty$, Theorem 1(a) and (17) follow directly from Theorem 1(a) of Zhang et al. (2021), and under Conditions C1–C3 and C5, as $n,p\rightarrow \infty$, Theorem 1(b) and (17) follow directly from Theorem 1(b) of Zhang et al. (2021). $\square$

Proof of Theorem 2

We first prove (a). Under Conditions C1–C4, Theorem 1(a) indicates that as $n,p\rightarrow \infty$ we have $(F_{n,p,0}-1)/\sqrt{2/d_1}{\mathop {\longrightarrow }\limits ^{L}}\zeta$. In addition, under Conditions C1–C3, as $n\rightarrow \infty$, we have $\hat{d}_1/d_1{\mathop {\longrightarrow }\limits ^{P}}1$ and $\hat{d}_2/d_2{\mathop {\longrightarrow }\limits ^{P}}1$ uniformly for all p. Therefore, under the given conditions, we have

$$\begin{aligned} \begin{aligned}&\quad \Pr \left[ F_{n,p}\ge F_{\hat{d}_1,\hat{d}_2}(\alpha )\right] \\&=\Pr \left[ F_{n,p,0}\ge F_{\hat{d}_1,\hat{d}_2}(\alpha )-\frac{n_1n_2n^{-1}\Vert {{\varvec{\mu }}}_1-{{\varvec{\mu }}}_2\Vert ^2}{{\text {tr}}(\hat{{{\varvec{\Sigma }}}}_n)}\right] [1+o(1)]\\&=\Pr \left[ \frac{F_{n,p,0}-1}{\sqrt{2/d_1}}\ge \frac{F_{\hat{d}_1,\hat{d}_2}(\alpha )-1}{\sqrt{2/d_1}}-\frac{n_1n_2n^{-1}\Vert {{\varvec{\mu }}}_1-{{\varvec{\mu }}}_2\Vert ^2}{\sqrt{2/d_1}{\text {tr}}(\hat{{{\varvec{\Sigma }}}}_n)}\right] [1+o(1)]\\&=\Pr \left\{ \zeta \ge \frac{F_{d_1,d_2}(\alpha )-1}{\sqrt{2/d_1}}-\frac{n\tau (1-\tau )\Vert {{\varvec{\mu }}}_1-{{\varvec{\mu }}}_2\Vert ^2}{\left[ 2{\text {tr}}({{\varvec{\Sigma }}}^2)\right] ^{1/2}}\right\} [1+o(1)].\\ \end{aligned} \end{aligned}$$

Next we prove (b). Under Conditions C1–C3 and C5, Theorem 1(b) indicates that as $n\rightarrow \infty$, we have $(F_{n,p,0}-1)/\sqrt{2/d_1}{\mathop {\longrightarrow }\limits ^{L}}\mathcal {N}(0,1)$. By Remark 2, we have $[F_{d_1,d_2}(\alpha )-1]/\sqrt{2/d_1}\rightarrow z_\alpha$ when $d_2\rightarrow \infty$. Therefore, under the given conditions, we have

$$\begin{aligned} \begin{aligned}&\quad \Pr \left[ F_{n,p}\ge F_{\hat{d}_1,\hat{d}_2}(\alpha )\right] \\&=\Pr \left[ F_{n,p,0}\ge F_{\hat{d}_1,\hat{d}_2}(\alpha )-\frac{n_1n_2n^{-1}\Vert {{\varvec{\mu }}}_1-{{\varvec{\mu }}}_2\Vert ^2}{{\text {tr}}(\hat{{{\varvec{\Sigma }}}}_n)}\right] [1+o(1)]\\&=\Pr \left[ \frac{F_{n,p,0}-1}{\sqrt{2/d_1}}\ge \frac{F_{\hat{d}_1,\hat{d}_2}(\alpha )-1}{\sqrt{2/d_1}}-\frac{n_1n_2n^{-1}\Vert {{\varvec{\mu }}}_1-{{\varvec{\mu }}}_2\Vert ^2}{\sqrt{2/d_1}{\text {tr}}(\hat{{{\varvec{\Sigma }}}}_n)}\right] [1+o(1)]\\&=\Phi \left\{ -z_{\alpha }+\frac{n\tau (1-\tau )\Vert {{\varvec{\mu }}}_1-{{\varvec{\mu }}}_2\Vert ^2}{\left[ 2{\text {tr}}({{\varvec{\Sigma }}}^2)\right] ^{1/2}}\right\} [1+o(1)],\\ \end{aligned} \end{aligned}$$

where $\Phi (\cdot )$ denotes the cumulative distribution of $\mathcal {N}(0,1)$.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, T., Wang, P. & Zhang, JT. Two-sample Behrens–Fisher problems for high-dimensional data: a normal reference F-type test. Comput Stat 39, 3207–3230 (2024). https://doi.org/10.1007/s00180-023-01433-6

Download citation

Received: 28 December 2022
Accepted: 23 October 2023
Published: 24 November 2023
Issue Date: September 2024
DOI: https://doi.org/10.1007/s00180-023-01433-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Two-sample Behrens–Fisher problems for high-dimensional data: a normal reference F-type test

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Further Study on Chen–Qin’s Test for Two-Sample Behrens–Fisher Problems for High-Dimensional Data

Testing the Mean Vector for High-Dimensional Data

Testing high-dimensional mean vector with applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (zip 3 KB)

Appendix A. Technical proofs

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Two-sample Behrens–Fisher problems for high-dimensional data: a normal reference F-type test

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Further Study on Chen–Qin’s Test for Two-Sample Behrens–Fisher Problems for High-Dimensional Data

Testing the Mean Vector for High-Dimensional Data

Testing high-dimensional mean vector with applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (zip 3 KB)

Appendix A. Technical proofs

Appendix A. Technical proofs

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation