Kernel density estimation of allele frequency including undetected alleles - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 22:12:e17248.
doi: 10.7717/peerj.17248. eCollection 2024.

Kernel density estimation of allele frequency including undetected alleles

Affiliations

Kernel density estimation of allele frequency including undetected alleles

Satoshi Aoki et al. PeerJ. .

Abstract

Whereas undetected species contribute to estimation of species diversity, undetected alleles have not been used to estimated genetic diversity. Although random sampling guarantees unbiased estimation of allele frequency and genetic diversity measures, using undetected alleles may provide biased but more precise estimators useful for conservation. We newly devised kernel density estimation (KDE) for allele frequency including undetected alleles and tested it in estimation of allele frequency and nucleotide diversity using population generated by coalescent simulation as well as well as real population data. Contrary to expectations, nucleotide diversity estimated by KDE had worse bias and accuracy. Allele frequency estimated by KDE was also worse except when the sample size was small. These might be due to finity of population and/or the curse of dimensionality. In conclusion, KDE of allele frequency does not contribute to genetic diversity estimation.

Keywords: Allele frequency; Genetic diversity; Kernel density estimation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. Sample allele frequencies and allele frequencies estimated by KDE on sequence space using the least squares cross validation.
The original samples are three “aa” and three “at” sequences shown as gray bars, and the estimated frequencies are shown as a line graph. The alleles “ag” and “ac” have a little high frequency because they are one distance away from both of the detected alleles, “aa” and “at”. The alleles “gg” to “tc” have the lowest frequency because they are most distant from the detected alleles.
Figure 2
Figure 2. Sample allele frequencies and allele frequencies estimated by KDE on distance space using the least squares cross validation.
The original samples are three “aa” and three “at” sequences shown as gray bars, and the estimated frequencies are shown as a line graph.
Figure 3
Figure 3. The accuracy of nucleotide diversity under mutation rate 0.01.
The five numbered groups correspond to the following parameter sets: 1. Without KDE. 2. KDE using LSCV and mutation number 1. 3. KDE using LSCV and mutation number 2. 4. KDE using LCV and mutation number 1. 5. KDE using LCV and mutation number 2.
Figure 4
Figure 4. The accuracy of nucleotide diversity under mutation rate 0.1.
The five numbered groups correspond to the following parameter sets: 1. Without KDE. 2. KDE using LSCV and mutation number 1. 3. KDE using LSCV and mutation number 2. 4. KDE using LCV and mutation number 1. 5. KDE using LCV and mutation number 2.
Figure 5
Figure 5. The allele frequency concordance rate under mutation rate 0.01.
The five numbered groups correspond to the following parameter sets: 1. Without KDE. 2. KDE using LSCV and mutation number 1. 3. KDE using LSCV and mutation number 2. 4. KDE using LCV and mutation number 1. 5. KDE using LCV and mutation number 2.
Figure 6
Figure 6. The allele frequency concordance rate under mutation rate 0.1.
The five numbered groups correspond to the following parameter sets: 1. Without KDE. 2. KDE using LSCV and mutation number 1. 3. KDE using LSCV and mutation number 2. 4. KDE using LCV and mutation number 1. 5. KDE using LCV and mutation number 2.
Figure 7
Figure 7. The accuracy of nucleotide diversity in the real data experiment.
The three numbered groups correspond to the following parameter sets: 1. Without KDE. 2. KDE using LSCV and mutation number 1. 3. KDE using LCV and mutation number 1. The error bars show the standard deviations.
Figure 8
Figure 8. The allele frequency concordance rate in the real data experiment.
The three numbered groups correspond to the following parameter sets: 1. Without KDE. 2. KDE using LSCV and mutation number 1. 3. KDE using LCV and mutation number 1. The error bars show the standard deviations.

Similar articles

References

    1. Bandelt HJ, Forster P, Röhl A. Median-joining networks for inferring intraspecific phylogenies. Molecular Biology and Evolution. 1999;16(1):37–48. doi: 10.1093/oxfordjournals.molbev.a026036. - DOI - PubMed
    1. Blum MGB, Nunes MA, Prangle D, Sission SA. A comparative review of dimension reduction methods in approximate Bayesian computation. Statistical Science. 2013;28(2):189–208. doi: 10.1214/12-STS406. - DOI
    1. Chao A, Chiu CH, Colwell RK, Magnago LFS, Chazdon RL, Gotelli NJ. Deciphering the enigma of undetected species, phylogenetic, and functional diversity based on Good-Turing theory. Ecology. 2017;98(11):2914–2929. doi: 10.1002/ecy.2000. - DOI - PubMed
    1. Heidenreich N-B, Schindler A, Sperlich S. Bandwidth selection for kernel density estimation: a review of fully automatic selectors. Advances in Statistical Analysis. 2013;97(4):403–433. doi: 10.1007/s10182-013-0216-y. - DOI
    1. Hudson RR. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics. 2002;18(2):337–338. doi: 10.1093/bioinformatics/18.2.337. - DOI - PubMed

Publication types

Grants and funding

This study was supported by JSPS KAKENHI Grant Number JP22J00445. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources