{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,7]],"date-time":"2024-09-07T00:34:47Z","timestamp":1725669287532},"reference-count":45,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2024,2,15]],"date-time":"2024-02-15T00:00:00Z","timestamp":1707955200000},"content-version":"vor","delay-in-days":14,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,7,3]]},"abstract":"Abstract<\/jats:title>\n Precisely defining and mapping all cytosine (C) positions and their clusters, known as CpG islands (CGIs), as well as their methylation status, are pivotal for genome-wide epigenetic studies, especially when population-centric reference genomes are ready for timely application. Here, we first align the two high-quality reference genomes, T2T-YAO and T2T-CHM13, from different ethnic backgrounds in a base-by-base fashion and compute their genome-wide density-defined and position-defined CGIs. Second, by mapping some representative genome-wide methylation data from selected organs onto the two genomes, we find that there are about 4.7%\u20135.8% sequence divergency of variable categories depending on quality cutoffs. Genes among the divergent sequences are mostly associated with neurological functions. Moreover, CGIs associated with the divergent sequences are significantly different with respect to CpG density and observed CpG\/expected CpG (O\/E) ratio between the two genomes. Finally, we find that the T2T-YAO genome not only has a greater CpG coverage than that of the T2T-CHM13 genome when whole-genome bisulfite sequencing (WGBS) data from the European and American populations are mapped to each reference, but also shows more hyper-methylated CpG sites as compared to the T2T-CHM13 genome. Our study suggests that future genome-wide epigenetic studies of the Chinese populations rely on both acquisition of high-quality methylation data and subsequent precision CGI mapping based on the Chinese T2T reference.<\/jats:p>","DOI":"10.1093\/gpbjnl\/qzae009","type":"journal-article","created":{"date-parts":[[2024,2,2]],"date-time":"2024-02-02T14:31:55Z","timestamp":1706884315000},"source":"Crossref","is-referenced-by-count":0,"title":["CpG Island Definition and Methylation Mapping of the T2T-YAO Genome"],"prefix":"10.1093","volume":"22","author":[{"ORCID":"http:\/\/orcid.org\/0000-0001-8608-5903","authenticated-orcid":false,"given":"Ming","family":"Xiao","sequence":"first","affiliation":[{"name":"College of Computer Science, Sichuan University , Chengdu 610065, China"}]},{"ORCID":"http:\/\/orcid.org\/0009-0006-8597-5924","authenticated-orcid":false,"given":"Rui","family":"Wei","sequence":"additional","affiliation":[{"name":"College of Computer Science, Sichuan University , Chengdu 610065, China"},{"name":"Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences , Hangzhou 310024, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0001-7599-2700","authenticated-orcid":false,"given":"Jun","family":"Yu","sequence":"additional","affiliation":[{"name":"CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation , Beijing 100101, China"},{"name":"University of Chinese Academy of Sciences , Beijing 100049, China"}]},{"ORCID":"http:\/\/orcid.org\/0009-0000-3797-7040","authenticated-orcid":false,"given":"Chujie","family":"Gao","sequence":"additional","affiliation":[{"name":"College of Computer Science, Sichuan University , Chengdu 610065, China"}]},{"ORCID":"http:\/\/orcid.org\/0009-0008-4105-030X","authenticated-orcid":false,"given":"Fengyi","family":"Yang","sequence":"additional","affiliation":[{"name":"College of Computer Science, Sichuan University , Chengdu 610065, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-3708-1727","authenticated-orcid":false,"given":"Le","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Computer Science, Sichuan University , Chengdu 610065, China"},{"name":"Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences , Hangzhou 310024, China"}]}],"member":"286","published-online":{"date-parts":[[2024,2,1]]},"reference":[{"key":"2024081700404836300_qzae009-B1","doi-asserted-by":"crossref","first-page":"512","DOI":"10.1016\/j.bpj.2016.12.029","article-title":"Optical trapping nanometry of hypermethylated CPG-Island DNA","volume":"112","author":"Pongor","year":"2017","journal-title":"Biophys J"},{"key":"2024081700404836300_qzae009-B2","first-page":"319","article-title":"Position-defined CpG islands provide complete co-methylation indexing for human genes","author":"Xiao","year":"2022","journal-title":"International Conference on Intelligent Computing"},{"key":"2024081700404836300_qzae009-B3","doi-asserted-by":"crossref","first-page":"777","DOI":"10.1016\/S0140-6736(18)31268-6","article-title":"Principles of DNA methylation and their implications for biology and medicine","volume":"392","author":"Dor","year":"2018","journal-title":"Lancet"},{"key":"2024081700404836300_qzae009-B4","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1016\/j.cell.2013.12.019","article-title":"Reversing DNA methylation: mechanisms, genomics, and biological functions","volume":"156","author":"Wu","year":"2014","journal-title":"Cell"},{"key":"2024081700404836300_qzae009-B5","doi-asserted-by":"crossref","first-page":"2655","DOI":"10.1038\/s41467-021-22639-6","article-title":"The genomic loci of specific human tRNA genes exhibit ageing-related DNA hypermethylation","volume":"12","author":"Acton","year":"2021","journal-title":"Nat Commun"},{"key":"2024081700404836300_qzae009-B6","doi-asserted-by":"crossref","first-page":"23421","DOI":"10.1038\/srep23421","article-title":"CpG methylation patterns of human mitochondrial DNA","volume":"6","author":"Liu","year":"2016","journal-title":"Sci Rep"},{"key":"2024081700404836300_qzae009-B7","doi-asserted-by":"crossref","first-page":"1001","DOI":"10.1086\/302065","article-title":"Methylation levels at selected CpG sites in the factor VIII and FGFR3 genes, in mature female and male germ cells: implications for male-driven evolution","volume":"63","author":"El-Maarri","year":"1998","journal-title":"Am J Hum Genet"},{"key":"2024081700404836300_qzae009-B8","doi-asserted-by":"crossref","first-page":"eabj5089","DOI":"10.1126\/science.abj5089","article-title":"Epigenetic patterns in a complete human genome","volume":"376","author":"Gershman","year":"2022","journal-title":"Science"},{"key":"2024081700404836300_qzae009-B9","doi-asserted-by":"crossref","first-page":"eabl3533","DOI":"10.1126\/science.abl3533","article-title":"A complete reference genome improves analysis of human genetic variation","volume":"376","author":"Aganezov","year":"2022","journal-title":"Science"},{"key":"2024081700404836300_qzae009-B10","doi-asserted-by":"crossref","first-page":"1085","DOI":"10.1016\/j.gpb.2023.08.001","article-title":"T2T-YAO: a telomere-to-telomere assembled diploid reference genome for Han Chinese","volume":"21","author":"He","year":"2023","journal-title":"Genomics Proteomics Bioinformatics"},{"key":"2024081700404836300_qzae009-B11","doi-asserted-by":"crossref","first-page":"515","DOI":"10.1093\/bib\/bbz134","article-title":"CpG-island-based annotation and analysis of human housekeeping genes","volume":"22","author":"Zhang","year":"2021","journal-title":"Brief Bioinform"},{"key":"2024081700404836300_qzae009-B12","doi-asserted-by":"crossref","first-page":"2148","DOI":"10.1109\/TCBB.2019.2935971","article-title":"CGIDLA: developing the web server for CpG island related density and LAUPs (Lineage-associated Underrepresented Permutations) study","volume":"17","author":"Xiao","year":"2019","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2024081700404836300_qzae009-B13","doi-asserted-by":"crossref","first-page":"1029","DOI":"10.1002\/ijc.33860","article-title":"A disease network-based deep learning approach for characterizing melanoma","volume":"150","author":"Lai","year":"2022","journal-title":"Int J Cancer"},{"key":"2024081700404836300_qzae009-B14","doi-asserted-by":"crossref","first-page":"1554","DOI":"10.1093\/bioinformatics\/btz542","article-title":"Revealing dynamic regulations and the related key proteins of myeloma-initiating cells by integrating experimental data into a systems biological model","volume":"37","author":"Zhang","year":"2021","journal-title":"Bioinformatics"},{"key":"2024081700404836300_qzae009-B15","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1016\/j.csbj.2020.02.008","article-title":"Exploring the computational methods for protein-ligand binding site prediction","volume":"18","author":"Zhao","year":"2020","journal-title":"Comput Struct Biotechnol J"},{"key":"2024081700404836300_qzae009-B16","first-page":"1","article-title":"Building up a robust risk mathematical platform to predict colorectal cancer","volume":"2017","author":"Zhang","year":"2017","journal-title":"Complexity"},{"key":"2024081700404836300_qzae009-B17","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1093\/jmcb\/mjx056","article-title":"EZH2-, CHD4-, and IDH-linked epigenetic perturbation and its association with survival in glioma patients","volume":"9","author":"Zhang","year":"2017","journal-title":"J Mol Cell Biol"},{"key":"2024081700404836300_qzae009-B18","doi-asserted-by":"crossref","first-page":"950","DOI":"10.1186\/s12864-016-3256-3","article-title":"Exploring the key genes and signaling transduction pathways related to the survival time of glioblastoma multiforme patients by a novel survival analysis model","volume":"18","author":"Xia","year":"2017","journal-title":"BMC Genomics"},{"key":"2024081700404836300_qzae009-B19","doi-asserted-by":"crossref","first-page":"3624","DOI":"10.1093\/bioinformatics\/bty392","article-title":"Lineage-associated underrepresented permutations (LAUPs) of mammalian genomic sequences based on a Jellyfish-based LAUPs analysis application (JBLA)","volume":"34","author":"Zhang","year":"2018","journal-title":"Bioinformatics"},{"key":"2024081700404836300_qzae009-B20","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1038\/s41438-021-00494-2","article-title":"Bioinformatic analysis of chromatin organization and biased expression of duplicated genes between two poplars with a common whole-genome duplication","volume":"8","author":"Zhang","year":"2021","journal-title":"Hortic Res"},{"key":"2024081700404836300_qzae009-B21","doi-asserted-by":"crossref","first-page":"e1007069","DOI":"10.1371\/journal.pcbi.1007069","article-title":"Comprehensively benchmarking applications for detecting copy number variation","volume":"15","author":"Zhang","year":"2019","journal-title":"PLoS Comput Biol"},{"key":"2024081700404836300_qzae009-B22","doi-asserted-by":"crossref","first-page":"673","DOI":"10.26599\/TST.2022.9010038","article-title":"Developing a physiological signal-based, mean threshold and decision-level fusion algorithm (PMD) for emotion recognition","volume":"28","author":"Zhang","year":"2023","journal-title":"Tsinghua Sci Technol"},{"key":"2024081700404836300_qzae009-B23","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btad121","article-title":"NGenomeSyn: an easy-to-use and flexible tool for publication-ready visualization of syntenic relationships across multiple genomes","volume":"39","author":"He","year":"2023","journal-title":"Bioinformatics"},{"key":"2024081700404836300_qzae009-B24","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1038\/s41586-023-05976-y","article-title":"Recombination between heterologous human acrocentric chromosomes","volume":"617","author":"Guarracino","year":"2023","journal-title":"Nature"},{"key":"2024081700404836300_qzae009-B25","doi-asserted-by":"crossref","first-page":"1057","DOI":"10.1093\/nar\/gku1113","article-title":"The GOA database: Gene Ontology annotation updates for 2015","volume":"43","author":"Huntley","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2024081700404836300_qzae009-B26","doi-asserted-by":"crossref","first-page":"446","DOI":"10.1186\/1471-2105-7-446","article-title":"CpGcluster: a distance-based algorithm for CpG-island detection","volume":"7","author":"Hackenberg","year":"2006","journal-title":"BMC Bioinformics"},{"key":"2024081700404836300_qzae009-B27","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1038\/s41392-022-00994-0","article-title":"Artificial intelligence in cancer target identification and drug discovery","volume":"7","author":"You","year":"2022","journal-title":"Signal Transduct Target Ther"},{"key":"2024081700404836300_qzae009-B28","doi-asserted-by":"crossref","first-page":"e202101302","DOI":"10.26508\/lsa.202101302","article-title":"Expression regulation of genes is linked to their CpG density distributions around transcription start sites","volume":"5","author":"Tian","year":"2022","journal-title":"Life Sci Alliance"},{"key":"2024081700404836300_qzae009-B29","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1016\/j.csbj.2022.11.037","article-title":"A network medicine approach for identifying diagnostic and prognostic biomarkers and exploring drug repurposing in human cancer","volume":"21","author":"Zhang","year":"2023","journal-title":"Comput Struct Biotechnol J"},{"key":"2024081700404836300_qzae009-B30","doi-asserted-by":"crossref","first-page":"1651","DOI":"10.3390\/e24111651","article-title":"Spatiotemporal transformer neural network for time-series forecasting","volume":"24","author":"You","year":"2022","journal-title":"Entropy (Basel)"},{"key":"2024081700404836300_qzae009-B31","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1016\/j.neucom.2020.10.118","article-title":"Denoising of MR and CT images using cascaded multi-supervision convolutional neural networks with progressive training","volume":"469","author":"Song","year":"2022","journal-title":"Neurocomputing"},{"key":"2024081700404836300_qzae009-B32","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1016\/j.eng.2019.06.008","article-title":"A brief review of artificial intelligence applications and algorithms for psychiatric disorders","volume":"6","author":"Liu","year":"2020","journal-title":"Engineering"},{"key":"2024081700404836300_qzae009-B33","doi-asserted-by":"crossref","first-page":"1123652","DOI":"10.3389\/fimmu.2023.1123652","article-title":"Discovering hematoma-stimulated circuits for secondary brain injury after intraventricular hemorrhage by spatial transcriptome analysis","volume":"14","author":"Zhang","year":"2023","journal-title":"Front Immunol"},{"key":"2024081700404836300_qzae009-B34","doi-asserted-by":"crossref","first-page":"3092","DOI":"10.1016\/j.apsb.2021.05.032","article-title":"MCDB: a comprehensive curated mitotic catastrophe database for retrieval, protein sequence alignment, and target prediction","volume":"11","author":"Zhang","year":"2021","journal-title":"Acta Pharm Sin B"},{"key":"2024081700404836300_qzae009-B35","doi-asserted-by":"crossref","first-page":"D501","DOI":"10.1093\/nar\/gki025","article-title":"NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins","volume":"33","author":"Pruitt","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2024081700404836300_qzae009-B36","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1016\/j.gpb.2021.04.001","article-title":"Genome Warehouse:\u00a0","volume":"19","author":"Chen","year":"2021","journal-title":"Genomics Proteomics Bioinformatics"},{"key":"2024081700404836300_qzae009-B37","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1016\/0022-2836(87)90689-9","article-title":"CpG islands in vertebrate genomes","volume":"196","author":"Gardiner-Garden","year":"1987","journal-title":"J Mol Biol"},{"key":"2024081700404836300_qzae009-B38","doi-asserted-by":"crossref","first-page":"D1115","DOI":"10.1093\/nar\/gkab959","article-title":"The UCSC genome browser database: 2022 update","volume":"50","author":"Lee","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2024081700404836300_qzae009-B39","doi-asserted-by":"crossref","first-page":"D882","DOI":"10.1093\/nar\/gkz1062","article-title":"New developments on the Encyclopedia of DNA Elements (ENCODE) data portal","volume":"48","author":"Luo","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2024081700404836300_qzae009-B40","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1038\/cr.2011.23","article-title":"Regulation and function of DNA methylation in plants and animals","volume":"21","author":"He","year":"2011","journal-title":"Cell Res"},{"key":"2024081700404836300_qzae009-B41","doi-asserted-by":"crossref","first-page":"e1005944","DOI":"10.1371\/journal.pcbi.1005944","article-title":"MUMmer4: a fast and versatile genome alignment system","volume":"14","author":"Mar\u00e7ais","year":"2018","journal-title":"PLoS Comput Biol"},{"key":"2024081700404836300_qzae009-B42","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1089\/omi.2011.0118","article-title":"clusterProfiler: an R package for comparing biological themes among gene clusters","volume":"16","author":"Yu","year":"2012","journal-title":"OMICS"},{"key":"2024081700404836300_qzae009-B43","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1016\/S0168-9525(00)02024-2","article-title":"EMBOSS: the European molecular biology open software suite","volume":"16","author":"Rice","year":"2000","journal-title":"Trends Genet"},{"key":"2024081700404836300_qzae009-B44","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1093\/bioinformatics\/bty690","article-title":"gemBS: high throughput processing for DNA methylation data from bisulfite sequencing","volume":"35","author":"Merkel","year":"2019","journal-title":"Bioinformatics"},{"key":"2024081700404836300_qzae009-B45","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1038\/nature12433","article-title":"Charting a dynamic DNA methylation landscape of the human genome","volume":"500","author":"Ziller","year":"2013","journal-title":"Nature"}],"container-title":["Genomics, Proteomics & Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/gpb\/advance-article-pdf\/doi\/10.1093\/gpbjnl\/qzae009\/56682859\/qzae009.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/gpb\/article-pdf\/22\/2\/qzae009\/58839389\/qzae009.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/gpb\/article-pdf\/22\/2\/qzae009\/58839389\/qzae009.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,17]],"date-time":"2024-08-17T00:41:04Z","timestamp":1723855264000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/gpb\/article\/doi\/10.1093\/gpbjnl\/qzae009\/7596638"}},"subtitle":[],"editor":[{"given":"Peng","family":"Cui","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,2,1]]},"references-count":45,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,7,3]]}},"URL":"https:\/\/doi.org\/10.1093\/gpbjnl\/qzae009","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.12.02.568720","asserted-by":"object"}]},"ISSN":["1672-0229","2210-3244"],"issn-type":[{"type":"print","value":"1672-0229"},{"type":"electronic","value":"2210-3244"}],"subject":[],"published-other":{"date-parts":[[2024,4]]},"published":{"date-parts":[[2024,2,1]]}}}