MrTADFinder: A network modularity based approach to identify topologically associating domains in multiple resolutions - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 24;13(7):e1005647.
doi: 10.1371/journal.pcbi.1005647. eCollection 2017 Jul.

MrTADFinder: A network modularity based approach to identify topologically associating domains in multiple resolutions

Affiliations

MrTADFinder: A network modularity based approach to identify topologically associating domains in multiple resolutions

Koon-Kiu Yan et al. PLoS Comput Biol. .

Abstract

Genome-wide proximity ligation based assays such as Hi-C have revealed that eukaryotic genomes are organized into structural units called topologically associating domains (TADs). From a visual examination of the chromosomal contact map, however, it is clear that the organization of the domains is not simple or obvious. Instead, TADs exhibit various length scales and, in many cases, a nested arrangement. Here, by exploiting the resemblance between TADs in a chromosomal contact map and densely connected modules in a network, we formulate TAD identification as a network optimization problem and propose an algorithm, MrTADFinder, to identify TADs from intra-chromosomal contact maps. MrTADFinder is based on the network-science concept of modularity. A key component of it is deriving an appropriate background model for contacts in a random chain, by numerically solving a set of matrix equations. The background model preserves the observed coverage of each genomic bin as well as the distance dependence of the contact frequency for any pair of bins exhibited by the empirical map. Also, by introducing a tunable resolution parameter, MrTADFinder provides a self-consistent approach for identifying TADs at different length scales, hence the acronym "Mr" standing for Multiple Resolutions. We then apply MrTADFinder to various Hi-C datasets. The identified domain boundaries are marked by characteristic signatures in chromatin marks and transcription factors (TF) that are consistent with earlier work. Moreover, by calling TADs at different length scales, we observe that boundary signatures change with resolution, with different chromatin features having different characteristic length scales. Furthermore, we report an enrichment of HOT (high-occupancy target) regions near TAD boundaries and investigate the role of different TFs in determining boundaries at various resolutions. To further explore the interplay between TADs and epigenetic marks, as tumor mutational burden is known to be coupled to chromatin structure, we examine how somatic mutations are distributed across boundaries and find a clear stepwise pattern. Overall, MrTADFinder provides a novel computational framework to explore the multi-scale structures in Hi-C contact maps.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Overview of MrTADFinder.
The input of MrTADFinder is an intra-chromosomal contact map W. A null model E is obtained from W. Given a particular resolution γ; the chromosome is partitioned probabilistically in a way such that the objective function Q is maximized. The optimization is performed by a modification Louvain algorithm shown on the right. The algorithm is stochastic because the updating order of nodes is random. A boundary score is defined after multiple trials for all adjacent bins. Adjacent bins that are robustly assigned to two different TADs form a consensus boundary. The output of MrTADFinder is a set of consensus domains bound by the consensus domains.
Fig 2
Fig 2. Identification of TADs in multiple resolutions.
A) A part of the contact map of the chromosome 10 in hES cell. The greenish triangles below represent TADs called by MrTADFinder in three different resolutions. The TADs called agree well visually with the contact map. The blue triangles and red triangles represent TADs called in human ES cells and human IMR90 cells respectively as reported in [8]. B) The size of TADs called in different resolutions. The median TADs size decreases from 3 Mbp to 300 kbp as the resolution increases from 0.75 to 3.5. C) The number of TADs increases as the resolution increases. When γ = 2.25, there are about 2600 TADs in hES cells with a median size of roughly 1Mb. The median size goes down to 300kb when the resolution increases to 3.5. The number of TADs identified in [8] is marked by the arrow. Comparing TADs called by MrTADFinder with TADs called in [8]. Two algorithms agree the most in a particular resolution (γ ≈ 2.875).
Fig 3
Fig 3. Boundary signatures of histone modifications in different resolutions.
A) Histone modifications near the TAD boundary regions obtained in various resolutions. The peak density is obtained by counting the number of peaks in every 40kb bin, and normalized by a null model in which peaks are randomly distributed. B) Different histone marks show different levels of enrichment near TAD boundaries at different resolutions. Despite a general decreasing trend, the signal of certain marks likes H3K27me3 remains flat until a very high resolution.
Fig 4
Fig 4
A) Distribution of house-keeping genes and tissue-specific genes near TAD boundaries at different resolutions. House-keeping genes are more enriched near TAD boundaries as compared to tissue-specific genes. B) House-keeping genes and tissue-specific genes show different levels of enrichment near TAD boundaries at different resolutions. Tissue-specific genes show a general decreasing trend, whereas the number of house-keeping genes remains flat until a high resolution.
Fig 5
Fig 5. Transcription factors binding in different resolutions.
A) Enrichment of HOT (high-occupancy target) and XOT (extreme-occupancy target) regions near TAD boundaries in hES cell. Boundaries are identified by MrTADFinder at a resolution γ = 2.75. The y-axis is normalized by a null model that peaks are randomly distributed in along the chromosome. B) A logistic regression model to classify real TAD boundaries and random boundaries based on the binding pattern of 60 TFs. The most influential factors responsible for TAD boundaries formation at different resolutions are listed. Factors with a positive coefficient have a direct effect on border establishment or maintenance, whereas factors like MYC has a negative effect. The factors are sorted by corresponding P-values and only the significant factors are displayed.
Fig 6
Fig 6. The number of promoter-enhancer linkages connecting the endpoints of domains in different resolutions.
As the resolution increases, the increase in the number of boundaries can capture a higher number of potential interactions. The blue curve shows the increase for an ensemble of randomly reshuffled TADs. The number of promoter-enhancer linkages connecting the endpoints of real domains is higher than the random counterparts.
Fig 7
Fig 7. Mutational burdens across TAD boundaries.
The 3 clusters of boundary regions exhibit distinct patterns in terms of mutational burden. For blue and red clusters, the area marks the first and the third quartiles. For the green cluster, only the mean values at different positions are shown for clarity. The inset shows the average Repli-seq signal for the red and blue clusters.
Fig 8
Fig 8. Enrichment of CTCF peaks near TAD boundaries at two different resolutions.
The blue line shows the same analysis using TADs reported in [8].

Similar articles

Cited by

References

    1. Dekker J, Marti-Renom MA, Mirny LA. Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nat Rev Genet. 2013;14: 390–403. doi: 10.1038/nrg3454 - DOI - PMC - PubMed
    1. Risca VI, Greenleaf WJ. Unraveling the 3D genome: genomics tools for multiscale exploration. Trends Genet. 2015;31: 357–372. doi: 10.1016/j.tig.2015.03.010 - DOI - PMC - PubMed
    1. Rowley MJ, Corces VG. The three-dimensional genome: principles and roles of long-distance interactions. Curr Opin Cell Biol. 2016;40: 8–14. doi: 10.1016/j.ceb.2016.01.009 - DOI - PMC - PubMed
    1. Bonev B, Cavalli G. Organization and function of the 3D genome. Nat Rev Genet. 2016;17: 661–678. doi: 10.1038/nrg.2016.112 - DOI - PubMed
    1. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science. 2009;326: 289–293. doi: 10.1126/science.1181369 - DOI - PMC - PubMed

LinkOut - more resources