A unified framework for inferring the multi-scale organization of chromatin domains from Hi-C - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 16;17(3):e1008834.
doi: 10.1371/journal.pcbi.1008834. eCollection 2021 Mar.

A unified framework for inferring the multi-scale organization of chromatin domains from Hi-C

Affiliations

A unified framework for inferring the multi-scale organization of chromatin domains from Hi-C

Ji Hyun Bak et al. PLoS Comput Biol. .

Abstract

Chromosomes are giant chain molecules organized into an ensemble of three-dimensional structures characterized with its genomic state and the corresponding biological functions. Despite the strong cell-to-cell heterogeneity, the cell-type specific pattern demonstrated in high-throughput chromosome conformation capture (Hi-C) data hints at a valuable link between structure and function, which makes inference of chromatin domains (CDs) from the pattern of Hi-C a central problem in genome research. Here we present a unified method for analyzing Hi-C data to determine spatial organization of CDs over multiple genomic scales. By applying statistical physics-based clustering analysis to a polymer physics model of the chromosome, our method identifies the CDs that best represent the global pattern of correlation manifested in Hi-C. The multi-scale intra-chromosomal structures compared across different cell types uncover the principles underlying the multi-scale organization of chromatin chain: (i) Sub-TADs, TADs, and meta-TADs constitute a robust hierarchical structure. (ii) The assemblies of compartments and TAD-based domains are governed by different organizational principles. (iii) Sub-TADs are the common building blocks of chromosome architecture. Our physically principled interpretation and analysis of Hi-C not only offer an accurate and quantitative view of multi-scale chromatin organization but also help decipher its connections with genome function.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The hierarchical organization of interphase chromosome and Hi-C map.
(A) Chromosome territories in the cell nucleus, which are manifested as the higher intra-chromosomal counts in the Hi-C map. (B) Alternating blocks of active and inactive chromatins, segregated into A- and B-compartments, give rise to the checkerboard pattern on Hi-C. (C) Sub-megabase to megabase sized chromatin folds into TADs. Adjacent TADs are merged to meta-TAD [8], and individual TAD is further decomposed into sub-TADs [, –31].
Fig 2
Fig 2. An overview of the Multi-CD method.
(A) We first pre-process Hi-C data to extract a correlation matrix C. Given C, we infer the chromatin domain (CD) solutions s at multiple scales by varying a single parameter λ. At each λ, the best domain solution is found through simulated annealing, in which the effective temperature T is gradually decreased (inner blue box). A complete Multi-CD algorithm involves repeating the process for different values of λ (outer red box), to obtain a family of solutions. (B) An example of the normalized Hi-C matrix, (C) and the correlation matrix that results from pre-processing. Shown is the full chromosome 10 in GM12878 (50-kb Hi-C), which is used as an example dataset throughout the paper. (D) Simplified schematic for the resulting family of domain solutions, {sλ}, at varying parameter λ. Each s is a vector of domain indices; line breaks illustrate domain boundaries. These solutions are not meant to be the optimal solutions for the shown data, but they illustrate how the typical domain scale increases with λ.
Fig 3
Fig 3. Multi-scale chromatin domain solutions for various cell types.
(A) A subset of 50-kb resolution Hi-C data, covering a 10-Mb genomic region of chr10 in GM12878. (B) The cross-correlation matrix Cij for the corresponding subset. (C) Multi-CD applied to the correlation matrix in B. Domain solutions determined at 4 different values of λ = 0; 10; 30; 50. (D) Hi-C data from the same chromosome (chr10) in four other cell lines: HUVEC, NHEK, K562, and KBM7. Same subset as in A. (E-G) Characteristics of the domain solutions determined for all five cell lines in A and D: (E) the average domain size, 〈n〉 (F) the index of dispersion in the domain size, D(=σn2/n) (G) the normalized mutual information, nMI. (H-I) Comparison of domain solutions across cell types. (H) Average cell-to-cell similarity of domain solutions, in terms of Pearson correlations, at varying λ. (I) Domain solutions obtained at λ = 10 for 5 different cell types. See S3 Fig for solutions at λ = 0 and λ = 40. (J) Similarity between domain solutions at different λ’s, shown for GM12878. See S4 Fig for corresponding results for the other four cell lines. (K) RNA-seq signals from the five cell lines (colored hairy lines), on top of the TAD solutions (filled boxes), in a genomic interval that contains the regulatory elements associated with a gene APBB1IP. APBB1IP is transcriptionally active only in two cell lines, GM12878 and KBM7, where the regulatory elements are fully enclosed in the same TAD. See S5 Fig for additional examples.
Fig 4
Fig 4. Domain solutions for compartments.
(A) Input correlation data for compartment identification. The 2-Mb diagonal band was removed. (B) Lower triangle: CDs obtained at λ = 90, based on the diagonal-band-removed data, which we identify as the compartments. Upper triangle: the CO/E matrix shown for comparison. (C) Same pair of data, after re-ordering to collect the two largest CDs in our solution, k = 1 and k = 2 (lower triangle). The CO/E is simultaneously reordered to show a clear separation of correlation patterns (upper triangle). (D) Intra-domain contact profiles for the two CDs k = 1 and k = 2. The domain solution k = 1 is locally more compact, with more contacts at short genomic distances. We therefore identify k = 1 as the B-compartment, and k = 2 as the A-compartment. (E) nMI between CDs at varying λ and CO/E, showing a plateau in the range 70 ≤ λ ≤ 100. CDs inferred by Multi-CD show consistently higher nMI, compared to sub-compartments (dashed line) and compartments (dotted line) from a previous method [19].
Fig 5
Fig 5. Hierarchical organization of CD families.
(A) Hierarchical structure of CDs are highlighted with the domain solutions for sub-TADs (red), TADs (green), meta-TADs (blue) and compartments (black). Shown for chr10 of GM12878. Each square panel overlays a pair of CD solutions; number above the panel reports the nestedness score. Inset: a reprint of the nestedness scores in a tetrahedral visualization with the four representative CD solutions. (B) A schematic diagram of inferred hierarchical relations between sub-TADs, TADs, meta-TADs and compartments, based on our calculation of nestedness scores.
Fig 6
Fig 6. Validation of CD solutions from Multi-CD.
(A) In terms of the normalized mutual information between the CD solutions and the input data, Multi-CD outperforms ArrowHead, DomainCaller and GaussianHMM at the corresponding scales (sub-TAD, TAD and compartment). (B) The correlation function χ(d) between CTCF signals and the domain boundaries. Shown for sub-TADs and TADs, obtained from Multi-CD (left); from ArrowHead and DomainCaller (right). (C) Genome-wide, locus-dependent replication signal. Top panel shows the A- (blue) and B- (red) compartments inferred by Multi-CD. Bottom panels show the replication signals in six different phases in the cell cycle, shaded in matching colors for the two compartments. (D) Pearson correlation between the replication signals and the two compartments A (filled blue) and B (open red).

Similar articles

Cited by

References

    1. Misteli T. Beyond the sequence: cellular organization of genome function. Cell. 2007;128(4):787–800. 10.1016/j.cell.2007.01.028 - DOI - PubMed
    1. Davies JO, Oudelaar AM, Higgs DR, Hughes JR. How best to identify chromosomal interactions: a comparison of approaches. Nat Methods. 2017;14(2):125. 10.1038/nmeth.4146 - DOI - PubMed
    1. Dekker J, Rippe K, Dekker M, Kleckner N. Capturing chromosome conformation. Science. 2002;295(5558):1306–1311. 10.1126/science.1067799 - DOI - PubMed
    1. Bickmore WA, van Steensel B. Genome architecture: domain organization of interphase chromosomes. Cell. 2013;152(6):1270–1284. 10.1016/j.cell.2013.02.001 - DOI - PubMed
    1. Franke M, Ibrahim DM, Andrey G, Schwarzer W, Heinrich V, Schöpflin R, et al.. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature. 2016;538(7624):265–269. 10.1038/nature19800 - DOI - PubMed

Publication types

Grants and funding

This work was supported in part by a KIAS Individual Grant at Korea Institute for Advanced Study (No. CG035003 to C.H.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources