Thursday, August 15, 2013

Alternative domains highlighted in a review

Spatial proximity between far-away parts of DNA has implications for DNA accessibility, gene expression and regulation. The information about which parts of DNA are physically co-located in the nucleus can be obtained through various Hi-C experiments: 3C (chromatin conformation capture), 4C (circularized 3C), 5C (carbon copy 3C), and some others. Our research group has been working a lot with Hi-C lately, so I made it a habit to scan my Feedly collection of titles from various journals for new relevant Hi-C material. As I was browsing, I saw this article in Nature Reviews talking about Hi-C data. The keypoints section drew my attention because our research group has developed some methods that fit with these outlined research problems almost directly! In particular, I wanted to highlight these two points:
  • Mining increasingly comprehensive chromatin interaction maps for chromosomal domains and complete genomes requires novel computational methods and modeling tools.
  • Statistical analysis of Hi-C data identifies multiple scales of domain organization: larger (1–10 Mb) chromosomal compartments and smaller (less than 1 Mb) topologically associating domains
Topological domains were first mentioned in a paper by Dixon et al. that appeared in Nature last year. Domains are contiguous segments of DNA that self-interact a lot more frequently than they interact with the rest of the chromosome. The original paper proposed an HMM method for finding domains and reported a collection of domains found by their algorithm. The article got cited over a hundred times since then and these domains have appeared in multiple analyses.

However, as the review above suggests, there may be other domains that overlap, nest in, or completely contain domains reported by Dixon et al. One look at the interaction matrix is enough to convince oneself of their existence:
Here I plotted a submatrix of human chromosome 22 (hESC cells) with each cell in the matrix representing 40Kbp of sequence. I highlighted several potential domains in purple, but there are many more just in this one snapshot.

Guided by what we saw in the matrices, we have developed a simple dynamic programming approach that finds alternative domains of various sizes. When applied to the same Hi-C interaction matrices that were used by Dixon et al, we were able to identify domains that are significantly different from Dixon's domains, yet are, in some cases, more enriched for certain chromatin marks. We are working on an extension to this problem that will find other optimal solutions to our formulation and plan on releasing the code soon. Our preprint is already posted on the ArXiv and will appear in this year's WABI, and now there is a review that calls for methods like ours to search for alternative domains. What a boost of confidence!