Thursday, December 12, 2013

Fun with custom tracks in WashU EpiGenome Browser

In a course of working on a multiscale domains project (see [2]), I came to a point where I wanted to plot domains against the epigenetic tracks and associated 3C data. WashU EpiGenome Browser contains a lot of such data and also supports custom tracks -- perfect! Below I describe how to prepare a custom track for overlapping and non-overlapping domains.

Dixon et al [1] domains

Dixon et al domains are non-overlapping and so I plotted them as a categorical track in WashU EpiGenome Browser.

Initial domain file looks like this:

chr1 760000 1280000
chr1 1280000 1840000
chr1 1840000 2320000
...

First, I added a column at the end to represent a category and alternate between the categories 1 and 2 for every other domain:

chr1    760000  1280000 1
chr1    1280000 1840000 2
chr1    1840000 2320000 1
...

Now, the file looks like a bedGraph format and we can use tabix:

bgzip -c dixon.domains.bedgraph > dixon.domains.bedgraph.gz
tabix -p bed dixon.domains.bedgraph.gz

The first command gzips the bedGraph file, the second one generates an index *.tbi file. Now copy both *.gz and *.tbi to your webserver and you can load the tracks from the EpiGenome Browser using the "CustomTK" menu.

Dixon domains URL: http://www.cs.cmu.edu/~dfilippo/epitracks/dixon.domains.gz
Domains are alternating yellow and orange blocks (each block is a domain), 3C matrix for the same cell line (IMR90, or human fibroblast) is shown in pink:



Multiscale domains [2]

Multiscale domains generated by Armatus change from one gamma setting to another, and so it was important that the track supports overlapping items. Multiscale domains are similar to a collection of genes (if you consider their various isoforms) and are best represented by a bed annotation track.

As before, initial domain file looked like this:

chr20   61640000        62440000
chr20   60760000        61520000
chr20   60600000        60640000

...

Now, since we may have several domains start at the same position, we need to sort domains by the start position, or otherwise tabix will throw an error:

chr20   40000   26280000
chr20   40000   80000

chr20   40000   120000

Next, to fit in with the format of the a bed file, we need to add 3 more fields to every line: a label assigned to this domain (a simply set it to indicate what gamma the domain comes from), a category (in the case of domains I assigned unique categories to individual gamma settings), and an indicator of the forward/reverse strand (we don't really need the forward/reverse information, so it could be replaced with a "."):

chr20   40000   26280000        0.0     1       +
chr20   40000   80000   0.0     1       +
chr20   40000   120000  0.0     1       +

Now that the bed file is ready, I use tabix tools again (same as above):

bgzip -c chr20-all-gamma.bed > chr20-all-gamma.bed.gz
tabix -p bed chr20-all-gamma.bed.gz

Upload the newly created *.gz and *.tbi files to the webserver and they are ready to be used within the WashU Epigenome Browser. Below, Armatus' domains are shown in green and vary between different scales capturing many alternative domains:


The labels on the domains correspond to the length scale at which they were observed (gamma).

My multiscale domains URL: http://www.cs.cmu.edu/~dfilippo/epitracks/chr20-all-gamma.bed.gz (chromosome 20 only).

Referi


[1] Dixon et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 485 (376-380). 2012.
[2] Filippova et al. Multiscale identification of topological domains in chromatin. WABI 2013.
Preparing custom track (WashU Epigenome Browser help)
Making a custom gene track (Epi Browser author writes out some commands to make a gene track)
Browser file formats (UCSC's FAQ on file formats for the UCSC and Epigenome Browsers)


UPD: one more reference custom gene tracks (WashU Epigenome Browser help)

No comments: