Thursday, December 12, 2013

Fun with custom tracks in WashU EpiGenome Browser

In a course of working on a multiscale domains project (see [2]), I came to a point where I wanted to plot domains against the epigenetic tracks and associated 3C data. WashU EpiGenome Browser contains a lot of such data and also supports custom tracks -- perfect! Below I describe how to prepare a custom track for overlapping and non-overlapping domains.

Dixon et al [1] domains

Dixon et al domains are non-overlapping and so I plotted them as a categorical track in WashU EpiGenome Browser.

Initial domain file looks like this:

chr1 760000 1280000
chr1 1280000 1840000
chr1 1840000 2320000
...

First, I added a column at the end to represent a category and alternate between the categories 1 and 2 for every other domain:

chr1    760000  1280000 1
chr1    1280000 1840000 2
chr1    1840000 2320000 1
...

Now, the file looks like a bedGraph format and we can use tabix:

bgzip -c dixon.domains.bedgraph > dixon.domains.bedgraph.gz
tabix -p bed dixon.domains.bedgraph.gz

The first command gzips the bedGraph file, the second one generates an index *.tbi file. Now copy both *.gz and *.tbi to your webserver and you can load the tracks from the EpiGenome Browser using the "CustomTK" menu.

Dixon domains URL: http://www.cs.cmu.edu/~dfilippo/epitracks/dixon.domains.gz
Domains are alternating yellow and orange blocks (each block is a domain), 3C matrix for the same cell line (IMR90, or human fibroblast) is shown in pink:



Multiscale domains [2]

Multiscale domains generated by Armatus change from one gamma setting to another, and so it was important that the track supports overlapping items. Multiscale domains are similar to a collection of genes (if you consider their various isoforms) and are best represented by a bed annotation track.

As before, initial domain file looked like this:

chr20   61640000        62440000
chr20   60760000        61520000
chr20   60600000        60640000

...

Now, since we may have several domains start at the same position, we need to sort domains by the start position, or otherwise tabix will throw an error:

chr20   40000   26280000
chr20   40000   80000

chr20   40000   120000

Next, to fit in with the format of the a bed file, we need to add 3 more fields to every line: a label assigned to this domain (a simply set it to indicate what gamma the domain comes from), a category (in the case of domains I assigned unique categories to individual gamma settings), and an indicator of the forward/reverse strand (we don't really need the forward/reverse information, so it could be replaced with a "."):

chr20   40000   26280000        0.0     1       +
chr20   40000   80000   0.0     1       +
chr20   40000   120000  0.0     1       +

Now that the bed file is ready, I use tabix tools again (same as above):

bgzip -c chr20-all-gamma.bed > chr20-all-gamma.bed.gz
tabix -p bed chr20-all-gamma.bed.gz

Upload the newly created *.gz and *.tbi files to the webserver and they are ready to be used within the WashU Epigenome Browser. Below, Armatus' domains are shown in green and vary between different scales capturing many alternative domains:


The labels on the domains correspond to the length scale at which they were observed (gamma).

My multiscale domains URL: http://www.cs.cmu.edu/~dfilippo/epitracks/chr20-all-gamma.bed.gz (chromosome 20 only).

Referi


[1] Dixon et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 485 (376-380). 2012.
[2] Filippova et al. Multiscale identification of topological domains in chromatin. WABI 2013.
Preparing custom track (WashU Epigenome Browser help)
Making a custom gene track (Epi Browser author writes out some commands to make a gene track)
Browser file formats (UCSC's FAQ on file formats for the UCSC and Epigenome Browsers)


UPD: one more reference custom gene tracks (WashU Epigenome Browser help)

Friday, December 06, 2013

Twinlist: When Simple is Powerful

I do not take pills too often, mostly it's something for a headache once every couple of months or allergy whenever the flowers are in bloom. However, there are people around us that take many medications daily, especially people with chronic conditions and the elderly. Keeping track of medications within an electronic medical record (EMR) for such patients is, not surprisingly, a challenge: the patient may switch to a generic drug, but the EMR lists it under a brand name, the dosage may be off, or instead of orally the patient now takes the drug intravenously. Reconciling what the patient is actually taking and what is recorded in the system is a boring and daunting task: one has to pay attention to details like "4mg" vs "7 mg" and meticulously compare drug names -- not something a busy physician can spend a lot of time on. However, this problem shows up every time the provider's data is out of date. This could be during an emergency visit to the hospital or during a routine visit to a physician or a specialist. Keeping track of the current medications is important when adjusting or prescribing new treatment and as to avoid prescribing drugs and procedures that may be incompatible with current medications.

The Human-Computer Interaction Lab (HCIL) at University of Maryland took on this challenge and produced a simple, yet powerful way to automate medication list reconciliation (a medical term for a problem described above). Computers are good at exact comparisons -- and that's what researchers at HCIL used to find discrepancies in dosage, path of administration, and frequency for the same drugs coming from two different lists. Inexact and ambiguous matches are then presented to the user who can decide how to proceed. Through intuitive highlighting and animation that makes it easier to follow the process they have made reconciliation quick and managed to reduce human error.


What I appreciate the most about this interface is its simplicity (it is a list after all). It is easy to identify the drugs that match and the drugs that differ, drugs that are unique to the hospital and drugs that are unique to a patient. Similar drugs are aligned on the same line visually encoding the relationship. The details that differ between similar drugs are subtly highlighted in yellow as to not crowd the overall picture with too much information. The careful use of animation makes a seemingly boring task more fun. Power users may skip the animation and use the overhead menu instead.

I am very impressed with HCIL's continued success and if I ever find myself having to manage 20 different drugs, I would want my physician to use Twinlist for reconciliation! They took a mundane task that no one likes and turned it into an easy exercise for the eyes improving medical care in the end. Having seen some EMRs and what a tangled mess they are, I believe it is solutions like this -- making mundane data entry/cleaning/updating -- that can significantly improve care and save doctors some precious time.

Code and more details about this award-winning design are available on Twinlist's page.