Monday, November 11, 2013

On war and peace

A while back Google released an Ngram Viewer tool which gives an opportunity to query the texts of many Google Books all at once. One can spend many hours trying out words and their different combinations and seeing how their frequency of use changed over the centuries. It tells you about popular trends and dying ideas. Today, through a series of random steps I found myself on Ngram Viewer quering for "inference", then "war". Here is a plot of occurrences of "war" in the many books since 1800 to the present day: Two bumps around 1920ies and 1940 are the two world wars. No other time points stand out - sorry, Iraq, Afghanistan, Sri-Lanka, East Timor, and many-many others.
Well, are we doing better in terms of promoting peace? Apparently not:

We use "peace" less and less. Sure, we spoke about peace a little more often during the first and second World Wars, but the general decline in its use continues. I wonder if "peace" is simply being replaced by another word, or does it mean that we are all doomed? A quick search through synonyms of "peace" offers a glimpse of hope in "nonviolence":

and "ceasefire":

although neither of these words is being used quite as often as "peace".

Well, be it non-violence or ceasefire, there is still hope that we will survive as a human race!

Thursday, August 15, 2013

Alternative domains highlighted in a review

Spatial proximity between far-away parts of DNA has implications for DNA accessibility, gene expression and regulation. The information about which parts of DNA are physically co-located in the nucleus can be obtained through various Hi-C experiments: 3C (chromatin conformation capture), 4C (circularized 3C), 5C (carbon copy 3C), and some others. Our research group has been working a lot with Hi-C lately, so I made it a habit to scan my Feedly collection of titles from various journals for new relevant Hi-C material. As I was browsing, I saw this article in Nature Reviews talking about Hi-C data. The keypoints section drew my attention because our research group has developed some methods that fit with these outlined research problems almost directly! In particular, I wanted to highlight these two points:
  • Mining increasingly comprehensive chromatin interaction maps for chromosomal domains and complete genomes requires novel computational methods and modeling tools.
  • Statistical analysis of Hi-C data identifies multiple scales of domain organization: larger (1–10 Mb) chromosomal compartments and smaller (less than 1 Mb) topologically associating domains
Topological domains were first mentioned in a paper by Dixon et al. that appeared in Nature last year. Domains are contiguous segments of DNA that self-interact a lot more frequently than they interact with the rest of the chromosome. The original paper proposed an HMM method for finding domains and reported a collection of domains found by their algorithm. The article got cited over a hundred times since then and these domains have appeared in multiple analyses.

However, as the review above suggests, there may be other domains that overlap, nest in, or completely contain domains reported by Dixon et al. One look at the interaction matrix is enough to convince oneself of their existence:
Here I plotted a submatrix of human chromosome 22 (hESC cells) with each cell in the matrix representing 40Kbp of sequence. I highlighted several potential domains in purple, but there are many more just in this one snapshot.

Guided by what we saw in the matrices, we have developed a simple dynamic programming approach that finds alternative domains of various sizes. When applied to the same Hi-C interaction matrices that were used by Dixon et al, we were able to identify domains that are significantly different from Dixon's domains, yet are, in some cases, more enriched for certain chromatin marks. We are working on an extension to this problem that will find other optimal solutions to our formulation and plan on releasing the code soon. Our preprint is already posted on the ArXiv and will appear in this year's WABI, and now there is a review that calls for methods like ours to search for alternative domains. What a boost of confidence!

Sunday, November 18, 2012

Dental health insurance

Going to a dentist is a stressful experience for most of us. However, trying to understand your dental insurance may stress you out even more, especially if your insurance is anything like mine. Here is a piece of paper that claims to "explain" the charges from my recent visit to a dentist to get a filling:

I studied this piece of paper in detail for some time, and, after trudging past the quirky vocabulary, I remain puzzled:
Problem 1: What? Does $200 + $200 + $200 + $200 equal $400?
Problem 2: What is "Allowance"? What determines how much the insurance company will pay for a procedure?
Problem 3: How is "amount paid" determined? Why is the "allowed amount" on line 4 $60, but insurance company decides to pay only $16? How is line 4 different from line 2: same procedure, same charge and allowance, but insurance company made a different call on that. Why?
Problem 4: What are these codes in the "Remarks" section? Aaaah, there they are on the next page. At least an indication of what these codes are would be helpful. Or, you can go crazy and explain them away on this very page in the whitespace below the table!

The company's website, where I went to check what I am covered for, is also... wanting. Here is an example of what one sees when they go to check their coverage:
Looks reasonable, doesn't it? But hold on, let's check out the coverage details for the fillings:
Whoa, how do I navigate through this mess? Where are the fillings?
Problem 1: Is this a list? Is this free text?  Are "Post removal" and "Enamel Microabrasion" the only two procedures (due to the visual cues - the dashes) or is every line a separate procedure?
Problem 2: Are the procedures ordered in any reasonable way, say, alphabetic, making it easier to navigate? How can one search for a specific procedure other than by reading the whole list?
Problem 3: What are these codes in parentheses? I guess these are procedure codes, but how do I know if my dentist is about to install D2391 and not D2394 into my mouth? And since there are codes for everything, why isn't there a code for the "comparable amalgam filling"?
Problem 4: Why wouldn't the text span the whole line?..

I think you get the point: either I'm a high school dropout, or this is really hard to read for a common person. A couple of gentle touches will make the "EOB" better, same goes for coverage - just adding some organization to the text improves its readability dramatically:
(the sad truth is nothing is covered). 
After trying to understand my dental benefits and trying to read  the "explanation" of such, I can only agree with our president: yes, we need a reform in health care. But this reform has to start at the very beginning, with people being able to understand their insurance. Improving the communication and readability of the benefits and coverage is the easiest step to make and, possibly, one of the most important.

Sunday, March 18, 2012

clustering a graph using Python

For those of you who are interested in graph clustering and use Python, here is a Python implementation of a clustering algorithm that finds non-overlapping modules that maximize graph's modularity.

Maximal vs maximum



Here I am again: what do I use in a sentences, "maximal" or "maximum"? After some digging around, I discovered that Wikipedia has a whole page explaining what a maximal element is, but in short:

  • "maximal" - a local maximum,

  • "maximum" - an absolute maximum

Wednesday, December 07, 2011

About a year ago I switched from JabRef to Mendeley and fter this unusually long test run I am quite happy with it:

  • it keeps my references in a cloud, so I can easily sync refs on my laptop and desktop machines;
  • I have sorted my references into folders by their topic and by the project;
  • my colleagues and I have created thematic groups through which we share relevant publications;
  • I can search for papers at mendeley.com -- comes in handy when I am outside of university's network and do not have access to some journals
  • I can add papers to Mendeley through their web importer clicking on a bookmark when I view the paper at the publisher's website

Wednesday, September 07, 2011

Despite all the pluses and minuses that a anonymous peer-review system has, there is one major issue that I have with it: bad reviewers. I'm not talking about reviewers who criticized my work, I mean those who are lazy, incompetent, or simply angry at the world, those who have no idea what my work is yet they talk about it with such authority!
I'd like to meet:

  • reviewer who commented on a great video I submitted along with a paper that had no video

  • reviewer who claimed I had no video - yet the other two reviewers found the video helpful (there was indeed a video)

  • reviewer who claimed that TF-IDF scores were not appropriate for text mining

  • reviewer who claimed "no related work as cited or discussed" for a paper with a page of related work and ~40 references. Given, it was not in a separate section titled "Related work" which, apparently, made it impossible to notice

  • reviewer who noted that my draft needed "some wordsmithing" and continued on with: "I belief...", "no future more section", "in the introduction was stated...", "explore and asses similarity", "an consensus", "requirements were not disused", " identified cores seems to be..." in a broken English. Really?



Is this the best we can be?

Tuesday, August 16, 2011

Tuesday, August 09, 2011

Graduate school is not easy, but it's definitely fun - otherwise how could you survive 5 years of hard work? An important part of grad student life is choosing a topic to work on that will treat you well - you will enjoy woking on it and you will generate new knowledge (in units of posters, presentations, papers, talks, even books). Uri Alon gives tips on how to choose a good scientific problem - a short read that can save you some pain later in your career!