Welcome to Thinking About Bioinformatics! In this post, I’ll try to explain my goals for this blog, and the kind of topics I plan on writing about. For some time I’ve been very interested in the nature of information, and processes that produce information. Meanwhile, my work in bioinformatics has paid the bills. I’m going to use this blog as a place to try to start discussion of information producing processes with other people who are interested in these questions. So please feel free to post a comment, send email to me, or link your own blog material to these posts. Here are some of the kinds of topics I hope to write about:
- the general information metric hypothesis: the notion that there is a general measure of information that in some sense is the “answer to all questions” in science, i.e. the best experiment to do is the one that maximizes the information yield and rate of production. There are many interesting arguments both for and against this idea, so in my view this is a great place to put everything we think we know about “information” to a rather searching test. Much of what I’m going to post here focuses on the idea that we can only understand information in Bayesian terms, i.e. as a hidden property of observable variables.
- biology as information and information as biology: there are striking parallels between the population genetic theory of evolution and the theory of statistical inference. For example, the equation for the evolution over time `t` of an asexual haploid population `p_i(t)` under natural selection `W_i^t` is identical to Bayes Law:
`p_i(t)=\frac{p_i(0)W_i^t}{\sum_j p_j(0)W_j^t}`
Here the initial population frequency of a given genotype `p_i(0)` is exactly equivalent to the Bayesian prior probability, the haplotype’s fitness `W_i` is analogous to the likelihood of the observations in Bayes Law, and the population frequencies `p_i(t)` after time `t` are equivalent to the Bayesian posterior probability. So is evolution a big computer doing Bayesian inference on fitness? This is but a small example of a general point: most of the interesting questions about genomes and evolution are most productively understood as questions about information: about statistical inference, about algorithmic complexity etc. And biology probably has things to teach us about statistical inference, as the information-producing process par excellence.
- the theory of powertools: a powertool takes an extraordinary ability and converts it into a rote cycle of steps, rendering it scalable (which to me generally means info-linear, i.e. a simple linear relationship between the amount of information produced, and the amount of work required to produce it). For example, the calculus is a powertool. The most brilliant Greek mathematicians struggled to derive and prove volume equations for various curved solids, which now “any schoolboy” instructed in the calculus can obtain with ease. A powertool is the basis set for a space that “diagonalizes” an entire class of problems into an info-linear representation. Thus the theory of powertools is closely related to the theory of representation.
Anyway, these are my interests, and I’m eager to talk with others who share these interests.
Categorization: for better or worse, my interests and work range from pretty abstract to very concrete / practical (e.g. Pygr), so I’m going to try to assign each post to separate categories so people can pick out the parts they’re interested in from the rest.