Chapter 1 on probabilistic inference
February 7th, 2008Here are a couple of items relevant to my Feb. 7 intro session at IMA:
- Chapter 1 draft introduction to probabilistic inference.
- Paper on identification of SNPs from EST chromatogram data.
Here are a couple of items relevant to my Feb. 7 intro session at IMA:
I’ve been hacking a bit with Silverlight, Microsoft’s environment for running dynamic languages like Python and Ruby directly within a web browser like Firefox or IE. It seems to work quite well, and it’s easy to get Python code up and running in Silverlight. Indeed, it’s been surprisingly easy to get quite significant chunks of Python running in Silverlight — specifically, pygr, including large portions of code originally written for Pyrex (a mixed C / Python language environment). First I’ll describe my impressions of Silverlight and its implications for Python, then what I’ve accomplished with it.
Let’s examine the principles for a general process of learning, via a scientific example: could we program a robot to make scientific discoveries directly from raw observations? As an example problem, let’s take the discovery of the basic principles of genetics by Gregor Mendel and subsequent researchers.
I’m writing a textbook on Information Evolution. Or at least I thought I was — so far it mainly seems to be about the statistical inference side of “information”, as opposed to the “evolution” side. I suspect it will make more sense to make this focus on inference and methodology, and leave the science of how physical systems produce information for a later effort. If you have an interest in the basic issues I’m raising in the posts here, you may want to take a look at the first five draft chapters. That’s where the real meat is.
Does there exist an information metric with truly general utility? If so, a scientist could use it to choose which experiment to do: the best experiment is that one that yields the largest amount of information about the scientist’s question of interest (or, over the long-term, the highest information rate per unit time / expense). Indeed, if the metric were truly general, the scientist could use it to decide which research question is “most interesting” (again, compute the expected information yield for the different research directions). Actually, if such an information metric existed, the “scientist” could just be a robot, because all that is required is the ability to calculate this metric for different possible experiments (observations). This wouldn’t be artificial intelligence in the traditional sense of that field, but instead just a big statistical number-crunching computation. In a way, scientific computing at its dullest.
Read the rest of this entry »
Welcome to Thinking About Bioinformatics! In this post, I’ll try to explain my goals for this blog, and the kind of topics I plan on writing about. For some time I’ve been very interested in the nature of information, and processes that produce information. Meanwhile, my work in bioinformatics has paid the bills. I’m going to use this blog as a place to try to start discussion of information producing processes with other people who are interested in these questions. So please feel free to post a comment, send email to me, or link your own blog material to these posts. Here are some of the kinds of topics I hope to write about:
`p_i(t)=\frac{p_i(0)W_i^t}{\sum_j p_j(0)W_j^t}`
Here the initial population frequency of a given genotype `p_i(0)` is exactly equivalent to the Bayesian prior probability, the haplotype’s fitness `W_i` is analogous to the likelihood of the observations in Bayes Law, and the population frequencies `p_i(t)` after time `t` are equivalent to the Bayesian posterior probability. So is evolution a big computer doing Bayesian inference on fitness? This is but a small example of a general point: most of the interesting questions about genomes and evolution are most productively understood as questions about information: about statistical inference, about algorithmic complexity etc. And biology probably has things to teach us about statistical inference, as the information-producing process par excellence.
Anyway, these are my interests, and I’m eager to talk with others who share these interests.
Categorization: for better or worse, my interests and work range from pretty abstract to very concrete / practical (e.g. Pygr), so I’m going to try to assign each post to separate categories so people can pick out the parts they’re interested in from the rest.