## Empirical information metrics paper published

It is common to measure the information value of a model as its average prediction power for some observable variable of interest. Then the absolute goodness-of-fit of a statistical model to a set of observations can be formulated as the total remaining information obtainable by the set of all models that we have not yet computed (one of which might fit the observations much better than our current model). In a paper just published in Information I define this metric as the potential information, and show that it can estimated directly from the observations, without actually computing any of the remaining models.

This addresses a simple question in Bayesian inference: how do we know when we’re done? Bayesian inference is widely used in many disciplines, because it provides a general framework for evaluating the strength of evidence for a list of competing theories \Psi_i, given a set of experimental observations obs . If all problems could be solved by computing a short list of possible models (theories), this would be a good general strategy. In real-world scientific inference, however, we cannot a priori assume that the possibilities can be limited to a fixed list of models. So in practice we face a set of all possible models that is effectively infinite (or at least unmanageably large), of which we only calculate a small subset of terms. This raises the unsettling possibility that the correct model \Omega may not even be included in the subset of terms that we calculated.

Specifically, the probability of a model \Psi_i given a set of observations obs is calculated via Bayes’ Law:

p(\Psi_i|obs) = \frac{p(obs|\Psi_i)p(\Psi_i)}{p(obs)}

where the denominator p(obs) is calculated via the expansion p(obs)=\sum_i{p(obs|\Psi_i)p(\Psi_i)}. If the set of all possible models \Psi_i is infinite, we will only be able to calculate this sum for a subset of terms \Psi_1 … \Psi_n. This underestimates the total sum p(obs) and therefore overestimates the probability of any model p(\Psi_i|obs), perhaps grossly. The real question is whether the correct model \Omega was included in the calculated terms \Psi_1 … \Psi_n or not. Since by definition \Omega maximizes p(obs|\Psi), if included it may dominate the sum, and therefore our calculated probabilities may be reasonably accurate (in which case we are “done”). But if not, then they may be very inaccurate, and we would need to calculate more terms of the model series in the hopes of finding \Omega. So how do we know whether we’re done?

Unfortunately, Bayes’ Law does not answer this question. Intuitively, if the calculated subset of models is a poor fit to the observations, this will be reflected in a very low value for the calculated probability of the observations p(obs) — much smaller than “it should be”. So how large should p(obs) be? Again, Bayes’ Law does not answer this.

This question is directly relevant to understanding the scientific method mathematically, because it is related to Popper’s criterion of falsifiability, namely that a scientific theory is only useful if it makes predictions that could be shown to be wrong by experiments. Translated into Bayesian terms, this means showing that the fit of the calculated model terms to the experimental observations is not “good enough” — precisely the capability that Bayes’ Law lacks.

The potential information metric solves this problem by measuring the maximum amount of new information obtainable by computing all the remaining terms of the infinite model set. Its most interesting property is that we can measure it without actually computing any more terms of the infinite model set. For details on the metric and its relations with traditional information theory, see the paper.