Information Maximizing Neural Networks
Published:
We would like to learn the probability of some model conditioned on the data p(model|data). In a typical Bayesian setup one would estimate this with a likelihood function p(data|model) and a prior probability p(model). This requires making two assumption about the functional form of the likelihood function. In cosmological experiements the likelihood function is assumed to be a multivariate Gaussian whose mean and covariance are given by the model and the data covariance matrix respectively. There are a few issues here:
- The covariance matrix is an NxN dimensional matrix where N can be as large as 100,000; thus difficult to estimate.
- The functional form of the likelihood might deviate from Gaussianity leading to biases and/or misestimation of uncertainties.
Approximate Bayesian Computation
Approximate Bayesian Computation (ABC) alleviates this problem by removing the likelihood function and replacing the Bayesian setup with the following steps:
- sample from the model prior
- simulate the data from the sampled model
- compute the distance between the simulated data and the observed data
- if the distance(simulated data , observed data) < threshold:
- accept the model
- else
- reject the model
In a high dimensional space the condition distance(simulated data , observed data) < threshold is almost impossible to meet.
Information Maximizing Neural Networks (IMNN)
Information
IMNN tries to solve the curse of dimensionality in ABC by finding a smart summarization of the data. That is, this model aims at finding a mapping between the data and a low-dimensional manifold (with the size of the parameters of the model, for example 6) such that amount information about the parameters of the model is maximized. The amount of information is captured by the Fisher Information Matrix that assumes a Gaussian Likelihood in the low-dimensional manifold.
Neural Nets
Then IMNN formulates the mapping between the space of the data and low dimensional maniford with a Neural Network. The weights and the biases of the Neural Net are found by maximizing the determinant of the Fisher information matrix in the low dimensional manifold. Like any other Neural Net, we need a set of training examples. In this case, each training example is a triplet of high dimensional simulations of the data. Let’s say the model is some function of the parameter (a) that we want to estimate: model = f(a), then the triplet simulations are given by (f(a-d), f(a), f(a+d)) where d is some sufficiently small number. The reason we use triplets is that computing the Fisher Information Matrix requires taking the derivative of the output of the Neural Network with respect to the parameter (a).
ABC with a low-dimensional summary of the data
Once we learn a mapping of the data to a low dimensional space, we can proceed with a normal ABC where we iterativatively sample from the prior, generate a simulation (here the low dimensional representation of the simulation) and compare it with the low dimensional representation of the data.
Pros and Cons
IMNN provides a smart eay of finding an optimal summary of the data. However, training the model requires having a large number of triplet simulations that may be a bottleneck when the simulations are expensive.