GoGP – Infergo — Go programs that learn

GoGP is a library for probabilistic programming around Gaussian processes. It uses Infergo for automatic differentiation and inference.

GoGP is built around a dual view on the Gaussian process

as a stochastic process,
as a probabilistic model with respect to kernel.

Gaussian process instance

GP, the Gaussian process type, encapsulates similarity and noise kernels, their parameters, and observation inputs and outputs:

// Type GP is the barebone implementation of GP.
type GP struct {
	NDim                   int       // number of dimensions
	Simil, Noise           Kernel    // kernels
	ThetaSimil, ThetaNoise []float64 // kernel parameters

	X [][]float64 // inputs
	Y []float64   // outputs

	Parallel bool // when true, covariances are computed in parallel
}

Public methods defined on GP fall into two groups: Gaussian process fitting and prediction, on one hand, and probabilistic model interface, on the other hand.

Gaussian process methods

Absorb updates GP with observations, Produce returns predicted outputs for unseen inputs.

func (*GP) Absorb(x [][]float64, y []float64)
func (*GP) Produce(x [][]float64) (mu, sigma []float64)

When kernel parameters are fixed (known or learned), Absorb and Produce are all that is needed for posterior Gaussian process inference.

Probabilistic model methods

Observe and Gradient turn a GP instance into an elemental Infergo model (that is, a model with supplied gradient). The model can be passed to inference algorithms or used within another model.

func (*GP) Observe(x []float64) float64
func (*GP) Gradient() []float64 // `elemental' model

If the length of x is equal to the number of kernel hyperparameters (Simil.NTheta() + Noise.NTheta()) then the gradient of marginal likelihood is computed with respect to kernel hyperparameters only. Observations must be provided in the fields of the GP instance. Otherwise, if the length of x is greater than the number of parameters, the rest of x is interpreted as observations. In the latter case, the gradient is computed with respect to both kernel hyperparameters and observations.

Hyperparameter priors

Type Model is a wrapper model combining a GP instance and priors on hyperparameters into an elemental Infergo model.

type Model struct {
	*GP
	Priors model.Model
}

GP holds a Gaussian process instance and Priors holds an instance of the model expressing beliefs about hyperparameters.

Kernels

There are two kernel kinds:

similarity kernel;
noise kernel.

Both kinds must satisfy the Kernel interface:

type Kernel interface {
	Observe([]float64) float64
	NTheta() int
}

The Observe method computes the variance or covariance, the NTheta method returns the number of kernel parameters.

A similarity (or covariance) kernel receives concanetation of kernel parameters and coordinates of two points. A noise kernel receives concatenation of kernel parameters and coordinates of a single point. Here is an example implementation of the RBF (or normal) kernel:

type RBF struct{}

func (RBF) Observe(x []float64) float64 {
	l, xa, xb := x[0], x[1], x[2]
	d := (xa - xb) / l
	return math.Exp(-d * d / 2)
}

func (RBF) NTheta() int { return 1 }

Frequently used primitive kernels are included in the library.

Case studies

GoGP includes case studies, illustrating, on simple examples, common patterns of GoGP use. We briefly summarize here some of the case studies.

Basic usage

In the basic case, similar to that supported by many Gaussian process libraries, a GP instance directly serves as the model for inference on hyperparameters (or the hyperparameters can be just fixed).

The library user specifies the kernel:

type Basic struct{}
func (Basic) Observe(x []float64) float64 {
    return x[0] * kernel.Normal.Observe(x[1:])
}
func (Basic) NTheta() int { return 2 }

and initializes GP with a kernel instance:

gp := &gp.GP{
    NDim:  1,
    Simil: Basic{},
}

MLE inference on hyperparameters and prediction can then be performed through library functions.

Priors on hyperparameters

If priors on hyperparameters are to be specified, the library user provides both the kernel and the prior beliefs:

// Similarity kernel
type Simil struct{}
func (Simil) Observe(x []float64) float64 {
	const (
		c = iota // trend scale
		l        // trend length scale
		xa        // first point
		xb        // second point
	)

	return x[c]*kernel.Matern52.Cov(x[l], x[xa], x[xb])
}
func (Simil) NTheta() int { return 2 }

// Noise kernel
type Noise struct{}
func (Noise) Observe(x []float64) float64 {
	return 0.01 * kernel.UniformNoise.Observe(x)
}
func (Noise) NTheta() int { return 1 }

// Hyperparameter priors
type Priors struct{}
func (*Priors) Observe(x []float64) float64 {
	return Normal.Logps(1, 1, x...)
}

A Model instance is then used to combine them:

m := &gp.Model{
	GP: &gp.GP{
		NDim:     1,
		Simil:    Simil,
		Noise:    Noise,
	},
	Priors: &Priors{},
}

Maximum a posteriori hyperparameter assignments are affected by both the data and the priors.

Uncertain observation inputs

When observation inputs are uncertain, beliefs about inputs can be specified, and the log-likelihood gradient can be computed with respect to both hyperparameters and observation inputs. For example, in a time series, one can assume that observation inputs come from a renewal process and let the inputs move relative to each other. Then, forecasting can be performed relative to posterior observation inputs.

Non-Gaussian noise

In basic usage, the observation noise is assumed to be Gaussian. This usage is supported by initializing GP with a noise kernel, along with a similarity kernel. When the noise is not Gaussian, an analytical solution for posterior Gaussian process inference does not always exist. However, non-Gaussian noise is straightforwardly supported by GoGP through supplying a model for beliefs on observation outputs:

type Noise struct {
	Y []float64 // noisy outputs
}
func (m *Noise) Observe(x []float64) (lp float64){
	// Laplacian noise
	for i := range m.Y {
		lp += Expon.Logp(1/math.Exp(x[0]),
			math.Abs(m.Y[i]-x[1+i]))
	}
	return lp
}