Example 3: Writing your own EstimationModel or RegressionModel#

The package provides two abstract base classes that can be subclassed to define custom statistical models:

regmmd.models.base_model.EstimationModel — for unconditional parameter
estimation (used with regmmd.MMDEstimator).
regmmd.models.base_model.RegressionModel — for conditional regression (used with regmmd.MMDRegressor). Extends EstimationModel with prediction support.

Both classes follow a par_v / par_c convention:

par_v — variable parameters that are optimized by the solver.
par_c — constant parameters that are held fixed during optimization.

Required Methods#

EstimationModel#

Subclasses of EstimationModel must implement the following methods:

sample_n(self, n): Draw n i.i.d. samples from the distribution under the current parameters. Used internally by the MMD objective to simulate from the model.
log_prob(self, x): Return the log-likelihood evaluated at each point in x, an array of shape (n_samples,) or (n_samples, n_features).
score(self, x): Return the gradient of the log-likelihood with respect to par_v at each point in x. The output shape must be compatible with par_v: a scalar gradient per sample when par_v is a scalar, or an array of shape (n_samples, len(par_v)) when par_v is a vector.
_init_params(self, X): Initialize the model parameters from the data X when par_v or par_c are None. Should call and return _get_params() at the end.
_get_params(self): Return the current (par_v, par_c) tuple so that the optimizer can read the parameter state.
_project_params(self, par_v): Project par_v onto the feasible set after each optimizer step. Use this to enforce constraints (e.g. positivity of a scale parameter). Return par_v unchanged if no constraints are required.
update(self, par_v): Write a new value of par_v back into the model’s attributes (e.g. self.loc = par_v). Called by the optimizer after each gradient step.

RegressionModel#

RegressionModel extends EstimationModel. In addition to all methods listed above (with signatures adapted for the joint input \((X, y)\)), the following extra method is required:

sample_n(self, n, mu_given_x): Draw n samples from the conditional distribution \(p(y \mid X, \text{par\_v}, \text{par\_c})\), given the predicted mean mu_given_x.
predict(self, X): Return the predicted conditional mean \(\mathbb{E}[Y | X]\) under the current parameters.
_init_params(self, X, y): Same role as in EstimationModel, but receives both X and y.

Implementing a Custom EstimationModel#

The example below implements a simple Poisson estimation model where the rate parameter lam is the only variable parameter.

import numpy as np
from regmmd.models.base_model import EstimationModel


class PoissonEstimation(EstimationModel):
    def __init__(self, par_v=None, par_c=None, random_state=None):
        # par_v: lambda (rate), par_c: unused
        self.lam = par_v
        self.rng = np.random.default_rng(seed=random_state)

    def sample_n(self, n):
        return self.rng.poisson(lam=self.lam, size=(n,))

    def log_prob(self, x):
        return x * np.log(self.lam) - self.lam - np.log(np.math.factorial(x))

    def score(self, x):
        # d/d(lam) log p(x | lam) = x / lam - 1
        return x / self.lam - 1

    def _init_params(self, X):
        if self.lam is None:
            self.lam = np.mean(X)
        return self._get_params()

    def _get_params(self):
        return self.lam, None

    def _project_params(self, par_v):
        # Rate must be strictly positive
        return max(1e-6, par_v)

    def update(self, par_v):
        self.lam = par_v

This model can then be passed directly to regmmd.MMDEstimator:

from regmmd import MMDEstimator

mmd_estim = MMDEstimator(
    model=PoissonEstimation(),
    par_v=None,
    par_c=None,
    kernel="Gaussian",
    solver={"type": "GD", "burnin": 200, "n_step": 500,
            "stepsize": 0.1, "epsilon": 1e-4},
    random_state=42,
)
res = mmd_estim.fit(X=x)

Implementing a Custom RegressionModel#

The example below implements a Poisson regression model where \(\mathrm{log}(\mathbb{E}[Y \mid X]) = X^T \beta\), with \(\beta\) as the only variable parameter.

import numpy as np
from regmmd.models.base_model import RegressionModel


class PoissonRegression(RegressionModel):
    def __init__(self, par_v=None, par_c=None, random_state=None):
        # par_v: beta coefficients, par_c: unused
        self.beta = par_v
        self.rng = np.random.default_rng(seed=random_state)

    def predict(self, X):
        # Conditional mean: E[Y | X] = exp(X @ beta)
        return np.exp(X @ self.beta)

    def sample_n(self, n, mu_given_x):
        return self.rng.poisson(lam=mu_given_x, size=(n,))

    def log_prob(self, X, y):
        mu = self.predict(X)
        return y * np.log(mu) - mu  # ignoring log(y!) constant

    def score(self, X, y):
        # d/d(beta) log p(y | X, beta) = X^T (y - mu)
        mu = self.predict(X)
        return X * (y - mu)[:, np.newaxis]

    def _init_params(self, X, y):
        if self.beta is None:
            self.beta = np.zeros(X.shape[1])
        return self._get_params()

    def _get_params(self):
        return self.beta, None

    def _project_params(self, par_v):
        return par_v  # no constraints on beta

    def update(self, par_v):
        self.beta = par_v

This model can then be passed directly to regmmd.MMDRegressor:

from regmmd import MMDRegressor

mmd_reg = MMDRegressor(
    model=PoissonRegression(),
    par_v=None,
    par_c=None,
    fit_intercept=False,
    bandwidth_X=0,
    bandwidth_y=1,
    kernel_y="Gaussian",
    solver={"burnin": 200, "n_step": 2000,
            "stepsize": 0.1, "epsilon": 1e-6},
    random_state=42,
)
res = mmd_reg.fit(X, y)