.. _example3: Example 3: Writing your own EstimationModel or RegressionModel ============================================================== The package provides two abstract base classes that can be subclassed to define custom statistical models: - :py:class:`regmmd.models.base_model.EstimationModel` — for unconditional parameter estimation (used with :py:class:`regmmd.MMDEstimator`). - :py:class:`regmmd.models.base_model.RegressionModel` — for conditional regression (used with :py:class:`regmmd.MMDRegressor`). Extends ``EstimationModel`` with prediction support. Both classes follow a ``par_v`` / ``par_c`` convention: - ``par_v`` — *variable* parameters that are optimized by the solver. - ``par_c`` — *constant* parameters that are held fixed during optimization. Required Methods ---------------- EstimationModel +++++++++++++++ Subclasses of ``EstimationModel`` must implement the following methods: ``sample_n(self, n)`` Draw ``n`` i.i.d. samples from the distribution under the current parameters. Used internally by the MMD objective to simulate from the model. ``log_prob(self, x)`` Return the log-likelihood evaluated at each point in ``x``, an array of shape ``(n_samples,)`` or ``(n_samples, n_features)``. ``score(self, x)`` Return the gradient of the log-likelihood with respect to ``par_v`` at each point in ``x``. The output shape must be compatible with ``par_v``: a scalar gradient per sample when ``par_v`` is a scalar, or an array of shape ``(n_samples, len(par_v))`` when ``par_v`` is a vector. ``_init_params(self, X)`` Initialize the model parameters from the data ``X`` when ``par_v`` or ``par_c`` are ``None``. Should call and return ``_get_params()`` at the end. ``_get_params(self)`` Return the current ``(par_v, par_c)`` tuple so that the optimizer can read the parameter state. ``_project_params(self, par_v)`` Project ``par_v`` onto the feasible set after each optimizer step. Use this to enforce constraints (e.g. positivity of a scale parameter). Return ``par_v`` unchanged if no constraints are required. ``update(self, par_v)`` Write a new value of ``par_v`` back into the model's attributes (e.g. ``self.loc = par_v``). Called by the optimizer after each gradient step. RegressionModel +++++++++++++++ ``RegressionModel`` extends ``EstimationModel``. In addition to all methods listed above (with signatures adapted for the joint input :math:`(X, y)`), the following extra method is required: ``sample_n(self, n, mu_given_x)`` Draw ``n`` samples from the conditional distribution :math:`p(y \mid X, \text{par\_v}, \text{par\_c})`, given the predicted mean ``mu_given_x``. ``predict(self, X)`` Return the predicted conditional mean :math:`\mathbb{E}[Y | X]` under the current parameters. ``_init_params(self, X, y)`` Same role as in ``EstimationModel``, but receives both ``X`` and ``y``. Implementing a Custom EstimationModel -------------------------------------- The example below implements a simple Poisson estimation model where the rate parameter ``lam`` is the only variable parameter. :: import numpy as np from regmmd.models.base_model import EstimationModel class PoissonEstimation(EstimationModel): def __init__(self, par_v=None, par_c=None, random_state=None): # par_v: lambda (rate), par_c: unused self.lam = par_v self.rng = np.random.default_rng(seed=random_state) def sample_n(self, n): return self.rng.poisson(lam=self.lam, size=(n,)) def log_prob(self, x): return x * np.log(self.lam) - self.lam - np.log(np.math.factorial(x)) def score(self, x): # d/d(lam) log p(x | lam) = x / lam - 1 return x / self.lam - 1 def _init_params(self, X): if self.lam is None: self.lam = np.mean(X) return self._get_params() def _get_params(self): return self.lam, None def _project_params(self, par_v): # Rate must be strictly positive return max(1e-6, par_v) def update(self, par_v): self.lam = par_v This model can then be passed directly to :py:class:`regmmd.MMDEstimator`:: from regmmd import MMDEstimator mmd_estim = MMDEstimator( model=PoissonEstimation(), par_v=None, par_c=None, kernel="Gaussian", solver={"type": "GD", "burnin": 200, "n_step": 500, "stepsize": 0.1, "epsilon": 1e-4}, random_state=42, ) res = mmd_estim.fit(X=x) Implementing a Custom RegressionModel --------------------------------------- The example below implements a Poisson regression model where :math:`\mathrm{log}(\mathbb{E}[Y \mid X]) = X^T \beta`, with :math:`\beta` as the only variable parameter. :: import numpy as np from regmmd.models.base_model import RegressionModel class PoissonRegression(RegressionModel): def __init__(self, par_v=None, par_c=None, random_state=None): # par_v: beta coefficients, par_c: unused self.beta = par_v self.rng = np.random.default_rng(seed=random_state) def predict(self, X): # Conditional mean: E[Y | X] = exp(X @ beta) return np.exp(X @ self.beta) def sample_n(self, n, mu_given_x): return self.rng.poisson(lam=mu_given_x, size=(n,)) def log_prob(self, X, y): mu = self.predict(X) return y * np.log(mu) - mu # ignoring log(y!) constant def score(self, X, y): # d/d(beta) log p(y | X, beta) = X^T (y - mu) mu = self.predict(X) return X * (y - mu)[:, np.newaxis] def _init_params(self, X, y): if self.beta is None: self.beta = np.zeros(X.shape[1]) return self._get_params() def _get_params(self): return self.beta, None def _project_params(self, par_v): return par_v # no constraints on beta def update(self, par_v): self.beta = par_v This model can then be passed directly to :py:class:`regmmd.MMDRegressor`:: from regmmd import MMDRegressor mmd_reg = MMDRegressor( model=PoissonRegression(), par_v=None, par_c=None, fit_intercept=False, bandwidth_X=0, bandwidth_y=1, kernel_y="Gaussian", solver={"burnin": 200, "n_step": 2000, "stepsize": 0.1, "epsilon": 1e-6}, random_state=42, ) res = mmd_reg.fit(X, y)