RegMMD Scikit-Learn Implementation documentation#

This is a scikit-learn style implementation of the estimation and regression procedures based one the Maximum Mean Discrepancy (MMD) principle between the sample and the statistical model. These procedures are provable universally consistent and extremely robust to outliers. For more information about the theoretical side, the relevant papers are:

  1. Universal robust regression via maximum mean discrepancy,

    Pierre Alquier, Mathieu Gerber, 2024

  2. Finite sample properties of parametric MMD estimation: Robustness to misspecification and dependence,

    Badr-Eddine Chérief-Abdellatif, Pierre Alquier, 2022

Existing R package#

One top of this implementation, there exists an implementation in the R language by the original authors of the papers, R package link. Most functions in this package are derived from the R implementation. However, the way the statistical models are implemented are different in this version to allow users to quickly implement their own custom model.

Mathematical introduction#

In short, the estimator is based on the minimum distance estimator \(\hat{\theta}\), which is defined as

\[\hat{\theta} \in \underset{\theta \in \Theta}{\text{argmin}}\; d(P_\theta, \hat{P}_n),\]

for some statistical model \(\left\{P_\theta \mid \theta \in \Theta\right\}\) and the empirical distrbution \(\hat{P}_n = \frac{1}{n} \sum_{i=1}^n \delta_{X_i}\).

The MMD estimator and regression procedure are derived from taking a specific distance metric in the above definition. The distances metric comes from embedding the distributions in an RKHS \(\mathcal{H}\) using the feature max \(\varphi \colon : \mathcal{X} \to\mathcal{H}\):

\[\begin{split}D(P, Q) &:= \|\mu(P) - \mu(Q)\|_\mathcal{H}.\\ \mu(P) &:= \mathbb{E}_{X \sim P}\left[\varphi(X)\right].\end{split}\]

Letting \(k(x, y) = \langle \varphi(x), \varphi(y) \rangle_\mathcal{H}\), the actual distance can be calculated through:

\[D^2(P, Q) = \mathbb{E}_{X, X' \sim P}\left[k(X, X')\right] - 2\mathbb{E}_{X \sim P, X' \sim Q}\left[k(X, X')\right] + \mathbb{E}_{X, X' \sim Q}\left[k(X, X')\right].\]

This expression is used to calculate gradients or stochastic gradients to do the actual optimisation, which is done automatically by the package.

Indices and tables#