Regression Class#

class regmmd.regression.MMDRegressor(model, fit_intercept=True, par_v=None, par_c=None, kernel_y='Gaussian', kernel_X='Laplace', bandwidth_y='auto', bandwidth_X='auto', solver=None, random_state=None)[source]#

Bases: RegressorMixin, BaseEstimator

Regression using the Maximum Mean Discrepancy (MMD) criterion.

This class implements regression using the MMD criterion, which is a kernel-based method to compare distributions by measuring the distance between mean embeddings in a Reproducing Kernel Hilbert Space (RKHS).

MMDRegressor fits a regression model by minimizing the MMD between the distributions of the observed data and the model’s predictions. It supports various kernel types and bandwidth selection methods for both the input features and the target variables.

Parameters:
  • model (RegressionModel) – The statistical model used for regression, provided as an instance of a RegressionModel class with initialized parameters. This model defines the relationship between the input features and the target variable.

  • fit_intercept (bool, default=True) – Specifies whether to calculate the intercept for the model. If set to False, the model assumes that the data is already centered, and no intercept will be fitted.

  • par_v (np.array, optional) – Initial values for the variable parameters of the model. If None, the model will use default initial values.

  • par_c (np.array, optional) – Initial values for the constant parameters of the model. If None, the model will use default initial values.

  • kernel_y (str, default="Gaussian") – The kernel type used for the target variable y. Supported options are “Gaussian”, “Laplace”, and “Cauchy”.

  • kernel_X (str, default="Laplace") – The kernel type used for the input features X. Supported options are “Gaussian”, “Laplace”, and “Cauchy”.

  • bandwidth_y (Union[str, float], default="auto") – The bandwidth parameter for the kernel applied to the target variable y. If set to “auto”, the bandwidth is determined using a heuristic method, such as the median heuristic.

  • bandwidth_X (Union[str, float], default="auto") – The bandwidth parameter for the kernel applied to the input features X. If set to “auto”, the bandwidth is determined using a heuristic method, such as the median heuristic.

  • solver (dict, optional) – A dictionary specifying the solver parameters for the optimization process. It should include keys such as “burnin” (number of burn-in iterations), “n_step” (number of optimization steps), and “stepsize” (learning rate for the optimizer). If None, default solver settings are used.

  • random_state (int, optional) – random seed to be passed to the model and any sampler used in the SGD optimizers.

X_offset[source]#

The offset applied to the input features X during preprocessing. This is used when fit_intercept is True.

Type:

np.array or None

y_offset[source]#

The offset applied to the target variable y during preprocessing.

Type:

np.array or None

X_scale[source]#

The scale factor applied to the input features X during preprocessing.

Type:

np.array or None

par_v[source]#

The estimated variable parameters of the model after fitting.

Type:

np.array

Notes

  • The fit method preprocesses the data, fits the model using the specified solver, and updates the model parameters.

  • The predict method uses the fitted model to make predictions on new data.

fit(X, y, use_exact=True)[source]#

Fit the MMD regression model according to the given training data.

Parameters:
  • X (np.ndarray, shape (n_samples, n_features)) – Training input samples.

  • y (np.ndarray, shape (n_samples,)) – Target values.

  • use_exact (bool, default=True) – Use the model._exact_fit() method, if it is available, will default to SGD if it is not. Mainly used for performance comparisons

Returns:

res – A dictionary containing the results of the optimization process, including the estimated parameters and the optimization trajectory.

Return type:

MMDResult

predict(X)[source]#

Predict using the MMD regression model.

Parameters:

X (np.ndarray, shape (n_samples, n_features)) – Input samples for which to compute the predictions.

Returns:

y_pred – The predicted target values.

Return type:

np.ndarray, shape (n_samples,)

set_fit_request(*, use_exact='$UNCHANGED$')[source]#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • use_exact (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for use_exact parameter in fit.

  • self (MMDRegressor)

Returns:

self – The updated object.

Return type:

object