leaspy.api.Leaspy

class Leaspy(model_name: str, **kwargs)

Bases: object

Main API used to fit models, run algorithms and simulations. This is the main class of the Leaspy package.

Parameters:
model_namestr

The name of the model that will be used for the computations. The available models are:

  • 'logistic' - suppose that every modality follow a logistic curve across time.

  • 'logistic_parallel' - idem & suppose also that every modality have the same slope at inflexion point

  • 'linear' - suppose that every modality follow a linear curve across time.

  • 'univariate_logistic' - a ‘logistic’ model for a single modality.

  • 'univariate_linear' - idem with a ‘linear’ model.

  • 'constant' - benchmark model for constant predictions.

  • 'lme' - benchmark model for classical linear mixed-effects model.

**kwargs

Keyword arguments directly passed to the model for its initialization (through ModelFactory.model()). Refer to the corresponding model to know possible arguments.

noise_modelstr

For manifold-like models. Define the noise structure of the model, can be either:

  • 'gaussian_scalar': gaussian error, with same standard deviation for all features

  • 'gaussian_diagonal': gaussian error, with one standard deviation parameter per feature (default)

  • 'bernoulli': for binary data (Bernoulli realization)

  • 'ordinal' or 'ordinal_ranking': for ordinal data. WARNING : make sure your dataset only contains positive integers.

source_dimensionint, optional

For multivariate models only. Set the degrees of freedom for _spatial_ variability. This number MUST BE strictly lower than the number of features. By default, this number is equal to square root of the number of features. One can interpret this hyperparameter as a way to reduce the dimension of inter-individual _spatial_ variability between progressions.

batch_deltas_ordinalbool, optional

For logistic models with ordinal noise model only. If True, concatenates the deltas for each feature into a 2-dimensional Tensor “deltas” model parameter, which essentially allows faster sampling with new samplers. If False, each feature will induce a new model parameter “deltas_<feature_name>”. The default is False but it is preferable to switch to True when ordinal items have many levels or when there are many items (when fit takes too long basically). Batching deltas will speed up the sampling part of the MCMC SAEM by trading for less accuracy in the estimation of deltas.

Attributes:
modelAbstractModel

Model used for computations, is an instance of AbstractModel.

typestr (read-only)

Name of the model - will be one of the names listed above.

Methods

calibrate(data, settings)

Duplicates of the fit() method.

check_if_initialized()

Check if model is initialized.

estimate(timepoints, individual_parameters, *)

Return the model values for individuals characterized by their individual parameters z_i at time-points (t_{i,j})_j.

estimate_ages_from_biomarker_values(...[, ...])

For individuals characterized by their individual parameters z_{i}, returns the age t_{i,j} at which a given feature value y_{i,j,k} is reached.

fit(data, settings)

Estimate the model's parameters \theta for a given dataset and a given algorithm.

load(path_to_model_settings)

Instantiate a Leaspy object from json model parameter file or the corresponding dictionary.

personalize(data, settings, *[, return_loss])

From a model, estimate individual parameters for each ID of a given dataset.

save(path, **kwargs)

Save Leaspy object as json model parameter file.

simulate(individual_parameters, data, settings)

Generate longitudinal synthetic patients data from a given model, a given collection of individual parameters and some given settings.

calibrate(data: Data, settings: AlgorithmSettings) None

Duplicates of the fit() method.

check_if_initialized() None

Check if model is initialized.

Raises:
LeaspyInputError

Raise an error if the model has not been initialized.

estimate(timepoints: pd.MultiIndex | Dict[IDType, List[float]], individual_parameters: IndividualParameters, *, to_dataframe: bool = None, ordinal_method: str = 'MLE') pd.DataFrame | Dict[IDType, np.ndarray]

Return the model values for individuals characterized by their individual parameters z_i at time-points (t_{i,j})_j.

Parameters:
timepointsdictionary {str/int: array_like[numeric]} or pandas.MultiIndex

Contains, for each individual, the time-points to estimate. It can be a unique time-point or a list of time-points.

individual_parametersIndividualParameters

Corresponds to the individual parameters of individuals.

to_dataframebool or None (default)

Whether to output a dataframe of estimations? If None: default is to be True if and only if timepoints is a pandas.MultiIndex

ordinal_methodstr

<!> Only used for ordinal models. * ‘MLE’ or ‘maximum_likelihood’ returns maximum likelihood estimator for each point (int) * ‘E’ or ‘expectation’ returns expectation (float) * ‘P’ or ‘probabilities’ returns probabilities of all levels (array[float]).

Returns:
individual_trajectorypandas.DataFrame or dict (depending on to_dataframe flag)

Key: patient indices. Value: numpy.ndarray of the estimated value, in the shape (number of timepoints, number of features)

Examples

Given the individual parameters of two subjects, estimate the features of the first at 70, 74 and 80 years old and at 71 and 72 years old for the second.

>>> from leaspy.datasets import Loader
>>> leaspy_logistic = Loader.load_leaspy_instance('parkinson-putamen-train')
>>> individual_parameters = Loader.load_individual_parameters('parkinson-putamen-train')
>>> df_train = Loader.load_dataset('parkinson-putamen-train_and_test').xs('train', level='SPLIT')
>>> timepoints = {'GS-001': (70, 74, 80), 'GS-002': (71, 72)}  # as dict
>>> timepoints = df_train.sort_index().groupby('ID').tail(2).index  # as pandas (ID, TIME) MultiIndex
>>> estimations = leaspy_logistic.estimate(timepoints, individual_parameters)
estimate_ages_from_biomarker_values(individual_parameters: IndividualParameters, biomarker_values: Dict[str, List[float] | float], feature: str | None = None) Dict[str, List[float] | float]

For individuals characterized by their individual parameters z_{i}, returns the age t_{i,j} at which a given feature value y_{i,j,k} is reached.

Parameters:
individual_parametersIndividualParameters

Corresponds to the individual parameters of individuals.

biomarker_valuesDict[Union[str, int], Union[List, float]]

Dictionary that associates to each patient (being a key of the dictionary) a value (float between 0 and 1, or a list of such floats) from which leaspy will estimate the age at which the value is reached. TODO? shouldn’t we allow pandas.Series / pandas.DataFrame

featurestr

For multivariate models only: feature name (indicates to which model feature the biomarker values belongs)

Returns:
biomarker_ages

Dictionary that associates to each patient (being a key of the dictionary) the corresponding age (or ages) for which the value(s) from biomarker_values have been reached. Same format as biomarker values.

Raises:
LeaspyTypeError

bad types for input

LeaspyInputError

inconsistent inputs

Examples

Given the individual parameters of two subjects, and the feature value of 0.2 for the first and 0.5 and 0.6 for the second, get the corresponding estimated ages at which these values will be reached.

>>> from leaspy.datasets import Loader
>>> leaspy_logistic = Loader.load_leaspy_instance('parkinson-putamen-train')
>>> individual_parameters = Loader.load_individual_parameters('parkinson-putamen-train')
>>> biomarker_values = {'GS-001': [0.2], 'GS-002': [0.5, 0.6]}
# Here the 'feature' argument is optional, as the model is univariate
>>> estimated_ages = leaspy_logistic.estimate_ages_from_biomarker_values(individual_parameters, biomarker_values,
>>> feature='PUTAMEN')
fit(data: Data, settings: AlgorithmSettings) None

Estimate the model’s parameters \theta for a given dataset and a given algorithm.

These model’s parameters correspond to the fixed-effects of the mixed-effects model.

Parameters:
dataData

Contains the information of the individuals, in particular the time-points (t_{i,j}) and the observations (y_{i,j}).

settingsAlgorithmSettings

Contains the algorithm’s settings.

See also

leaspy.algo.fit

Examples

Fit a logistic model on a longitudinal dataset, display the group parameters

>>> from leaspy import AlgorithmSettings, Data, Leaspy
>>> from leaspy.datasets import Loader
>>> putamen_df = Loader.load_dataset('parkinson-putamen')
>>> data = Data.from_dataframe(putamen_df)
>>> leaspy_logistic = Leaspy('univariate_logistic')
>>> settings = AlgorithmSettings('mcmc_saem', seed=0)
>>> settings.set_logs('path/to/logs', console_print_periodicity=50)
>>> leaspy_logistic.fit(data, settings)
 ==> Setting seed to 0
|##################################################|   10000/10000 iterations
The standard deviation of the noise at the end of the calibration is:
0.0213
Calibration took: 30s
>>> print(str(leaspy_logistic.model))
=== MODEL ===
g : tensor([-1.1744])
tau_mean : 68.56787872314453
tau_std : 10.12782096862793
xi_mean : -2.3396952152252197
xi_std : 0.5421289801597595
noise_std : 0.021265486255288124
classmethod load(path_to_model_settings: str) Leaspy

Instantiate a Leaspy object from json model parameter file or the corresponding dictionary.

This function can be used to load a pre-trained model.

Parameters:
path_to_model_settingsstr or dict

Path to the model’s settings json file or dictionary of model parameters

Returns:
Leaspy

An instanced Leaspy object with the given population parameters \theta.

Examples

Load a univariate logistic pre-trained model.

>>> from leaspy import Leaspy
>>> from leaspy.datasets.loader import model_paths
>>> leaspy_logistic = Leaspy.load(model_paths['parkinson-putamen-train'])
>>> print(str(leaspy_logistic.model))
=== MODEL ===
g : tensor([-0.7901])
tau_mean : 64.18125915527344
tau_std : 10.199116706848145
xi_mean : -2.346343994140625
xi_std : 0.5663877129554749
noise_std : 0.021229960024356842
personalize(data: Data, settings: AlgorithmSettings, *, return_loss: bool = False)

From a model, estimate individual parameters for each ID of a given dataset. These individual parameters correspond to the random-effects (z_{i,j}) of the mixed-effects model.

Parameters:
dataData

Contains the information of the individuals, in particular the time-points (t_{i,j}) and the observations (y_{i,j}).

settingsAlgorithmSettings

Contains the algorithm’s settings.

return_lossbool (default False)

Returns a tuple (individual_parameters, loss) if True

Returns:
ipsIndividualParameters

Contains individual parameters

if return_loss is Truetuple
Raises:
LeaspyInputError

if model is not initialized.

Examples

Compute the individual parameters for a given longitudinal dataset and calibrated model, then display the histogram of the log-acceleration:

>>> from leaspy import AlgorithmSettings, Data
>>> from leaspy.datasets import Loader
>>> leaspy_logistic = Loader.load_leaspy_instance('parkinson-putamen-train')
>>> putamen_df = Loader.load_dataset('parkinson-putamen')
>>> data = Data.from_dataframe(putamen_df)
>>> personalize_settings = AlgorithmSettings('scipy_minimize', seed=0)
>>> individual_parameters = leaspy_logistic.personalize(data, personalize_settings)
 ==> Setting seed to 0
|##################################################|   200/200 subjects
The standard deviation of the noise at the end of the personalization is:
0.0191
Personalization scipy_minimize took: 5s
>>> ip_df = individual_parameters.to_dataframe()
>>> ip_df[['xi']].hist()
save(path: str, **kwargs) None

Save Leaspy object as json model parameter file.

Parameters:
pathstr

Path to store the model’s parameters.

**kwargs

Keyword arguments for save() (including those sent to json.dump() function).

Examples

Load the univariate dataset 'parkinson-putamen', calibrate the model & save it:

>>> from leaspy import AlgorithmSettings, Data, Leaspy
>>> from leaspy.datasets import Loader
>>> putamen_df = Loader.load_dataset('parkinson-putamen')
>>> data = Data.from_dataframe(putamen_df)
>>> leaspy_logistic = Leaspy('univariate_logistic')
>>> settings = AlgorithmSettings('mcmc_saem', seed=0)
>>> leaspy_logistic.fit(data, settings)
 ==> Setting seed to 0
|##################################################|   10000/10000 iterations
The standard deviation of the noise at the end of the calibration is:
0.0213
Calibration took: 30s
>>> leaspy_logistic.save('leaspy-logistic-model_parameters-seed0.json')
simulate(individual_parameters: IndividualParameters, data: Data, settings: AlgorithmSettings)

Generate longitudinal synthetic patients data from a given model, a given collection of individual parameters and some given settings.

This procedure learn the joined distribution of the individual parameters and baseline age of the subjects present in individual_parameters and data respectively to sample new patients from this joined distribution. The model is used to compute for each patient their scores from the individual parameters. The number of visits per patients is set in settings['parameters']['mean_number_of_visits'] and settings['parameters']['std_number_of_visits'] which are set by default to 6 and 3 respectively.

Parameters:
individual_parametersIndividualParameters

Contains the individual parameters.

dataData

Data object

settingsAlgorithmSettings

Contains the algorithm’s settings.

Returns:
simulated_dataResult

Contains the generated individual parameters & the corresponding generated scores.

Notes

To generate a new subject, first we estimate the joined distribution of the individual parameters and the reparametrized baseline ages. Then, we randomly pick a new point from this distribution, which define the individual parameters & baseline age of our new subjects. Then, we generate the timepoints following the baseline age. Then, from the model and the generated timepoints and individual parameters, we compute the corresponding values estimations. Then, we add some noise to these estimations, which is the same noise-model as the one from your model by default. But, you may customize it by setting the noise keyword.

Examples

Use a calibrated model & individual parameters to simulate new subjects similar to the ones you have:

>>> from leaspy import AlgorithmSettings, Data
>>> from leaspy.datasets import Loader
>>> putamen_df = Loader.load_dataset('parkinson-putamen-train_and_test')
>>> data = Data.from_dataframe(putamen_df.xs('train', level='SPLIT'))
>>> leaspy_logistic = Loader.load_leaspy_instance('parkinson-putamen-train')
>>> individual_parameters = Loader.load_individual_parameters('parkinson-putamen-train')
>>> simulation_settings = AlgorithmSettings('simulation', seed=0, noise='bernoulli')
>>> simulated_data = leaspy_logistic.simulate(individual_parameters, data, simulation_settings)
 ==> Setting seed to 0
>>> print(simulated_data.data.to_dataframe().set_index(['ID', 'TIME']).head())
                                          PUTAMEN
ID                    TIME
Generated_subject_001 63.611107  0.556399
                      64.111107  0.571381
                      64.611107  0.586279
                      65.611107  0.615718
                      66.611107  0.644518
>>> print(simulated_data.get_dataframe_individual_parameters().tail())
                             tau        xi
ID
Generated_subject_096  46.771028 -2.483644
Generated_subject_097  73.189964 -2.513465
Generated_subject_098  57.874967 -2.175362
Generated_subject_099  54.889400 -2.069300
Generated_subject_100  50.046972 -2.259841

By default, you have simulate 100 subjects, with an average number of visit at 6 & and standard deviation is the number of visits equal to 3. Let’s say you want to simulate 200 subjects, everyone of them having ten visits exactly:

>>> simulation_settings = AlgorithmSettings('simulation', seed=0, number_of_subjects=200, \
mean_number_of_visits=10, std_number_of_visits=0)
 ==> Setting seed to 0
>>> simulated_data = leaspy_logistic.simulate(individual_parameters, data, simulation_settings)
>>> print(simulated_data.data.to_dataframe().set_index(['ID', 'TIME']).tail())
                                  PUTAMEN
ID                    TIME
Generated_subject_200 72.119949  0.829185
                      73.119949  0.842113
                      74.119949  0.854271
                      75.119949  0.865680
                      76.119949  0.876363

By default, the generated subjects are named ‘Generated_subject_001’, ‘Generated_subject_002’ and so on. Let’s say you want a shorter name, for example ‘GS-001’. Furthermore, you want to set the level of noise around the subject trajectory when generating the observations:

>>> simulation_settings = AlgorithmSettings('simulation', seed=0, prefix='GS-', noise=.2)
>>> simulated_data = leaspy_logistic.simulate(individual_parameters, data, simulation_settings)
 ==> Setting seed to 0
>>> print(simulated_data.get_dataframe_individual_parameters().tail())
              tau        xi
ID
GS-096  46.771028 -2.483644
GS-097  73.189964 -2.513465
GS-098  57.874967 -2.175362
GS-099  54.889400 -2.069300
GS-100  50.046972 -2.259841