leaspy.api module
- class Leaspy(model_name: str, **kwargs)
Main API used to fit models, run algorithms and simulations. This is the main class of the Leaspy package.
- Parameters:
- model_namestr
The name of the model that will be used for the computations. The available models are:
'logistic'
- suppose that every modality follow a logistic curve across time.'logistic_parallel'
- idem & suppose also that every modality have the same slope at inflexion point'linear'
- suppose that every modality follow a linear curve across time.'univariate_logistic'
- a ‘logistic’ model for a single modality.'univariate_linear'
- idem with a ‘linear’ model.'constant'
- benchmark model for constant predictions.'lme'
- benchmark model for classical linear mixed-effects model.
- **kwargs
Keyword arguments directly passed to the model for its initialization (through
ModelFactory.model()
). Refer to the corresponding model to know possible arguments.- noise_modelstr
For manifold-like models. Define the noise structure of the model, can be either:
'gaussian_scalar'
: gaussian error, with same standard deviation for all features'gaussian_diagonal'
: gaussian error, with one standard deviation parameter per feature (default)'bernoulli'
: for binary data (Bernoulli realization)'ordinal'
or'ordinal_ranking'
: for ordinal data. WARNING : make sure your dataset only contains positive integers.
- source_dimensionint, optional
For multivariate models only. Set the degrees of freedom for _spatial_ variability. This number MUST BE strictly lower than the number of features. By default, this number is equal to square root of the number of features. One can interpret this hyperparameter as a way to reduce the dimension of inter-individual _spatial_ variability between progressions.
- batch_deltas_ordinalbool, optional
For logistic models with ordinal noise model only. If True, concatenates the deltas for each feature into a 2-dimensional Tensor “deltas” model parameter, which essentially allows faster sampling with new samplers. If False, each feature will induce a new model parameter “deltas_<feature_name>”. The default is False but it is preferable to switch to True when ordinal items have many levels or when there are many items (when fit takes too long basically). Batching deltas will speed up the sampling part of the MCMC SAEM by trading for less accuracy in the estimation of deltas.
- Attributes:
- model
AbstractModel
Model used for computations, is an instance of AbstractModel.
- typestr (read-only)
Name of the model - will be one of the names listed above.
- model
Methods
calibrate
(data, settings)Duplicates of the
fit()
method.Check if model is initialized.
estimate
(timepoints, individual_parameters, *)Return the model values for individuals characterized by their individual parameters at time-points .
estimate_ages_from_biomarker_values
(...[, ...])For individuals characterized by their individual parameters , returns the age at which a given feature value is reached.
fit
(data, settings)Estimate the model's parameters for a given dataset and a given algorithm.
load
(path_to_model_settings)Instantiate a Leaspy object from json model parameter file or the corresponding dictionary.
personalize
(data, settings, *[, return_loss])From a model, estimate individual parameters for each ID of a given dataset.
save
(path, **kwargs)Save Leaspy object as json model parameter file.
simulate
(individual_parameters, data, settings)Generate longitudinal synthetic patients data from a given model, a given collection of individual parameters and some given settings.
- fit(data: Data, settings: AlgorithmSettings) None
Estimate the model’s parameters for a given dataset and a given algorithm.
These model’s parameters correspond to the fixed-effects of the mixed-effects model.
- Parameters:
- data
Data
Contains the information of the individuals, in particular the time-points and the observations .
- settings
AlgorithmSettings
Contains the algorithm’s settings.
- data
See also
Examples
Fit a logistic model on a longitudinal dataset, display the group parameters
>>> from leaspy import AlgorithmSettings, Data, Leaspy >>> from leaspy.datasets import Loader >>> putamen_df = Loader.load_dataset('parkinson-putamen') >>> data = Data.from_dataframe(putamen_df) >>> leaspy_logistic = Leaspy('univariate_logistic') >>> settings = AlgorithmSettings('mcmc_saem', seed=0) >>> settings.set_logs('path/to/logs', console_print_periodicity=50) >>> leaspy_logistic.fit(data, settings) ==> Setting seed to 0 |##################################################| 10000/10000 iterations The standard deviation of the noise at the end of the calibration is: 0.0213 Calibration took: 30s >>> print(str(leaspy_logistic.model)) === MODEL === g : tensor([-1.1744]) tau_mean : 68.56787872314453 tau_std : 10.12782096862793 xi_mean : -2.3396952152252197 xi_std : 0.5421289801597595 noise_std : 0.021265486255288124
- calibrate(data: Data, settings: AlgorithmSettings) None
Duplicates of the
fit()
method.
- personalize(data: Data, settings: AlgorithmSettings, *, return_loss: bool = False)
From a model, estimate individual parameters for each ID of a given dataset. These individual parameters correspond to the random-effects of the mixed-effects model.
- Parameters:
- data
Data
Contains the information of the individuals, in particular the time-points and the observations .
- settings
AlgorithmSettings
Contains the algorithm’s settings.
- return_lossbool (default False)
Returns a tuple (individual_parameters, loss) if True
- data
- Returns:
- ips
IndividualParameters
Contains individual parameters
- if return_loss is Truetuple
ips :
IndividualParameters
loss :
torch.Tensor
- ips
- Raises:
LeaspyInputError
if model is not initialized.
See also
Examples
Compute the individual parameters for a given longitudinal dataset and calibrated model, then display the histogram of the log-acceleration:
>>> from leaspy import AlgorithmSettings, Data >>> from leaspy.datasets import Loader >>> leaspy_logistic = Loader.load_leaspy_instance('parkinson-putamen-train') >>> putamen_df = Loader.load_dataset('parkinson-putamen') >>> data = Data.from_dataframe(putamen_df) >>> personalize_settings = AlgorithmSettings('scipy_minimize', seed=0) >>> individual_parameters = leaspy_logistic.personalize(data, personalize_settings) ==> Setting seed to 0 |##################################################| 200/200 subjects The standard deviation of the noise at the end of the personalization is: 0.0191 Personalization scipy_minimize took: 5s >>> ip_df = individual_parameters.to_dataframe() >>> ip_df[['xi']].hist()
- estimate(timepoints: pd.MultiIndex | Dict[IDType, List[float]], individual_parameters: IndividualParameters, *, to_dataframe: bool = None, ordinal_method: str = 'MLE') pd.DataFrame | Dict[IDType, np.ndarray]
Return the model values for individuals characterized by their individual parameters at time-points .
- Parameters:
- timepointsdictionary {str/int: array_like[numeric]} or
pandas.MultiIndex
Contains, for each individual, the time-points to estimate. It can be a unique time-point or a list of time-points.
- individual_parameters
IndividualParameters
Corresponds to the individual parameters of individuals.
- to_dataframebool or None (default)
Whether to output a dataframe of estimations? If None: default is to be True if and only if timepoints is a pandas.MultiIndex
- ordinal_methodstr
<!> Only used for ordinal models. * ‘MLE’ or ‘maximum_likelihood’ returns maximum likelihood estimator for each point (int) * ‘E’ or ‘expectation’ returns expectation (float) * ‘P’ or ‘probabilities’ returns probabilities of all levels (array[float]).
- timepointsdictionary {str/int: array_like[numeric]} or
- Returns:
- individual_trajectory
pandas.DataFrame
or dict (depending on to_dataframe flag) Key: patient indices. Value:
numpy.ndarray
of the estimated value, in the shape (number of timepoints, number of features)
- individual_trajectory
Examples
Given the individual parameters of two subjects, estimate the features of the first at 70, 74 and 80 years old and at 71 and 72 years old for the second.
>>> from leaspy.datasets import Loader >>> leaspy_logistic = Loader.load_leaspy_instance('parkinson-putamen-train') >>> individual_parameters = Loader.load_individual_parameters('parkinson-putamen-train') >>> df_train = Loader.load_dataset('parkinson-putamen-train_and_test').xs('train', level='SPLIT') >>> timepoints = {'GS-001': (70, 74, 80), 'GS-002': (71, 72)} # as dict >>> timepoints = df_train.sort_index().groupby('ID').tail(2).index # as pandas (ID, TIME) MultiIndex >>> estimations = leaspy_logistic.estimate(timepoints, individual_parameters)
- estimate_ages_from_biomarker_values(individual_parameters: IndividualParameters, biomarker_values: Dict[str, List[float] | float], feature: str | None = None) Dict[str, List[float] | float]
For individuals characterized by their individual parameters , returns the age at which a given feature value is reached.
- Parameters:
- individual_parameters
IndividualParameters
Corresponds to the individual parameters of individuals.
- biomarker_valuesDict[Union[str, int], Union[List, float]]
Dictionary that associates to each patient (being a key of the dictionary) a value (float between 0 and 1, or a list of such floats) from which leaspy will estimate the age at which the value is reached. TODO? shouldn’t we allow pandas.Series / pandas.DataFrame
- featurestr
For multivariate models only: feature name (indicates to which model feature the biomarker values belongs)
- individual_parameters
- Returns:
- biomarker_ages
Dictionary that associates to each patient (being a key of the dictionary) the corresponding age (or ages) for which the value(s) from biomarker_values have been reached. Same format as biomarker values.
- Raises:
LeaspyTypeError
bad types for input
LeaspyInputError
inconsistent inputs
Examples
Given the individual parameters of two subjects, and the feature value of 0.2 for the first and 0.5 and 0.6 for the second, get the corresponding estimated ages at which these values will be reached.
>>> from leaspy.datasets import Loader >>> leaspy_logistic = Loader.load_leaspy_instance('parkinson-putamen-train') >>> individual_parameters = Loader.load_individual_parameters('parkinson-putamen-train') >>> biomarker_values = {'GS-001': [0.2], 'GS-002': [0.5, 0.6]} # Here the 'feature' argument is optional, as the model is univariate >>> estimated_ages = leaspy_logistic.estimate_ages_from_biomarker_values(individual_parameters, biomarker_values, >>> feature='PUTAMEN')
- simulate(individual_parameters: IndividualParameters, data: Data, settings: AlgorithmSettings)
Generate longitudinal synthetic patients data from a given model, a given collection of individual parameters and some given settings.
This procedure learn the joined distribution of the individual parameters and baseline age of the subjects present in
individual_parameters
anddata
respectively to sample new patients from this joined distribution. The model is used to compute for each patient their scores from the individual parameters. The number of visits per patients is set insettings['parameters']['mean_number_of_visits']
andsettings['parameters']['std_number_of_visits']
which are set by default to 6 and 3 respectively.- Parameters:
- individual_parameters
IndividualParameters
Contains the individual parameters.
- data
Data
Data object
- settings
AlgorithmSettings
Contains the algorithm’s settings.
- individual_parameters
- Returns:
- simulated_data
Result
Contains the generated individual parameters & the corresponding generated scores.
- simulated_data
See also
Notes
To generate a new subject, first we estimate the joined distribution of the individual parameters and the reparametrized baseline ages. Then, we randomly pick a new point from this distribution, which define the individual parameters & baseline age of our new subjects. Then, we generate the timepoints following the baseline age. Then, from the model and the generated timepoints and individual parameters, we compute the corresponding values estimations. Then, we add some noise to these estimations, which is the same noise-model as the one from your model by default. But, you may customize it by setting the noise keyword.
Examples
Use a calibrated model & individual parameters to simulate new subjects similar to the ones you have:
>>> from leaspy import AlgorithmSettings, Data >>> from leaspy.datasets import Loader >>> putamen_df = Loader.load_dataset('parkinson-putamen-train_and_test') >>> data = Data.from_dataframe(putamen_df.xs('train', level='SPLIT')) >>> leaspy_logistic = Loader.load_leaspy_instance('parkinson-putamen-train') >>> individual_parameters = Loader.load_individual_parameters('parkinson-putamen-train') >>> simulation_settings = AlgorithmSettings('simulation', seed=0, noise='bernoulli') >>> simulated_data = leaspy_logistic.simulate(individual_parameters, data, simulation_settings) ==> Setting seed to 0 >>> print(simulated_data.data.to_dataframe().set_index(['ID', 'TIME']).head()) PUTAMEN ID TIME Generated_subject_001 63.611107 0.556399 64.111107 0.571381 64.611107 0.586279 65.611107 0.615718 66.611107 0.644518 >>> print(simulated_data.get_dataframe_individual_parameters().tail()) tau xi ID Generated_subject_096 46.771028 -2.483644 Generated_subject_097 73.189964 -2.513465 Generated_subject_098 57.874967 -2.175362 Generated_subject_099 54.889400 -2.069300 Generated_subject_100 50.046972 -2.259841
By default, you have simulate 100 subjects, with an average number of visit at 6 & and standard deviation is the number of visits equal to 3. Let’s say you want to simulate 200 subjects, everyone of them having ten visits exactly:
>>> simulation_settings = AlgorithmSettings('simulation', seed=0, number_of_subjects=200, \ mean_number_of_visits=10, std_number_of_visits=0) ==> Setting seed to 0 >>> simulated_data = leaspy_logistic.simulate(individual_parameters, data, simulation_settings) >>> print(simulated_data.data.to_dataframe().set_index(['ID', 'TIME']).tail()) PUTAMEN ID TIME Generated_subject_200 72.119949 0.829185 73.119949 0.842113 74.119949 0.854271 75.119949 0.865680 76.119949 0.876363
By default, the generated subjects are named ‘Generated_subject_001’, ‘Generated_subject_002’ and so on. Let’s say you want a shorter name, for example ‘GS-001’. Furthermore, you want to set the level of noise around the subject trajectory when generating the observations:
>>> simulation_settings = AlgorithmSettings('simulation', seed=0, prefix='GS-', noise=.2) >>> simulated_data = leaspy_logistic.simulate(individual_parameters, data, simulation_settings) ==> Setting seed to 0 >>> print(simulated_data.get_dataframe_individual_parameters().tail()) tau xi ID GS-096 46.771028 -2.483644 GS-097 73.189964 -2.513465 GS-098 57.874967 -2.175362 GS-099 54.889400 -2.069300 GS-100 50.046972 -2.259841
- classmethod load(path_to_model_settings: str) Leaspy
Instantiate a Leaspy object from json model parameter file or the corresponding dictionary.
This function can be used to load a pre-trained model.
- Parameters:
- path_to_model_settingsstr or dict
Path to the model’s settings json file or dictionary of model parameters
- Returns:
Leaspy
An instanced Leaspy object with the given population parameters .
Examples
Load a univariate logistic pre-trained model.
>>> from leaspy import Leaspy >>> from leaspy.datasets.loader import model_paths >>> leaspy_logistic = Leaspy.load(model_paths['parkinson-putamen-train']) >>> print(str(leaspy_logistic.model)) === MODEL === g : tensor([-0.7901]) tau_mean : 64.18125915527344 tau_std : 10.199116706848145 xi_mean : -2.346343994140625 xi_std : 0.5663877129554749 noise_std : 0.021229960024356842
- save(path: str, **kwargs) None
Save Leaspy object as json model parameter file.
- Parameters:
- pathstr
Path to store the model’s parameters.
- **kwargs
Keyword arguments for
save()
(including those sent tojson.dump()
function).
Examples
Load the univariate dataset
'parkinson-putamen'
, calibrate the model & save it:>>> from leaspy import AlgorithmSettings, Data, Leaspy >>> from leaspy.datasets import Loader >>> putamen_df = Loader.load_dataset('parkinson-putamen') >>> data = Data.from_dataframe(putamen_df) >>> leaspy_logistic = Leaspy('univariate_logistic') >>> settings = AlgorithmSettings('mcmc_saem', seed=0) >>> leaspy_logistic.fit(data, settings) ==> Setting seed to 0 |##################################################| 10000/10000 iterations The standard deviation of the noise at the end of the calibration is: 0.0213 Calibration took: 30s >>> leaspy_logistic.save('leaspy-logistic-model_parameters-seed0.json')
- check_if_initialized() None
Check if model is initialized.
- Raises:
LeaspyInputError
Raise an error if the model has not been initialized.