Note
Go to the end to download the full example code.
Quickstart with Leaspy#
This example demonstrates how to quickly use Leaspy with properly formatted data.
Leaspy uses its own data container. To use it correctly, you need to provide either a CSV file or a pandas.DataFrame in long format.
Below is an example of synthetic longitudinal data illustrating how to use Leaspy:
from leaspy.datasets import load_dataset
alzheimer_df = load_dataset("alzheimer")
print(alzheimer_df.columns)
alzheimer_df = alzheimer_df[["MMSE", "RAVLT", "FAQ", "FDG PET"]]
print(alzheimer_df.head())
Index(['E-Cog Subject', 'E-Cog Study-partner', 'MMSE', 'RAVLT', 'FAQ',
'FDG PET', 'Hippocampus volume ratio'],
dtype='object')
MMSE RAVLT FAQ FDG PET
ID TIME
GS-001 73.973183 0.111998 0.510524 0.178827 0.454605
74.573181 0.029991 0.749223 0.181327 0.450064
75.173180 0.121922 0.779680 0.026179 0.662006
75.773186 0.092102 0.649391 0.156153 0.585949
75.973183 0.203874 0.612311 0.320484 0.634809
The data correspond to repeated visits (TIME index) of different participants (ID index). Each visit corresponds to the measurement of 4 different outcomes : the MMSE, the RAVLT, the FAQ and the FDG PET.
Warning
You MUST include both ID and TIME, either as indices or as columns.
The remaining columns should correspond to the observed variables
(also called features or endpoints).
Each feature should have its own column, and each visit should occupy one row.
Warning
Leaspy supports linear and logistic models.
Features should follow an overall increasing trend over time. Individual observations may decrease due to noise or measurement variability — what matters is that the general progression goes upward.
For logistic models, data must be rescaled between 0 and 1.
from leaspy.io.data import Data
data = Data.from_dataframe(alzheimer_df)
See also
For a deeper understanding of the Data and Dataset classes, including
iteration, cofactors, and best practices, refer to the Data Containers Guide
in the documentation.
The core functionality of Leaspy is to estimate the group-average trajectory of the variables measured in a population. To do this, you need to choose a model. For example, a logistic model can be initialized and fitted as follows:
from leaspy.models import LogisticModel
model = LogisticModel(name="test-model", source_dimension=2)
model.fit(
data,
"mcmc_saem",
seed=42,
n_iter=100,
progress_bar=False,
path="_outputs",
overwrite_logs_folder=True,
save_periodicity=10,
plot_periodicity=10,
)
/home/docs/checkouts/readthedocs.org/user_builds/leaspy/checkouts/stable/src/leaspy/algo/settings.py:86: UserWarning: The logs path you provided (/home/docs/checkouts/readthedocs.org/user_builds/leaspy/checkouts/stable/examples/_outputs) does not exist. Needed paths will be created (and their parents if needed).
self._create_root_folder(settings)
Fit with `AlgorithmName.FIT_MCMC_SAEM` took: 23.14s
The save_periodicity and plot_periodicity arguments are optional, and control how often the model parameters are saved and plotted during the fitting process. By setting them to an integer value, an output folder is created under the name _outputs`in the working directory, where the convergence plots and csv are saved. You can also control the target folder by providing a string to the `path argument.
model.summary()
================================================================================
Model Summary
================================================================================
Model Name: logistic
Model Type: LogisticModel
Features (4): MMSE, RAVLT, FAQ, FDG PET
Sources (2): Source 0 (s0), Source 1 (s1)
Observation Models: gaussian-diagonal
Neg. Log-Likelihood: -8041.9810
Parameters: 24
BIC: -18742.38
AIC: -18821.54
Training Metadata
--------------------------------------------------------------------------------
Algorithm: mcmc_saem
Seed: 42
Iterations: 100
Data Context
--------------------------------------------------------------------------------
Subjects: 200
Visits: 1975
Total Observations: 7900
Leaspy Version: 2.1.0
================================================================================
Population Parameters
--------------------------------------------------------------------------------
betas_mean:
s0 s1
b0 0.0003 0.0745
b1 -0.0571 -0.0578
b2 0.0866 -0.0087
MMSE RAVLT FAQ FDG PET
log_g_mean 1.5137 -0.8647 0.5388 -0.3684
MMSE RAVLT FAQ FDG PET
log_v0_mean -3.3783 -3.5659 -2.2573 -3.7160
Individual Parameters
--------------------------------------------------------------------------------
tau_mean [78.5057]
tau_std [8.4135]
xi_std [0.5298]
Noise Model
--------------------------------------------------------------------------------
MMSE RAVLT FAQ FDG PET
noise_std 0.0690 0.0762 0.0688 0.0797
Derived Parameters (interpretable scale)
--------------------------------------------------------------------------------
MMSE RAVLT FAQ FDG PET
v0 0.0341 0.0283 0.1046 0.0243
MMSE RAVLT FAQ FDG PET
p0 0.1804 0.7036 0.3685 0.5911
================================================================================
Interpreting the population parameters. The summary above describes the average disease trajectory through three population-level parameters:
tau_mean— the reference age (in years) at which the patients, on average, reaches the inflection point. It anchors the shared disease timeline.v0— the per-feature velocity attau_mean. Features with largerv0change faster around the reference age.p0— the per-feature value attau_mean, on[0, 1]. Features with largerp0are more advanced attau_mean.
v0 and p0 appear under “Derived Parameters” in the summary. They are
returned in interpretable scale by model.compute_derived_parameters() —
the raw fitted values log_v0_mean and log_g_mean live in log / logit
space and are not meant to be read directly.
MMSE v0 = 0.0341 / yr p0 = 0.180
RAVLT v0 = 0.0283 / yr p0 = 0.704
FAQ v0 = 0.1046 / yr p0 = 0.368
FDG PET v0 = 0.0243 / yr p0 = 0.591
tau_mean = 78.51 yr
The fit method estimates the parameters of the model, which are then accessible
through the summary method. The parameters are also stored in the parameters attribute of the model.
model.info()
================================================================================
Model Information
================================================================================
Statistical Model
Type: LogisticModel
Name: logistic
Dimension: 4
Source Dimension: 2
Observation Models: gaussian-diagonal
Parameters: 24
Latent Variables
--------------------------------------------------------------------------------
Population:
betas Normal(betas_mean, betas_std)
log_g Normal(log_g_mean, log_g_std)
log_v0 Normal(log_v0_mean, log_v0_std)
Individual:
sources Normal(sources_mean, sources_std)
tau Normal(tau_mean, tau_std)
xi Normal(xi_mean, xi_std)
--------------------------------------------------------------------------------
Training Dataset
--------------------------------------------------------------------------------
Subjects: 200
Visits: 1975
Scores (Features): 4
Total Observations: 7900
Visits per Subject: Median 10.0 [Min 1, Max 22, IQR 5.0]
Training Details
--------------------------------------------------------------------------------
Algorithm: mcmc_saem
Seed: 42
Iterations: 100
Burn-in: 90/100 (90%)
Burn-out: 10
Duration: 23.140s
Hyperparameters (fixed values from the source code)
--------------------------------------------------------------------------------
betas_std: 0.01
log_g_std: 0.01
log_v0_std: 0.01
sources_mean: [0.0, 0.0]
sources_std: 1.0
xi_mean: 0.0
Leaspy Version: 2.1.0
================================================================================
The method info provides the model configuration and the settings used for the fit,
as well as the dataset information and the training information.
Leaspy can also estimate the individual trajectories of each participant. This is done using a personalization algorithm, here scipy_minimize:
individual_parameters = model.personalize(
data, "scipy_minimize", seed=0, progress_bar=False)
print(individual_parameters.to_dataframe())
Personalize with `AlgorithmName.PERSONALIZE_SCIPY_MINIMIZE` took: 1m 5.47s
sources_0 sources_1 tau xi
ID
GS-001 0.223385 0.512438 78.324982 -0.353727
GS-002 -0.506300 -0.748022 77.214973 -0.590474
GS-003 0.318169 -0.880042 77.221268 0.072147
GS-004 0.173873 -0.112541 78.973320 0.442837
GS-005 1.193803 -1.354746 85.793999 -0.043807
... ... ... ... ...
GS-196 0.942744 -0.561715 73.683899 0.321979
GS-197 -0.149667 1.023047 81.432175 -0.567788
GS-198 -0.022196 -0.172132 84.719788 0.183573
GS-199 1.623866 -2.198954 94.264305 -0.104986
GS-200 1.258483 0.027850 77.085251 0.819960
[200 rows x 4 columns]
We have seen how to fit a model and personalize it to individuals. Leaspy also provides various plotting functions to visualize the results. Let’s go to the next section to see how to plot the group-average trajectory and the individual trajectories using the Parkinson’s disease dataset.
To go further:
See the User Guide and full API documentation.
Explore additional examples.
Total running time of the script: (1 minutes 39.808 seconds)