Quickstart with Leaspy#

This example demonstrates how to quickly use Leaspy with properly formatted data.

Leaspy uses its own data container. To use it correctly, you need to provide either a CSV file or a pandas.DataFrame in long format.

Below is an example of synthetic longitudinal data illustrating how to use Leaspy:

from leaspy.datasets import load_dataset

alzheimer_df = load_dataset("alzheimer")
print(alzheimer_df.columns)
alzheimer_df = alzheimer_df[["MMSE", "RAVLT", "FAQ", "FDG PET"]]
print(alzheimer_df.head())
Index(['E-Cog Subject', 'E-Cog Study-partner', 'MMSE', 'RAVLT', 'FAQ',
       'FDG PET', 'Hippocampus volume ratio'],
      dtype='object')
                      MMSE     RAVLT       FAQ   FDG PET
ID     TIME
GS-001 73.973183  0.111998  0.510524  0.178827  0.454605
       74.573181  0.029991  0.749223  0.181327  0.450064
       75.173180  0.121922  0.779680  0.026179  0.662006
       75.773186  0.092102  0.649391  0.156153  0.585949
       75.973183  0.203874  0.612311  0.320484  0.634809

The data correspond to repeated visits (TIME index) of different participants (ID index). Each visit corresponds to the measurement of 4 different outcomes : the MMSE, the RAVLT, the FAQ and the FDG PET.

Warning

You MUST include both ID and TIME, either as indices or as columns. The remaining columns should correspond to the observed variables (also called features or endpoints). Each feature should have its own column, and each visit should occupy one row.

Warning

  • Leaspy supports linear and logistic models.

  • Features should follow an overall increasing trend over time. Individual observations may decrease due to noise or measurement variability — what matters is that the general progression goes upward.

  • For logistic models, data must be rescaled between 0 and 1.

from leaspy.io.data import Data

data = Data.from_dataframe(alzheimer_df)

See also

For a deeper understanding of the Data and Dataset classes, including iteration, cofactors, and best practices, refer to the Data Containers Guide in the documentation.

The core functionality of Leaspy is to estimate the group-average trajectory of the variables measured in a population. To do this, you need to choose a model. For example, a logistic model can be initialized and fitted as follows:

from leaspy.models import LogisticModel

model = LogisticModel(name="test-model", source_dimension=2)
model.fit(
    data,
    "mcmc_saem",
    seed=42,
    n_iter=100,
    progress_bar=False,
    path="_outputs",
    overwrite_logs_folder=True,
    save_periodicity=10,
    plot_periodicity=10,
)
/home/docs/checkouts/readthedocs.org/user_builds/leaspy/checkouts/v2.1.0/src/leaspy/algo/settings.py:86: UserWarning: The logs path you provided (/home/docs/checkouts/readthedocs.org/user_builds/leaspy/checkouts/v2.1.0/examples/_outputs) does not exist. Needed paths will be created (and their parents if needed).
  self._create_root_folder(settings)

Fit with `AlgorithmName.FIT_MCMC_SAEM` took: 23.26s

The save_periodicity and plot_periodicity arguments are optional, and control how often the model parameters are saved and plotted during the fitting process. By setting them to an integer value, an output folder is created under the name _outputs`in the working directory, where the convergence plots and csv are saved. You can also control the target folder by providing a string to the `path argument.

model.summary()
================================================================================
                                 Model Summary
================================================================================
Model Name: logistic
Model Type: LogisticModel
Features (4): MMSE, RAVLT, FAQ, FDG PET
Sources (2): Source 0 (s0), Source 1 (s1)
Observation Models: gaussian-diagonal
Neg. Log-Likelihood: -8041.9810
Parameters: 24
BIC: -18742.38
AIC: -18821.54

Training Metadata
--------------------------------------------------------------------------------
Algorithm: mcmc_saem
Seed: 42
Iterations: 100

Data Context
--------------------------------------------------------------------------------
Subjects: 200
Visits: 1975
Total Observations: 7900
Leaspy Version: 2.1.0
================================================================================

Population Parameters
--------------------------------------------------------------------------------
  betas_mean:
                          s0        s1
            b0        0.0003    0.0745
            b1       -0.0571   -0.0578
            b2        0.0866   -0.0087
                        MMSE     RAVLT       FAQ   FDG PET
  log_g_mean          1.5137   -0.8647    0.5388   -0.3684
                        MMSE     RAVLT       FAQ   FDG PET
  log_v0_mean        -3.3783   -3.5659   -2.2573   -3.7160

Individual Parameters
--------------------------------------------------------------------------------
  tau_mean           [78.5057]
  tau_std            [8.4135]
  xi_std             [0.5298]

Noise Model
--------------------------------------------------------------------------------
                        MMSE     RAVLT       FAQ   FDG PET
  noise_std           0.0690    0.0762    0.0688    0.0797

Derived Parameters (interpretable scale)
--------------------------------------------------------------------------------
                        MMSE     RAVLT       FAQ   FDG PET
  v0                  0.0341    0.0283    0.1046    0.0243
                        MMSE     RAVLT       FAQ   FDG PET
  p0                  0.1804    0.7036    0.3685    0.5911
================================================================================

Interpreting the population parameters. The summary above describes the average disease trajectory through three population-level parameters:

  • tau_mean — the reference age (in years) at which the patients, on average, reaches the inflection point. It anchors the shared disease timeline.

  • v0 — the per-feature velocity at tau_mean. Features with larger v0 change faster around the reference age.

  • p0 — the per-feature value at tau_mean, on [0, 1]. Features with larger p0 are more advanced at tau_mean.

v0 and p0 appear under “Derived Parameters” in the summary. They are returned in interpretable scale by model.compute_derived_parameters() — the raw fitted values log_v0_mean and log_g_mean live in log / logit space and are not meant to be read directly.

derived = model.compute_derived_parameters()
for k, name in enumerate(model.features):
    v0_k = derived["v0"][k].item()
    p0_k = derived["p0"][k].item()
    print(f"  {name:<8}  v0 = {v0_k: .4f} / yr     p0 = {p0_k:.3f}")
print(f"  tau_mean = {float(model.parameters['tau_mean']):.2f} yr")
MMSE      v0 =  0.0341 / yr     p0 = 0.180
RAVLT     v0 =  0.0283 / yr     p0 = 0.704
FAQ       v0 =  0.1046 / yr     p0 = 0.368
FDG PET   v0 =  0.0243 / yr     p0 = 0.591
tau_mean = 78.51 yr

The fit method estimates the parameters of the model, which are then accessible through the summary method. The parameters are also stored in the parameters attribute of the model.

model.info()
================================================================================
                               Model Information
================================================================================
Statistical Model
Type: LogisticModel
Name: logistic
Dimension: 4
Source Dimension: 2
Observation Models: gaussian-diagonal
Parameters: 24

Latent Variables
--------------------------------------------------------------------------------
  Population:
    betas                Normal(betas_mean, betas_std)
    log_g                Normal(log_g_mean, log_g_std)
    log_v0               Normal(log_v0_mean, log_v0_std)
  Individual:
    sources              Normal(sources_mean, sources_std)
    tau                  Normal(tau_mean, tau_std)
    xi                   Normal(xi_mean, xi_std)
--------------------------------------------------------------------------------

Training Dataset
--------------------------------------------------------------------------------
Subjects: 200
Visits: 1975
Scores (Features): 4
Total Observations: 7900
Visits per Subject: Median 10.0 [Min 1, Max 22, IQR 5.0]

Training Details
--------------------------------------------------------------------------------
Algorithm: mcmc_saem
Seed: 42
Iterations: 100
  Burn-in: 90/100 (90%)
  Burn-out: 10
Duration: 23.265s

Hyperparameters (fixed values from the source code)
--------------------------------------------------------------------------------
  betas_std: 0.01
  log_g_std: 0.01
  log_v0_std: 0.01
  sources_mean: [0.0, 0.0]
  sources_std: 1.0
  xi_mean: 0.0

Leaspy Version: 2.1.0
================================================================================

The method info provides the model configuration and the settings used for the fit, as well as the dataset information and the training information.

Leaspy can also estimate the individual trajectories of each participant. This is done using a personalization algorithm, here scipy_minimize:

individual_parameters = model.personalize(
    data, "scipy_minimize", seed=0, progress_bar=False)
print(individual_parameters.to_dataframe())
Personalize with `AlgorithmName.PERSONALIZE_SCIPY_MINIMIZE` took: 1m 5.67s
        sources_0  sources_1        tau        xi
ID
GS-001   0.223385   0.512438  78.324982 -0.353727
GS-002  -0.506300  -0.748022  77.214973 -0.590474
GS-003   0.317889  -0.879936  77.222809  0.071537
GS-004   0.174103  -0.112546  78.973427  0.442772
GS-005   1.193052  -1.354831  85.799477 -0.044721
...           ...        ...        ...       ...
GS-196   0.942744  -0.561715  73.683899  0.321979
GS-197  -0.149789   1.022556  81.430946 -0.567142
GS-198  -0.022226  -0.173084  84.721840  0.183258
GS-199   1.624825  -2.199798  94.262115 -0.104598
GS-200   1.258487   0.027141  77.085251  0.819791

[200 rows x 4 columns]

We have seen how to fit a model and personalize it to individuals. Leaspy also provides various plotting functions to visualize the results. Let’s go to the next section to see how to plot the group-average trajectory and the individual trajectories using the Parkinson’s disease dataset.

To go further:

  1. See the User Guide and full API documentation.

  2. Explore additional examples.

Total running time of the script: (1 minutes 39.112 seconds)

Gallery generated by Sphinx-Gallery