Simulating Data with Leaspy#

This example demonstrates how to use Leaspy to simulate longitudinal data based on a fitted model.

The following imports bring in the required modules and load the synthetic Parkinson dataset from Leaspy. A logistic model will be fitted on this dataset and then used to simulate new longitudinal data.

from leaspy.datasets import load_dataset
from leaspy.io.data import Data

df = load_dataset("parkinson")

The clinical and imaging features of interest are selected and the DataFrame is converted into a Leaspy Data object that can be used for model fitting.

data = Data.from_dataframe(
    df[
        [
            "MDS1_total",
            "MDS2_total",
            "MDS3_off_total",
            "SCOPA_total",
            "MOCA_total",
            "REM_total",
            "PUTAMEN_R",
            "PUTAMEN_L",
            "CAUDATE_R",
            "CAUDATE_L",
        ]
    ]
)

A logistic model with a two-dimensional latent space is initialized.

from leaspy.models import LogisticModel

model = LogisticModel(name="test-model", source_dimension=2)

The model is fitted to the data using the MCMC-SAEM algorithm. A fixed seed is used for reproducibility and 100 iterations are performed.

model.fit(
    data,
    "mcmc_saem",
    n_iter=100,
    progress_bar=False,
)
/home/docs/checkouts/readthedocs.org/user_builds/leaspy/envs/2.0.2/lib/python3.11/site-packages/torch/__init__.py:1240: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /pytorch/torch/csrc/tensor/python_tensor.cpp:434.)
  _C._set_default_tensor_type(t)
Fit with `AlgorithmName.FIT_MCMC_SAEM` took: 4s

The parameters for simulating patient visits are defined. These parameters specify the number of patients, the visit spacing, and the timing variability.

visit_params = {
    "patient_number": 5,
    "visit_type": "random",  # The visit type could also be 'dataframe' with df_visits.
    # "df_visits": df_test           # Example for custom visit schedule.
    "first_visit_mean": 0.0,  # The mean of the first visit age/time.
    "first_visit_std": 0.4,  # The standard deviation of the first visit age/time.
    "time_follow_up_mean": 11,  # The mean follow-up time.
    "time_follow_up_std": 0.5,  # The standard deviation of the follow-up time.
    "distance_visit_mean": 2 / 12,  # The mean spacing between visits in years.
    "distance_visit_std": 0.75
    / 12,  # The standard deviation of the spacing between visits in years.
    "min_spacing_between_visits": 1,  # The minimum allowed spacing between visits.
}

A new longitudinal dataset is simulated from the fitted model using the specified parameters.

df_sim = model.simulate(
    algorithm="simulate",
    features=[
        "MDS1_total",
        "MDS2_total",
        "MDS3_off_total",
        "SCOPA_total",
        "MOCA_total",
        "REM_total",
        "PUTAMEN_R",
        "PUTAMEN_L",
        "CAUDATE_R",
        "CAUDATE_L",
    ],
    visit_parameters=visit_params,
)
Simulate with `simulate` took: 0s

The simulated data is converted back to a pandas DataFrame for inspection.

The simulated longitudinal dataset is displayed below.

print(df_sim)
   ID  TIME  MDS1_total  MDS2_total  MDS3_off_total  SCOPA_total  MOCA_total  \
0   0  66.0    0.145819    0.379388        0.131221     0.257127    0.036604   
1   0  67.0    0.115448    0.200207        0.313423     0.263239    0.056684   
2   0  68.0    0.206236    0.055621        0.275031     0.116821    0.098001   
3   0  69.0    0.213661    0.261213        0.237138     0.419975    0.097544   
4   0  70.0    0.145637    0.095558        0.160107     0.213323    0.224746   
.. ..   ...         ...         ...             ...          ...         ...   
57  4  89.0    0.228174    0.180634        0.504522     0.334634    0.147563   
58  4  90.0    0.184453    0.270096        0.354750     0.315006    0.135524   
59  4  91.0    0.106646    0.255901        0.278132     0.437487    0.109874   
60  4  92.0    0.152322    0.243124        0.151022     0.298422    0.190189   
61  4  93.0    0.441469    0.166017        0.309512     0.304210    0.136597   

    REM_total  PUTAMEN_R  PUTAMEN_L  CAUDATE_R  CAUDATE_L  
0    0.185669   0.742450   0.722027   0.553364   0.354767  
1    0.232306   0.835375   0.702342   0.486042   0.526465  
2    0.190168   0.815491   0.739033   0.593082   0.614752  
3    0.211290   0.864575   0.837883   0.734921   0.638288  
4    0.328726   0.562916   0.873017   0.484088   0.692437  
..        ...        ...        ...        ...        ...  
57   0.219133   0.862596   0.829867   0.594593   0.364344  
58   0.347209   0.870107   0.714928   0.505635   0.529377  
59   0.417505   0.859573   0.802358   0.580982   0.546227  
60   0.414299   0.908853   0.813048   0.661989   0.625155  
61   0.472368   0.714406   0.821625   0.670936   0.739525  

[62 rows x 12 columns]