# Leaspy in a nutshell

## Comprehensive example

We first load synthetic data from the leaspy.datasets to get of a grasp of longitudinal data.

```
>>> from leaspy import AlgorithmSettings, Data, Leaspy
>>> from leaspy.datasets import Loader
>>> alzheimer_df = Loader.load_dataset('alzheimer-multivariate')
>>> print(alzheimer_df.columns)
Index(['E-Cog Subject', 'E-Cog Study-partner', 'MMSE', 'RAVLT', 'FAQ',
'FDG PET', 'Hippocampus volume ratio'], dtype='object')
>>> alzheimer_df = alzheimer_df[['MMSE', 'RAVLT', 'FAQ', 'FDG PET']]
>>> print(alzheimer_df.head())
MMSE RAVLT FAQ FDG PET
ID TIME
GS-001 73.973183 0.111998 0.510524 0.178827 0.454605
74.573181 0.029991 0.749223 0.181327 0.450064
75.173180 0.121922 0.779680 0.026179 0.662006
75.773186 0.092102 0.649391 0.156153 0.585949
75.973183 0.203874 0.612311 0.320484 0.634809
```

The data correspond to repeated visits (TIME index) of different participants (ID index). Each visit corresponds to the measurement of 4 different variables : the MMSE, the RAVLT, the FAQ and the FDG PET.

If plotted, the data would look like the following:

where each color corresponds to a variable, and the connected dots corresponds to the repeated visits of a single participant.

Not very engaging, right ? To go a step further, let’s first encapsulate the data into the main leaspy Data container.

```
>>> data = Data.from_dataframe(alzheimer_df)
```

Leaspy core functionality is to estimate the group-average trajectory of the different variables that are measured in a population. Let’s initialize the leaspy object

```
>>> leaspy_logistic = Leaspy('logistic', source_dimension=2)
```

as well as the algorithm needed to estimate the group-average trajectory:

```
>>> fit_settings = AlgorithmSettings('mcmc_saem', seed=0, n_iter=8000)
```

We then use the Leaspy.fit method to estimate the group average trajectory:

```
>>> leaspy_logistic.fit(data, fit_settings)
==> Setting seed to 0
|##################################################| 8000/8000 iterations
Fit with `mcmc_saem` took: 6m 57s
The standard deviation of the noise at the end of the fit is:
MMSE: 6.50%
RAVLT: 7.63%
FAQ: 6.67%
FDG PET: 7.87%
```

If we were to plot the measured average progression of the variables - see started example notebook for details - it would look like the following

We can also derive the individual trajectory of each subject. To do this, we use the Leaspy.personalize method, again by providing the proper settings.

```
>>> personalize_settings = AlgorithmSettings('scipy_minimize', seed=0)
>>> individual_parameters = leaspy_logistic.personalize(data, personalize_settings)
==> Setting seed to 0
|##################################################| 200/200 subjects
Personalize with `scipy_minimize` took: 9s
The standard deviation of the noise at the end of the personalize is:
MMSE: 6.32%
RAVLT: 7.27%
FAQ: 6.29%
FDG PET: 7.49%
```

Plotting the input participant data against its personalization would give the following - see started example notebook for details.

## Using my own data

### Data format

Leaspy uses its own data container. To use it properly, you need to provide a csv file or a pandas.DataFrame in the right format. Let’s have a look at the data used in the previous example:

```
>>> print(alzheimer_df.head())
MMSE RAVLT FAQ FDG PET
ID TIME
GS-001 73.973183 0.111998 0.510524 0.178827 0.454605
74.573181 0.029991 0.749223 0.181327 0.450064
75.173180 0.121922 0.779680 0.026179 0.662006
75.773186 0.092102 0.649391 0.156153 0.585949
75.973183 0.203874 0.612311 0.320484 0.634809
```

You **MUST** have ID and TIME, either in index or in the columns. The other
columns must be the observed variables (also named *features* or *endpoints*).
In this fashion, you have one column per *feature* and one line per *visit*.

### Data scale & constraints

Leaspy uses *linear* and *logistic* models. The features **MUST** be increasing
with time. For the *logistic* model, you need to rescale your data between 0 and 1.

### Missing data

Leaspy automatically handles missing data as long as they are encoded as `nan`

in your pandas.DataFrame, or as empty values in your csv file.

## Going further

You can check the User guide and the full API documentation. You can also dive into the started example of the Leaspy repository. The Disease Progression Modelling website also hosts a mathematical introduction and tutorials for Leaspy.