Variable Types#

Module: leaspy.variables.specs

Every node in the Variables DAG is an instance of one of six Python classes from leaspy.variables.specs. Understanding what each class means is the core skill for declaring variables in a new model via get_variables_specs().


Class Hierarchy#

VariableInterface          (abstract base)
├── IndepVariable          (no dependencies on other variables — root nodes in the DAG)
│   ├── Hyperparameter     — fixed constant, never learned
│   ├── DataVariable       — observed input data (t, y, ...)
│   ├── ModelParameter     — M-step: updated by SAEM maximization
│   └── LatentVariable     — E-step: sampled by MCMC
│       ├── PopulationLatentVariable  — one value shared across all patients
│       └── IndividualLatentVariable  — one value per patient
└── LinkedVariable         — deterministically computed from parents

All types share two boolean class attributes that the DAG and the State use:

Attribute

Meaning

Practical effect

is_settable

Can the State accept a direct assignment (state[name] = value) for this variable?

If True, state[name] = value works normally. If False, writing raises LeaspyInputError — prevents accidental modification of constants and computed values.

fixed_shape

Is the tensor shape determined by the model alone, without needing the dataset?

If True, the variable can be referenced inside a Normal(...) prior (e.g. Normal("xi_mean", "xi_std") works because both have known shapes). If False (shape depends on cohort size), it cannot be used as a prior parameter (e.g. a DataVariable or IndividualLatentVariable cannot appear inside Normal(...)).

Class

is_settable

fixed_shape

Hyperparameter

False

True

DataVariable

True

False

ModelParameter

True

True

PopulationLatentVariable

True

True

IndividualLatentVariable

True

False

LinkedVariable

False

False


Color Legend#

The colors in the DAG diagrams map directly to Python classes:

Color

Class

Role

pink

Hyperparameter

Fixed constant — set at definition time, never modified

orange

ModelParameter

Estimated quantity — updated by the M-step each iteration

plum

PopulationLatentVariable

Population-level random effect — sampled by MCMC in the E-step

blue

IndividualLatentVariable

Per-patient random effect — sampled per individual in the E-step

green

LinkedVariable

Deterministic function of other variables — no independent value

white

DataVariable

Observed input — injected from the dataset at runtime

wheat

(visual convention)

Observation model (likelihood / NLL) — built by ObservationModel, not a variable type


An Illustrated Example — Temporal Variability#

This sub-graph governs when each patient is positioned on the disease timeline. It is the smallest self-contained sub-graph of the Logistic model — yet it already contains five of the six variable types.

DAG of temporal variability

Reading this diagram:

  • Pink roots (xi_mean): a constant baked into the model definition. xi_mean = 0 sets the prior mean acceleration to \(e^0 = 1\). Declared as Hyperparameter(0.0).

  • Orange roots (xi_std, tau_mean, tau_std): estimated by the M-step. Declared with factories like ModelParameter.for_ind_std("xi", shape=(1,)).

  • Blue intermediate nodes (xi, tau): per-patient random effects. Declared as IndividualLatentVariable(Normal("xi_mean", "xi_std")). The prior is symbolic — it reads current values from the State at each E-step rather than fixing them at construction.

  • White root (t): observed visit ages. Declared as DataVariable().

  • Green leaves (alpha, rt): pure deterministic transforms. alpha = exp(xi) is declared as LinkedVariable(Exp("xi")) — the keyword argument name xi wires the edge automatically. rt uses time_reparametrization(*, t, alpha, tau), which wires all three edges.


Variable Types#

DataVariable()    # no arguments

A root node holding observed input data injected from the dataset before each E-step. Shape is unknown at definition time (depends on cohort size), so fixed_shape = False.

# In McmcSaemCompatibleModel.get_variables_specs()
t = DataVariable()   # observed visit ages  — shape (Ni, Nt) at runtime

# In ObservationModel.get_variables_specs()
y = DataVariable()   # observed biomarkers  — shape (Ni, Nt, Nfts) at runtime

Decision at a Glance#

Is the value OBSERVED (comes from the dataset)?
  └─ YES → DataVariable()

Is the value FIXED (never updated during fit)?
  └─ YES → Hyperparameter(value)

Is the value COMPUTED from other variables (no own tensor)?
  └─ YES → LinkedVariable(f)   ← f keyword-only; argument names = parent variable names

Does the variable need to be SAMPLED (random effect)?
  ├─ shared across ALL patients → PopulationLatentVariable(prior)
  └─ one value PER patient     → IndividualLatentVariable(prior)

Otherwise (optimized by the M-step):
  └─ ModelParameter(shape, suff_stats=Collect(...), update_rule=...)
     or a factory shortcut:
       ModelParameter.for_ind_mean(var, shape)
       ModelParameter.for_ind_std(var, shape)
       ModelParameter.for_pop_mean(var, shape)

For the complete dependency graph of the Logistic model see The Variables DAG.