leaspy.io.data.dataset module

class Dataset(data: Data, model: AbstractModel = None, algo: AbstractAlgo = None)

Bases: object

Data container based on torch.Tensor, used to run algorithms.

Parameters
dataData

Create Dataset from Data object

modelAbstractModel (optional)

If not None, will check compatibility of model and data

algoAbstractAlgo (optional)

If not None, will check compatibility of algo and data

Raises
LeaspyInputError

if data, model or algo are not compatible together.

Attributes
headerslist[str]

Features names

dimensionint

Number of features

n_individualsint

Number of individuals

indiceslist[ID]

Order of patients

n_visits_per_individuallist[int]

Number of visits per individual

n_visits_maxint

Maximum number of visits for one individual

n_visitsint

Total number of visits

n_observations_per_ind_per_fttorch.LongTensor, shape (n_individuals, dimension)

Number of observations (not taking into account missing values) per individual per feature

n_observations_per_fttorch.LongTensor, shape (dimension,)

Total number of observations per feature

n_observationsint

Total number of observations

timepointstorch.FloatTensor, shape (n_individuals, n_visits_max)

Ages of patients at their different visits

valuestorch.FloatTensor, shape (n_individuals, n_visits_max, dimension)

Values of patients for each visit for each feature

masktorch.FloatTensor, shape (n_individuals, n_visits_max, dimension)

Binary mask associated to values. If 1: value is meaningful If 0: value is meaningless (either was nan or does not correspond to a real visit - only here for padding)

L2_norm_per_fttorch.FloatTensor, shape (dimension,)

Sum of all non-nan squared values, feature per feature

L2_normscalar torch.FloatTensor

Sum of all non-nan squared values

_one_hot_encodingDict[sf: bool, torch.LongTensor]

Values of patients for each visit for each feature, but tensorized into a one-hot encoding (pdf or sf) Shapes of tensors are (n_individuals, n_visits_max, dimension, max_ordinal_level [-1 when sf=True])

Methods

get_one_hot_encoding(*, sf, ordinal_infos)

Builds the one-hot encoding of ordinal data once and for all and returns it.

get_times_patient(i)

Get ages for patient number i

get_values_patient(i, *[, adapt_for_model])

Get values for patient number i, with nans.

move_to_device(device)

Moves the dataset to the specified device.

to_pandas()

Convert dataset to a DataFrame.

get_one_hot_encoding(*, sf: bool, ordinal_infos: Dict[str, Any])

Builds the one-hot encoding of ordinal data once and for all and returns it.

Parameters
sfbool

Whether the vector should be the survival function [1(X > l), l=0..max_level-1] instead of the probability density function [1(X=l), l=0..max_level]

ordinal_infosdict[str, Any]

All the hyperparameters concerning ordinal modelling (in particular maximum level per features)

Returns
One-hot encoding of data values.
get_times_patient(i: int) FloatTensor

Get ages for patient number i

Parameters
iint

The index of the patient (<!> not its identifier)

Returns
torch.Tensor, shape (n_obs_of_patient,)

Contains float

get_values_patient(i: int, *, adapt_for_model=None) FloatTensor

Get values for patient number i, with nans.

Parameters
iint

The index of the patient (<!> not its identifier)

adapt_for_modelNone (default) or AbstractModel

The values returned are suited for this model. In particular:

  • For model with noise_model=’ordinal’ will return one-hot-encoded values [P(X = l), l=0..ordinal_max_level]

  • For model with noise_model=’ordinal_ranking’ will return survival function values [P(X > l), l=0..ordinal_max_level-1]

If None, we return the raw values, whatever the model is.

Returns
torch.Tensor, shape (n_obs_of_patient, dimension [, extra_dimension_for_ordinal_models])

Contains float or nans

move_to_device(device: device) None

Moves the dataset to the specified device.

Parameters
devicetorch.device
to_pandas() DataFrame

Convert dataset to a DataFrame.

Returns
pandas.DataFrame