leaspy.io.data.dataset.Dataset

class Dataset(data: Data, *, no_warning: bool = False)

Bases: object

Data container based on torch.Tensor, used to run algorithms.

Parameters:
dataData

Create Dataset from Data object

no_warningbool (default False)

Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.

Raises:
LeaspyInputError

if data, model or algo are not compatible together.

Attributes:
headerslist[str]

Features names

dimensionint

Number of features

n_individualsint

Number of individuals

indiceslist[ID]

Order of patients

n_visits_per_individuallist[int]

Number of visits per individual

n_visits_maxint

Maximum number of visits for one individual

n_visitsint

Total number of visits

n_observations_per_ind_per_fttorch.LongTensor, shape (n_individuals, dimension)

Number of observations (not taking into account missing values) per individual per feature

n_observations_per_fttorch.LongTensor, shape (dimension,)

Total number of observations per feature

n_observationsint

Total number of observations

timepointstorch.FloatTensor, shape (n_individuals, n_visits_max)

Ages of patients at their different visits

valuestorch.FloatTensor, shape (n_individuals, n_visits_max, dimension)

Values of patients for each visit for each feature

masktorch.FloatTensor, shape (n_individuals, n_visits_max, dimension)

Binary mask associated to values. If 1: value is meaningful If 0: value is meaningless (either was nan or does not correspond to a real visit - only here for padding)

L2_norm_per_fttorch.FloatTensor, shape (dimension,)

Sum of all non-nan squared values, feature per feature

L2_normscalar torch.FloatTensor

Sum of all non-nan squared values

no_warningbool (default False)

Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.

_one_hot_encodingDict[sf: bool, torch.LongTensor]

Values of patients for each visit for each feature, but tensorized into a one-hot encoding (pdf or sf) Shapes of tensors are (n_individuals, n_visits_max, dimension, max_ordinal_level [-1 when sf=True])

Methods

get_one_hot_encoding(*, sf, ordinal_infos)

Builds the one-hot encoding of ordinal data once and for all and returns it.

get_times_patient(i)

Get ages for patient number i

get_values_patient(i, *[, adapt_for_model])

Get values for patient number i, with nans.

move_to_device(device)

Moves the dataset to the specified device.

to_pandas()

Convert dataset to a DataFrame with ['ID', 'TIME'] index.

get_one_hot_encoding(*, sf: bool, ordinal_infos: Dict[str, Any])

Builds the one-hot encoding of ordinal data once and for all and returns it.

Parameters:
sfbool

Whether the vector should be the survival function [1(X > l), l=0..max_level-1] instead of the probability density function [1(X=l), l=0..max_level]

ordinal_infosdict[str, Any]

All the hyperparameters concerning ordinal modelling (in particular maximum level per features)

Returns:
One-hot encoding of data values.
get_times_patient(i: int) FloatTensor

Get ages for patient number i

Parameters:
iint

The index of the patient (<!> not its identifier)

Returns:
torch.Tensor, shape (n_obs_of_patient,)

Contains float

get_values_patient(i: int, *, adapt_for_model=None) FloatTensor

Get values for patient number i, with nans.

Parameters:
iint

The index of the patient (<!> not its identifier)

adapt_for_modelNone (default) or AbstractModel

The values returned are suited for this model. In particular:

  • For model with noise_model=’ordinal’ will return one-hot-encoded values [P(X = l), l=0..ordinal_max_level]

  • For model with noise_model=’ordinal_ranking’ will return survival function values [P(X > l), l=0..ordinal_max_level-1]

If None, we return the raw values, whatever the model is.

Returns:
torch.Tensor, shape (n_obs_of_patient, dimension [, extra_dimension_for_ordinal_models])

Contains float or nans

move_to_device(device: device) None

Moves the dataset to the specified device.

Parameters:
devicetorch.device
to_pandas() DataFrame

Convert dataset to a DataFrame with [‘ID’, ‘TIME’] index.

Returns:
pandas.DataFrame