leaspy.io.data.dataset module

class Dataset(data: Data, model: AbstractModel = None, algo: AbstractAlgo = None)

Bases: object

Data container based on torch.Tensor, used to run algorithms.

Parameters

dataData: Create Dataset from Data object
modelAbstractModel (optional): If not None, will check compatibility of model and data
algoAbstractAlgo (optional): If not None, will check compatibility of algo and data

Raises

LeaspyInputError: if data, model or algo are not compatible together.

Attributes

headerslist[str]: Features names
dimensionint: Number of features
n_individualsint: Number of individuals
indiceslist[ID]: Order of patients
n_visits_per_individuallist[int]: Number of visits per individual
n_visits_maxint: Maximum number of visits for one individual
n_visitsint: Total number of visits
n_observations_per_ind_per_fttorch.LongTensor, shape (n_individuals, dimension): Number of observations (not taking into account missing values) per individual per feature
n_observations_per_fttorch.LongTensor, shape (dimension,): Total number of observations per feature
n_observationsint: Total number of observations
timepointstorch.FloatTensor, shape (n_individuals, n_visits_max): Ages of patients at their different visits
valuestorch.FloatTensor, shape (n_individuals, n_visits_max, dimension): Values of patients for each visit for each feature
masktorch.FloatTensor, shape (n_individuals, n_visits_max, dimension): Binary mask associated to values. If 1: value is meaningful If 0: value is meaningless (either was nan or does not correspond to a real visit - only here for padding)
L2_norm_per_fttorch.FloatTensor, shape (dimension,): Sum of all non-nan squared values, feature per feature
L2_normscalar torch.FloatTensor: Sum of all non-nan squared values
_one_hot_encodingDict[sf: bool, torch.LongTensor]: Values of patients for each visit for each feature, but tensorized into a one-hot encoding (pdf or sf) Shapes of tensors are (n_individuals, n_visits_max, dimension, max_ordinal_level [-1 when sf=True])

Methods

`get_one_hot_encoding`(*, sf, ordinal_infos)	Builds the one-hot encoding of ordinal data once and for all and returns it.
`get_times_patient`(i)	Get ages for patient number `i`
`get_values_patient`(i, *[, adapt_for_model])	Get values for patient number `i`, with nans.
`move_to_device`(device)	Moves the dataset to the specified device.
`to_pandas`()	Convert dataset to a DataFrame.

get_one_hot_encoding(*, sf: bool, ordinal_infos: Dict[str, Any])

Builds the one-hot encoding of ordinal data once and for all and returns it.

Parameters

sfbool: Whether the vector should be the survival function [1(X > l), l=0..max_level-1] instead of the probability density function [1(X=l), l=0..max_level]
ordinal_infosdict[str, Any]: All the hyperparameters concerning ordinal modelling (in particular maximum level per features)

Returns

One-hot encoding of data values.

get_times_patient(i: int) → FloatTensor

Get ages for patient number i

Parameters

iint: The index of the patient (<!> not its identifier)

Returns

torch.Tensor, shape (n_obs_of_patient,): Contains float

get_values_patient(i: int, *, adapt_for_model=None) → FloatTensor

Get values for patient number i, with nans.

Parameters

iint

The index of the patient (<!> not its identifier)

adapt_for_modelNone (default) or AbstractModel

The values returned are suited for this model. In particular:

For model with noise_model=’ordinal’ will return one-hot-encoded values [P(X = l), l=0..ordinal_max_level]

For model with noise_model=’ordinal_ranking’ will return survival function values [P(X > l), l=0..ordinal_max_level-1]

If None, we return the raw values, whatever the model is.

Returns

torch.Tensor, shape (n_obs_of_patient, dimension [, extra_dimension_for_ordinal_models]): Contains float or nans

move_to_device(device: device) → None

Moves the dataset to the specified device.

Parameters

devicetorch.device

to_pandas() → DataFrame

Convert dataset to a DataFrame.

Returns

pandas.DataFrame