leaspy.io.data.dataset
.Dataset
- class Dataset(data: Data, *, no_warning: bool = False)
Bases:
object
Data container based on
torch.Tensor
, used to run algorithms.- Parameters:
- data
Data
Create Dataset from Data object
- no_warningbool (default False)
Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.
- data
- Raises:
LeaspyInputError
if data, model or algo are not compatible together.
- Attributes:
- headerslist[str]
Features names
- dimensionint
Number of features
- n_individualsint
Number of individuals
- indiceslist[ID]
Order of patients
- n_visits_per_individuallist[int]
Number of visits per individual
- n_visits_maxint
Maximum number of visits for one individual
- n_visitsint
Total number of visits
- n_observations_per_ind_per_ft
torch.LongTensor
, shape (n_individuals, dimension) Number of observations (not taking into account missing values) per individual per feature
- n_observations_per_ft
torch.LongTensor
, shape (dimension,) Total number of observations per feature
- n_observationsint
Total number of observations
- timepoints
torch.FloatTensor
, shape (n_individuals, n_visits_max) Ages of patients at their different visits
- values
torch.FloatTensor
, shape (n_individuals, n_visits_max, dimension) Values of patients for each visit for each feature
- mask
torch.FloatTensor
, shape (n_individuals, n_visits_max, dimension) Binary mask associated to values. If 1: value is meaningful If 0: value is meaningless (either was nan or does not correspond to a real visit - only here for padding)
- L2_norm_per_ft
torch.FloatTensor
, shape (dimension,) Sum of all non-nan squared values, feature per feature
- L2_normscalar
torch.FloatTensor
Sum of all non-nan squared values
- no_warningbool (default False)
Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.
- _one_hot_encodingDict[sf: bool,
torch.LongTensor
] Values of patients for each visit for each feature, but tensorized into a one-hot encoding (pdf or sf) Shapes of tensors are (n_individuals, n_visits_max, dimension, max_ordinal_level [-1 when sf=True])
Methods
get_one_hot_encoding
(*, sf, ordinal_infos)Builds the one-hot encoding of ordinal data once and for all and returns it.
Get ages for patient number
i
get_values_patient
(i, *[, adapt_for_model])Get values for patient number
i
, with nans.move_to_device
(device)Moves the dataset to the specified device.
Convert dataset to a DataFrame with ['ID', 'TIME'] index.
- get_one_hot_encoding(*, sf: bool, ordinal_infos: Dict[str, Any])
Builds the one-hot encoding of ordinal data once and for all and returns it.
- Parameters:
- sfbool
Whether the vector should be the survival function [1(X > l), l=0..max_level-1] instead of the probability density function [1(X=l), l=0..max_level]
- ordinal_infosdict[str, Any]
All the hyperparameters concerning ordinal modelling (in particular maximum level per features)
- Returns:
- One-hot encoding of data values.
- get_times_patient(i: int) FloatTensor
Get ages for patient number
i
- Parameters:
- iint
The index of the patient (<!> not its identifier)
- Returns:
torch.Tensor
, shape (n_obs_of_patient,)Contains float
- get_values_patient(i: int, *, adapt_for_model=None) FloatTensor
Get values for patient number
i
, with nans.- Parameters:
- iint
The index of the patient (<!> not its identifier)
- adapt_for_modelNone (default) or AbstractModel
The values returned are suited for this model. In particular:
For model with noise_model=’ordinal’ will return one-hot-encoded values [P(X = l), l=0..ordinal_max_level]
For model with noise_model=’ordinal_ranking’ will return survival function values [P(X > l), l=0..ordinal_max_level-1]
If None, we return the raw values, whatever the model is.
- Returns:
torch.Tensor
, shape (n_obs_of_patient, dimension [, extra_dimension_for_ordinal_models])Contains float or nans
- move_to_device(device: device) None
Moves the dataset to the specified device.
- Parameters:
- devicetorch.device