leaspy.io.data.dataset module
- class Dataset(data: Data, model: AbstractModel = None, algo: AbstractAlgo = None)
Bases:
object
Data container based on
torch.Tensor
, used to run algorithms.- Parameters
- data
Data
Create Dataset from Data object
- model
AbstractModel
(optional) If not None, will check compatibility of model and data
- algo
AbstractAlgo
(optional) If not None, will check compatibility of algo and data
- data
- Raises
LeaspyInputError
if data, model or algo are not compatible together.
- Attributes
- headerslist[str]
Features names
- dimensionint
Number of features
- n_individualsint
Number of individuals
- indiceslist[ID]
Order of patients
- n_visits_per_individuallist[int]
Number of visits per individual
- n_visits_maxint
Maximum number of visits for one individual
- n_visitsint
Total number of visits
- n_observations_per_ind_per_ft
torch.LongTensor
, shape (n_individuals, dimension) Number of observations (not taking into account missing values) per individual per feature
- n_observations_per_ft
torch.LongTensor
, shape (dimension,) Total number of observations per feature
- n_observationsint
Total number of observations
- timepoints
torch.FloatTensor
, shape (n_individuals, n_visits_max) Ages of patients at their different visits
- values
torch.FloatTensor
, shape (n_individuals, n_visits_max, dimension) Values of patients for each visit for each feature
- mask
torch.FloatTensor
, shape (n_individuals, n_visits_max, dimension) Binary mask associated to values. If 1: value is meaningful If 0: value is meaningless (either was nan or does not correspond to a real visit - only here for padding)
- L2_norm_per_ft
torch.FloatTensor
, shape (dimension,) Sum of all non-nan squared values, feature per feature
- L2_normscalar
torch.FloatTensor
Sum of all non-nan squared values
- _one_hot_encodingDict[sf: bool,
torch.LongTensor
] Values of patients for each visit for each feature, but tensorized into a one-hot encoding (pdf or sf) Shapes of tensors are (n_individuals, n_visits_max, dimension, max_ordinal_level [-1 when sf=True])
Methods
get_one_hot_encoding
(*, sf, ordinal_infos)Builds the one-hot encoding of ordinal data once and for all and returns it.
Get ages for patient number
i
get_values_patient
(i, *[, adapt_for_model])Get values for patient number
i
, with nans.move_to_device
(device)Moves the dataset to the specified device.
Convert dataset to a DataFrame.
- get_one_hot_encoding(*, sf: bool, ordinal_infos: Dict[str, Any])
Builds the one-hot encoding of ordinal data once and for all and returns it.
- Parameters
- sfbool
Whether the vector should be the survival function [1(X > l), l=0..max_level-1] instead of the probability density function [1(X=l), l=0..max_level]
- ordinal_infosdict[str, Any]
All the hyperparameters concerning ordinal modelling (in particular maximum level per features)
- Returns
- One-hot encoding of data values.
- get_times_patient(i: int) FloatTensor
Get ages for patient number
i
- Parameters
- iint
The index of the patient (<!> not its identifier)
- Returns
torch.Tensor
, shape (n_obs_of_patient,)Contains float
- get_values_patient(i: int, *, adapt_for_model=None) FloatTensor
Get values for patient number
i
, with nans.- Parameters
- iint
The index of the patient (<!> not its identifier)
- adapt_for_modelNone (default) or AbstractModel
The values returned are suited for this model. In particular:
For model with noise_model=’ordinal’ will return one-hot-encoded values [P(X = l), l=0..ordinal_max_level]
For model with noise_model=’ordinal_ranking’ will return survival function values [P(X > l), l=0..ordinal_max_level-1]
If None, we return the raw values, whatever the model is.
- Returns
torch.Tensor
, shape (n_obs_of_patient, dimension [, extra_dimension_for_ordinal_models])Contains float or nans
- move_to_device(device: device) None
Moves the dataset to the specified device.
- Parameters
- devicetorch.device