leaspy.io.data
Submodules
Attributes
Classes
Methods to convert |
|
Main data container for a collection of individuals |
|
Data container based on |
|
Methods to convert |
|
Enumeration defining the possible names for observation models. |
|
Container for an individual's data |
|
Methods to convert |
|
Methods to convert |
Functions
|
Factory for observation models. |
Package Contents
- class AbstractDataframeDataReader
Methods to convert
pandas.DataFrameto Leaspy-compliant data containers.- Raises:
- time_rounding_digits = 6
- individuals: dict[leaspy.utils.typing.IDType, IndividualData]
- read(df, *, drop_full_nan=True, sort_index=False, warn_empty_column=True)
The method that effectively reads the input dataframe (automatically called in __init__).
- Parameters:
- df
pandas.DataFrame The dataframe to read.
- drop_full_nanbool
Should we drop rows full of nans? (except index)
- sort_indexbool
Should we lexsort index? (Keep False as default so not to break many of the downstream tests that check order…)
- warn_empty_columnbool
Should we warn when there are empty columns?
- df
- Parameters:
- Return type:
None
- class Data
Bases:
collections.abc.IterableMain data container for a collection of individuals
It can be iterated over and sliced, both of these operations being applied to the underlying individuals attribute.
- Attributes:
- individuals
Dict[IDType,IndividualData] Included individuals and their associated data
- iter_to_idx
Dict[int,IDType] Maps an integer index to the associated individual ID
- headers
List[FeatureType] Feature names
- dimension
int Number of features
- n_individuals
int Number of individuals
- n_visits
int Total number of visits
- cofactors
List[FeatureType] Feature names corresponding to cofactors
- event_time_name
str Name of the header that store the time at event in the original dataframe
- event_bool_name
str Name of the header that store the bool at event (censored or observed) in the original dataframe
- individuals
- individuals: dict[leaspy.utils.typing.IDType, IndividualData]
- property cofactors: list[leaspy.utils.typing.FeatureType]
Feature names corresponding to cofactors
- Returns:
List[FeatureType]:List of feature names corresponding to cofactors.
- Return type:
list[leaspy.utils.typing.FeatureType]
- load_cofactors(df, *, cofactors=None)
Load cofactors from a pandas.DataFrame to the Data object
- Parameters:
- df
pandas.DataFrame The dataframe where the cofactors are stored. Its index should be ID, the identifier of subjects and it should uniquely index the dataframe (i.e. one row per individual).
- cofactors
List[FeatureType], optional Names of the column(s) of dataframe which shall be loaded as cofactors. If None, all the columns from the input dataframe will be loaded as cofactors. Default: None
- df
- Parameters:
- Return type:
None
- static from_csv_file(path, data_type='visit', *, pd_read_csv_kws={}, facto_kws={}, **df_reader_kws)
Create a Data object from a CSV file.
- Parameters:
- path
str Path to the CSV file to load (with extension)
- data_type
str Type of data to read. Can be ‘visit’ or ‘event’.
- pd_read_csv_kws
dict Keyword arguments that are sent to
pandas.read_csv()- facto_kws
dict Keyword arguments
- **df_reader_kws
Keyword arguments that are sent to
AbstractDataframeDataReadertodataframe_data_reader_factory()
- path
- Returns:
Data:A Data object containing the data from the CSV file.
- Parameters:
- Return type:
- to_dataframe(*, cofactors=None, reset_index=True)
Convert the Data object to a
pandas.DataFrame- Parameters:
- cofactors
List[FeatureType] orint, optional Cofactors to include in the DataFrame. If None (default), no cofactors are included. If “all”, all the available cofactors are included. Default: None
- reset_index
bool, optional Whether to reset index levels in output. Default: True
- cofactors
- Returns:
pandas.DataFrame:A DataFrame containing the individuals’ ID, timepoints and associated observations (optional - and cofactors).
- Raises:
LeaspyDataInputErrorIf the Data object does not contain any cofactors.
LeaspyTypeErrorIf the cofactors argument is not of a valid type.
- Parameters:
- Return type:
- static from_dataframe(df, data_type='visit', factory_kws={}, **kws)
Create a Data object from a
DataFrame.- Parameters:
- df
pandas.DataFrame Dataframe containing ID, TIME and features.
- data_type
str Type of data to read. Can be ‘visit’, ‘event’, ‘joint’
- factory_kws
Dict Keyword arguments that are sent to
dataframe_data_reader_factory()- **kws
Keyword arguments that are sent to
DataframeDataReader
- df
- Returns:
Data
- Parameters:
- Return type:
- static from_individual_values(indices, timepoints=None, values=None, headers=None, event_time_name=None, event_bool_name=None, event_time=None, event_bool=None)
Construct Data from a collection of individual data points
- Parameters:
- indices
List[IDType] List of the individuals’ unique ID
- timepoints
List[List[float]] For each individual
i, list of timepoints associated with the observations. The number of such timepoints is notedn_timepoints_i- values
List[array-like[float,2D]] For each individual
i, two-dimensional array-like object containing observed data points. Its expected shape is(n_timepoints_i, n_features)- headers
List[FeatureType] Feature names. The number of features is noted
n_features
- indices
- Returns:
Data:A Data object containing the individuals and their data.
- Parameters:
- Return type:
- static from_individuals(individuals, headers=None, event_time_name=None, event_bool_name=None)
Construct Data from a list of individuals
- Parameters:
- individuals
List[IndividualData] List of individuals
- headers
List[FeatureType] List of feature names
- individuals
- Returns:
Data:A Data object containing the individuals and their data.
- Parameters:
individuals (list[IndividualData])
headers (Optional[list[leaspy.utils.typing.FeatureType]])
event_time_name (Optional[str])
event_bool_name (Optional[str])
- Return type:
- extract_longitudinal_only()
Extract longitudinal data from the Data object
- Returns:
Data:A Data object containing only longitudinal data.
- Raises:
LeaspyDataInputErrorIf the Data object does not contain any longitudinal data.
- Return type:
- class Dataset(data, *, no_warning=False)
Data container based on
torch.Tensor, used to run algorithms.- Parameters:
- data
Data Create Dataset from Data object
- no_warningbool (default False)
Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.
- data
- Attributes:
- headerslist[str]
Features names
- dimensionint
Number of features
- n_individualsint
Number of individuals
- indiceslist[ID]
Order of patients
- event_time: torch.FloatTensor
Time of an event, if the event is censored, the time correspond to the last patient observation
- event_bool: torch.BoolTensor
Boolean to indicate if an event is censored or not: 1 observed, 0 censored
- n_visits_per_individuallist[int]
Number of visits per individual
- n_visits_maxint
Maximum number of visits for one individual
- n_visitsint
Total number of visits
- n_observations_per_ind_per_ft
torch.LongTensor, shape (n_individuals, dimension) Number of observations (not taking into account missing values) per individual per feature
- n_observations_per_ft
torch.LongTensor, shape (dimension,) Total number of observations per feature
- n_observationsint
Total number of observations
- timepoints
torch.FloatTensor, shape (n_individuals, n_visits_max) Ages of patients at their different visits
- values
torch.FloatTensor, shape (n_individuals, n_visits_max, dimension) Values of patients for each visit for each feature
- mask
torch.FloatTensor, shape (n_individuals, n_visits_max, dimension) Binary mask associated to values. If 1: value is meaningful If 0: value is meaningless (either was nan or does not correspond to a real visit - only here for padding)
- L2_norm_per_ft
torch.FloatTensor, shape (dimension,) Sum of all non-nan squared values, feature per feature
- L2_normscalar
torch.FloatTensor Sum of all non-nan squared values
- no_warningbool (default False)
Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.
- _one_hot_encodingDict[sf: bool,
torch.LongTensor] Values of patients for each visit for each feature, but tensorized into a one-hot encoding (pdf or sf) Shapes of tensors are (n_individuals, n_visits_max, dimension, max_ordinal_level [-1 when sf=True])
- Raises:
LeaspyInputErrorif data, model or algo are not compatible together.
- Parameters:
- n_individuals
- indices
- no_warning = False
- get_times_patient(i)
Get ages for patient number
i- Parameters:
- iint
The index of the patient (<!> not its identifier)
- Returns:
torch.Tensor, shape (n_obs_of_patient,)Contains float
- Parameters:
i (int)
- Return type:
torch.FloatTensor
- get_event_patient(idx_patient)
Get ages at event for patient number
idx_patient- Parameters:
- idx_patientint
The index of the patient (<!> not its identifier)
- Returns:
torch.Tensor, shape (n_obs_of_patient,)Contains float
- Parameters:
idx_patient (int)
- Return type:
- get_values_patient(i, *, adapt_for_model=None)
Get values for patient number
i, with nans.- Parameters:
- iint
The index of the patient (<!> not its identifier)
- adapt_for_modelNone (default) or AbstractModel
The values returned are suited for this model. In particular:
For model with noise_model=’ordinal’ will return one-hot-encoded values [P(X = l), l=0..ordinal_max_level]
For model with noise_model=’ordinal_ranking’ will return survival function values [P(X > l), l=0..ordinal_max_level-1]
If None, we return the raw values, whatever the model is.
- Returns:
torch.Tensor, shape (n_obs_of_patient, dimension [, extra_dimension_for_ordinal_models])Contains float or nans
- Parameters:
i (int)
- Return type:
torch.FloatTensor
- to_pandas(apply_headers=False)
Convert dataset to a DataFrame with [‘ID’, ‘TIME’] index, with all covariates, events and repeated measures if apply_headers is False, and only the repeated measures otherwise.
- Parameters:
- apply_headersbool
Enable to select only the columns that are needed for leaspy fit (headers attribute)
- Returns:
- Parameters:
apply_headers (bool)
- Return type:
- move_to_device(device)
Moves the dataset to the specified device.
- Parameters:
- devicetorch.device
- Parameters:
device (device)
- Return type:
None
- get_one_hot_encoding(*, sf, ordinal_infos)
Builds the one-hot encoding of ordinal data once and for all and returns it.
- Parameters:
- sfbool
Whether the vector should be the survival function [1(X > l), l=0..max_level-1] instead of the probability density function [1(X=l), l=0..max_level]
- ordinal_infosdict[str, Any]
All the hyperparameters concerning ordinal modelling (in particular maximum level per features)
- Returns:
- One-hot encoding of data values.
- Parameters:
sf (bool)
ordinal_infos (leaspy.utils.typing.KwargsType)
- class EventDataframeDataReader(*, event_time_name='EVENT_TIME', event_bool_name='EVENT_BOOL', nb_events=None)
Bases:
leaspy.io.data.abstract_dataframe_data_reader.AbstractDataframeDataReaderMethods to convert
pandas.DataFrameto Leaspy-compliant data containers for event data only.- Parameters:
- event_time_name: str
Name of the columns in dataframe that contains the time of event
- event_bool_name: str
Name of the columns in dataframe that contains if the event is censored of not
- Raises:
- Parameters:
- event_time_name = 'EVENT_TIME'
- event_bool_name = 'EVENT_BOOL'
- nb_events = None
- DataframeDataReaderFactoryInput
- class DataframeDataReaderNames(*args, **kwds)
Bases:
enum.EnumEnumeration defining the possible names for observation models.
- EVENT = 'event'
- VISIT = 'visit'
- JOINT = 'joint'
- dataframe_data_reader_factory(reader, **kwargs)
Factory for observation models.
- Parameters:
- model
strorObservationModelordict[str, …] If an instance of a subclass of
ObservationModel, returns the instance.If a string, then returns a new instance of the appropriate class (with optional parameters kws).
If a dictionary, it must contain the ‘name’ key and other initialization parameters.
- **kwargs
Optional parameters for initializing the requested observation model when a string.
- model
- Returns:
ObservationModelThe desired observation model.
- Raises:
LeaspyModelInputErrorIf model is not supported.
- Parameters:
reader (DataframeDataReaderFactoryInput)
- Return type:
- class IndividualData(idx)
Container for an individual’s data
- Parameters:
- idxIDType
Unique ID
- Attributes:
- idxIDType
Unique ID
- timepointsnp.ndarray[float, 1D]
Timepoints associated with the observations
- observationsnp.ndarray[float, 2D]
Observed data points. Shape is
(n_timepoints, n_features)- cofactorsDict[FeatureType, Any]
Cofactors in the form {cofactor_name: cofactor_value}
- event_time: Float
Time of an event, if the event is censored, the time correspond to the last patient observation
- event_bool: bool
Boolean to indicate if an event is censored or not: 1 observed, 0 censored
- Parameters:
idx (leaspy.utils.typing.IDType)
- idx: leaspy.utils.typing.IDType
- add_observations(timepoints, observations)
Include new observations and associated timepoints
- add_event(event_time, event_bool)
Include event time and associated censoring bool
- class JointDataframeDataReader(*, event_time_name='EVENT_TIME', event_bool_name='EVENT_BOOL', nb_events=None)
Bases:
leaspy.io.data.abstract_dataframe_data_reader.AbstractDataframeDataReaderMethods to convert
pandas.DataFrameto Leaspy-compliant data containers for event data and longitudinal data.- Parameters:
- event_time_name: str
Name of the columns in dataframe that contains the time of event
- event_bool_name: str
Name of the columns in dataframe that contains if the event is censored of not
- Raises:
- Parameters:
- tol_diff = 0.001
- visit_reader
- event_reader
- property dimension: int | None
Number of longitudinal outcomes in dataset.
- Return type:
Optional[int]
- class VisitDataframeDataReader
Bases:
leaspy.io.data.abstract_dataframe_data_reader.AbstractDataframeDataReaderMethods to convert
pandas.DataFrameto Leaspy-compliant data containers for longitudinal data only. Raises ——LeaspyDataInputError