leaspy.io.data.data#
Classes#
Main data container for a collection of individuals |
Module Contents#
- class Data#
Bases:
collections.abc.IterableMain data container for a collection of individuals
It can be iterated over and sliced, both of these operations being applied to the underlying individuals attribute.
- Attributes:
- individuals
Dict[IDType,IndividualData] Included individuals and their associated data
- iter_to_idx
Dict[int,IDType] Maps an integer index to the associated individual ID
- headers
List[FeatureType] Feature names
- dimension
int Number of features
- n_individuals
int Number of individuals
- n_visits
int Total number of visits
- cofactors
List[FeatureType] Feature names corresponding to cofactors
- event_time_name
str Name of the header that store the time at event in the original dataframe
- event_bool_name
str Name of the header that store the bool at event (censored or observed) in the original dataframe
- individuals
- individuals: dict[IDType, IndividualData]#
- headers: list[FeatureType] | None = None#
- property cofactors: list[FeatureType]#
Feature names corresponding to cofactors
- Returns:
List[FeatureType]:List of feature names corresponding to cofactors.
- Return type:
- load_cofactors(df, *, cofactors=None)#
Load cofactors from a pandas.DataFrame to the Data object
- Parameters:
- df
pandas.DataFrame The dataframe where the cofactors are stored. Its index should be ID, the identifier of subjects and it should uniquely index the dataframe (i.e. one row per individual).
- cofactors
List[FeatureType], optional Names of the column(s) of dataframe which shall be loaded as cofactors. If None, all the columns from the input dataframe will be loaded as cofactors. Default: None
- df
- Parameters:
df (DataFrame)
cofactors (Optional[list[FeatureType]])
- Return type:
None
- static from_csv_file(path, data_type='visit', *, pd_read_csv_kws={}, facto_kws={}, **df_reader_kws)#
Create a Data object from a CSV file.
- Parameters:
- path
str Path to the CSV file to load (with extension)
- data_type
str Type of data to read. Can be ‘visit’ or ‘event’.
- pd_read_csv_kws
dict Keyword arguments that are sent to
pandas.read_csv()- facto_kws
dict Keyword arguments
- **df_reader_kws
Keyword arguments that are sent to
AbstractDataframeDataReadertodataframe_data_reader_factory()
- path
- Returns:
Data:A Data object containing the data from the CSV file.
- Parameters:
- Return type:
- to_dataframe(*, cofactors=None, reset_index=True)#
Convert the Data object to a
pandas.DataFrame- Parameters:
- cofactors
List[FeatureType] orint, optional Cofactors to include in the DataFrame. If None (default), no cofactors are included. If “all”, all the available cofactors are included. Default: None
- reset_index
bool, optional Whether to reset index levels in output. Default: True
- cofactors
- Returns:
pandas.DataFrame:A DataFrame containing the individuals’ ID, timepoints and associated observations (optional - and cofactors).
- Raises:
LeaspyDataInputErrorIf the Data object does not contain any cofactors.
LeaspyTypeErrorIf the cofactors argument is not of a valid type.
- Parameters:
cofactors (Optional[Union[list[FeatureType], str]])
reset_index (bool)
- Return type:
- static from_dataframe(df, data_type='visit', factory_kws={}, **kws)#
Create a Data object from a
DataFrame.- Parameters:
- df
pandas.DataFrame Dataframe containing ID, TIME and features.
- data_type
str Type of data to read. Can be ‘visit’, ‘event’, ‘joint’
- factory_kws
Dict Keyword arguments that are sent to
dataframe_data_reader_factory()- **kws
Keyword arguments that are sent to
DataframeDataReader
- df
- Returns:
Data
- Parameters:
- Return type:
- static from_individual_values(indices, timepoints=None, values=None, headers=None, event_time_name=None, event_bool_name=None, event_time=None, event_bool=None, covariate_names=None, covariates=None)#
Construct Data from a collection of individual data points
- Parameters:
- indices
List[IDType] List of the individuals’ unique ID
- timepoints
List[List[float]] For each individual
i, list of timepoints associated with the observations. The number of such timepoints is notedn_timepoints_i- values
List[array-like[float,2D]] For each individual
i, two-dimensional array-like object containing observed data points. Its expected shape is(n_timepoints_i, n_features)- headers
List[FeatureType] Feature names. The number of features is noted
n_features
- indices
- Returns:
Data:A Data object containing the individuals and their data.
- Parameters:
headers (Optional[list[FeatureType]])
event_time_name (Optional[str])
event_bool_name (Optional[str])
- Return type:
- static from_individuals(individuals, headers=None, event_time_name=None, event_bool_name=None, covariate_names=None)#
Construct Data from a list of individuals
- Parameters:
- individuals
List[IndividualData] List of individuals
- headers
List[FeatureType] List of feature names
- individuals
- Returns:
Data:A Data object containing the individuals and their data.
- Parameters:
individuals (list[IndividualData])
headers (Optional[list[FeatureType]])
event_time_name (Optional[str])
event_bool_name (Optional[str])
- Return type:
- extract_longitudinal_only()#
Extract longitudinal data from the Data object
- Returns:
Data:A Data object containing only longitudinal data.
- Raises:
LeaspyDataInputErrorIf the Data object does not contain any longitudinal data.
- Return type: