leaspy.io.data.data#

Classes#

Data

Main data container for a collection of individuals

Module Contents#

class Data#

Bases: collections.abc.Iterable

Main data container for a collection of individuals

It can be iterated over and sliced, both of these operations being applied to the underlying individuals attribute.

Attributes:

individualsDict [IDType , IndividualData]: Included individuals and their associated data
iter_to_idxDict [int, IDType]: Maps an integer index to the associated individual ID
headersList [FeatureType]: Feature names
dimensionint: Number of features
n_individualsint: Number of individuals
n_visitsint: Total number of visits
cofactorsList [FeatureType]: Feature names corresponding to cofactors
event_time_namestr: Name of the header that store the time at event in the original dataframe
event_bool_namestr: Name of the header that store the bool at event (censored or observed) in the original dataframe

individuals: dict[IDType, IndividualData]#

iter_to_idx: dict[int, IDType]#

headers: list[FeatureType] | None = None#

event_time_name: str | None = None#

event_bool_name: str | None = None#

covariate_names: list[str] | None = None#

property dimension: int | None#

Number of features

Returns:

int or None:: Number of features in the dataset. If no features are present, returns None.

Return type:

Optional[int]

property n_individuals: int#

Number of individuals

Returns:

int:: Number of individuals in the dataset.

Return type:

int

property n_visits: int#

Total number of visits

Returns:

int:: Total number of visits in the dataset.

Return type:

int

property cofactors: list[FeatureType]#

Feature names corresponding to cofactors

Returns:

List [FeatureType]:: List of feature names corresponding to cofactors.

Return type:

list[FeatureType]

load_cofactors(df, *, cofactors=None)#

Load cofactors from a pandas.DataFrame to the Data object

Parameters:

dfpandas.DataFrame: The dataframe where the cofactors are stored. Its index should be ID, the identifier of subjects and it should uniquely index the dataframe (i.e. one row per individual).
cofactorsList [FeatureType], optional: Names of the column(s) of dataframe which shall be loaded as cofactors. If None, all the columns from the input dataframe will be loaded as cofactors. Default: None

Parameters:

df (DataFrame)
cofactors (Optional[list[FeatureType]])

Return type:

None

static from_csv_file(path, data_type='visit', *, pd_read_csv_kws={}, facto_kws={}, **df_reader_kws)#

Create a Data object from a CSV file.

Parameters:

pathstr: Path to the CSV file to load (with extension)
data_typestr: Type of data to read. Can be ‘visit’ or ‘event’.
pd_read_csv_kwsdict: Keyword arguments that are sent to pandas.read_csv()
facto_kwsdict: Keyword arguments
**df_reader_kws: Keyword arguments that are sent to AbstractDataframeDataReader to dataframe_data_reader_factory()

Returns:

Data:: A Data object containing the data from the CSV file.

Parameters:

path (str)
data_type (str)
pd_read_csv_kws (dict)
facto_kws (dict)

Return type:

Data

to_dataframe(*, cofactors=None, reset_index=True)#

Convert the Data object to a pandas.DataFrame

Parameters:

cofactorsList [FeatureType] or int, optional: Cofactors to include in the DataFrame. If None (default), no cofactors are included. If “all”, all the available cofactors are included. Default: None
reset_indexbool, optional: Whether to reset index levels in output. Default: True

Returns:

pandas.DataFrame:: A DataFrame containing the individuals’ ID, timepoints and associated observations (optional - and cofactors).

Raises:

LeaspyDataInputError: If the Data object does not contain any cofactors.
LeaspyTypeError: If the cofactors argument is not of a valid type.

Parameters:

cofactors (Optional[Union[list[FeatureType], str]])
reset_index (bool)

Return type:

DataFrame

static from_dataframe(df, data_type='visit', factory_kws={}, **kws)#

Create a Data object from a DataFrame.

Parameters:

dfpandas.DataFrame: Dataframe containing ID, TIME and features.
data_typestr: Type of data to read. Can be ‘visit’, ‘event’, ‘joint’
factory_kwsDict: Keyword arguments that are sent to dataframe_data_reader_factory()
**kws: Keyword arguments that are sent to DataframeDataReader

Returns:

Data

Parameters:

df (DataFrame)
data_type (str)
factory_kws (dict)

Return type:

Data

static from_individual_values(indices, timepoints=None, values=None, headers=None, event_time_name=None, event_bool_name=None, event_time=None, event_bool=None, covariate_names=None, covariates=None)#

Construct Data from a collection of individual data points

Parameters:

indicesList [IDType]: List of the individuals’ unique ID
timepointsList [List [float]]: For each individual i, list of timepoints associated with the observations. The number of such timepoints is noted n_timepoints_i
valuesList [array-like [float, 2D]]: For each individual i, two-dimensional array-like object containing observed data points. Its expected shape is (n_timepoints_i, n_features)
headersList [FeatureType]: Feature names. The number of features is noted n_features

Returns:

Data:: A Data object containing the individuals and their data.

Parameters:

indices (list[IDType])
timepoints (Optional[list[list[float]]])
values (Optional[list[list[list[float]]]])
headers (Optional[list[FeatureType]])
event_time_name (Optional[str])
event_bool_name (Optional[str])
event_time (Optional[list[list[float]]])
event_bool (Optional[list[list[int]]])
covariate_names (Optional[list[str]])
covariates (Optional[list[list[int]]])

Return type:

Data

static from_individuals(individuals, headers=None, event_time_name=None, event_bool_name=None, covariate_names=None)#

Construct Data from a list of individuals

Parameters:

individualsList [IndividualData]: List of individuals
headersList [FeatureType]: List of feature names

Returns:

Data:: A Data object containing the individuals and their data.

Parameters:

individuals (list[IndividualData])
headers (Optional[list[FeatureType]])
event_time_name (Optional[str])
event_bool_name (Optional[str])
covariate_names (Optional[list[str]])

Return type:

Data

extract_longitudinal_only()#

Extract longitudinal data from the Data object

Returns:

Data:: A Data object containing only longitudinal data.

Raises:

LeaspyDataInputError: If the Data object does not contain any longitudinal data.

Return type:

Data

leaspy.io.data.data#

Classes#

Module Contents#

This Page