leaspy.io.data.data#

Classes#

Data

Main data container for a collection of individuals

Module Contents#

class Data#

Bases: collections.abc.Iterable

Main data container for a collection of individuals

It can be iterated over and sliced, both of these operations being applied to the underlying individuals attribute.

Attributes:
individualsDict [IDType , IndividualData]

Included individuals and their associated data

iter_to_idxDict [int, IDType]

Maps an integer index to the associated individual ID

headersList [FeatureType]

Feature names

dimensionint

Number of features

n_individualsint

Number of individuals

n_visitsint

Total number of visits

cofactorsList [FeatureType]

Feature names corresponding to cofactors

event_time_namestr

Name of the header that store the time at event in the original dataframe

event_bool_namestr

Name of the header that store the bool at event (censored or observed) in the original dataframe

individuals: dict[IDType, IndividualData]#
iter_to_idx: dict[int, IDType]#
headers: list[FeatureType] | None = None#
event_time_name: str | None = None#
event_bool_name: str | None = None#
covariate_names: list[str] | None = None#
property dimension: int | None#

Number of features

Returns:
int or None:

Number of features in the dataset. If no features are present, returns None.

Return type:

Optional[int]

property n_individuals: int#

Number of individuals

Returns:
int:

Number of individuals in the dataset.

Return type:

int

property n_visits: int#

Total number of visits

Returns:
int:

Total number of visits in the dataset.

Return type:

int

property cofactors: list[FeatureType]#

Feature names corresponding to cofactors

Returns:
List [FeatureType]:

List of feature names corresponding to cofactors.

Return type:

list[FeatureType]

load_cofactors(df, *, cofactors=None)#

Load cofactors from a pandas.DataFrame to the Data object

Parameters:
dfpandas.DataFrame

The dataframe where the cofactors are stored. Its index should be ID, the identifier of subjects and it should uniquely index the dataframe (i.e. one row per individual).

cofactorsList [FeatureType], optional

Names of the column(s) of dataframe which shall be loaded as cofactors. If None, all the columns from the input dataframe will be loaded as cofactors. Default: None

Parameters:
Return type:

None

static from_csv_file(path, data_type='visit', *, pd_read_csv_kws={}, facto_kws={}, **df_reader_kws)#

Create a Data object from a CSV file.

Parameters:
pathstr

Path to the CSV file to load (with extension)

data_typestr

Type of data to read. Can be ‘visit’ or ‘event’.

pd_read_csv_kwsdict

Keyword arguments that are sent to pandas.read_csv()

facto_kwsdict

Keyword arguments

**df_reader_kws

Keyword arguments that are sent to AbstractDataframeDataReader to dataframe_data_reader_factory()

Returns:
Data:

A Data object containing the data from the CSV file.

Parameters:
Return type:

Data

to_dataframe(*, cofactors=None, reset_index=True)#

Convert the Data object to a pandas.DataFrame

Parameters:
cofactorsList [FeatureType] or int, optional

Cofactors to include in the DataFrame. If None (default), no cofactors are included. If “all”, all the available cofactors are included. Default: None

reset_indexbool, optional

Whether to reset index levels in output. Default: True

Returns:
pandas.DataFrame:

A DataFrame containing the individuals’ ID, timepoints and associated observations (optional - and cofactors).

Raises:
LeaspyDataInputError

If the Data object does not contain any cofactors.

LeaspyTypeError

If the cofactors argument is not of a valid type.

Parameters:
Return type:

DataFrame

static from_dataframe(df, data_type='visit', factory_kws={}, **kws)#

Create a Data object from a DataFrame.

Parameters:
dfpandas.DataFrame

Dataframe containing ID, TIME and features.

data_typestr

Type of data to read. Can be ‘visit’, ‘event’, ‘joint’

factory_kwsDict

Keyword arguments that are sent to dataframe_data_reader_factory()

**kws

Keyword arguments that are sent to DataframeDataReader

Returns:
Data
Parameters:
Return type:

Data

static from_individual_values(indices, timepoints=None, values=None, headers=None, event_time_name=None, event_bool_name=None, event_time=None, event_bool=None, covariate_names=None, covariates=None)#

Construct Data from a collection of individual data points

Parameters:
indicesList [IDType]

List of the individuals’ unique ID

timepointsList [List [float]]

For each individual i, list of timepoints associated with the observations. The number of such timepoints is noted n_timepoints_i

valuesList [array-like [float, 2D]]

For each individual i, two-dimensional array-like object containing observed data points. Its expected shape is (n_timepoints_i, n_features)

headersList [FeatureType]

Feature names. The number of features is noted n_features

Returns:
Data:

A Data object containing the individuals and their data.

Parameters:
Return type:

Data

static from_individuals(individuals, headers=None, event_time_name=None, event_bool_name=None, covariate_names=None)#

Construct Data from a list of individuals

Parameters:
individualsList [IndividualData]

List of individuals

headersList [FeatureType]

List of feature names

Returns:
Data:

A Data object containing the individuals and their data.

Parameters:
Return type:

Data

extract_longitudinal_only()#

Extract longitudinal data from the Data object

Returns:
Data:

A Data object containing only longitudinal data.

Raises:
LeaspyDataInputError

If the Data object does not contain any longitudinal data.

Return type:

Data