summarease.summarize_target

Functions

summarize_target_df(dataset_name, target_variable, ...)

Summarize and evaluate the target variable for categarical or numerical types.

summarize_target_balance_plot(summary_df)

Visualize the balance condition of a categorical target.

Module Contents

summarease.summarize_target.summarize_target_df(dataset_name: pandas.DataFrame, target_variable: str, target_type: str, threshold=0.2)[source]

Summarize and evaluate the target variable for categarical or numerical types.

Parameters:
  • dataset_name (DataFrame) – The input dataset containing target variable.

  • target_variable (str) – The name of target column.

  • target_type (str, within {"categorical", "numerical"}) – The type of target variable.

  • threshold (float, optional) – Only feasible for “categorical” type to identify class imbalance. Default is 0.2.

Returns:

  • DataFrame

    If target_type=”categorical”, returns a summary DataFrame

    containing classes, proportions, imbalance flag, and threshold.

    If target_type=”numerical”, returns the DataFrame with the basic

    statistical summary.

  • Notes

  • —–

  • For categorical type, the function does not distinguish between binary and – multi-class classification.

  • Balance criteria (Assume n classes, each class should between) – [(1-threshold)/n, (1+threshold)/n].

  • threshold (float, optional) – Only used if target_type=”categorical”. It identifies class imbalance. User decides the threshold of imbalance. Typically, a target class is considered balanced if it varies within 20% of the average. Of course, users can choose a narrower balance range, such as 10%.

Examples

>>> summarize_target(
data, target_variable='target', target_type='categorical', threshold=0.2
)
summarease.summarize_target.summarize_target_balance_plot(summary_df: pandas.DataFrame)[source]

Visualize the balance condition of a categorical target.

Parameters:

summary_df (DataFrame) – The input DataFrame, expected to match the output of summarize_target_df() with target_type=”categorical”. It must contain the columns [‘class’, ‘proportion’, ‘imbalanced’, ‘threshold’].

Returns:

The Altair chart visualizing the balance of the categorical target variable.

Return type:

alt.Chart

Notes

The chart includes the following:
  • A bar plot for actual class proportions.

  • Expected proportion range (lower and upper bounds) as balance range.

  • Imbalance status for each class indicated by color.

  • Highlighted ticks for expected lower and upper bounds.