summarease.summarize_target¶
Functions¶
|
Summarize and evaluate the target variable for categarical or numerical types. |
|
Visualize the balance condition of a categorical target. |
Module Contents¶
- summarease.summarize_target.summarize_target_df(dataset_name: pandas.DataFrame, target_variable: str, target_type: str, threshold=0.2)[source]¶
Summarize and evaluate the target variable for categarical or numerical types.
- Parameters:
dataset_name (DataFrame) – The input dataset containing target variable.
target_variable (str) – The name of target column.
target_type (str, within {"categorical", "numerical"}) – The type of target variable.
threshold (float, optional) – Only feasible for “categorical” type to identify class imbalance. Default is 0.2.
- Returns:
DataFrame –
- If target_type=”categorical”, returns a summary DataFrame
containing classes, proportions, imbalance flag, and threshold.
- If target_type=”numerical”, returns the DataFrame with the basic
statistical summary.
Notes
—–
For categorical type, the function does not distinguish between binary and – multi-class classification.
Balance criteria (Assume n classes, each class should between) – [(1-threshold)/n, (1+threshold)/n].
threshold (float, optional) – Only used if target_type=”categorical”. It identifies class imbalance. User decides the threshold of imbalance. Typically, a target class is considered balanced if it varies within 20% of the average. Of course, users can choose a narrower balance range, such as 10%.
Examples
>>> summarize_target( data, target_variable='target', target_type='categorical', threshold=0.2 )
- summarease.summarize_target.summarize_target_balance_plot(summary_df: pandas.DataFrame)[source]¶
Visualize the balance condition of a categorical target.
- Parameters:
summary_df (DataFrame) – The input DataFrame, expected to match the output of summarize_target_df() with target_type=”categorical”. It must contain the columns [‘class’, ‘proportion’, ‘imbalanced’, ‘threshold’].
- Returns:
The Altair chart visualizing the balance of the categorical target variable.
- Return type:
alt.Chart
Notes
- The chart includes the following:
A bar plot for actual class proportions.
Expected proportion range (lower and upper bounds) as balance range.
Imbalance status for each class indicated by color.
Highlighted ticks for expected lower and upper bounds.