summarease.summarize_target =========================== .. py:module:: summarease.summarize_target Functions --------- .. autoapisummary:: summarease.summarize_target.summarize_target_df summarease.summarize_target.summarize_target_balance_plot Module Contents --------------- .. py:function:: summarize_target_df(dataset_name: pandas.DataFrame, target_variable: str, target_type: str, threshold=0.2) Summarize and evaluate the target variable for categarical or numerical types. :param dataset_name: The input dataset containing target variable. :type dataset_name: DataFrame :param target_variable: The name of target column. :type target_variable: str :param target_type: The type of target variable. :type target_type: str, within {"categorical", "numerical"} :param threshold: Only feasible for "categorical" type to identify class imbalance. Default is 0.2. :type threshold: float, optional :returns: * *DataFrame* -- If target_type="categorical", returns a summary DataFrame containing classes, proportions, imbalance flag, and threshold. If target_type="numerical", returns the DataFrame with the basic statistical summary. * *Notes* * *-----* * *For categorical type, the function does not distinguish between binary and* -- multi-class classification. * **Balance criteria** (*Assume n classes, each class should between*) -- [(1-threshold)/n, (1+threshold)/n]. * **threshold** (*float, optional*) -- Only used if `target_type="categorical"`. It identifies class imbalance. User decides the threshold of imbalance. Typically, a target class is considered balanced if it varies within 20% of the average. Of course, users can choose a narrower balance range, such as 10%. .. rubric:: Examples >>> summarize_target( data, target_variable='target', target_type='categorical', threshold=0.2 ) .. py:function:: summarize_target_balance_plot(summary_df: pandas.DataFrame) Visualize the balance condition of a categorical target. :param summary_df: The input DataFrame, expected to match the output of summarize_target_df() with target_type="categorical". It must contain the columns ['class', 'proportion', 'imbalanced', 'threshold']. :type summary_df: DataFrame :returns: The Altair chart visualizing the balance of the categorical target variable. :rtype: alt.Chart .. rubric:: Notes The chart includes the following: - A bar plot for actual class proportions. - Expected proportion range (lower and upper bounds) as balance range. - Imbalance status for each class indicated by color. - Highlighted ticks for expected lower and upper bounds.