summarease.summarize_numeric

Functions

plot_numeric_density(dataset_numeric)

Generate density plots for each numeric column in the provided dataset. Each plot represents the

plot_correlation_heatmap(dataset_numeric)

Generate and save a correlation heatmap for the specified numeric columns in a dataset.

summarize_numeric(dataset[, summarize_by])

Summarize the numeric variables in the dataset by providing the summary statistics (e.g., mean,

Module Contents

summarease.summarize_numeric.plot_numeric_density(dataset_numeric: pandas.DataFrame)[source]

Generate density plots for each numeric column in the provided dataset. Each plot represents the distribution of values in a numeric column using a density estimate.

Parameters:

dataset_numericpd.DataFrame

A pandas DataFrame containing numeric columns. The function will generate a density plot for each numeric column in the dataset.

Returns:

alt.Chart

An Altair chart object representing a vertical concatenation of density plots for each numeric column. The plots are grouped into rows with a maximum of 4 plots per row.

Example:

>>> plot_numeric_density(dataset_numeric=df)
summarease.summarize_numeric.plot_correlation_heatmap(dataset_numeric: pandas.DataFrame)[source]

Generate and save a correlation heatmap for the specified numeric columns in a dataset.

Parameters:

datasetpd.DataFrame

The input dataset containing the data for the heatmap.

numeric_columnslist of str, optional

A list of column names to include in the correlation heatmap. If None, all numeric columns in the dataset will be used.

save_pathstr, optional

File path to save the generated heatmap. If None, the plot will not be saved.

Returns:

alt.Chart

The Altair chart visualizing the correlation heatmap.

Example:

>>> plot_correlation_heatmap(dataset=df, numeric_columns=["col1", "col2", "col3"], save_path="heatmap.png")
summarease.summarize_numeric.summarize_numeric(dataset: pandas.DataFrame, summarize_by: str = 'table')[source]

Summarize the numeric variables in the dataset by providing the summary statistics (e.g., mean, standard deviation, min, max, etc.) for each numeric column or plotting the correlation heatmap to visualize the relationships between numeric variables. The summary type provided is requested based on the summarize_by argument.

Parameters:

datasetpd.DataFrame

The dataset to analyze.

summarize_by (str):
The format for summarizing the numeric variables.

Options are “table” (default) or “plot”. If “table”, a summary table is generated with statistics for each numeric column. If “plot”, a correlation heatmap is displayed to visualize the correlation between numeric variables.

Returns:

A table of summary statistics or a plot (correlation heatmap), depending on the

summarize_by argument.

Notes:

  • The correlation heatmap is only applicable if there are two or more numeric columns in the dataset.

  • The summary statistics for numeric columns are computed using df.describe(), and additional details (such as count, mean, standard deviation, min, max, etc.) will be included.

Example:

>>> summarize_numeric(dataset=df, summarize_by="table")