Summarize_numeric¶
This tutorial will guide you through the summarease.summarize_numeric module, which provides tools to generate summary statistics or visualizations for numeric variables in a dataset, helping you understand key patterns and relationships between your numeric data.
Getting started¶
The summarease.summarize_numeric module offers the following core functionalities:
Summarizing Numeric Variables:
Outputs summary statistics for each numeric column (mean, standard deviation, min, max, etc.).
Useful for quickly understanding the distribution and spread of numeric features.
Visualizing Numeric Relationships:
Generates density plots for each numeric column to visualize their distributions.
Creates a correlation heatmap to explore the relationships between multiple numeric variables.
Necessary libraries¶
To use the summarease.summarize_numeric module, ensure the following libraries are installed:
import pandas as pd
import altair as alt
from summarease.summarize_numeric import summarize_numeric, plot_numeric_density, plot_correlation_heatmap
Example dataset¶
We’ll use the following dataset to demonstrate the module’s functionality:
Dataset: Numeric Data¶
numeric_data = pd.DataFrame({
'feature1': [1.2, 2.3, 3.1, 4.8, 5.5, 6.7, 8.9, 10.1],
'feature2': [3.2, 4.5, 5.1, 6.0, 7.8, 8.3, 9.1, 10.7],
'feature3': [1.1, 2.2, 3.1, 4.0, 5.4, 6.6, 7.7, 8.5]
})
Example usage¶
summarize_numeric¶
This function calculates summary statistics or generates visualizations for numeric columns. It takes two parameters:
dataset: The input dataset containing numeric variables. It must be a DataFrame.summarize_by: Specifies the format for summarizing the numeric variables. It can be either"table"(default) to generate a summary table, or"plot"to generate visualizations like density plots and a correlation heatmap.
Example 1: Summarizing with a Table¶
# Summarize numeric variables in the dataset by generating summary statistics
summary_table = summarize_numeric(dataset=numeric_data, summarize_by="table")
summary_table["numeric_describe"]
| feature1 | feature2 | feature3 | |
|---|---|---|---|
| count | 8.000000 | 8.000000 | 8.000000 |
| mean | 5.325000 | 6.837500 | 4.825000 |
| std | 3.137219 | 2.550035 | 2.663912 |
| min | 1.200000 | 3.200000 | 1.100000 |
| 25% | 2.900000 | 4.950000 | 2.875000 |
| 50% | 5.150000 | 6.900000 | 4.700000 |
| 75% | 7.250000 | 8.500000 | 6.875000 |
| max | 10.100000 | 10.700000 | 8.500000 |
Example 2: Visualizing Numeric Relationships¶
# Visualize the distribution of numeric variables and their correlations
numeric_plots = summarize_numeric(dataset=numeric_data, summarize_by="plot")
numeric_plots["numeric_plot"].show() # Display density plots
numeric_plots["corr_plot"].show() # Display correlation heatmap
In conclusion, the summarease.summarize_numeric module provides an easy way to explore and understand your dataset’s numeric features. Whether you’re looking for summary statistics or visualizations to analyze distributions and correlations, this tool will help you gain deeper insights into your data.
Final notes¶
If you get an error or something went wrong during the usage of the function, you can always submit an issue in the github repo which will be addressed as soon as possible. Thanks for your time!