{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Summarize_dtype\n",
"\n",
"This tutorial will guide you through the `summarize_dtypes_table` function, which provides a simple way to analyze and summarize data types within a dataset, making it easier to evaluate the data's structure.\n",
"\n",
"## Getting started\n",
"\n",
"The `summarize_dtypes_table` function offers the following core functionalities:\n",
"\n",
"**1. Summarizing Data Types:**\n",
"- Analyzes the input DataFrame to identify data types.\n",
"- Outputs a summary table with the counts of each data type.\n",
"- Converts data types to string format for consistent representation.\n",
"\n",
"**2. Error Handling:**\n",
"- Ensures the input is a valid pandas DataFrame.\n",
"- Raises a `TypeError` for invalid input types.\n",
"\n",
"### Function Parameters\n",
"\n",
"The `summarize_dtypes_table` function accepts the following parameter:\n",
"\n",
"- dataset: The input dataset to analyze. It must be a pandas DataFrame.\n",
"\n",
"### Function Output\n",
"\n",
"The function returns a pandas DataFrame summarizing the counts of each data type in the dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Necessary libraries\n",
"\n",
"To use the `summarize_dtypes_table` function, ensure the following libraries are installed:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from summarease.summarize_dtypes import summarize_dtypes_table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example dataset\n",
"\n",
"We'll use the following dataset to demonstrate the function's functionality:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"data = pd.DataFrame({\n",
" 'int_col': [1, 2, 3],\n",
" 'float_col': [1.1, 2.2, 3.3],\n",
" 'str_col': ['a', 'b', 'c'],\n",
" 'bool_col': [True, False, True]\n",
"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example usage"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" DataType | \n",
" Count | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" int64 | \n",
" 1 | \n",
"
\n",
" \n",
" | 1 | \n",
" float64 | \n",
" 1 | \n",
"
\n",
" \n",
" | 2 | \n",
" object | \n",
" 1 | \n",
"
\n",
" \n",
" | 3 | \n",
" bool | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" DataType Count\n",
"0 int64 1\n",
"1 float64 1\n",
"2 object 1\n",
"3 bool 1"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Summarize the data types in the dataset\n",
"summary = summarize_dtypes_table(data)\n",
"summary"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 2: Analyzing Sales Data\n",
"\n",
"### Scenario\n",
"\n",
"Imagine you are analyzing sales data for a company. The dataset includes columns such as `TransactionID`, `CustomerName`, `PurchaseAmount`, and `IsMember`.\n",
"\n",
"### Dataset"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"sales_data = pd.DataFrame({\n",
" 'TransactionID': [1001, 1002, 1003],\n",
" 'CustomerName': ['Alice', 'Bob', 'Charlie'],\n",
" 'PurchaseAmount': [200.5, 150.0, 300.75],\n",
" 'IsMember': [True, False, True]\n",
"})"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" DataType | \n",
" Count | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" int64 | \n",
" 1 | \n",
"
\n",
" \n",
" | 1 | \n",
" object | \n",
" 1 | \n",
"
\n",
" \n",
" | 2 | \n",
" float64 | \n",
" 1 | \n",
"
\n",
" \n",
" | 3 | \n",
" bool | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" DataType Count\n",
"0 int64 1\n",
"1 object 1\n",
"2 float64 1\n",
"3 bool 1"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales_summary = summarize_dtypes_table(sales_data)\n",
"sales_summary"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interpretation\n",
"\n",
"- `int64`: Represents integer data, such as `TransactionID`.\n",
"- `float64`: Represents floating-point data, such as `PurchaseAmount`.\n",
"- `object`: Represents string data, such as `CustomerName`.\n",
"- `bool`: Represents boolean data, such as `IsMember`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Final notes\n",
"\n",
"If you get an error or something went wrong during the usage of the function, you can always submit an issue in the github repo which will be addressed as soon as possible. Thanks for your time! "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 4
}