Use Pandas-Profiling on Jupyter Notebook ❤ EDA

Nov 2, 2020

Automated Exploratory Data Analysis EDA using Pandas Profiling in Jupyter

Pandas_profiling displays descriptive overview of the data sets, by showing the number of variables, observations, total missing cells, duplicate rows, memory used and the variable types. Then, it generates detailed analysis for each variable, class distributions, interactions, correlations, missing values, samples and duplicated rows, which you can observe by clicking each tab.

# Install Pandas Profiling the last version 2.9.0 on Conda
conda install -c conda-forge pandas-profiling=2.9.0

Pandas_profiling extends the general data frame report using a single line of code: df.profile_report() which interactively describes the statistics, you can read it more here.

# Import the libraries
import pandas as pd
import numpy as np
from pandas_profiling import ProfileReport

# Read your data set
df = pd.read_csv(“titanic.csv”)
df.head(10)

# Define your profile report:
profile = ProfileReport(df, title=’Pandas Profile Report’, html={‘style’:{‘full_width’:True}})
# Save your output file in html forma
profile.to_file(output_file=”titanic_report.html”)