Use Pandas-Profiling on Jupyter Notebook ❤ EDA

Wajdi HAJJI
Nov 2, 2020

Automated Exploratory Data Analysis EDA using Pandas Profiling in Jupyter

Pandas Profiling

Pandas_profiling displays descriptive overview of the data sets, by showing the number of variables, observations, total missing cells, duplicate rows, memory used and the variable types. Then, it generates detailed analysis for each variable, class distributions, interactions, correlations, missing values, samples and duplicated rows, which you can observe by clicking each tab.

# Install Pandas Profiling the last version 2.9.0 on Conda
conda install -c conda-forge pandas-profiling=2.9.0

Pandas_profiling extends the general data frame report using a single line of code: df.profile_report() which interactively describes the statistics, you can read it more here.

# Import the libraries
import pandas as pd
import numpy as np
from pandas_profiling import ProfileReport

# Read your data set
df = pd.read_csv(“titanic.csv”)
df.head(10)

show dataset ‘titanic.csv’

# Define your profile report:
profile = ProfileReport(df, title=’Pandas Profile Report’, html={‘style’:{‘full_width’:True}})
# Save your output file in html forma
profile.to_file(output_file=”titanic_report.html”)

Variables (columns) and Observations (rows)
(EDA) Data Profiling
Variables
(EDA) Data Profiling — Variables
Correlations
(EDA) Data Profiling — Correlations
Missing Valuers
(EDA) Data Profiling — Missing Valuers

I hope this will help you to play around with Pandas profiling.
Happy exploring!

--

--

Wajdi HAJJI

Data Scientist and Machine Learning Enthusiast ❤❤❤