Exploratory Data Analysis in Python: A Step-by-Step Guide with Sample Data

Chinna Babu Singanamala
2 min readOct 12, 2023

--

Exploratory Data Analysis (EDA) is a critical step in any data science or data analysis project. It involves understanding your data, uncovering patterns, and gaining insights that can inform your subsequent analysis. In this guide, we’ll walk through the process of performing EDA in Python using sample data to illustrate each step.

Step 1: Importing Necessary Libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

Step 2: Loading the Sample Data

# Load your dataset
df = pd.read_csv(“your_data.csv”)

Step 3: Getting an Overview of the Data

  • Use df.head() to view the first few rows of the data.
  • Use df.shape to check the dimensions of the dataset.
  • Use df.info() to get data types and missing values information.

Step 4: Descriptive Statistics

Calculate basic statistics of the numerical columns:

df.describe()

Step 5: Data Visualization

Visualizing the data is crucial to spot trends and patterns. Let’s create some basic visualizations:

# Histogram
sns.histplot(df[‘numeric_column’], kde=True)
plt.title(“Distribution of Numeric Column”)
plt.show()

# Box plot
sns.boxplot(x=’categorical_column’, y=’numeric_column’, data=df)
plt.title(“Box Plot of Numeric Column by Category”)
plt.show()

Step 6: Correlation Analysis

Understand relationships between variables using a correlation matrix:

correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap=’coolwarm’)
plt.title(“Correlation Heatmap”)
plt.show()

Step 7: Handling Missing Data

Identify and handle missing data:

# Check for missing values
df.isnull().sum()

# Handle missing values (e.g., fill with mean or median)
df[‘column_name’].fillna(df[‘column_name’].median(), inplace=True)

Step 8: Outlier Detection

Detect and handle outliers:

# Box plot or IQR method for outlier detection
sns.boxplot(df[‘numeric_column’])
plt.title(“Box Plot for Outlier Detection”)
plt.show()

Exploratory Data Analysis is an essential first step in any data analysis project. It helps you understand your data, identify patterns, and make informed decisions about how to proceed with further analysis.

--

--

Chinna Babu Singanamala
Chinna Babu Singanamala

Written by Chinna Babu Singanamala

Join me, an experienced engineer with a passion for innovation and cutting-edge technologies. Discover the latest trends and explore the digital world with me!

No responses yet