🐍Python Programming

Pandas Tutorial

Updated 2026-05-15

30 min read

Pandas Tutorial

Introduction

Welcome to the Pandas tutorial! Pandas is a powerful open-source library in Python that provides high-performance, easy-to-use data structures and data analysis tools. It's an essential tool for anyone working with structured data, especially in fields like data science and machine learning.

In this tutorial, we'll cover the basics of creating Series and DataFrames, reading data from CSV and Excel files, selecting and filtering data using loc and iloc, handling missing values, performing groupby operations, merging/joining datasets, and conducting basic data analysis. By the end of this tutorial, you'll have a solid understanding of how to use Pandas for your data manipulation needs.

Core Content

1. Series and DataFrame Creation

A Series is a one-dimensional array-like object containing a sequence of values and an associated array of data labels, called its index. A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

Creating a Series

Python

1import pandas as pd
2 
3# Create a Series from a list
4s = pd.Series([1, 3, 5, np.nan, 6, 8])
5print(s)

Output

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Creating a DataFrame

Python

1# Create a DataFrame from a dictionary
2data = {
3  'Name': ['John', 'Anna', 'James'],
4  'Age': [28, 24, 35],
5  'City': ['New York', 'Paris', 'London']
6}
7df = pd.DataFrame(data)
8print(df)

Output

Name  Age      City
0    John   28  New York
1    Anna   24     Paris
2   James   35    London

2. Reading CSV/Excel Files

Pandas makes it easy to read data from various file formats, including CSV and Excel.

Reading a CSV File

Python

1# Read a CSV file into a DataFrame
2df = pd.read_csv('data.csv')
3print(df.head())

Output

Column1  Column2
0        A        B
1        C        D
2        E        F
3        G        H
4        I        J

Reading an Excel File

Python

1# Read an Excel file into a DataFrame
2df = pd.read_excel('data.xlsx')
3print(df.head())

Output

Column1  Column2
0        A        B
1        C        D
2        E        F
3        G        H
4        I        J

3. Selecting/Filtering Data (loc, iloc)

Pandas provides two primary indexing methods: loc for label-based indexing and iloc for position-based indexing.

Using loc

Python

1# Select rows by index label and columns by name
2filtered_df = df.loc[0:2, ['Name', 'Age']]
3print(filtered_df)

Output

Name  Age
0    John   28
1    Anna   24
2   James   35

Using iloc

Python

1# Select rows by position and columns by position
2filtered_df = df.iloc[0:3, [0, 1]]
3print(filtered_df)

Output

Name  Age
0    John   28
1    Anna   24
2   James   35

4. Handling Missing Values

Missing data is a common issue in datasets. Pandas provides several methods to handle missing values.

Checking for Missing Values

Python

1# Check for missing values in the DataFrame
2print(df.isnull().sum())

Output

Name      0
Age       0
City      0
dtype: int64

Filling Missing Values

Python

1# Fill missing values with a specific value
2df_filled = df.fillna(value=0)
3print(df_filled)

Output

Name  Age      City
0    John   28  New York
1    Anna   24     Paris
2   James   35    London

5. Groupby Operations

Grouping data is a powerful way to aggregate and analyze data.

Python

1# Group by the 'City' column and calculate the mean age
2grouped = df.groupby('City')['Age'].mean()
3print(grouped)

Output

City
London    35.0
New York  28.0
Paris     24.0
Name: Age, dtype: float64

6. Merging/Joining Datasets

Merging and joining datasets is a common task in data analysis.

Merging DataFrames

Python

1# Create two DataFrames
2df1 = pd.DataFrame({'Key': ['A', 'B', 'C'], 'Value1': [1, 2, 3]})
3df2 = pd.DataFrame({'Key': ['B', 'C', 'D'], 'Value2': [4, 5, 6]})
4 
5# Merge the DataFrames on the 'Key' column
6merged_df = pd.merge(df1, df2, on='Key')
7print(merged_df)

Output

Key  Value1  Value2
0   B       2       4
1   C       3       5

7. Basic Data Analysis

Pandas provides a variety of methods for basic data analysis.

Descriptive Statistics

Python

1# Get descriptive statistics of the DataFrame
2print(df.describe())

Output

Age
count    3.000000
mean    29.000000
std     7.071068
min     24.000000
25%     26.000000
50%     30.000000
75%     34.000000
max     35.000000

Practical Example

Let's create a complete example that demonstrates reading a CSV file, filtering data, handling missing values, performing a groupby operation, and conducting basic analysis.

Python

1import pandas as pd
2 
3# Read the dataset
4df = pd.read_csv('sales_data.csv')
5 
6# Filter data for a specific year
7filtered_df = df.loc[df['Year'] == 2020]
8 
9# Handle missing values by filling them with 0
10cleaned_df = filtered_df.fillna(value=0)
11 
12# Group by 'Region' and calculate total sales
13grouped_sales = cleaned_df.groupby('Region')['Sales'].sum()
14 
15# Print the results
16print(grouped_sales)

Summary

Concept	Description
Series	One-dimensional array-like object with labels.
DataFrame	Two-dimensional labeled data structure with columns of potentially different types.
Reading Files	Use `pd.read_csv()` for CSV and `pd.read_excel()` for Excel files.
Selecting/Filtering	Use `loc` for label-based indexing and `iloc` for position-based indexing.
Handling Missing Values	Use `isnull()`, `fillna()`, etc., to manage missing data.
Groupby	Aggregate data using the `groupby()` method.
Merging/Joining	Combine datasets using `pd.merge()` or `join()`.
Basic Data Analysis	Use methods like `describe()` for summary statistics.

What's Next?

Now that you have a solid understanding of Pandas, the next step is to explore more advanced topics such as time series analysis, pivot tables, and more complex data manipulation techniques. You can continue your learning with the "SciPy Tutorial," where we'll dive into scientific computing in Python.

Stay tuned for more tutorials and happy coding!

🐍Python Programming

Pandas Tutorial

Updated 2026-05-15

30 min read

Pandas Tutorial

Introduction

Core Content

1. Series and DataFrame Creation

Creating a Series

Python

1import pandas as pd
2 
3# Create a Series from a list
4s = pd.Series([1, 3, 5, np.nan, 6, 8])
5print(s)

Output

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Creating a DataFrame

Python

1# Create a DataFrame from a dictionary
2data = {
3  'Name': ['John', 'Anna', 'James'],
4  'Age': [28, 24, 35],
5  'City': ['New York', 'Paris', 'London']
6}
7df = pd.DataFrame(data)
8print(df)

Output

Name  Age      City
0    John   28  New York
1    Anna   24     Paris
2   James   35    London

2. Reading CSV/Excel Files

Pandas makes it easy to read data from various file formats, including CSV and Excel.

Reading a CSV File

Python

1# Read a CSV file into a DataFrame
2df = pd.read_csv('data.csv')
3print(df.head())

Output

Column1  Column2
0        A        B
1        C        D
2        E        F
3        G        H
4        I        J

Reading an Excel File

Python

1# Read an Excel file into a DataFrame
2df = pd.read_excel('data.xlsx')
3print(df.head())

Output

Column1  Column2
0        A        B
1        C        D
2        E        F
3        G        H
4        I        J

3. Selecting/Filtering Data (loc, iloc)

Pandas provides two primary indexing methods: loc for label-based indexing and iloc for position-based indexing.

Using loc

Python

1# Select rows by index label and columns by name
2filtered_df = df.loc[0:2, ['Name', 'Age']]
3print(filtered_df)

Output

Name  Age
0    John   28
1    Anna   24
2   James   35

Using iloc

Python

1# Select rows by position and columns by position
2filtered_df = df.iloc[0:3, [0, 1]]
3print(filtered_df)

Output

Name  Age
0    John   28
1    Anna   24
2   James   35

4. Handling Missing Values

Missing data is a common issue in datasets. Pandas provides several methods to handle missing values.

Checking for Missing Values

Python

1# Check for missing values in the DataFrame
2print(df.isnull().sum())

Output

Name      0
Age       0
City      0
dtype: int64

Filling Missing Values

Python

1# Fill missing values with a specific value
2df_filled = df.fillna(value=0)
3print(df_filled)

Output

Name  Age      City
0    John   28  New York
1    Anna   24     Paris
2   James   35    London

5. Groupby Operations

Grouping data is a powerful way to aggregate and analyze data.

Python

1# Group by the 'City' column and calculate the mean age
2grouped = df.groupby('City')['Age'].mean()
3print(grouped)

Output

City
London    35.0
New York  28.0
Paris     24.0
Name: Age, dtype: float64

6. Merging/Joining Datasets

Merging and joining datasets is a common task in data analysis.

Merging DataFrames

Python

1# Create two DataFrames
2df1 = pd.DataFrame({'Key': ['A', 'B', 'C'], 'Value1': [1, 2, 3]})
3df2 = pd.DataFrame({'Key': ['B', 'C', 'D'], 'Value2': [4, 5, 6]})
4 
5# Merge the DataFrames on the 'Key' column
6merged_df = pd.merge(df1, df2, on='Key')
7print(merged_df)

Output

Key  Value1  Value2
0   B       2       4
1   C       3       5

7. Basic Data Analysis

Pandas provides a variety of methods for basic data analysis.

Descriptive Statistics

Python

1# Get descriptive statistics of the DataFrame
2print(df.describe())

Output

Age
count    3.000000
mean    29.000000
std     7.071068
min     24.000000
25%     26.000000
50%     30.000000
75%     34.000000
max     35.000000

Practical Example

Let's create a complete example that demonstrates reading a CSV file, filtering data, handling missing values, performing a groupby operation, and conducting basic analysis.

Python

1import pandas as pd
2 
3# Read the dataset
4df = pd.read_csv('sales_data.csv')
5 
6# Filter data for a specific year
7filtered_df = df.loc[df['Year'] == 2020]
8 
9# Handle missing values by filling them with 0
10cleaned_df = filtered_df.fillna(value=0)
11 
12# Group by 'Region' and calculate total sales
13grouped_sales = cleaned_df.groupby('Region')['Sales'].sum()
14 
15# Print the results
16print(grouped_sales)

Summary

Concept	Description
Series	One-dimensional array-like object with labels.
DataFrame	Two-dimensional labeled data structure with columns of potentially different types.
Reading Files	Use `pd.read_csv()` for CSV and `pd.read_excel()` for Excel files.
Selecting/Filtering	Use `loc` for label-based indexing and `iloc` for position-based indexing.
Handling Missing Values	Use `isnull()`, `fillna()`, etc., to manage missing data.
Groupby	Aggregate data using the `groupby()` method.
Merging/Joining	Combine datasets using `pd.merge()` or `join()`.
Basic Data Analysis	Use methods like `describe()` for summary statistics.

What's Next?

Stay tuned for more tutorials and happy coding!