guideintermediate20 min
Introduction to Pandas
Learn the basics of pandas, Python's powerful data analysis library.
Prerequisites
Last updated: Jan 28, 2026
Pandas is a library that makes working with tabular data (like spreadsheets) easy. It's the foundation of data analysis in Python and essential for working with financial data.
Importing Pandas
python
import pandas as pd # "pd" is the standard alias
# Now use pd.something() to access pandas functionsDataFrames: The Core Concept
A DataFrame is like a spreadsheet: rows and columns of data with labels.
python
# Create from dictionary
data = {
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"city": ["NYC", "LA", "Chicago"]
}
df = pd.DataFrame(data)
print(df)
# name age city
# 0 Alice 25 NYC
# 1 Bob 30 LA
# 2 Charlie 35 ChicagoReading Data from Files
python
# Read CSV file
df = pd.read_csv("data.csv")
# Read Excel file
df = pd.read_excel("data.xlsx")
# Read from URL
df = pd.read_csv("https://example.com/data.csv")
# Quick look at the data
print(df.head()) # First 5 rows
print(df.tail()) # Last 5 rows
print(df.shape) # (rows, columns)
print(df.columns) # Column names
print(df.info()) # Data types and missing values
print(df.describe()) # Statistics for numeric columnsSelecting Data
python
# Select a column (returns Series)
ages = df["age"]
print(ages)
# Select multiple columns (returns DataFrame)
subset = df[["name", "age"]]
# Select rows by index
first_row = df.iloc[0] # By position
first_three = df.iloc[0:3] # Slice by position
# Select rows by label
row = df.loc[0] # By label (same as iloc if using default index)
# Select specific cell
value = df.iloc[0, 1] # Row 0, column 1
value = df.loc[0, "age"] # Row 0, column "age"Filtering Data
python
# Filter rows based on condition
adults = df[df["age"] >= 18]
print(adults)
# Multiple conditions (use & for AND, | for OR)
young_nyc = df[(df["age"] < 30) & (df["city"] == "NYC")]
# Filter using isin()
cities = ["NYC", "LA"]
filtered = df[df["city"].isin(cities)]
# Filter using string methods
df[df["name"].str.startswith("A")]
df[df["name"].str.contains("li")]Basic Operations
python
# Math on columns
df["age_in_months"] = df["age"] * 12
# Aggregate functions
print(df["age"].mean()) # Average
print(df["age"].sum()) # Total
print(df["age"].min()) # Minimum
print(df["age"].max()) # Maximum
print(df["age"].std()) # Standard deviation
# Value counts
print(df["city"].value_counts())
# NYC 1
# LA 1
# Chicago 1
# Sort
df_sorted = df.sort_values("age", ascending=False)Handling Missing Data
python
# Check for missing values
print(df.isna().sum()) # Count NaN per column
# Drop rows with any missing values
df_clean = df.dropna()
# Fill missing values
df_filled = df.fillna(0) # Fill with 0
df_filled = df.fillna(df["age"].mean()) # Fill with mean
# Fill forward/backward
df_filled = df.fillna(method="ffill") # Forward fillGrouping Data
python
# Group by one column
grouped = df.groupby("city")["age"].mean()
print(grouped)
# city
# Chicago 35
# LA 30
# NYC 25
# Multiple aggregations
stats = df.groupby("city")["age"].agg(["mean", "min", "max", "count"])
print(stats)Saving Data
python
# Save to CSV
df.to_csv("output.csv", index=False)
# Save to Excel
df.to_excel("output.xlsx", index=False)Practice
- Load a CSV file and display basic statistics
- Filter a dataset to show only rows matching certain criteria
- Calculate the average value per category using groupby
- Handle missing values by filling them with the column mean
- Create a new column based on calculations from existing columns