Pandas is a powerhouse for data manipulation in Python, and knowing its hidden gems can save you hours of frustration. In this article, I'll walk you through 10 practical Pandas tricks using a dummy dataset of a small online store's sales. Each trick includes code, a sample dataset, and the result — so you can see the magic in action. Let's get started!
First, here's our dummy dataset:
import pandas as pd
data = {
'order_id': [1, 2, 3, 4, 5],
'customer': ['Alice', 'Bob', 'Charlie', 'Dana', 'Eve'],
'age': [25, 40, None, 33, 28],
'sales': [100, 200, 150, 300, 50],
'region': ['North', 'South', 'East', 'West', 'North'],
'hobbies': [['reading', 'gaming'], ['swimming'], ['coding', 'hiking'], [], ['painting']],
'gender': ['F', 'M', 'M', 'F', 'F']
}
df = pd.DataFrame(data)This dataset tracks orders with customer details. Now, let's dive into the tricks!
1. Speed Up Filtering with query()
Boolean masks can get clunky. Use query() for a cleaner, faster alternative.
# Filter customers over 30
result = df.query('age > 30')
order_id customer age sales region hobbies gender
1 2 Bob 40.0 200 South [swimming] M
3 4 Dana 33.0 300 West [] FIt's concise and perfect for complex conditions like df.query('age>30 and sales<25').
2. Handle Missing Data with fillna() and a Dictionary
Missing values? Fill them smartly with column-specific defaults.
# Fill missing age with the mean
values = {'age': df['age'].mean()} # Mean ≈ 31.5
df_filled = df.fillna(values)
order_id customer age sales region hobbies gender
0 1 Alice 25.0 100 North [reading, gaming] F
1 2 Bob 40.0 200 South [swimming] M
2 3 Charlie 31.5 150 East [coding, hiking] M
3 4 Dana 33.0 300 West [] F
4 5 Eve 28.0 50 North [painting] FCharlie's missing age is now a sensible 31.5.
3. Chain Operations with pipe()
Avoid spaghetti code — use pipe() to chain transformations.
def clean_data(df):
return df.dropna().reset_index(drop=True)
def discount_sales(df, col):
df[col] = df[col] * 0.9 # 10% discount
return df
df_piped = (df.pipe(clean_data)
.pipe(discount_sales, 'sales'))
order_id customer age sales region hobbies gender
0 1 Alice 25.0 90.0 North [reading, gaming] F
1 2 Bob 40.0 180.0 South [swimming] M
2 4 Dana 33.0 270.0 West [] F
3 5 Eve 28.0 45.0 North [painting] FClean and discounted in one smooth flow!
4. Pivot Tables with pivot_table()
Summarize data like a pro with pivot_table().
pivot = df.pivot_table(values='sales', index='region', aggfunc='sum', fill_value=0)
sales
region
East 150
North 150
South 200
West 300Instant insights into regional sales totals.
5. Bin Data with cut()
Group continuous data into categories with cut().
bins = [0, 30, 50, 100]
labels = ['Young', 'Middle', 'Senior']
df['age_group'] = pd.cut(df['age'], bins=bins, labels=labels)
order_id customer age sales region hobbies gender age_group
0 1 Alice 25.0 100 North [reading, gaming] F Young
1 2 Bob 40.0 200 South [swimming] M Middle
2 3 Charlie NaN 150 East [coding, hiking] M NaN
3 4 Dana 33.0 300 West [] F Middle
4 5 Eve 28.0 50 North [painting] F YoungAge groups make analysis a breeze (NaN stays NaN unless filled).
6. Merge with Indicators
Track merge results with indicator=True. Let's merge with a second dataset:
df2 = pd.DataFrame({'order_id': [1, 2, 6], 'status': ['Shipped', 'Pending', 'Canceled']})
merged = df.merge(df2, on='order_id', how='outer', indicator=True)
order_id customer age sales region hobbies gender status _merge
0 1 Alice 25.0 100.0 North [reading, gaming] F Shipped both
1 2 Bob 40.0 200.0 South [swimming] M Pending both
2 3 Charlie NaN 150.0 East [coding, hiking] M NaN left_only
3 4 Dana 33.0 300.0 West [] F NaN left_only
4 5 Eve 28.0 50.0 North [painting] F NaN left_only
5 6 NaN NaN NaN NaN NaN NaN Canceled right_onlyThe _merge column shows what matched and what didn't.
7. Flatten Lists with explode()
Unpack list columns into rows with explode().
df_exploded = df.explode('hobbies')
order_id customer age sales region hobbies gender age_group
0 1 Alice 25.0 100 North reading F Young
0 1 Alice 25.0 100 North gaming F Young
1 2 Bob 40.0 200 South swimming M Middle
2 3 Charlie NaN 150 East coding M NaN
2 3 Charlie NaN 150 East hiking M NaN
3 4 Dana 33.0 300 West NaN F Middle
4 5 Eve 28.0 50 North painting F YoungHobbies are now neatly separated.
8. Style Your DataFrames
Add flair to outputs with style.
styled = df[['customer', 'sales']].style.highlight_max(subset='sales', color='lightgreen')
# Use styled in Jupyter or styled.to_html() for exportOutcome (visual in Jupyter): Dana's 300 in sales is highlighted in light green. It's a simple way to spotlight key data.
9. Save Memory with Categorical Data
Convert repetitive columns to category type.
df['gender'] = df['gender'].astype('category')No visible change in output, but memory usage drops (check with df.memory_usage()). For larger datasets, this is a lifesaver.
10. Rolling Calculations with rolling()
Smooth data with rolling averages.
df['sales_roll_avg'] = df['sales'].rolling(window=2, min_periods=1).mean()
order_id customer age sales region hobbies gender age_group sales_roll_avg
0 1 Alice 25.0 100 North [reading, gaming] F Young 100.0
1 2 Bob 40.0 200 South [swimming] M Middle 150.0
2 3 Charlie NaN 150 East [coding, hiking] M NaN 175.0
3 4 Dana 33.0 300 West [] F Middle 225.0
4 5 Eve 28.0 50 North [painting] F Young 175.0A quick way to see sales trends over time.
Wrap-Up
These 10 Pandas tricks transform how you handle data — from cleaning to visualizing. With our dummy store dataset, you've seen each one in action. Apply them to your own projects, and you'll cut wrangling time while boosting clarity.
Got a favorite trick? Drop it in the comments — I'd love to hear it!
If you enjoyed this article, please consider buying me a coffee as a token of appreciation for my work: Buy me a Coffee from here
Please, feel free to contact me if you have any questions, and consider following me to stay informed about new articles.