10 Pandas Tricks to Slash Your Data Wrangling Time

Photo by Yu Wang on Unsplash

Yasser Mustafa, PhD

Dev Genius

· ~4 min read · March 27, 2025 (Updated: April 2, 2025) · Free: No

Pandas is a powerhouse for data manipulation in Python, and knowing its hidden gems can save you hours of frustration. In this article, I'll walk you through 10 practical Pandas tricks using a dummy dataset of a small online store's sales. Each trick includes code, a sample dataset, and the result — so you can see the magic in action. Let's get started!

First, here's our dummy dataset:

import pandas as pd

data = {
    'order_id': [1, 2, 3, 4, 5],
    'customer': ['Alice', 'Bob', 'Charlie', 'Dana', 'Eve'],
    'age': [25, 40, None, 33, 28],
    'sales': [100, 200, 150, 300, 50],
    'region': ['North', 'South', 'East', 'West', 'North'],
    'hobbies': [['reading', 'gaming'], ['swimming'], ['coding', 'hiking'], [], ['painting']],
    'gender': ['F', 'M', 'M', 'F', 'F']
}
df = pd.DataFrame(data)

This dataset tracks orders with customer details. Now, let's dive into the tricks!

1. Speed Up Filtering with `query()`

Boolean masks can get clunky. Use query() for a cleaner, faster alternative.

# Filter customers over 30
result = df.query('age > 30')
   order_id customer   age  sales region   hobbies      gender
1         2      Bob  40.0    200  South  [swimming]      M
3         4     Dana  33.0    300   West         []       F

It's concise and perfect for complex conditions like df.query('age>30 and sales<25').

2. Handle Missing Data with fillna() and a Dictionary

Missing values? Fill them smartly with column-specific defaults.

# Fill missing age with the mean
values = {'age': df['age'].mean()}  # Mean ≈ 31.5
df_filled = df.fillna(values)
   order_id customer   age  sales region         hobbies     gender
0         1    Alice  25.0    100  North  [reading, gaming]      F
1         2      Bob  40.0    200  South       [swimming]        M
2         3  Charlie  31.5    150   East   [coding, hiking]      M
3         4     Dana  33.0    300   West              []         F
4         5      Eve  28.0     50  North       [painting]        F

Charlie's missing age is now a sensible 31.5.

3. Chain Operations with pipe()

Avoid spaghetti code — use pipe() to chain transformations.

def clean_data(df):
    return df.dropna().reset_index(drop=True)

def discount_sales(df, col):
    df[col] = df[col] * 0.9  # 10% discount
    return df

df_piped = (df.pipe(clean_data)
            .pipe(discount_sales, 'sales'))
   order_id customer   age  sales region         hobbies     gender
0         1    Alice  25.0   90.0  North  [reading, gaming]      F
1         2      Bob  40.0  180.0  South       [swimming]        M
2         4     Dana  33.0  270.0   West              []         F
3         5      Eve  28.0   45.0  North       [painting]        F

Clean and discounted in one smooth flow!

4. Pivot Tables with pivot_table()

Summarize data like a pro with pivot_table().

pivot = df.pivot_table(values='sales', index='region', aggfunc='sum', fill_value=0)
          sales
region       
East       150
North      150
South      200
West       300

Instant insights into regional sales totals.

5. Bin Data with cut()

Group continuous data into categories with cut().

bins = [0, 30, 50, 100]
labels = ['Young', 'Middle', 'Senior']
df['age_group'] = pd.cut(df['age'], bins=bins, labels=labels)
    order_id customer   age  sales region            hobbies gender age_group
0         1    Alice  25.0    100  North  [reading, gaming]      F     Young
1         2      Bob  40.0    200  South       [swimming]        M    Middle
2         3  Charlie   NaN    150   East   [coding, hiking]      M       NaN
3         4     Dana  33.0    300   West              []         F    Middle
4         5      Eve  28.0     50  North       [painting]        F     Young

Age groups make analysis a breeze (NaN stays NaN unless filled).

6. Merge with Indicators

Track merge results with indicator=True. Let's merge with a second dataset:

df2 = pd.DataFrame({'order_id': [1, 2, 6], 'status': ['Shipped', 'Pending', 'Canceled']})
merged = df.merge(df2, on='order_id', how='outer', indicator=True)
    order_id customer   age  sales region       hobbies       gender   status      _merge
0         1    Alice  25.0  100.0  North  [reading, gaming]      F      Shipped        both
1         2      Bob  40.0  200.0  South       [swimming]        M      Pending        both
2         3  Charlie   NaN  150.0   East   [coding, hiking]      M      NaN        left_only
3         4     Dana  33.0  300.0   West              []         F      NaN        left_only
4         5      Eve  28.0   50.0  North       [painting]        F      NaN        left_only
5         6      NaN   NaN    NaN    NaN               NaN      NaN Canceled       right_only

The _merge column shows what matched and what didn't.

7. Flatten Lists with explode()

Unpack list columns into rows with explode().

df_exploded = df.explode('hobbies')
   order_id customer   age  sales region  hobbies gender age_group
0         1    Alice  25.0    100  North  reading      F     Young
0         1    Alice  25.0    100  North   gaming      F     Young
1         2      Bob  40.0    200  South swimming      M    Middle
2         3  Charlie   NaN    150   East   coding      M       NaN
2         3  Charlie   NaN    150   East   hiking      M       NaN
3         4     Dana  33.0    300   West      NaN      F    Middle
4         5      Eve  28.0     50  North painting      F     Young

Hobbies are now neatly separated.

8. Style Your DataFrames

Add flair to outputs with style.

styled = df[['customer', 'sales']].style.highlight_max(subset='sales', color='lightgreen')
# Use styled in Jupyter or styled.to_html() for export

Outcome (visual in Jupyter): Dana's 300 in sales is highlighted in light green. It's a simple way to spotlight key data.

9. Save Memory with Categorical Data

Convert repetitive columns to category type.

df['gender'] = df['gender'].astype('category')

No visible change in output, but memory usage drops (check with df.memory_usage()). For larger datasets, this is a lifesaver.

10. Rolling Calculations with rolling()

Smooth data with rolling averages.

df['sales_roll_avg'] = df['sales'].rolling(window=2, min_periods=1).mean()
   order_id customer   age  sales region            hobbies gender age_group  sales_roll_avg
0         1    Alice  25.0    100  North  [reading, gaming]    F     Young          100.0
1         2      Bob  40.0    200  South       [swimming]      M    Middle          150.0
2         3  Charlie   NaN    150   East   [coding, hiking]    M       NaN          175.0
3         4     Dana  33.0    300   West              []       F    Middle          225.0
4         5      Eve  28.0     50  North       [painting]      F     Young          175.0

A quick way to see sales trends over time.

Wrap-Up

These 10 Pandas tricks transform how you handle data — from cleaning to visualizing. With our dummy store dataset, you've seen each one in action. Apply them to your own projects, and you'll cut wrangling time while boosting clarity.

Got a favorite trick? Drop it in the comments — I'd love to hear it!

If you enjoyed this article, please consider buying me a coffee as a token of appreciation for my work: Buy me a Coffee from here

Please, feel free to contact me if you have any questions, and consider following me to stay informed about new articles.

#pandas-dataframe #pandas #data-analysis #python #data-science