Over the past four years, I've automated almost every repetitive task in my daily workflow. From moving files around to fetching data from APIs, I've learned that the real magic happens when you don't just build isolated scripts — you connect them into pipelines. Think of it as an assembly line, but powered by Python.
In this article, I'll walk through how I build automation pipelines step by step. I'll cover file handling, APIs, databases, scheduling, error handling, and reporting. By the end, you'll see how to stitch everything together into systems that quietly work for you in the background.
1) Designing the Pipeline Structure
The biggest mistake I see beginners make is writing a single massive script that tries to do everything. That quickly becomes unmanageable. Instead, I design my pipeline as modular tasks that plug into each other.
Here's a simple structure I use for almost everything:
# pipeline.py
def extract():
"""Get data from source"""
pass
def transform(data):
"""Clean and process the data"""
pass
def load(data):
"""Save data to destination"""
pass
def run_pipeline():
raw_data = extract()
processed = transform(raw_data)
load(processed)
if __name__ == "__main__":
run_pipeline()This is the classic ETL (Extract, Transform, Load) pattern. It forces me to think about my automation in logical steps.
2) Automating File Handling with os and shutil
Every pipeline needs to deal with files: moving them, cleaning them, archiving them. I use os and shutil as my daily workhorses.
import os
import shutil
def archive_reports(src_folder, dest_folder):
for file in os.listdir(src_folder):
if file.endswith(".csv"):
full_path = os.path.join(src_folder, file)
shutil.move(full_path, os.path.join(dest_folder, file))
print(f"Archived {file}")
# Example usage
archive_reports("reports/daily", "reports/archive")This kind of snippet automatically organizes clutter on my desktop — or in my case, hundreds of CSV exports every month.
3) Fetching Data from APIs with requests
APIs are where pipelines really shine. Python's requests makes fetching external data as easy as reading a file.
import requests
def extract():
url = "https://jsonplaceholder.typicode.com/todos"
response = requests.get(url)
response.raise_for_status()
return response.json()
# Example test
data = extract()
print(f"Fetched {len(data)} records")Once you master APIs, you stop depending on manually downloading data. Instead, the data comes to you automatically.
4) Transforming Data with pandas
Almost every dataset needs cleaning before it's useful. That's where pandas becomes the backbone of my pipelines.
import pandas as pd
def transform(data):
df = pd.DataFrame(data)
df = df[["id", "title", "completed"]]
df["completed"] = df["completed"].astype(int) # Convert bool to int
return df
# Example usage
df = transform(data)
print(df.head())By chaining transformations, I can turn raw, messy JSON or CSV into structured data ready for analysis.
5) Loading Data into Databases with sqlalchemy
Automation isn't just about cleaning files. It's also about storing results in the right place. I rely on sqlalchemy to insert processed data into databases.
from sqlalchemy import create_engine
def load(df):
engine = create_engine("sqlite:///pipeline.db")
df.to_sql("todos", engine, if_exists="replace", index=False)
print("Data loaded into database")Now the cleaned data lives in a database where it can be queried, visualized, or connected to dashboards.
6) Scheduling Pipelines with schedule
A pipeline isn't useful if you have to run it manually every time. I schedule my scripts using the schedule library.
import schedule
import time
def run_pipeline_job():
print("Starting pipeline...")
run_pipeline()
print("Pipeline completed.")
schedule.every().day.at("07:00").do(run_pipeline_job)
while True:
schedule.run_pending()
time.sleep(1)This script wakes up every morning, runs the pipeline, and quietly delivers results before I even open my laptop.
7) Handling Errors with logging
Every pipeline breaks eventually. Logging is the difference between silently failing and knowing exactly what went wrong.
import logging
logging.basicConfig(
filename="pipeline.log",
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s"
)
def safe_run():
try:
run_pipeline()
logging.info("Pipeline completed successfully")
except Exception as e:
logging.error(f"Pipeline failed: {e}")
safe_run()Logs save me hours of debugging — especially when I'm not even awake when the script runs.
8) Generating Reports with matplotlib
I like to close the loop by creating a report or visualization from my pipeline. This way, I'm not just storing data — I'm communicating results.
import matplotlib.pyplot as plt
def generate_report(df):
df["completed"].value_counts().plot(kind="bar")
plt.title("Tasks Completed vs Not Completed")
plt.xlabel("Completed (1=Yes, 0=No)")
plt.ylabel("Count")
plt.savefig("report.png")
generate_report(df)Suddenly, a boring pipeline turns into something decision-makers can actually use.
9) Putting It All Together
Here's what a full mini-pipeline looks like when everything is stitched together:
def run_pipeline():
raw_data = extract()
processed = transform(raw_data)
load(processed)
generate_report(processed)
if __name__ == "__main__":
safe_run()And just like that, I've got a pipeline that:
- Fetches data from an API
- Cleans it with pandas
- Loads it into a database
- Logs the results
- Generates a report
This isn't a script anymore — it's a system.
Final Thoughts
The true power of Python isn't in single snippets — it's in pipelines. Once you master modular design, logging, scheduling, and reporting, you can automate entire workflows without touching them again.
My advice? Start small. Automate one annoying task this week. Then plug it into another. Over time, you'll have pipelines that feel like invisible coworkers.
"Don't automate for the sake of automating. Automate what frees your brain for deep work."
A message from our Founder
Hey, Sunil here. I wanted to take a moment to thank you for reading until the end and for being a part of this community.
Did you know that our team run these publications as a volunteer effort to over 3.5m monthly readers? We don't receive any funding, we do this to support the community. ❤️
If you want to show some love, please take a moment to follow me on LinkedIn, TikTok, Instagram. You can also subscribe to our weekly newsletter.
And before you go, don't forget to clap and follow the writer️!