Following my deployment to a Blue/Green Node.js setup behind Nginx, using only pre-built images. I had to introduce a new container to my setup: an alert watcher built with Python. that tails Nginx access logs, detects blue/green pool flips, monitors 5xx error rates, and pushes alerts to Slack when thresholds are breached. In a nutshell the goal is to make it self-observing.

In simpler terms:

When traffic shifts from blue → green, I should know.

When our service starts spitting 500s, I should also know.

The watcher was a Python script, alert_watcher.py, designed to do three main things:

  1. Tail /var/log/nginx/access.log
  2. Parse each request line, capturing the pool header (X-App-Pool) and status code
  3. Detect two events:
  • When the active pool switches between blue and green
  • When 5xx errors cross a defined threshold (e.g., 5 or more in a 10-second window)

I started off inside my blue_green_deployment repo, with the familiar Docker Compose setup nginx, app_blue, app_green and alert_watcher.

version: "3.8"

services:
  app_blue:
    image: "${BLUE_IMAGE}"
    env_file: .env
    environment:
      - APP_POOL=blue
      - RELEASE_ID=${RELEASE_ID_BLUE}
      - PORT=${PORT}
    ports:
      - "${BLUE_DIRECT_PORT}:${PORT}"
    restart: unless-stopped

  app_green:
    image: "${GREEN_IMAGE}"
    env_file: .env
    environment:
      - APP_POOL=green
      - RELEASE_ID=${RELEASE_ID_GREEN}
      - PORT=${PORT}
    ports:
      - "${GREEN_DIRECT_PORT}:${PORT}"
    restart: unless-stopped

  nginx:
    image: nginx:stable-alpine
    env_file: .env
    depends_on:
      - app_blue
      - app_green
    ports:
      - "${NGINX_PUBLIC_PORT}:80"  
    volumes:
      - ./nginx/nginx.template.conf:/etc/nginx/nginx.template.conf
      - ./nginx/entrypoint.sh:/tmp/entrypoint.sh
      - logs:/var/log/nginx   # shared volume
    command: ["sh", "-c", "chmod +x /tmp/entrypoint.sh && /tmp/entrypoint.sh"]
    restart: unless-stopped

  alert_watcher:
    build: ./watcher
    env_file: .env
    depends_on:
      - nginx
    volumes:
      - logs:/var/log/nginx
    restart: unless-stopped

volumes:
  logs:

Then I updated the nginx.template.conf file below:

user nginx;
worker_processes 1;

events {
    worker_connections 1024;
}

http {
    # upstreams: primary + backup
    upstream node_app {
        server app_blue:${PORT} max_fails=1 fail_timeout=3s;
        server app_green:${PORT} backup;
        keepalive 32;
    }

    # --- custom log format: captures pool, release, upstream status, addr, timings
    log_format structured '$remote_addr - $remote_user [$time_local] '
        '"$request" $status $body_bytes_sent '
        'pool="$upstream_http_x_app_pool|$http_x_app_pool" '
        'release="$upstream_http_x_release_id|$http_x_release_id" '
        'upstream_status="$upstream_status" upstream_addr="$upstream_addr" '
        'request_time="$request_time" upstream_response_time="$upstream_response_time" '
        'referrer="$http_referer" ua="$http_user_agent"';

    access_log /var/log/nginx/structured_access.log structured;
    error_log /var/log/nginx/error.log warn;

    server {
        listen 80;

        location / {
            # forward to upstream pool
            proxy_pass http://node_app;

            # upstream behaviour
            proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
            proxy_next_upstream_tries 2;
            proxy_connect_timeout 3s;
            proxy_send_timeout 3s;
            proxy_read_timeout 3s;

            # preserve host and forward headers
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

            # Let upstream include response headers like X-App-Pool and X-Release-Id.
            # We log them via $upstream_http_x_app_pool and $upstream_http_x_release_id.
        }
    }
}

Previously, I had Nginx acting as the gatekeeper, deciding whether Blue or Green handled traffic. Now, the focus shifted to operational visibility

To do that, I built a small watcher.py and inside it sat a simple but powerful Dockerfile. The idea was to make the watcher stateless and disposable, if it crashed, Docker would restart it immediately without affecting the main Nginx or app containers.

That design made it feel like a true sidecar: not interfering, but always observing

None

I wired everything up with environment variables in .env:

None

With that done, I ran:

docker compose up -d

and began watching the logs dance in real time.

None

Watching Logs and Reading Between the Lines

I started by simulating traffic,but first checked the status of containers ruuning:

docker compose ps
curl -v http://localhost:8080/

Then i checked the watcher logs:

docker compose logs -f alert_watcher

and confirmed responses with the X-App-Pool header:

curl -I http://localhost:8080 | grep X-App-Pool

It felt good seeing X-App-Pool: green appear in the headers — meaning Nginx was routing correctly.

Then I switched pools manually to trigger the watcher:

docker compose stop app_blue
curl -I http://localhost:8080 | grep X-App-Pool

The moment X-App-Pool flipped to blue, the watcher detected it and logged a clean alert:

[INFO] Pool flip detected: blue → green
None

That part worked beautifully. But the real test was simulating errors.

Simulating a 5xx Storm

This was the tricky part. I wanted to create enough server errors to breach the threshold and trigger a Slack alert.

After a few trials, I found a reliable approach: stop the active app container and flood Nginx with requests so it responds with 502 Bad Gateway.

For instance:

docker compose stop app_green
for i in $(seq 1 200); do
  curl -s -o /dev/null -w "%{http_code} " http://localhost:8080;
  sleep 0.05;
done; echo

Within seconds, my terminal filled with:

502 502 502 502 502 502 ...

Then, flipping over to the watcher logs:

docker compose logs -f alert_watcher

I saw the magic line appear:

[ALERT] High 5xx error rate detected: 6 errors within 10s

A few moments later, the same message appeared on Slack — clean, structured, and timestamped. The threshold logic had worked exactly as designed.

None

The Bottlenecks and Debugging Pitfalls

This stage wasn't without its pain points. Some of the bottlenecks I ran into:

Nginx Log Path Confusion: My watcher initially couldn't find /var/log/nginx/access.log because it was reading from the container's internal path, not the mounted one. I fixed it by ensuring a shared volume between nginx and alert_watcher.

Silent Failures in the Watcher: The first few runs didn't send Slack alerts because the script was silently failing on missing environment variables. I learned to add sanity checks at startup to validate required env vars.

Threshold Logic Tuning: Setting the right detection window was tricky. Too small, and you get false positives. Too large, and you miss spikes. I settled on a 10-second window with a 5-error threshold, which felt balanced for testing.

Docker Restarts: At one point, every small code change meant restarting all services — which was inefficient. I eventually learned to recreate only the watcher service:

docker compose up -d --no-deps --force-recreate alert_watcher

That kept the log flow uninterrupted while iterating quickly.

When It Finally Clicked

The real moment of clarity came when I watched both behaviors happen in sequence:

A pool flip alert when I stopped app_blue

A 5xx threshold alert when I flooded traffic during downtime

That was when the system felt "alive." It wasn't just serving; it was observing.

my main takeaway was:

  • Every deployment pipeline should have built-in feedback, not just success/failure signals.

This felt like the moment our blue/green setup grew eyes and ears. And as someone who's constantly thinking about automation and resilience, this stage reminded me that knowing when things go wrong is just as important as making things go right.