True MLOps on GCP Is Mostly Not Vertex AI

When Google Cloud sells you on machine learning, they lead with Vertex AI. It's their flagship ML platform — a unified suite promising…

Rashmi

Towards AI

· ~10 min read · December 16, 2025 (Updated: December 16, 2025) · Free: No

When Google Cloud sells you on machine learning, they lead with Vertex AI. It's their flagship ML platform — a unified suite promising end-to-end MLOps with minimal friction. But talk to teams running serious production ML workloads on GCP, and you'll notice something curious: they're not really using Vertex AI.

They're orchestrating with Cloud Composer. Building pipelines with Cloud Run and Cloud Build. Managing state in BigQuery and Cloud Storage. Monitoring with Cloud Logging and custom dashboards. The actual ML platform? It's assembled from GCP primitives.

This isn't an accident. It's a deliberate architectural choice driven by the reality of production ML systems.

The Common Misconception

Many blogs imply this flow:

Data → Vertex AI → Model → Endpoint → Done

That works for demos, POCs, and AutoML experiments.

But real production MLOps needs:

CI/CD
Versioning
Governance
Monitoring
Security
Rollbacks
Cost control
Auditability

Vertex AI does not solve most of these problems alone.

What Vertex AI Is Actually Good At

Let's be fair. Vertex AI is excellent at:

| Capability     | What Vertex AI Does Well                |
| -------------- | --------------------------------------- |
| Training       | Managed training jobs (custom & AutoML) |
| Pipelines      | Orchestrating ML steps                  |
| Model Registry | Lineage & version tracking              |
| Endpoints      | Managed online prediction               |
| Evaluation     | Metric tracking & comparisons           |

The Real GCP MLOps Stack (Production Reality)

Core MLOps Infrastructure (Mostly Not Vertex)

| MLOps Concern   | GCP Service       |
| --------------- | ----------------- |
| CI/CD           | Cloud Build       |
| Artifacts       | Artifact Registry |
| IaC             | Terraform         |
| Secrets         | Secret Manager    |
| Access Control  | IAM               |
| Monitoring      | Cloud Monitoring  |
| Logs            | Cloud Logging     |
| Metadata        | BigQuery          |
| Data Versioning | BigQuery + GCS    |
| Governance      | Audit Logs        |

Vertex AI plugs into this stack — it does not replace it.

Reference Architecture (Mental Model)

Developer Commit
      ↓
Cloud Build (CI/CD)
      ↓
Artifact Registry (Images)
      ↓
Vertex AI Training Job
      ↓
Model Registry (Versioned)
      ↓
Vertex AI Endpoint
      ↓
Monitoring + Logging
      ↓
BigQuery (Metrics, Drift, Audits)

End-to-End MLOps on GCP: A Simple Architecture

Architecture (How to Explain the Diagram)

Layer 1: Developer & Source Control

Purpose: Control & reproducibility

GitHub / GitLab
Code, Dockerfiles, pipeline specs
Feature logic & model code

Nothing starts in Vertex AI.

Layer 2: CI/CD & Infrastructure

This is the real MLOps backbone

Services

Cloud Build → pipeline orchestration
Artifact Registry → Docker images
Terraform → infrastructure as code
Secret Manager → credentials

📌 Why this matters This layer decides:

When training starts
Which environment is targeted
What version gets deployed

Vertex AI only executes what CI/CD approves.

Layer 3: Data & Metadata

Source of truth

BigQuery → datasets, metrics, drift signals
Cloud Storage → raw artifacts
Feature snapshots (not feature stores)

📌 True reproducibility lives here, not in notebooks.

Layer 4: Vertex AI (EXECUTION LAYER)

What Vertex AI actually does

Training jobs
Pipelines (DAG execution)
Model Registry
Online endpoints

🚨 Important insight:

Vertex AI is a worker, not the boss.

It runs jobs — it does not control releases.

Layer 5: Deployment & Serving

Where failures hurt

Vertex AI Endpoints
Canary / shadow deployments
Traffic splitting

📌 Rollbacks are driven by:

CI/CD
Monitoring alerts
Human approvals

Layer 6: Observability & Governance

Enterprise-grade requirements

Cloud Monitoring → latency, errors
Cloud Logging → predictions & failures
BigQuery → audits & drift analysis
IAM & Audit Logs → compliance

This layer is mandatory in regulated industries.

End-to-End Flow

Developer commits code
Cloud Build triggers pipeline
Docker image stored in Artifact Registry
Vertex AI training job executes
Model registered with lineage
CI/CD promotes model
Endpoint updated
Monitoring validates behavior
Rollback if anomalies detected

One-Line Diagram Insight (Highly Quotable)

Vertex AI runs models. CI/CD runs the system.

Why Teams Avoid Vertex AI

1. Vendor Lock-in at the Wrong Layer

Vertex AI's APIs are deeply GCP-specific. Once you commit, migration becomes a multi-month rewrite project.

# Vertex AI - deeply coupled to GCP
from google.cloud import aiplatform
aiplatform.init(project="my-project", location="us-central1")
model = aiplatform.Model.upload(
    display_name="custom-model",
    artifact_uri="gs://bucket/model",
    serving_container_image_uri="gcr.io/image"
)
endpoint = model.deploy(machine_type="n1-standard-4")

Compare this to a portable approach:

# Portable - works anywhere
import mlflow
from google.cloud import storage
# Log model (MLflow works on any cloud)
mlflow.sklearn.log_model(model, "model")
# Deploy anywhere
model_uri = "gs://bucket/mlflow-models/run-id/artifacts/model"
# Can deploy to GCP, AWS, Azure, or on-prem
```
### 2. **Cost Opacity and Surprise Bills**
Vertex AI's pricing is opaque. Prediction endpoints, training jobs, and managed notebooks each have complex billing structures that make cost forecasting difficult.
One team reported: *"Our Vertex AI bill jumped 3x in a month. Half the charges were for idle notebook instances we forgot to shut down."*
### 3. **Limited Flexibility in Production**
Real production ML needs:
- Custom authentication/authorization
- Fine-grained traffic control
- Integration with existing CI/CD
- Custom monitoring and alerting
- Multi-cloud or hybrid deployments
Vertex AI's managed services constrain all of these.
## The Real GCP MLOps Stack
Here's what production teams actually use:
Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                     Orchestration                        │
│              Cloud Composer (Airflow)                    │
└────────────────────┬────────────────────────────────────┘
                     │
      ┌──────────────┼──────────────┐
      │              │              │
      ▼              ▼              ▼
┌──────────┐  ┌──────────┐  ┌──────────┐
│ Training │  │   Data   │  │  Serving │
│          │  │Processing│  │          │
│ Compute  │  │          │  │Cloud Run │
│  Engine  │  │Dataflow/ │  │   or     │
│   or     │  │BigQuery  │  │   GKE    │
│Cloud Batch│ │          │  │          │
└──────────┘  └──────────┘  └──────────┘
      │              │              │
      └──────────────┼──────────────┘
                     │
                     ▼
            ┌────────────────┐
            │   Artifact     │
            │   Registry     │
            │ GCS + Artifact │
            │   Registry     │
            └────────────────┘

1. Orchestration: Cloud Composer

Cloud Composer (managed Airflow) handles workflow orchestration:

from airflow import DAG
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
from airflow.providers.google.cloud.operators.compute import ComputeEngineStartInstanceOperator
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
default_args = {
    'owner': 'ml-team',
    'retries': 2,
    'retry_delay': timedelta(minutes=5),
}
with DAG(
    'ml_training_pipeline',
    default_args=default_args,
    schedule_interval='@daily',
    start_date=datetime(2025, 1, 1),
    catchup=False,
) as dag:
    
    # Extract features from BigQuery
    extract_features = BigQueryInsertJobOperator(
        task_id='extract_features',
        configuration={
            "query": {
                "query": """
                    CREATE OR REPLACE TABLE `project.dataset.features` AS
                    SELECT * FROM `project.dataset.raw_data`
                    WHERE DATE(timestamp) = CURRENT_DATE()
                """,
                "useLegacySql": False,
            }
        }
    )
    
    # Train model on Compute Engine
    def trigger_training():
        from google.cloud import compute_v1
        
        instance_client = compute_v1.InstancesClient()
        # Start training instance with startup script
        # that pulls code, trains model, saves to GCS
        pass
    
    train = PythonOperator(
        task_id='train_model',
        python_callable=trigger_training
    )
    
    extract_features >> train

2. Training: Compute Engine or Cloud Batch

For training, use Compute Engine with preemptible instances or Cloud Batch for cost efficiency:

# cloud_batch_training.py
from google.cloud import batch_v1
def create_training_job(project_id: str, region: str):
    client = batch_v1.BatchServiceClient()
    
    job = batch_v1.Job()
    job.task_groups = [batch_v1.TaskGroup()]
    
    # Define training container
    runnable = batch_v1.Runnable()
    runnable.container = batch_v1.Runnable.Container()
    runnable.container.image_uri = "gcr.io/project/ml-trainer:latest"
    runnable.container.commands = [
        "python", "train.py",
        "--data-path", "gs://bucket/data",
        "--model-output", "gs://bucket/models"
    ]
    
    job.task_groups[0].task_spec.runnables = [runnable]
    
    # Use spot VMs (80% cheaper)
    job.allocation_policy.instances = [batch_v1.AllocationPolicy.InstancePolicyOrTemplate()]
    job.allocation_policy.instances[0].policy.provisioning_model = (
        batch_v1.AllocationPolicy.ProvisioningModel.SPOT
    )
    
    # Machine type
    job.allocation_policy.instances[0].policy.machine_type = "n1-highmem-8"
    
    create_request = batch_v1.CreateJobRequest(
        parent=f"projects/{project_id}/locations/{region}",
        job=job,
        job_id=f"training-{datetime.now().strftime('%Y%m%d-%H%M%S')}"
    )
    
    return client.create_job(create_request)

3. Model Registry: GCS + Artifact Registry

Simple, portable model versioning:

import mlflow
from google.cloud import storage
import json
class GCSModelRegistry:
    def __init__(self, bucket_name: str):
        self.bucket_name = bucket_name
        self.client = storage.Client()
        
    def register_model(self, model, model_name: str, version: str, metadata: dict):
        """Register model with metadata"""
        bucket = self.client.bucket(self.bucket_name)
        
        # Save model
        model_path = f"models/{model_name}/{version}/model.pkl"
        blob = bucket.blob(model_path)
        
        import pickle
        blob.upload_from_string(pickle.dumps(model))
        
        # Save metadata
        metadata_path = f"models/{model_name}/{version}/metadata.json"
        metadata_blob = bucket.blob(metadata_path)
        metadata_blob.upload_from_string(json.dumps(metadata))
        
        # Update latest pointer
        latest_blob = bucket.blob(f"models/{model_name}/latest.txt")
        latest_blob.upload_from_string(version)
        
        return f"gs://{self.bucket_name}/{model_path}"
    
    def load_model(self, model_name: str, version: str = "latest"):
        """Load model from registry"""
        bucket = self.client.bucket(self.bucket_name)
        
        if version == "latest":
            latest_blob = bucket.blob(f"models/{model_name}/latest.txt")
            version = latest_blob.download_as_text().strip()
        
        model_blob = bucket.blob(f"models/{model_name}/{version}/model.pkl")
        import pickle
        return pickle.loads(model_blob.download_as_bytes())
# Usage
registry = GCSModelRegistry("ml-models-bucket")
# Register
registry.register_model(
    model=trained_model,
    model_name="fraud-detector",
    version="v1.2.0",
    metadata={
        "accuracy": 0.94,
        "training_date": "2025-01-15",
        "framework": "sklearn",
        "features": ["amount", "merchant_category", "hour_of_day"]
    }
)

4. Serving: Cloud Run for REST APIs

Cloud Run provides auto-scaling, serverless inference:

# app.py - FastAPI serving on Cloud Run
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from google.cloud import storage
import pickle
import numpy as np
app = FastAPI()
# Load model at startup
@app.on_event("startup")
async def load_model():
    global model
    client = storage.Client()
    bucket = client.bucket("ml-models-bucket")
    
    # Get latest version
    latest_blob = bucket.blob("models/fraud-detector/latest.txt")
    version = latest_blob.download_as_text().strip()
    
    # Load model
    model_blob = bucket.blob(f"models/fraud-detector/{version}/model.pkl")
    model = pickle.loads(model_blob.download_as_bytes())
    print(f"Loaded model version: {version}")
class PredictionRequest(BaseModel):
    features: list[float]
class PredictionResponse(BaseModel):
    prediction: int
    probability: float
    model_version: str
@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    try:
        features = np.array(request.features).reshape(1, -1)
        prediction = model.predict(features)[0]
        probability = model.predict_proba(features)[0][1]
        
        return PredictionResponse(
            prediction=int(prediction),
            probability=float(probability),
            model_version="v1.2.0"
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
@app.get("/health")
async def health():
    return {"status": "healthy"}

Dockerfile:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py .
# Cloud Run expects port 8080
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

Deploy:

# Build and push
gcloud builds submit --tag gcr.io/PROJECT_ID/fraud-detector
# Deploy to Cloud Run
gcloud run deploy fraud-detector \
  --image gcr.io/PROJECT_ID/fraud-detector \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --memory 2Gi \
  --cpu 2 \
  --min-instances 0 \
  --max-instances 100

5. Monitoring: Cloud Logging + Custom Metrics

from google.cloud import logging
from google.cloud import monitoring_v3
import time
class ModelMonitoring:
    def __init__(self, project_id: str):
        self.project_id = project_id
        self.logging_client = logging.Client()
        self.metrics_client = monitoring_v3.MetricServiceClient()
        self.logger = self.logging_client.logger("model-predictions")
        
    def log_prediction(self, features, prediction, probability, latency_ms):
        """Log prediction for monitoring"""
        self.logger.log_struct({
            "features": features,
            "prediction": prediction,
            "probability": probability,
            "latency_ms": latency_ms,
            "timestamp": time.time()
        })
        
    def write_custom_metric(self, metric_name: str, value: float):
        """Write custom metric for alerting"""
        project_name = f"projects/{self.project_id}"
        
        series = monitoring_v3.TimeSeries()
        series.metric.type = f"custom.googleapis.com/{metric_name}"
        
        now = time.time()
        seconds = int(now)
        nanos = int((now - seconds) * 10 ** 9)
        
        interval = monitoring_v3.TimeInterval(
            {"end_time": {"seconds": seconds, "nanos": nanos}}
        )
        
        point = monitoring_v3.Point({
            "interval": interval,
            "value": {"double_value": value}
        })
        
        series.points = [point]
        self.metrics_client.create_time_series(
            name=project_name, 
            time_series=[series]
        )
# Usage in prediction endpoint
monitor = ModelMonitoring("my-project")
@app.post("/predict")
async def predict(request: PredictionRequest):
    start_time = time.time()
    
    # Make prediction
    prediction = model.predict(features)[0]
    probability = model.predict_proba(features)[0][1]
    
    latency_ms = (time.time() - start_time) * 1000
    
    # Log for monitoring
    monitor.log_prediction(
        features=request.features,
        prediction=int(prediction),
        probability=float(probability),
        latency_ms=latency_ms
    )
    
    # Track metrics
    monitor.write_custom_metric("prediction_latency", latency_ms)
    monitor.write_custom_metric("fraud_probability", probability)
    
    return PredictionResponse(...)

Vertex AI vs. DIY GCP: The Comparison

When to Use Vertex AI

Vertex AI does make sense for:

Rapid Prototyping — Need a demo in 48 hours? Vertex AI's AutoML gets you there fast.
Small Teams — 2–3 person ML teams without dedicated MLOps engineers.
Experimental Projects — POCs where cost and lock-in don't matter.
GCP-Only Shops — If you're 100% committed to GCP forever.

The Real Cost Savings

A mid-sized ML team running 5 models in production with 1M predictions/day:

Vertex AI Approach:

Prediction endpoints: $5,000/month
Training (AutoML): $3,000/month
Notebooks: $1,200/month
Total: ~$9,200/month

DIY GCP Approach:

Cloud Run serving: $800/month
Cloud Batch training (spot): $400/month
Composer orchestration: $300/month
Storage + logging: $200/month
Total: ~$1,700/month

Savings: $7,500/month or $90,000/year

Why CI/CD Is the Real Heart of MLOps

Vertex AI does not:

Trigger builds from Git
Enforce approval gates
Handle environment promotion
Manage infra drift

That's Cloud Build + Terraform.

Example (simplified Cloud Build logic)
steps:
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/project/train:latest', '.']
- name: 'gcr.io/cloud-builders/gcloud'
args:
- ai
- custom-jobs
- create
- - region=us-central1
- - config=vertex_job.yaml
This is real MLOps, not notebook clicks.

Model Registry ≠ Deployment Strategy

Vertex AI Model Registry:

Tracks lineage
Stores versions
Captures metrics

But it does not:

Decide when to deploy
Perform canary releases
Roll back automatically

Those decisions belong to:

CI/CD pipelines
Release strategies
Human approvals

Monitoring: Vertex AI Is Only One Signal

Vertex AI gives:

Prediction latency
Error rates
Basic drift metrics

Production MLOps needs:

Data drift (BigQuery)
Concept drift
Cost anomalies
Behavioral anomalies
Silent failure detection

Most teams:

Export logs to BigQuery
Build dashboards manually
Alert via Cloud Monitoring

Security & IAM: The Hardest Part (And Least Written About)

In real enterprises:

Pipelines run under service accounts
Training jobs need data access
Endpoints need restricted invocation
Humans need read-only visibility

IAM complexity grows faster than models.

This is where most Vertex AI deployments break.

The Bottom Line

Vertex AI is Google's vision of what MLOps should be — simplified, integrated, managed. But production ML is messy. It needs custom authentication, complex deployment strategies, fine-grained cost control, and the ability to move workloads across clouds.

The teams building real production ML systems on GCP have figured this out. They use Composer for orchestration, Cloud Batch for training, Cloud Run for serving, and GCS for artifacts. They get better cost predictability, more flexibility, and avoid deep vendor lock-in.

Vertex AI is training wheels. True MLOps on GCP means taking them off and building with the platform's core infrastructure primitives.

Thank you for diving into this post. I hope this content helps in better understanding. If the content helped you, your claps and a follow on Medium would mean a lot — they help this knowledge reach more readers and keep me motivated to write more. Really appreciate your time and support!

#vertex-ai #vertex #gcp #mlops #mlops-platform