When Google Cloud sells you on machine learning, they lead with Vertex AI. It's their flagship ML platform — a unified suite promising end-to-end MLOps with minimal friction. But talk to teams running serious production ML workloads on GCP, and you'll notice something curious: they're not really using Vertex AI.
They're orchestrating with Cloud Composer. Building pipelines with Cloud Run and Cloud Build. Managing state in BigQuery and Cloud Storage. Monitoring with Cloud Logging and custom dashboards. The actual ML platform? It's assembled from GCP primitives.
This isn't an accident. It's a deliberate architectural choice driven by the reality of production ML systems.
The Common Misconception
Many blogs imply this flow:
Data → Vertex AI → Model → Endpoint → Done
That works for demos, POCs, and AutoML experiments.
But real production MLOps needs:
- CI/CD
- Versioning
- Governance
- Monitoring
- Security
- Rollbacks
- Cost control
- Auditability
Vertex AI does not solve most of these problems alone.
What Vertex AI Is Actually Good At
Let's be fair. Vertex AI is excellent at:
| Capability | What Vertex AI Does Well |
| -------------- | --------------------------------------- |
| Training | Managed training jobs (custom & AutoML) |
| Pipelines | Orchestrating ML steps |
| Model Registry | Lineage & version tracking |
| Endpoints | Managed online prediction |
| Evaluation | Metric tracking & comparisons |The Real GCP MLOps Stack (Production Reality)
Core MLOps Infrastructure (Mostly Not Vertex)
| MLOps Concern | GCP Service |
| --------------- | ----------------- |
| CI/CD | Cloud Build |
| Artifacts | Artifact Registry |
| IaC | Terraform |
| Secrets | Secret Manager |
| Access Control | IAM |
| Monitoring | Cloud Monitoring |
| Logs | Cloud Logging |
| Metadata | BigQuery |
| Data Versioning | BigQuery + GCS |
| Governance | Audit Logs |
Vertex AI plugs into this stack — it does not replace it.Reference Architecture (Mental Model)
Developer Commit
↓
Cloud Build (CI/CD)
↓
Artifact Registry (Images)
↓
Vertex AI Training Job
↓
Model Registry (Versioned)
↓
Vertex AI Endpoint
↓
Monitoring + Logging
↓
BigQuery (Metrics, Drift, Audits)End-to-End MLOps on GCP: A Simple Architecture





Architecture (How to Explain the Diagram)
Layer 1: Developer & Source Control
Purpose: Control & reproducibility
- GitHub / GitLab
- Code, Dockerfiles, pipeline specs
- Feature logic & model code
Nothing starts in Vertex AI.
Layer 2: CI/CD & Infrastructure
This is the real MLOps backbone
Services
- Cloud Build → pipeline orchestration
- Artifact Registry → Docker images
- Terraform → infrastructure as code
- Secret Manager → credentials
📌 Why this matters This layer decides:
- When training starts
- Which environment is targeted
- What version gets deployed
Vertex AI only executes what CI/CD approves.
Layer 3: Data & Metadata
Source of truth
- BigQuery → datasets, metrics, drift signals
- Cloud Storage → raw artifacts
- Feature snapshots (not feature stores)
📌 True reproducibility lives here, not in notebooks.
Layer 4: Vertex AI (EXECUTION LAYER)
What Vertex AI actually does
- Training jobs
- Pipelines (DAG execution)
- Model Registry
- Online endpoints
🚨 Important insight:
Vertex AI is a worker, not the boss.
It runs jobs — it does not control releases.
Layer 5: Deployment & Serving
Where failures hurt
- Vertex AI Endpoints
- Canary / shadow deployments
- Traffic splitting
📌 Rollbacks are driven by:
- CI/CD
- Monitoring alerts
- Human approvals
Layer 6: Observability & Governance
Enterprise-grade requirements
- Cloud Monitoring → latency, errors
- Cloud Logging → predictions & failures
- BigQuery → audits & drift analysis
- IAM & Audit Logs → compliance
This layer is mandatory in regulated industries.
End-to-End Flow
- Developer commits code
- Cloud Build triggers pipeline
- Docker image stored in Artifact Registry
- Vertex AI training job executes
- Model registered with lineage
- CI/CD promotes model
- Endpoint updated
- Monitoring validates behavior
- Rollback if anomalies detected
One-Line Diagram Insight (Highly Quotable)
Vertex AI runs models. CI/CD runs the system.
Why Teams Avoid Vertex AI
1. Vendor Lock-in at the Wrong Layer
Vertex AI's APIs are deeply GCP-specific. Once you commit, migration becomes a multi-month rewrite project.
# Vertex AI - deeply coupled to GCP
from google.cloud import aiplatform
aiplatform.init(project="my-project", location="us-central1")
model = aiplatform.Model.upload(
display_name="custom-model",
artifact_uri="gs://bucket/model",
serving_container_image_uri="gcr.io/image"
)
endpoint = model.deploy(machine_type="n1-standard-4")Compare this to a portable approach:
# Portable - works anywhere
import mlflow
from google.cloud import storage
# Log model (MLflow works on any cloud)
mlflow.sklearn.log_model(model, "model")
# Deploy anywhere
model_uri = "gs://bucket/mlflow-models/run-id/artifacts/model"
# Can deploy to GCP, AWS, Azure, or on-prem
```
### 2. **Cost Opacity and Surprise Bills**
Vertex AI's pricing is opaque. Prediction endpoints, training jobs, and managed notebooks each have complex billing structures that make cost forecasting difficult.
One team reported: *"Our Vertex AI bill jumped 3x in a month. Half the charges were for idle notebook instances we forgot to shut down."*
### 3. **Limited Flexibility in Production**
Real production ML needs:
- Custom authentication/authorization
- Fine-grained traffic control
- Integration with existing CI/CD
- Custom monitoring and alerting
- Multi-cloud or hybrid deployments
Vertex AI's managed services constrain all of these.
## The Real GCP MLOps Stack
Here's what production teams actually use:
Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ Orchestration │
│ Cloud Composer (Airflow) │
└────────────────────┬────────────────────────────────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Training │ │ Data │ │ Serving │
│ │ │Processing│ │ │
│ Compute │ │ │ │Cloud Run │
│ Engine │ │Dataflow/ │ │ or │
│ or │ │BigQuery │ │ GKE │
│Cloud Batch│ │ │ │ │
└──────────┘ └──────────┘ └──────────┘
│ │ │
└──────────────┼──────────────┘
│
▼
┌────────────────┐
│ Artifact │
│ Registry │
│ GCS + Artifact │
│ Registry │
└────────────────┘1. Orchestration: Cloud Composer
Cloud Composer (managed Airflow) handles workflow orchestration:
from airflow import DAG
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
from airflow.providers.google.cloud.operators.compute import ComputeEngineStartInstanceOperator
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'ml-team',
'retries': 2,
'retry_delay': timedelta(minutes=5),
}
with DAG(
'ml_training_pipeline',
default_args=default_args,
schedule_interval='@daily',
start_date=datetime(2025, 1, 1),
catchup=False,
) as dag:
# Extract features from BigQuery
extract_features = BigQueryInsertJobOperator(
task_id='extract_features',
configuration={
"query": {
"query": """
CREATE OR REPLACE TABLE `project.dataset.features` AS
SELECT * FROM `project.dataset.raw_data`
WHERE DATE(timestamp) = CURRENT_DATE()
""",
"useLegacySql": False,
}
}
)
# Train model on Compute Engine
def trigger_training():
from google.cloud import compute_v1
instance_client = compute_v1.InstancesClient()
# Start training instance with startup script
# that pulls code, trains model, saves to GCS
pass
train = PythonOperator(
task_id='train_model',
python_callable=trigger_training
)
extract_features >> train2. Training: Compute Engine or Cloud Batch
For training, use Compute Engine with preemptible instances or Cloud Batch for cost efficiency:
# cloud_batch_training.py
from google.cloud import batch_v1
def create_training_job(project_id: str, region: str):
client = batch_v1.BatchServiceClient()
job = batch_v1.Job()
job.task_groups = [batch_v1.TaskGroup()]
# Define training container
runnable = batch_v1.Runnable()
runnable.container = batch_v1.Runnable.Container()
runnable.container.image_uri = "gcr.io/project/ml-trainer:latest"
runnable.container.commands = [
"python", "train.py",
"--data-path", "gs://bucket/data",
"--model-output", "gs://bucket/models"
]
job.task_groups[0].task_spec.runnables = [runnable]
# Use spot VMs (80% cheaper)
job.allocation_policy.instances = [batch_v1.AllocationPolicy.InstancePolicyOrTemplate()]
job.allocation_policy.instances[0].policy.provisioning_model = (
batch_v1.AllocationPolicy.ProvisioningModel.SPOT
)
# Machine type
job.allocation_policy.instances[0].policy.machine_type = "n1-highmem-8"
create_request = batch_v1.CreateJobRequest(
parent=f"projects/{project_id}/locations/{region}",
job=job,
job_id=f"training-{datetime.now().strftime('%Y%m%d-%H%M%S')}"
)
return client.create_job(create_request)3. Model Registry: GCS + Artifact Registry
Simple, portable model versioning:
import mlflow
from google.cloud import storage
import json
class GCSModelRegistry:
def __init__(self, bucket_name: str):
self.bucket_name = bucket_name
self.client = storage.Client()
def register_model(self, model, model_name: str, version: str, metadata: dict):
"""Register model with metadata"""
bucket = self.client.bucket(self.bucket_name)
# Save model
model_path = f"models/{model_name}/{version}/model.pkl"
blob = bucket.blob(model_path)
import pickle
blob.upload_from_string(pickle.dumps(model))
# Save metadata
metadata_path = f"models/{model_name}/{version}/metadata.json"
metadata_blob = bucket.blob(metadata_path)
metadata_blob.upload_from_string(json.dumps(metadata))
# Update latest pointer
latest_blob = bucket.blob(f"models/{model_name}/latest.txt")
latest_blob.upload_from_string(version)
return f"gs://{self.bucket_name}/{model_path}"
def load_model(self, model_name: str, version: str = "latest"):
"""Load model from registry"""
bucket = self.client.bucket(self.bucket_name)
if version == "latest":
latest_blob = bucket.blob(f"models/{model_name}/latest.txt")
version = latest_blob.download_as_text().strip()
model_blob = bucket.blob(f"models/{model_name}/{version}/model.pkl")
import pickle
return pickle.loads(model_blob.download_as_bytes())
# Usage
registry = GCSModelRegistry("ml-models-bucket")
# Register
registry.register_model(
model=trained_model,
model_name="fraud-detector",
version="v1.2.0",
metadata={
"accuracy": 0.94,
"training_date": "2025-01-15",
"framework": "sklearn",
"features": ["amount", "merchant_category", "hour_of_day"]
}
)4. Serving: Cloud Run for REST APIs
Cloud Run provides auto-scaling, serverless inference:
# app.py - FastAPI serving on Cloud Run
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from google.cloud import storage
import pickle
import numpy as np
app = FastAPI()
# Load model at startup
@app.on_event("startup")
async def load_model():
global model
client = storage.Client()
bucket = client.bucket("ml-models-bucket")
# Get latest version
latest_blob = bucket.blob("models/fraud-detector/latest.txt")
version = latest_blob.download_as_text().strip()
# Load model
model_blob = bucket.blob(f"models/fraud-detector/{version}/model.pkl")
model = pickle.loads(model_blob.download_as_bytes())
print(f"Loaded model version: {version}")
class PredictionRequest(BaseModel):
features: list[float]
class PredictionResponse(BaseModel):
prediction: int
probability: float
model_version: str
@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
try:
features = np.array(request.features).reshape(1, -1)
prediction = model.predict(features)[0]
probability = model.predict_proba(features)[0][1]
return PredictionResponse(
prediction=int(prediction),
probability=float(probability),
model_version="v1.2.0"
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/health")
async def health():
return {"status": "healthy"}Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py .
# Cloud Run expects port 8080
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]Deploy:
# Build and push
gcloud builds submit --tag gcr.io/PROJECT_ID/fraud-detector
# Deploy to Cloud Run
gcloud run deploy fraud-detector \
--image gcr.io/PROJECT_ID/fraud-detector \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--memory 2Gi \
--cpu 2 \
--min-instances 0 \
--max-instances 1005. Monitoring: Cloud Logging + Custom Metrics
from google.cloud import logging
from google.cloud import monitoring_v3
import time
class ModelMonitoring:
def __init__(self, project_id: str):
self.project_id = project_id
self.logging_client = logging.Client()
self.metrics_client = monitoring_v3.MetricServiceClient()
self.logger = self.logging_client.logger("model-predictions")
def log_prediction(self, features, prediction, probability, latency_ms):
"""Log prediction for monitoring"""
self.logger.log_struct({
"features": features,
"prediction": prediction,
"probability": probability,
"latency_ms": latency_ms,
"timestamp": time.time()
})
def write_custom_metric(self, metric_name: str, value: float):
"""Write custom metric for alerting"""
project_name = f"projects/{self.project_id}"
series = monitoring_v3.TimeSeries()
series.metric.type = f"custom.googleapis.com/{metric_name}"
now = time.time()
seconds = int(now)
nanos = int((now - seconds) * 10 ** 9)
interval = monitoring_v3.TimeInterval(
{"end_time": {"seconds": seconds, "nanos": nanos}}
)
point = monitoring_v3.Point({
"interval": interval,
"value": {"double_value": value}
})
series.points = [point]
self.metrics_client.create_time_series(
name=project_name,
time_series=[series]
)
# Usage in prediction endpoint
monitor = ModelMonitoring("my-project")
@app.post("/predict")
async def predict(request: PredictionRequest):
start_time = time.time()
# Make prediction
prediction = model.predict(features)[0]
probability = model.predict_proba(features)[0][1]
latency_ms = (time.time() - start_time) * 1000
# Log for monitoring
monitor.log_prediction(
features=request.features,
prediction=int(prediction),
probability=float(probability),
latency_ms=latency_ms
)
# Track metrics
monitor.write_custom_metric("prediction_latency", latency_ms)
monitor.write_custom_metric("fraud_probability", probability)
return PredictionResponse(...)Vertex AI vs. DIY GCP: The Comparison

When to Use Vertex AI
Vertex AI does make sense for:
- Rapid Prototyping — Need a demo in 48 hours? Vertex AI's AutoML gets you there fast.
- Small Teams — 2–3 person ML teams without dedicated MLOps engineers.
- Experimental Projects — POCs where cost and lock-in don't matter.
- GCP-Only Shops — If you're 100% committed to GCP forever.
The Real Cost Savings
A mid-sized ML team running 5 models in production with 1M predictions/day:
Vertex AI Approach:
- Prediction endpoints: $5,000/month
- Training (AutoML): $3,000/month
- Notebooks: $1,200/month
- Total: ~$9,200/month
DIY GCP Approach:
- Cloud Run serving: $800/month
- Cloud Batch training (spot): $400/month
- Composer orchestration: $300/month
- Storage + logging: $200/month
- Total: ~$1,700/month
Savings: $7,500/month or $90,000/year
Why CI/CD Is the Real Heart of MLOps
Vertex AI does not:
- Trigger builds from Git
- Enforce approval gates
- Handle environment promotion
- Manage infra drift
That's Cloud Build + Terraform.
Example (simplified Cloud Build logic)
steps:
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/project/train:latest', '.']
- name: 'gcr.io/cloud-builders/gcloud'
args:
- ai
- custom-jobs
- create
- - region=us-central1
- - config=vertex_job.yaml
This is real MLOps, not notebook clicks.Model Registry ≠ Deployment Strategy
Vertex AI Model Registry:
- Tracks lineage
- Stores versions
- Captures metrics
But it does not:
- Decide when to deploy
- Perform canary releases
- Roll back automatically
Those decisions belong to:
- CI/CD pipelines
- Release strategies
- Human approvals
Monitoring: Vertex AI Is Only One Signal
Vertex AI gives:
- Prediction latency
- Error rates
- Basic drift metrics
Production MLOps needs:
- Data drift (BigQuery)
- Concept drift
- Cost anomalies
- Behavioral anomalies
- Silent failure detection
Most teams:
- Export logs to BigQuery
- Build dashboards manually
- Alert via Cloud Monitoring
Security & IAM: The Hardest Part (And Least Written About)
In real enterprises:
- Pipelines run under service accounts
- Training jobs need data access
- Endpoints need restricted invocation
- Humans need read-only visibility
IAM complexity grows faster than models.
This is where most Vertex AI deployments break.
The Bottom Line
Vertex AI is Google's vision of what MLOps should be — simplified, integrated, managed. But production ML is messy. It needs custom authentication, complex deployment strategies, fine-grained cost control, and the ability to move workloads across clouds.
The teams building real production ML systems on GCP have figured this out. They use Composer for orchestration, Cloud Batch for training, Cloud Run for serving, and GCS for artifacts. They get better cost predictability, more flexibility, and avoid deep vendor lock-in.
Vertex AI is training wheels. True MLOps on GCP means taking them off and building with the platform's core infrastructure primitives.
Thank you for diving into this post. I hope this content helps in better understanding. If the content helped you, your claps and a follow on Medium would mean a lot — they help this knowledge reach more readers and keep me motivated to write more. Really appreciate your time and support!