Use Postgres and pgvector for production-grade retrieval and RAG. Learn schema design, indexes, and architecture patterns that scale on a startup budget.
The first time someone suggested pgvector to me, I shrugged.
"Cute for prototypes. We'll use a real vector database in production."
A year later, we were still arguing about which vector database to standardize on… while our poor little Postgres cluster quietly powered most of the real workloads.
Here's the thing no one wants to admit in vendor decks: Postgres + pgvector is not a toy.
If you treat it like a first-class citizen — good schema, sane indexes, thoughtful architecture — it will happily serve millions of embeddings, sub-100ms queries, and RAG workloads on a very normal hardware bill.
Let's walk through how.
Why Postgres + pgvector Is More Serious Than It Looks
Many teams dismiss pgvector because they've seen it used like this:
- single table, no index, 200k rows
- cosine distance on CPU with no tuning
- random VACUUM schedule
Of course it looks slow. That's not pgvector's fault; that's "demo-driven development."
In reality, Postgres gives you three unfair advantages:
- Mature reliability story — backups, replicas, monitoring… your SRE team already knows the playbook.
- Transactional semantics — you can keep embeddings in sync with the data they represent.
- Shared infra — you avoid spinning a whole new cluster and operational surface area just for vectors.
pgvector layers vector types and indexes on top of that. Treat it like any other Postgres extension: learn its knobs, design for your workload, and it will behave.
The Minimal Production Schema
Let's ground this in something concrete.
Imagine you're building retrieval for support articles and past tickets. A pragmatic schema:
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
external_id TEXT,
source TEXT, -- 'kb', 'ticket', 'email'
title TEXT,
body TEXT,
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT now()
);
-- chunked text for retrieval
CREATE TABLE document_chunks (
id BIGSERIAL PRIMARY KEY,
document_id BIGINT REFERENCES documents(id) ON DELETE CASCADE,
chunk_index INT,
content TEXT,
embedding VECTOR(1536), -- match your model dimension
created_at TIMESTAMPTZ DEFAULT now()
);Two core ideas here:
- Separate documents from chunks. You search chunks but display documents.
- Store embeddings alongside the chunk. No extra "embeddings" microservice needed.
Then you add pgvector indexes. For example, IVF index on cosine distance:
-- Enable extension once per database
CREATE EXTENSION IF NOT EXISTS vector;
-- Choose an index type: ivfflat is a good default
CREATE INDEX ON document_chunks
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100); -- tweak based on dataset sizeYes, this is ready for production… as long as you understand what you just built.
How the Query Path Actually Works
A typical retrieval query looks like this:
-- $1 is the query embedding (vector)
SELECT document_id,
chunk_index,
content,
1 - (embedding <=> $1) AS similarity
FROM document_chunks
WHERE source = 'kb'
ORDER BY embedding <-> $1
LIMIT 10;A few details worth noting:
<->and<=>are distance operators; lower is closer.- For cosine similarity, you often transform distance into "similarity" for easier ranking.
- Narrowing by filters (
source,created_at,metadata->'lang') is where Postgres shines. Normal B-tree indexes still work.
From your app, you rarely hit this query directly. You wrap it in a retrieval service:
User query
│
▼
+----------------+ +-------------------+
| Embed Query | -----> | Postgres + pgvec |
+----------------+ +-------------------+
│ ▲
▼ │
Rerank / format document_chunks + documents
│
▼
Return top-N passagesThat's it. No separate vector DB, no extra persistence layer.
Architecture: "One Database, Two Faces"
A simple (and surprisingly scalable) pattern for startups is:
+----------------+
| Application |
| (API, UI) |
+--------+-------+
|
v
+------------+-------------+
| Retrieval / RAG Service |
+------------+-------------+
|
+------------+-------------+
| Postgres Cluster |
| OLTP + pgvector schema |
+--------------------------+For heavier workloads, you evolve into:
Writes Reads
+------------------+ +--------------------+
| Primary DB | <---- | Read Replica(s) |
| (OLTP, ingest) | | (Retrieval-heavy) |
+------------------+ +--------------------+
^ ^
| |
ETL / streaming RAG / searchYou offload vector-heavy queries to read replicas while keeping transactional updates on the primary. Postgres has had this pattern nailed for years; pgvector just rides along.
Performance Tuning Without Losing Your Weekend
You don't need a PhD to make pgvector fast, but you do need a bit of tuning.
1. Choose the right index type
ivfflat– good default for large collections, approximate search, configurable recall vs speed.hnsw(if available in your version) – great recall, often excellent latency, slightly heavier writes.- No index — acceptable only for <100k rows or offline jobs.
Rule of thumb: if your p95 query time > 150 ms with ivfflat, consider:
- increasing
lists(more buckets) - using
probesparameter at query time for better recall
SET ivfflat.probes = 10; -- higher = more accurate, slower2. Keep embeddings dense, not massive
Don't store 4,096-dim embeddings "just because the model offered them." For many workloads, 768 or 1,536 dimensions are fine and significantly cheaper to compute and store.
3. Watch IO, not just CPU
Large embeddings mean big rows. Use:
- TOAST-aware schema: keep text columns reasonable, offload raw HTML elsewhere.
- Fillfactor/partitioning: especially if you're frequently deleting/refreshing chunks.
If your bottleneck becomes disk IO, a surprisingly effective trick is separate tablespaces (fast disk for document_chunks, cheaper disk for everything else).
A Quick Case Study: Startup Support Search
A small B2B SaaS team I know had:
- ~3M support email threads
- ~50k knowledge base articles
- ~3 support engineers drowning in copy-paste
They considered a dedicated vector DB, but every option added new infra and new failure modes. Instead they:
- Added
pgvectorto their existing managed Postgres. - Nightly ETL to chunk and embed new tickets + articles.
- A small retrieval API backing:
- "similar tickets" in their support tool
- AI-generated reply suggestions
- a customer-facing semantic search bar
Hardware: one beefy primary, two read replicas. Results:
- Median retrieval latency: ~70–90 ms.
- Support handle time dropped because agents started from better context.
- Total infra bill lower than the cost of most standalone managed vector DBs at that scale.
Did they eventually outgrow it? Maybe. But it carried them through the first meaningful two years of AI features with zero "we lost the vector cluster" incidents.
When Postgres + pgvector Is the Wrong Tool
Let's be honest: sometimes you do need a dedicated vector database. Reasons might include:
- Billions of embeddings with strict low-latency SLAs.
- Multi-tenant, cross-region deployments where per-tenant isolation is more important than reuse.
- You want built-in hybrid search, document ingestion pipelines, and analytics features out of the box.
The point isn't "never use a vector DB." It's: don't start there if you already run Postgres and your problem fits comfortably within its envelope.
Getting Started Without Regretting It Later
If you're pgvector-curious, here's a practical starting checklist:
- Enable pgvector in a dev database and create a
document_chunkstable with real data. - Run a backfill of embeddings for ~100k–500k chunks.
- Add an ivfflat index and measure:
- p50/p95 query latency
- memory and CPU impact
4. Wire a minimal retrieval API, put it behind a feature flag, and dogfood internally.
5. Only after that decide whether you truly need a separate vector infra.
You'll learn more from a week of realistic pgvector experiments than from a month of reading vendor benchmarks.
Closing: The Boring Stack That Wins
In a hype cycle, it's tempting to collect new databases like Pokémon.
But the stack that quietly wins is often the boring one:
- Postgres
- pgvector
- A small retrieval service
- Good indexes
- Clear observability
If you're a startup or a team on a budget, you can ship serious retrieval and RAG features today using the database you already know how to run.
And if you are running Postgres + pgvector in production, I'd love to hear your war stories — good, bad, or "we accidentally stored 10x more dimensions than we needed."
👇 Drop a comment with your setup, follow for more pragmatic AI infra pieces, or share this with the teammate currently writing an RFP for three different vector databases.