The Chunking Dilemma

Remember when we talked about chunking? We had a problem:

Small chunks:

"Einstein published his theory in 1905."
  • ✅ Precise matching (easy to find with "Einstein 1905")
  • ❌ Missing context (which theory? what was the impact?)

Large chunks:

[Entire 2000-word biography section]
  • ✅ Full context
  • ❌ Poor matching (query about "1905" buried in text)

We want both! Precise search + complete context.

The Parent-Child Solution

The idea: Index small chunks, but return large parent sections.

Parent Document: "Einstein Biography - Chapter 3"
├─ Child Chunk 1: "Einstein was born in 1879..."
├─ Child Chunk 2: "He published relativity in 1905..." ← Match!
└─ Child Chunk 3: "Won Nobel Prize in 1921..."

Query: "Einstein 1905 theory"
Search Result: Child Chunk 2 (precise match)
LLM Receives: Entire Chapter 3 (full context)

How It Works

Step 1: Create Parent-Child Structure

When chunking your documents:

{
    "doc_id": "einstein_bio",          # Parent
    "chunk_id": "einstein_bio:::2",    # Child (2nd chunk)
    "text": "He published relativity in 1905...",
    "meta": {
        "source": "biography.pdf",
        "parent_id": "einstein_bio"    # Link to parent
    }
}

Step 2: Index the Children

# Build index with SMALL chunks (1-2 sentences)
chunks = build_store(docs, strategy="sentence", sentences_per_chunk=2)
dense_idx.build(chunks)

Step 3: Retrieve Children, Return Parents

def attach_parent_sections(retrieved_chunks, all_chunks, max_chars=1200):
    """For each retrieved chunk, find and attach parent context."""
    
    # Group all chunks by parent document
    by_parent = defaultdict(list)
    for chunk in all_chunks:
        parent_id = chunk.get("meta", {}).get("parent_id")
        if parent_id:
            by_parent[parent_id].append(chunk)
    
    enriched = []
    for chunk in retrieved_chunks:
        # Get parent document ID
        parent_id = chunk.get("meta", {}).get("parent_id")
        
        if not parent_id:
            # No parent, just return original chunk
            enriched.append(chunk)
            continue
        
        # Get all sibling chunks from same parent
        siblings = by_parent.get(parent_id, [])
        
        # Combine sibling chunks into parent context
        parent_text = " ".join([s["text"] for s in siblings])
        
        # Truncate if too long
        if len(parent_text) > max_chars:
            parent_text = parent_text[:max_chars] + "..."
        
        # Attach parent context
        chunk["parent_context"] = parent_text
        enriched.append(chunk)
    
    return enriched

Visual Example

Original Document Structure:

Document: "Machine Learning Tutorial"

Chapter 1: Introduction
├─ Chunk 1-1: "ML is a subset of AI..."
├─ Chunk 1-2: "It learns from data..."
└─ Chunk 1-3: "Common applications include..."

Chapter 2: Supervised Learning
├─ Chunk 2-1: "Supervised learning uses labeled data..."
├─ Chunk 2-2: "Examples include classification..."
└─ Chunk 2-3: "Linear regression is the simplest..."

Query: "What is supervised learning?"

Without Parent-Child:

Retrieved: Chunk 2-1 "Supervised learning uses labeled data..."
LLM Sees: Just that one sentence
Answer: Limited, might miss important details

With Parent-Child:

Retrieved: Chunk 2-1 (matched query)
LLM Sees: All of Chapter 2 (Chunks 2-1, 2-2, 2-3 combined)
Answer: Complete explanation with examples and context ✅

Sentence Window: A Simpler Alternative

If you don't have a clear parent-child structure, use sentence window expansion:

def sentence_window_expand(retrieved_chunks, window_sentences=2):
    """Expand each chunk by N sentences before/after."""
    
    expanded = []
    for chunk in retrieved_chunks:
        text = chunk["text"]
        
        # Split full document into sentences
        # (In real code, you'd need the full document reference)
        sentences = sent_tokenize(full_document)
        
        # Find current chunk in sentences
        chunk_sentences = sent_tokenize(text)
        start_idx = find_sentence_position(sentences, chunk_sentences[0])
        
        # Get window around it
        window_start = max(0, start_idx - window_sentences)
        window_end = min(len(sentences), start_idx + len(chunk_sentences) + window_sentences)
        
        # Combine window
        window_text = " ".join(sentences[window_start:window_end])
        
        chunk["window_context"] = window_text
        expanded.append(chunk)
    
    return expanded

Example:

Full Text: "Sentence 1. Sentence 2. Sentence 3. Sentence 4. Sentence 5."
Retrieved Chunk: "Sentence 3"
Window (±2): "Sentence 1. Sentence 2. Sentence 3. Sentence 4. Sentence 5."

Multi-Vector Documents: Advanced Approach

The problem: Documents have different parts — title, summary, body. Queries might match different parts.

Example:

Document: "Python Programming Guide"
├─ Title: "Python Programming Guide"
├─ Summary: "Learn Python basics, syntax, and best practices"
└─ Body: [5000 words of detailed content]

Query: "Python guide"
- Matches title strongly (short, distinctive)
- Would match body weakly (buried in long text)

Solution: Create multiple embeddings per document:

def multi_vector_views(document):
    """Create multiple searchable views of one document."""
    
    views = []
    
    # View 1: Title (for broad matching)
    views.append({
        "chunk_id": f"{doc_id}:::title",
        "text": document["title"],
        "view_type": "title",
        "parent_id": doc_id,
        "returns": "body"  # If matched, return body
    })
    
    # View 2: Summary (for detailed matching)
    views.append({
        "chunk_id": f"{doc_id}:::summary",
        "text": document["summary"],
        "view_type": "summary",
        "parent_id": doc_id,
        "returns": "body"
    })
    
    # View 3: Body (for specific facts)
    body_chunks = chunk_by_sentence(document["body"], sentences_per_chunk=3)
    for i, chunk_text in enumerate(body_chunks):
        views.append({
            "chunk_id": f"{doc_id}:::body_{i}",
            "text": chunk_text,
            "view_type": "body",
            "parent_id": doc_id,
            "returns": "self"
        })
    
    return views

Why this works:

Query: "Python programming tutorial"

Matches: 1. Title view: "Python Programming Guide" (score: 0.92) 2. Summary view: "Learn Python basics…" (score: 0.85) 3. Body chunk 47: "…advanced Python features…" (score: 0.71)

All point to the same document → Boost that document's rank Return: Full body content (what the user actually needs)


Approach             Best For                                     Complexity
Parent-Child --  Structured docs (books, manuals with chapters) -- Medium
Sentence Window -- Unstructured text (articles, blogs) --          Low
Multi-Vector -- Documents with distinct sections (papers, reports) -- High

Implementation Tips

  1. How big should parent sections be?
# Too small (defeats the purpose)
max_parent_chars = 200  ❌

# Too large (exceeds LLM context)
max_parent_chars = 10000  ❌

# Just right
max_parent_chars = 1000-1500  ✅

2. Should you always return the parent?

No! Sometimes the child chunk is enough:

def should_expand_context(chunk, query):
    # If chunk is already long, no need
    if len(chunk["text"]) > 500:
        return False
    
    # If query matches specific fact, chunk is sufficient
    if is_factoid_query(query):  # "When was X born?"
        return False
    
    # Otherwise, expand
    return True

3. Deduplication

If multiple child chunks from the same parent are retrieved, don't repeat the parent context:

seen_parents = set()
for chunk in retrieved:
    parent_id = chunk.get("meta", {}).get("parent_id")
    
    if parent_id in seen_parents:
        chunk["parent_context"] = "[See previous result]"
    else:
        chunk["parent_context"] = get_parent_text(parent_id)
        seen_parents.add(parent_id)

Real-World Example

# Your RAG pipeline with parent-child
def rag_with_context_expansion(query):
    # Step 1: Retrieve small chunks (precise)
    candidates = hybrid_search(query, dense_idx, sparse_idx, k=100)
    
    # Step 2: Rerank
    top_chunks = cross_encoder_rerank(query, candidates[:50], top_k=10)
    
    # Step 3: Expand context (return large parent sections)
    expanded = attach_parent_sections(
        top_chunks,
        all_chunks=store["chunks"],
        max_chars=1200
    )
    
    # Step 4: Generate with expanded context
    answer = openai_generate(
        query,
        [{"text": c["parent_context"]} for c in expanded]
    )
    
    return answer

Before (without expansion):

Query: "How does photosynthesis work?"
Retrieved: "Plants convert light to energy."
LLM Answer: "Plants convert light energy to chemical energy."
Quality: ⭐⭐⭐ (vague, missing details)

After (with expansion):

Query: "How does photosynthesis work?"
Retrieved: "Plants convert light to energy." (small chunk)
Expanded: [Full paragraph about chlorophyll, light reactions, Calvin cycle...]
LLM Answer: "Photosynthesis occurs in two stages. First, in the light-dependent reactions, chlorophyll absorbs photons..."
Quality: ⭐⭐⭐⭐⭐ (detailed, accurate, complete)

What's Next

You now know how to:

  1. Index small chunks for precise retrieval
  2. Return a large context for complete answers
  3. Handle structured and unstructured documents

But what if someone asks about "recent news" or "last quarter's reports"? Time matters!

In the next article, we'll cover Time-Based Filtering and Freshness Boosting — how to prioritize recent documents and filter by date ranges. Essential for news, updates, and time-sensitive queries!