Chunk Extractability
chunk-extractability
What this datapoint measures
Whether prose paragraphs are self-contained and extractable as standalone citations. AI systems often cite content in chunks (paragraphs or short passages); whether those chunks make sense in isolation is the chunk-extractability question.
What high looks like
- Paragraphs that contain complete ideas
- Paragraphs that don’t depend heavily on the previous paragraph for context
- First sentences that establish what the paragraph is about
- Specific facts or claims within paragraphs (not deferred to subsequent paragraphs)
- Section openings that introduce the section’s content
What low looks like
- Paragraphs that are sentence fragments of larger ideas
- Heavy use of “this” or “that” referring to prior paragraphs without re-establishing context
- Paragraphs that begin mid-thought
- Stream-of-consciousness prose where context flows across paragraphs without anchors
What at floor looks like
A brand at floor on chunk-extractability has prose where paragraphs don’t stand alone. AI systems extracting citations get fragments that lose meaning in isolation. The citations may be technically correct but unhelpful to readers who didn’t see the full original context.
The remedy is editorial restructuring: rewriting prose so paragraphs contain complete ideas with sufficient context. This is editorial discipline that often requires editorial training across content authors.
What affects this datapoint
- Paragraph-level idea completeness
- First-sentence establishment of paragraph topic
- Context anchoring within each paragraph
- Section opening clarity
- Pronoun resolution within paragraphs
OMG actions that influence this datapoint
| Action | Influence |
|---|---|
| M-2 Answer-First Content Architecture | Direct, primary. Answer-first work explicitly produces self-contained answer paragraphs. |
| M-6 Evidence-Based Content & Citation Architecture | Substantial. Citation-architecture work emphasizes extractable claim-paragraphs. |
| O-6 Content Audit & Baseline Optimization | Substantial. Audit surfaces context-dependent paragraph patterns. |
Multilingual considerations
Per-language paragraph conventions vary:
- English paragraph conventions emphasize topic-sentence-first structure that aids chunk-extractability naturally
- Japanese paragraph conventions are more discursive; literal translation from English may improve chunk-extractability while losing native-reading flow
- Korean and Traditional Chinese conventions vary by content type
The team’s working principle: per-language editorial standards apply. The underlying principle (paragraphs should make sense in isolation for AI extraction) translates across languages but the specific structural patterns differ.
Common failure modes
- Long-form content with cumulative argument that can’t be chunked
- Stream-of-consciousness or essay-style prose where context flows across paragraphs
- Heavy pronoun use without antecedent in the same paragraph
- Section openings that begin in the middle of a thought from the prior section
Diagnostic interpretation
Chunk-extractability at floor with content-formatting also low indicates broad structural work needed. The remedy is editorial restructuring across the affected content.
Chunk-extractability at low with content-depth at high indicates substantive content that doesn’t extract well. The depth is hidden by structure. M-2 work releases the depth into extractable form.