Datapoints › manifest structural-legibility

Chunk Extractability

structural-legibility floor concept multilingual multilingual

Influenced by actions

M-2 Answer-First Content Architecture M-6 Evidence-Based Content & Citation Arch O-6 Content Audit & Baseline Optimization

`chunk-extractability`

What this datapoint measures

Whether prose paragraphs are self-contained and extractable as standalone citations. AI systems often cite content in chunks (paragraphs or short passages); whether those chunks make sense in isolation is the chunk-extractability question.

What high looks like

Paragraphs that contain complete ideas
Paragraphs that don’t depend heavily on the previous paragraph for context
First sentences that establish what the paragraph is about
Specific facts or claims within paragraphs (not deferred to subsequent paragraphs)
Section openings that introduce the section’s content

What low looks like

Paragraphs that are sentence fragments of larger ideas
Heavy use of “this” or “that” referring to prior paragraphs without re-establishing context
Paragraphs that begin mid-thought
Stream-of-consciousness prose where context flows across paragraphs without anchors

What at floor looks like

A brand at floor on chunk-extractability has prose where paragraphs don’t stand alone. AI systems extracting citations get fragments that lose meaning in isolation. The citations may be technically correct but unhelpful to readers who didn’t see the full original context.

The remedy is editorial restructuring: rewriting prose so paragraphs contain complete ideas with sufficient context. This is editorial discipline that often requires editorial training across content authors.

What affects this datapoint

Paragraph-level idea completeness
First-sentence establishment of paragraph topic
Context anchoring within each paragraph
Section opening clarity
Pronoun resolution within paragraphs

OMG actions that influence this datapoint

Action	Influence
M-2 Answer-First Content Architecture	Direct, primary. Answer-first work explicitly produces self-contained answer paragraphs.
M-6 Evidence-Based Content & Citation Architecture	Substantial. Citation-architecture work emphasizes extractable claim-paragraphs.
O-6 Content Audit & Baseline Optimization	Substantial. Audit surfaces context-dependent paragraph patterns.

Multilingual considerations

Per-language paragraph conventions vary:

English paragraph conventions emphasize topic-sentence-first structure that aids chunk-extractability naturally
Japanese paragraph conventions are more discursive; literal translation from English may improve chunk-extractability while losing native-reading flow
Korean and Traditional Chinese conventions vary by content type

The team’s working principle: per-language editorial standards apply. The underlying principle (paragraphs should make sense in isolation for AI extraction) translates across languages but the specific structural patterns differ.

Common failure modes

Long-form content with cumulative argument that can’t be chunked
Stream-of-consciousness or essay-style prose where context flows across paragraphs
Heavy pronoun use without antecedent in the same paragraph
Section openings that begin in the middle of a thought from the prior section

Diagnostic interpretation

Chunk-extractability at floor with content-formatting also low indicates broad structural work needed. The remedy is editorial restructuring across the affected content.

Chunk-extractability at low with content-depth at high indicates substantive content that doesn’t extract well. The depth is hidden by structure. M-2 work releases the depth into extractable form.