Datapointsoptimize signal-architecture

Semantic Html

signal-architecture floor concept

semantic-html

What this datapoint measures

Use of meaningful HTML elements rather than div-based layout. Whether the brand’s pages use article, section, header, footer, nav, aside, main, h1-h6, ul/ol/li, dl/dt/dd, and other semantic elements appropriately, versus relying entirely on div containers with class names that carry semantics only for human developers.

Semantic HTML matters for AI consumption because AI systems use HTML element semantics as a parsing guide. A page that uses <article> to wrap article content gives the AI a structural signal that what’s inside is article-type content. A page that uses <div class="article"> requires the AI to infer the semantic from the class name, which is less reliable.

What high looks like

  • Article content wrapped in <article> elements
  • Page sections wrapped in <section> elements with appropriate headings
  • Navigation in <nav> elements
  • Page header in <header>, footer in <footer>, main content in <main>
  • Heading hierarchy uses h1-h6 to reflect content structure (one h1 per page; h2-h6 used for nested headings)
  • Lists use ul/ol/li; definition lists use dl/dt/dd
  • Time references use <time> with datetime attributes
  • Images have descriptive alt text (failing alt text degrades semantic-html and accessibility-score)

What low looks like

  • Layout entirely div-based with class names carrying semantics
  • Headings used for visual size rather than structural meaning (h1 for any large text; h3 used for emphasis)
  • Lists rendered as div containers with bullet-point styling
  • Navigation in <div class="nav"> rather than <nav>
  • Multiple h1 elements per page or no h1 at all

What at floor looks like

A brand at floor on semantic-html has pages built entirely from div containers, with semantic HTML elements absent or used incorrectly. This is common in older WordPress themes, in pages built by older CMSes, and in pages built by visual page builders that generate div-heavy output.

Pages at floor are still readable by AI systems — HTML is HTML — but the AI must work harder to identify content structure, and parsing failures are more likely. The path off floor is template work: update site templates to use semantic HTML elements in place of generic divs.

What affects this datapoint

  • Use of article, section, header, footer, nav, aside, main elements
  • Use of h1-h6 in proper hierarchy
  • Use of list elements (ul, ol, li, dl, dt, dd) for actual lists
  • Use of figure, figcaption for images with captions
  • Use of time element for dates and timestamps
  • Avoidance of heading elements used purely for visual styling

OMG actions that influence this datapoint

ActionInfluence
O-4 Technical Infrastructure, Performance & International FoundationSubstantial. Template-level work in O-4 includes correcting non-semantic HTML in site templates.
O-5 Core Structured Data FoundationIndirect. O-5 work often reveals non-semantic HTML during structured-data implementation, prompting cleanup.
M-2 Answer-First Content ArchitectureIndirect. M-2 work creates new content using current templates; if templates are updated to be semantic-correct as part of M-2, semantic-html lifts.

Multilingual considerations

Semantic HTML is language-neutral in its markup. The <article>, <section>, and similar elements work the same way regardless of content language.

One consideration specific to RTL languages (Arabic and Hebrew, not currently in Avonetiq’s primary five): the dir="rtl" attribute should be set at the HTML element or on individual content blocks, not as a CSS-only directionality. CSS-only directionality breaks semantic parsing for some AI systems. Avonetiq’s current calibrated five primary languages do not require RTL handling.

For CJK languages, semantic HTML is identical to English. No special handling required.

Common failure modes

  • Visual page builders generating heavily nested divs with no semantic elements
  • Older WordPress themes using <div class="post"> instead of <article>
  • React/Vue/Angular components using <div> as the default container instead of semantic elements
  • Heading hierarchy violations (multiple h1, skipped levels, h-elements used for visual size)
  • Lists implemented with css ::before content rather than ul/li
  • Tables used for layout rather than for tabular data
  • iframe content lacking semantic context

Diagnostic interpretation

Semantic-html at floor or low almost always reflects template-level engineering work needed. Content authors typically write content; semantic-html is determined by the template they author into. Lifting this datapoint requires template work.

Semantic-html at low with high schema-presence is a recoverable anomaly — the brand has done structured-data work, indicating engineering capacity, but the underlying templates are not semantically clean. O-4 work to update templates produces measurable lift.

Semantic-html at high with low chunk-extractability (V2.2) indicates a brand with semantically correct HTML but content that is not structured for extraction. The HTML is right but the content within it is monolithic. M-2 and M-pillar work is the remedy.