Document 1 — Foundations · Part 4 — How AVO is measured (conceptual)

Part 4 — How AVO is measured (conceptual)

This part introduces the measurement architecture of AVO at a conceptual level. The detailed datapoint specifications are in Document 3: The Datapoint Reference. Calibration values (specific weights, thresholds, balance exponents) are not part of this document — they are engineering implementation detail.

4.1 The Authority Score — what it measures, what it predicts

The Authority Score (AS) is a 0-to-100 score quantifying the predictive readiness of a brand to be recognized and cited by AI systems. AS measures the conditions of authority — the structural, substantive, and external-validation signals that determine whether AI systems will treat the brand as citable. A high AS predicts that AI systems should cite the brand when the brand’s category and Focus are queried; a low AS predicts they should not.

AS is positioned in the lineage of authority measurement instruments. PageRank measured the conditions of authority for the link-citation web. Domain Authority and Page Authority measured them for search-engine ranking. E-E-A-T provided a behavioral framework for evaluating them at the page level. AS measures the conditions of authority for the AI search era — different signals, different aggregation, different calibration, the same underlying concept of authority being measured.

For the practitioner, AS is the diagnostic instrument. Its value is not the headline number alone; its value is the decomposition into pillar, vector, and datapoint scores that surfaces specifically where the brand’s readiness is deficient. A brand with AS = 28 is not just “low” — the underlying decomposition reveals whether the deficit is in technical infrastructure (V1.2), structural legibility (V2.2), knowledge validation (V3.1), or somewhere else. This decomposition is what makes AS operationally actionable rather than merely descriptive.

A brand at AS ≈ 0 typically shows deficits across all three pillars. The decomposition is still informative even when all components are weak: it surfaces which pillar’s deficit is most severe and which vectors within that pillar are at floor. This directs the first wave of OMG action.

4.2 The aggregation chain

AS is computed by aggregating thirty-six datapoint scores into six vector scores, then into three pillar scores, then into a single Authority Score via a hybrid base formula and four sequential modifiers.

Datapoint scores  →  Vector scores  →  Pillar scores  →  Authority Score

Each datapoint produces a 0-to-100 score, computed from observable evidence on the domain or from external sources. Vector scores are weighted averages of the datapoints they contain. Pillar scores are weighted averages of the vectors they contain. The within-vector and within-pillar weights are calibration parameters; they are not part of the open methodology and may be adjusted as Avonetiq’s calibration evolves.

A datapoint floor mechanism prevents zero-collapse: a single missing structural signal does not drag the parent vector to zero. Floors are configured per datapoint, reflecting the structural minimum each datapoint can reasonably produce given the absence of evidence rather than the presence of evidence against.

For the practitioner, the aggregation chain has diagnostic implications:

A vector at floor with most datapoints at floor signals systemic deficit in that vector. The work is at the vector level, not at any single datapoint.
A vector at floor with most datapoints normal but one or two at floor signals concentrated deficit. The work is at the specific datapoints.
A pillar at floor with all vectors low signals systemic pillar deficit. The work is at the pillar level — typically a foundations-stage brand.
A pillar in normal range with one vector dragging it down signals concentrated vector deficit. The work is at the specific vector.

Reading the aggregation chain bottom-up (from datapoints upward) versus top-down (from headline AS downward) produces the same conclusions but follows different paths. Practitioners typically read top-down for engagement-level conversations and bottom-up for action-selection conversations.

4.3 The hybrid base formula

The base Authority Score is computed using a hybrid of weighted arithmetic mean and weighted geometric mean across the three pillars.

arith   = O × wO + M × wM + G × wG
geo     = O^wO × M^wM × G^wG
balance = geo / arith
raw     = arith × balance^p

Where O, M, and G are the pillar scores; wO, wM, wG are pillar weights summing to one; and p is a balance exponent that controls how heavily imbalance between pillars is penalized.

The reasoning behind the hybrid is structural. Pure arithmetic-mean scoring rewards lopsided portfolios — a brand strong in one pillar can post a high score despite catastrophic weakness in another. Pure geometric-mean scoring is too punitive in the opposite direction — a single weak pillar can collapse the headline score even when the brand is fundamentally sound. The hybrid form interpolates between these two positions: when p approaches zero, the formula behaves arithmetically and tolerates imbalance; when p approaches one, the formula behaves geometrically and rewards balance.

For the practitioner, the hybrid formula produces a recurring pattern in AS readings: brands with strong scores on two pillars and weak scores on a third are penalized but not collapsed. This matches operational reality — a brand engineered for two pillars but weak on the third has real AVO problems but is not at structural zero. The headline AS reflects this honestly.

The actual balance exponent is calibrated against reference domains spanning the scoring bands and is not part of the open methodology.

4.4 The four sequential modifiers

After base computation, four sequential modifiers adjust the raw score. Each modifier addresses one structural condition the base formula cannot capture.

Modifier	Trigger condition	Behavior
O-penalty	Optimize pillar below configured threshold	Multiplies the score by a penalty factor scaling linearly from a floor multiplier upward toward unity as Optimize approaches the threshold. Reflects that AI systems cannot recover content quality from a domain they cannot crawl.
G-penalty	Generative pillar below configured threshold	Same structural form as O-penalty, applied to Generative. Reflects that without external trust signals, internal content quality alone is insufficient for AI citation.
G-bonus	Generative pillar above configured threshold	Multiplies the score by a bonus factor scaling linearly from unity at the threshold upward to a maximum boost as Generative approaches 100. Rewards exceptional external authority.
Facade penalty	High Optimize plus Manifest combined with very low Generative	Detects domains that appear technically polished but lack genuine knowledge signals. Multiplies the score by a reduction factor proportional to the size of the readiness-without-trust gap.

The four modifiers are sequential. Each is applied to the running raw score before the next is evaluated. The order — O-penalty, G-penalty, G-bonus, Facade — is structural: the first establishes the absolute floor (no crawl, no score), the next two adjust for external recognition, the last adjusts for the specific recognition-without-validation pattern.

For the practitioner, the modifiers produce specific recognizable patterns:

A brand whose AS is structurally limited by the O-penalty has a technical infrastructure problem. Until that problem is addressed, no amount of Manifest or Generative work will lift the headline AS substantially.
A brand whose AS is structurally limited by the G-penalty has an external validation problem. Manifest work alone, however excellent, will not lift the headline AS substantially.
A brand whose AS is unexpectedly high due to the G-bonus has exceptional external authority. The brand may be relying on this external authority to mask gaps in Manifest or Optimize work that should be addressed.
A brand whose AS shows the Facade penalty pattern is a brand that has invested in technical infrastructure and content but has not done the external authority work. This is the most common pattern for brands that hire technical SEO and content marketing services without strategic AVO direction.

The modifiers are calibrated values, not open methodology. The practitioner uses the patterns above as diagnostic recognition; the precise threshold values are engineering implementation.

4.5 The five scoring bands

The Authority Score reports into one of five bands. Band thresholds are calibration parameters; band names and structural meanings are open methodology.

Band	Structural meaning
(Below Critical)	Domains so structurally broken that even a Critical scoring would overstate readiness — typically domains blocking all crawlers, returning errors on most pages, or producing no machine-readable content
Critical	Fundamentally unprepared for AI visibility. Major Optimize-pillar deficits typically present.
Developing	Foundation in place but significant gaps remain. The default state of a competently-built but unoptimized site.
Strong	Well-optimized across all three pillars, with credible authority signals.
Elite	Genuinely elite AI readiness. Reserved for brands with full Optimize, Manifest, and Generative maturity.

For the practitioner, the bands provide common vocabulary for engagement conversations. A brand at AS = 18 in the Critical band is described to the brand stakeholder as “Critical” with a defined structural meaning. A brand at AS = 42 in the Developing band is described as “Developing” with a different defined structural meaning. The bands compress the full 0-100 score into four operationally distinct categories that brand stakeholders can act on.

The bands also map roughly to the maturity stages described in Part 3.6:

Band	Maturity stage
Below Critical	Pre-foundations (the brand has not yet engineered for any web discovery surface)
Critical	Foundations stage (Optimize-heavy work is the priority)
Developing	Foundations transitioning to Depth (Optimize work continues, Manifest work begins)
Strong	Authority stage (Generative-pillar work is the priority)
Elite	Sustained authority stage (balanced compounding work)

The mapping is approximate, not deterministic. Some brands sit in Developing band with specific deficits that indicate Authority-stage work on a sub-vector (e.g., a brand with strong content but very low V3.1 needs Generative-pillar work even though headline AS is in Developing range). The practitioner uses the mapping as guidance and the underlying decomposition as decision input.

4.6 The Visibility Score — what it measures, what it proves

The Visibility Score (VS) is a 0-to-100 score quantifying the empirical presence of a brand in AI-generated answers across the platforms the brand cares about. Where the Authority Score predicts citability based on conditions of authority, VS measures whether AI systems are in fact citing the brand. VS is the verification instrument — the empirical test of whether the conditions AS measured have produced the outcomes AS predicted.

VS is the proof side of AVO’s prediction-and-proof structure. A high AS combined with a low VS reveals that the predicted citability is not materializing. A high AS combined with a high VS validates that the prediction held. A low AS combined with a low VS confirms the prediction in the negative.

The AVO practice loop proceeds on the basis of these comparisons. AS findings direct OMG work; VS findings verify whether that work succeeded. Reading the AS-VS pairing is one of the practitioner’s core diagnostic skills, treated in detail in section 4.11.

4.7 The Focus, Prompt Book, and probe vocabulary

Three terms structure VS measurement.

A Focus is a declared positioning statement representing what the brand wants to be known for. A Focus is not a keyword and not a topic; it is a strategic claim about the brand’s category and audience. Examples of well-formed Focuses: Japanese travel specialist, enterprise cybersecurity for healthcare, sustainable fashion brand. A Focus is the unit at which VS measures whether AI systems agree with a brand’s claimed positioning.

For the practitioner, Focus selection is one of the most consequential decisions in early engagement scoping. A poorly chosen Focus produces VS measurement that does not reflect the brand’s actual market positioning. A Focus that is too narrow (“budget hotels in Sapporo for solo female business travelers”) will produce thin VS measurement because the underlying probe corpus is small. A Focus that is too broad (“hospitality”) will produce VS measurement that is uninformative because the brand competes against entire categories of unrelated brands.

The team’s working principle for Focus selection: a Focus should be specific enough to be defensible (the brand’s strategic claim is real and not a generic category claim) and broad enough to produce meaningful probe coverage (the probe corpus generated from the Focus contains questions the brand’s audience actually asks).

A Prompt Book is a structured set of prompts generated from a Focus, spanning three intent tiers — navigational (does the AI know the brand exists), category (does the AI surface the brand when asked about its category), and advisory (does the AI recommend the brand when a user requests guidance). The Prompt Book is generated automatically from the Focus and from search-volume signals.

Each tier serves a different diagnostic purpose:

Navigational tier: Does the AI recognize this brand by name? This tier is the brand-recognition foundation. Failure at this tier means the AI does not know the brand exists, which makes category and advisory measurements structurally premature.
Category tier: When asked about the brand’s category, does the AI mention this brand? This tier is the category-presence foundation. Failure at this tier means the brand is invisible in category-level discovery even when it exists.
Advisory tier: When asked for a recommendation in the brand’s category, does the AI recommend this brand? This tier is the recommendation-presence foundation. Failure at this tier means the brand is mentioned but not endorsed.

The three tiers produce distinct VS sub-measurements that the practitioner reads separately for diagnostic purposes.

A probe is a single execution of a prompt against an AI platform, with the response recorded and analyzed. Probes are the atomic unit of VS measurement; rates are computed across many probes per platform per measurement run.

4.8 The six probe-level signals and the three VS pillars

From each probe, six signals are extracted. Signal extraction follows a blind-extraction principle: the signal extractor sees only the brand and the response text, not the platform, the prompt, or the Focus. This isolation prevents bias.

Signal	Type	What it captures
Mentioned	Binary	The brand appears in the response, in any framing
Recommended	Binary	The AI explicitly recommends or endorses the brand
Top pick	Binary	The AI singles out the brand as a primary or preferred option
Listed	Binary	The brand appears within an enumerated list the AI provides
Position	Ordinal	Where the brand falls within any list — first, second, third, or later
Cited	Binary	The AI provides a link or attribution to the brand’s website

Sentiment is extracted as a separate companion metric from any probe in which the brand is mentioned. Sentiment is reported alongside VS as an independent diagnostic dimension; it is not part of the Visibility Score itself.

The Visibility Score is structured as three pillars that mirror progressive depths of brand presence.

Pillar	Question	Composition
Presence	Does AI know you exist?	The mention rate across all probes. The baseline of visibility.
Endorsement	Does AI recommend you?	A weighted combination of the recommendation rate and the top-pick rate. Captures whether mentions translate into endorsement.
Prominence	How prominently?	A weighted combination of listed rate, position score, and citation rate. Captures the depth and quality of presence when the brand does appear.

For the practitioner, the three-pillar VS structure provides recognizable diagnostic patterns:

Presence high, Endorsement low. The AI knows the brand exists but does not endorse it. The brand is mentioned in lists or descriptions without being singled out. Diagnostic read: Manifest-pillar work is producing mention-level visibility but Generative-pillar work has not produced endorsement-level authority. The path forward involves G-pillar actions that elevate the brand from mentioned to recommended.
Presence low, Endorsement low. The AI does not know the brand exists. Diagnostic read: foundations work is incomplete or recently completed but not yet ingested by the AI. The path forward involves verifying foundations completeness and waiting for ingestion before judging.
Presence high, Endorsement high, Prominence low. The AI recognizes and endorses the brand but presents it without prominence — buried in lists, late in position order, without citation links. Diagnostic read: the brand is reaching endorsement-level authority but lacks structural signals that elevate prominence. Actions affecting V3.1 (Knowledge Validation) and the prominence-related signals (citation, position, listed-rate) are the path forward.
Presence high, Endorsement high, Prominence high. The brand is well-engineered for AI visibility. Maintenance work and refinement.

4.9 Wilson confidence intervals

Every rate that feeds the Visibility Score is a binomial proportion: a count of probes where some condition was true, divided by the total count of probes. Binomial proportions admit a well-known confidence interval — the Wilson score interval — which is more accurate than the simpler normal approximation, particularly for small samples or extreme proportions.

center = (p + z²/(2n)) / (1 + z²/n)
spread = (z / (1 + z²/n)) × √( p(1-p)/n + z²/(4n²) )
CI     = [ center − spread,  center + spread ]

Where p is the observed rate, n is the total number of probes, and z is the standard normal quantile for the desired confidence level (1.96 for 95% confidence).

The interval is propagated through each pillar formula and through the final VS-weighted combination to produce a confidence interval on the Visibility Score itself. The reported VS is therefore a point estimate with formal lower and upper bounds.

For the practitioner, the Wilson interval is the difference between confident measurement and noise. A VS of 35 with a 90% confidence interval of [12, 58] is a different finding than a VS of 35 with a 90% interval of [33, 37]. The first is a measurement that should not yet drive action; the second is a measurement the practitioner can act on confidently.

The interval is graded — high, medium, low, or insufficient — based on its width. Insufficient-grade intervals indicate that probe coverage is too thin to draw conclusions; the methodology requires that decisions be deferred until coverage improves.

The propagation is intentionally conservative: rate intervals are treated as if independent, slightly overstating the true interval width because the underlying outcomes are positively correlated. The reported interval is therefore wider than the strictly correct interval would be — the safe direction for any disclosure to a stakeholder.

A practical implication: brands with small probe corpora (typically because of narrow Focus or thin platform coverage) will see wider Wilson intervals. The remedy is not adjusting the Wilson computation; it is expanding probe coverage by broadening the Prompt Book or adding platform coverage.

4.10 Brand recognition as a derived gate

Brand recognition is computed automatically from the navigational-tier probes within VS. It functions as a gate on the rest of the Visibility Score.

Recognition state	VS handling
Above pass threshold	VS proceeds normally
Between warn and pass thresholds	VS reported with reduced-reliability flag
Below warn threshold	VS reported with explicit block status

The recognition gate prevents the methodology from generating recommendations that cannot succeed: there is no point in optimizing how AI systems describe a brand they do not know exists.

For the practitioner working with brands at AS ≈ 0, brand recognition gate is almost always blocking initially. The diagnostic conversation with the brand stakeholder includes explaining this state honestly: VS measurement at advisory and category tiers is structurally premature until the AI knows the brand exists at all. The first wave of OMG work establishes navigational-tier recognition; only then does category and advisory-tier VS measurement become informative.

Brand recognition itself is a measurable quantity — the navigational-tier mention rate — and the practitioner tracks it as a leading indicator. A brand whose recognition is rising from floor toward warn-threshold is a brand whose foundations work is taking effect; a brand whose recognition remains at floor after substantial OMG work is a brand whose work is not reaching ingestion or whose work is in the wrong areas.

4.11 The AS-VS pairing — what each combination tells the practitioner

Reading the pairing of AS and VS is one of the practitioner’s primary diagnostic skills. Each combination signals a different operational state and prescribes different next actions.

AS	VS	What this signals	Typical action direction
Low	Low	Foundations work incomplete. The brand is not engineered for AI citation and the AI has not seen the brand. Expected pattern for AS ≈ 0 starting state.	Continue or initiate Optimize-pillar foundations work; defer aggressive VS measurement until brand recognition gate clears.
Low	High	Anomalous. The brand is being cited despite not being engineered for citation. Possible causes: existing strong brand awareness from non-AI channels (e.g., long-established brand with traditional marketing depth); training-corpus inclusion from prior eras; competitor citation patterns that include this brand by association.	Investigate to confirm the anomaly is real. If real, the brand has authority assets that can be amplified through structured AVO work; AS will catch up to VS as foundations are engineered.
High	Low	Common. The brand is engineered for citation but is not yet being cited. Possible causes: time lag between OMG work and AI ingestion; conditions outside AS scope are blocking visibility; AS measurement is over-reading the brand’s actual readiness.	Investigate first cause: how long since substantive OMG work was done relative to the AI platforms’ typical refresh cadence. If lag is plausible, wait and re-measure. If lag is not plausible, investigate platform-specific dynamics or AS calibration drift.
High	High	The prediction held. The brand is engineered for citation and is in fact being cited.	Maintenance and compounding work. Continue OMG action selection at higher complexity (Generative-pillar maturation), defend against decay (M-8 content refresh), and watch for category competitive shifts.

These four combinations are the basic diagnostic pairings. In practice, the situation is more granular because AS and VS are not single numbers — each has pillar-level and vector-level decomposition. A brand may show high Optimize-pillar AS, low Manifest-pillar AS, and a VS pattern with high Presence but low Endorsement. The practitioner reads both decompositions in conjunction.

The diagnostic readings produced by AS-VS pairings are rarely definitive on a single observation. Practitioner judgment combines the current measurement with the brand’s history (what OMG actions have been completed and when), the platform mix (some platforms refresh more quickly than others), the language scope (different languages have different ingestion dynamics), and the category context (some categories have rapid AI-mediated discovery shifts, others are slower).

The Worked Engagement document (Document 4) contains a full case study that walks the practitioner through reading AS-VS pairings across a multi-cycle engagement. The pairing patterns above are the foundation; the case study extends them to operational reality.