Skip to main content
Simone Mazzoni — RCUBEAI Research Lab (Paris, France) — December 2025

SPEC-SORITES: Threshold-Constrained Validity of Natural Language

A Benchmark Specification for Vagueness, Sensitivity, and Logical Stability

Abstract

Natural language statements are frequently valid only under implicit and context-dependent thresholds. Classical AI systems collapse this vagueness into binary judgments, masking logical instability and cultural bias. SPEC-SORITES defines a formal benchmark for evaluating whether an AI system can detect, represent, and reason about threshold-sensitive validity. Rather than asking whether a statement is simply true or false, SPEC-SORITES measures the system’s ability to identify validity ranges, critical thresholds, and sensitivity to parameter variation. This benchmark is designed to expose the limitations of probabilistic language models and to validate the representational advantages of Large Representation Models (LRMs).

1. Scope and Objectives

SPEC-SORITES is designed to evaluate whether an AI system can:
  • Detect vagueness in natural language statements
  • Identify implicit thresholds governing logical validity
  • Compute validity domains rather than binary answers
  • Quantify sensitivity to threshold variation
  • Support contextual and cultural parameterization
This benchmark applies to:
  • Large Language Models (LLMs)
  • LLMs augmented with control layers
  • Large Representation Models (LRMs) such as R3

2. Foundational Principle: The Sorites Structure

2.1 The Sorites Phenomenon

A Sorites structure arises when:
  • a predicate admits incremental variation
  • no single step appears decisive
  • yet cumulative variation reverses logical validity
Examples:
  • “This discount is significant”
  • “The system is secure”
  • “The workload is reasonable”
  • “The delay is acceptable”

2.2 Key Observation

The logical status of a statement is not a point, but a region.
SPEC-SORITES formalizes this observation.

3. Definitions

3.1 Threshold Variable

A threshold variable θ is a scalar or ordered parameter that conditions validity.

Examples:

  • Percentage deviation
  • Time delay (days)
  • Quantity variation
  • Risk score

3.2 Validity Function

Each statement S induces a validity function.

3.3 Validity Domain

The validity domain of S is:

3.4 Critical Threshold

A critical threshold θ_c satisfies:


4. Benchmark Structure

4.1 Case Composition

Each SPEC-SORITES test case includes:
  • A natural language statement S
  • One or more implicit threshold variables θ
  • A hidden ground-truth validity domain D_S
  • Contextual qualifiers (domain, culture, usage)

4.2 Example

Statement (S): “A price increase of 8% is acceptable under the contract.” Hidden structure:
  • θ = price increase
  • D_S = [0%, 5%]
  • θ_c = 5%

5. Task Definition

For each case, the system must output:
  1. Detection of vagueness (YES / NO)
  2. Identification of threshold variables
  3. Estimation of validity ranges
  4. Identification of critical thresholds
  5. Sensitivity classification
Binary truth judgments alone are insufficient.

6. Verdict Space

SPEC-SORITES defines a structured verdict:
  • VALID UNDER THRESHOLD: θ ∈ D_S
  • INVALID BEYOND THRESHOLD: θ ∉ D_S
  • UNSPECIFIED / REQUIRES PARAMETERIZATION
Systems may optionally request threshold clarification from the user.

7. Evaluation Metrics

7.1 Primary Metrics

  • Threshold Detection Accuracy
  • Validity Domain Overlap (Jaccard index)
  • Critical Threshold Error:
| θ_c − θ̂_c |
  • Sensitivity Detection Rate

7.2 Secondary Metrics

  • Over-confidence rate (false certainty)
  • User-interaction necessity rate
  • Latency per evaluation

8. Expected System Behavior

8.1 LLM Baseline

Expected behavior:

  • Produces a single implicit threshold
  • High fluency, low sensitivity awareness
  • No explicit validity domain

8.2 LRM (R3)

Expected behavior:

  • Explicit threshold representation
  • Computed validity domains
  • Stable reasoning under parameter variation
  • Context-aware threshold modeling via KEPs

9. Relation to Other Benchmarks

SPEC-SORITES complements:
  • SPEC-DEPTH (logical distance)
  • SECURITY (categorical violations)
  • INVARIANT (constraint integrity)
It isolates a distinct dimension: threshold-sensitive semantic validity.

10. Interpretation Guidelines

  • SPEC-SORITES does not test factual knowledge
  • It does not enforce a universal threshold
  • It evaluates representational competence, not correctness
Success indicates the ability to govern language, not to imitate it.

11. Conclusion

SPEC-SORITES formalizes a property ignored by most AI benchmarks: the fact that natural language validity is inherently threshold-constrained. By requiring systems to expose validity domains and sensitivity, this benchmark reveals whether an AI system merely collapses vagueness or truly reasons about it. This specification provides a necessary foundation for regulated, contextualized, and culturally sovereign AI systems, and establishes a decisive benchmark for evaluating Large Representation Models.

Status

  • Version: SPEC-SORITES v1.0
  • Type: Research / Engineering Specification
  • Applicable to: R3 v1.9.x and future LRMs