Simone Mazzoni — RCUBEAI Research Lab (Paris, France) — December 2025
SPEC-SORITES: Threshold-Constrained Validity of Natural Language
A Benchmark Specification for Vagueness, Sensitivity, and Logical StabilityAbstract
Natural language statements are frequently valid only under implicit and context-dependent thresholds. Classical AI systems collapse this vagueness into binary judgments, masking logical instability and cultural bias. SPEC-SORITES defines a formal benchmark for evaluating whether an AI system can detect, represent, and reason about threshold-sensitive validity. Rather than asking whether a statement is simply true or false, SPEC-SORITES measures the system’s ability to identify validity ranges, critical thresholds, and sensitivity to parameter variation. This benchmark is designed to expose the limitations of probabilistic language models and to validate the representational advantages of Large Representation Models (LRMs).1. Scope and Objectives
SPEC-SORITES is designed to evaluate whether an AI system can:- Detect vagueness in natural language statements
- Identify implicit thresholds governing logical validity
- Compute validity domains rather than binary answers
- Quantify sensitivity to threshold variation
- Support contextual and cultural parameterization
- Large Language Models (LLMs)
- LLMs augmented with control layers
- Large Representation Models (LRMs) such as R3
2. Foundational Principle: The Sorites Structure
2.1 The Sorites Phenomenon
A Sorites structure arises when:- a predicate admits incremental variation
- no single step appears decisive
- yet cumulative variation reverses logical validity
- “This discount is significant”
- “The system is secure”
- “The workload is reasonable”
- “The delay is acceptable”
2.2 Key Observation
The logical status of a statement is not a point, but a region.
3. Definitions
3.1 Threshold Variable
A threshold variable θ is a scalar or ordered parameter that conditions validity.
Examples:
- Percentage deviation
- Time delay (days)
- Quantity variation
- Risk score
3.2 Validity Function
Each statement S induces a validity function.
3.3 Validity Domain
The validity domain of S is:
3.4 Critical Threshold
A critical threshold θ_c satisfies:
4. Benchmark Structure
4.1 Case Composition
Each SPEC-SORITES test case includes:- A natural language statement S
- One or more implicit threshold variables θ
- A hidden ground-truth validity domain D_S
- Contextual qualifiers (domain, culture, usage)
4.2 Example
Statement (S): “A price increase of 8% is acceptable under the contract.” Hidden structure:- θ = price increase
- D_S = [0%, 5%]
- θ_c = 5%
5. Task Definition
For each case, the system must output:- Detection of vagueness (YES / NO)
- Identification of threshold variables
- Estimation of validity ranges
- Identification of critical thresholds
- Sensitivity classification
Binary truth judgments alone are insufficient.
6. Verdict Space
SPEC-SORITES defines a structured verdict:- VALID UNDER THRESHOLD: θ ∈ D_S
- INVALID BEYOND THRESHOLD: θ ∉ D_S
- UNSPECIFIED / REQUIRES PARAMETERIZATION
7. Evaluation Metrics
7.1 Primary Metrics
- Threshold Detection Accuracy
- Validity Domain Overlap (Jaccard index)
- Critical Threshold Error:
- Sensitivity Detection Rate
7.2 Secondary Metrics
- Over-confidence rate (false certainty)
- User-interaction necessity rate
- Latency per evaluation
8. Expected System Behavior
8.1 LLM Baseline
Expected behavior:
- Produces a single implicit threshold
- High fluency, low sensitivity awareness
- No explicit validity domain
8.2 LRM (R3)
Expected behavior:
- Explicit threshold representation
- Computed validity domains
- Stable reasoning under parameter variation
- Context-aware threshold modeling via KEPs
9. Relation to Other Benchmarks
SPEC-SORITES complements:- SPEC-DEPTH (logical distance)
- SECURITY (categorical violations)
- INVARIANT (constraint integrity)
10. Interpretation Guidelines
- SPEC-SORITES does not test factual knowledge
- It does not enforce a universal threshold
- It evaluates representational competence, not correctness
11. Conclusion
SPEC-SORITES formalizes a property ignored by most AI benchmarks: the fact that natural language validity is inherently threshold-constrained. By requiring systems to expose validity domains and sensitivity, this benchmark reveals whether an AI system merely collapses vagueness or truly reasons about it. This specification provides a necessary foundation for regulated, contextualized, and culturally sovereign AI systems, and establishes a decisive benchmark for evaluating Large Representation Models.Status
- Version: SPEC-SORITES v1.0
- Type: Research / Engineering Specification
- Applicable to: R3 v1.9.x and future LRMs