Simone Mazzoni — RCUBEAI Research Lab (Paris, France) — December 2025

SPEC-SORITES: Threshold-Constrained Validity of Natural Language

A Benchmark Specification for Vagueness, Sensitivity, and Logical Stability

Abstract

Natural language statements are frequently valid only under implicit and context-dependent thresholds. Classical AI systems collapse this vagueness into binary judgments, masking logical instability and cultural bias. SPEC-SORITES defines a formal benchmark for evaluating whether an AI system can detect, represent, and reason about threshold-sensitive validity. Rather than asking whether a statement is simply true or false, SPEC-SORITES measures the system’s ability to identify validity ranges, critical thresholds, and sensitivity to parameter variation. This benchmark is designed to expose the limitations of probabilistic language models and to validate the representational advantages of Large Representation Models (LRMs).

1. Scope and Objectives

SPEC-SORITES is designed to evaluate whether an AI system can:

Detect vagueness in natural language statements
Identify implicit thresholds governing logical validity
Compute validity domains rather than binary answers
Quantify sensitivity to threshold variation
Support contextual and cultural parameterization

This benchmark applies to:

Large Language Models (LLMs)
LLMs augmented with control layers
Large Representation Models (LRMs) such as R3

2. Foundational Principle: The Sorites Structure

2.1 The Sorites Phenomenon

A Sorites structure arises when:

a predicate admits incremental variation
no single step appears decisive
yet cumulative variation reverses logical validity

Examples:

“This discount is significant”
“The system is secure”
“The workload is reasonable”
“The delay is acceptable”

2.2 Key Observation

The logical status of a statement is not a point, but a region.

SPEC-SORITES formalizes this observation.

3. Definitions

3.1 Threshold Variable

A threshold variable θ is a scalar or ordered parameter that conditions validity.

Examples:

Percentage deviation
Time delay (days)
Quantity variation
Risk score

3.2 Validity Function

Each statement S induces a validity function.

3.3 Validity Domain

The validity domain of S is:

3.4 Critical Threshold

A critical threshold θ_c satisfies:

4. Benchmark Structure

4.1 Case Composition

Each SPEC-SORITES test case includes:

A natural language statement S
One or more implicit threshold variables θ
A hidden ground-truth validity domain D_S
Contextual qualifiers (domain, culture, usage)

4.2 Example

Statement (S): “A price increase of 8% is acceptable under the contract.” Hidden structure:

θ = price increase
D_S = [0%, 5%]
θ_c = 5%

5. Task Definition

For each case, the system must output:

Detection of vagueness (YES / NO)
Identification of threshold variables
Estimation of validity ranges
Identification of critical thresholds
Sensitivity classification

Binary truth judgments alone are insufficient.

6. Verdict Space

SPEC-SORITES defines a structured verdict:

VALID UNDER THRESHOLD: θ ∈ D_S
INVALID BEYOND THRESHOLD: θ ∉ D_S
UNSPECIFIED / REQUIRES PARAMETERIZATION

Systems may optionally request threshold clarification from the user.

7. Evaluation Metrics

7.1 Primary Metrics

Threshold Detection Accuracy
Validity Domain Overlap (Jaccard index)
Critical Threshold Error:

| θ_c − θ̂_c |

Sensitivity Detection Rate

7.2 Secondary Metrics

Over-confidence rate (false certainty)
User-interaction necessity rate
Latency per evaluation

8. Expected System Behavior

8.1 LLM Baseline

Expected behavior:

Produces a single implicit threshold
High fluency, low sensitivity awareness
No explicit validity domain

8.2 LRM (R3)

Expected behavior:

Explicit threshold representation
Computed validity domains
Stable reasoning under parameter variation
Context-aware threshold modeling via KEPs

9. Relation to Other Benchmarks

SPEC-SORITES complements:

SPEC-DEPTH (logical distance)
SECURITY (categorical violations)
INVARIANT (constraint integrity)

It isolates a distinct dimension: threshold-sensitive semantic validity.

10. Interpretation Guidelines

SPEC-SORITES does not test factual knowledge
It does not enforce a universal threshold
It evaluates representational competence, not correctness

Success indicates the ability to govern language, not to imitate it.

11. Conclusion

SPEC-SORITES formalizes a property ignored by most AI benchmarks: the fact that natural language validity is inherently threshold-constrained. By requiring systems to expose validity domains and sensitivity, this benchmark reveals whether an AI system merely collapses vagueness or truly reasons about it. This specification provides a necessary foundation for regulated, contextualized, and culturally sovereign AI systems, and establishes a decisive benchmark for evaluating Large Representation Models.

Status

Version: SPEC-SORITES v1.0
Type: Research / Engineering Specification
Applicable to: R3 v1.9.x and future LRMs

Benchmarks

Phase 1

Phase 2

SPEC-SORITES

SPEC-SORITES: Threshold-Constrained Validity of Natural Language

Abstract

1. Scope and Objectives

2. Foundational Principle: The Sorites Structure

2.1 The Sorites Phenomenon

2.2 Key Observation

3. Definitions

3.1 Threshold Variable

3.2 Validity Function

3.3 Validity Domain

3.4 Critical Threshold

4. Benchmark Structure

4.1 Case Composition

4.2 Example

5. Task Definition

6. Verdict Space

7. Evaluation Metrics

7.1 Primary Metrics

7.2 Secondary Metrics

8. Expected System Behavior

8.1 LLM Baseline

8.2 LRM (R3)

9. Relation to Other Benchmarks

10. Interpretation Guidelines

11. Conclusion

Status

Benchmarks

Phase 1

Phase 2

​SPEC-SORITES: Threshold-Constrained Validity of Natural Language

​Abstract

​1. Scope and Objectives

​2. Foundational Principle: The Sorites Structure

​2.1 The Sorites Phenomenon

​2.2 Key Observation

​3. Definitions

3.1 Threshold Variable

3.2 Validity Function

3.3 Validity Domain

3.4 Critical Threshold

​4. Benchmark Structure

​4.1 Case Composition

​4.2 Example

​5. Task Definition

​6. Verdict Space

​7. Evaluation Metrics

​7.1 Primary Metrics

​7.2 Secondary Metrics

​8. Expected System Behavior

8.1 LLM Baseline

8.2 LRM (R3)

​9. Relation to Other Benchmarks

​10. Interpretation Guidelines

​11. Conclusion

​Status

SPEC-SORITES: Threshold-Constrained Validity of Natural Language

Abstract

1. Scope and Objectives

2. Foundational Principle: The Sorites Structure

2.1 The Sorites Phenomenon

2.2 Key Observation

3. Definitions

4. Benchmark Structure

4.1 Case Composition

4.2 Example

5. Task Definition

6. Verdict Space

7. Evaluation Metrics

7.1 Primary Metrics

7.2 Secondary Metrics

8. Expected System Behavior

9. Relation to Other Benchmarks

10. Interpretation Guidelines

11. Conclusion

Status