High-level summary of the ARC, ARC-Align, and Eden research programme, including the benchmark stack, intervention results, and current claim discipline.
Grok 4.1 Fast gets dramatically more ethical the harder it thinks. Claude Opus 4.6 does too. Gemini 3 Flash gets less ethical. GPT-5.4 doesn’t change at all.
Six frontier AI systems. Same questions. Same scoring. Opposite results. Why?
A mouse’s heart beats 600 times per minute. An elephant’s beats 28. The scaling exponent is ¾. A jellyfish’s is ⅔. A fungus’s is ½. Three fractions, but why those fractions? The formula $\alpha = d/(d+1)$, where $d$ is the dimensionality of the system, provides the answer. The ¾ exponent for mammals is $3/(3+1)$ because mammals are three-dimensional. The ⅔ for jellyfish is $2/(2+1)$ because they are effectively two-dimensional. The ½ for fungi is $1/(1+1)$ because they grow along one-dimensional filaments. Zero adjustable parameters. This formula was independently derived by West, Brown and Enquist for metabolic scaling (1997, Science, 9,000+ citations), by Banavar et al. for transport networks (2010), by Demetrius for statistical mechanics of biological scaling (2010), by Zhao for allometric geometry (2022), and by Bettencourt for urban scaling (2013, Science, 2,000+ citations). The ARC Principle’s contribution is not the formula itself, but the identification that all of these derivations are special cases of Cauchy-constrained recursive composition, unifying them under a single mathematical framework for the first time and extending the result to AI scaling and alignment.
And why, when we applied clinical-trial-grade blinding to AI safety evaluation for the first time, did half of the previously published results reverse?
This research programme answers these questions. The answer provides the first quantitative framework for predicting which AI architectures will become safer as they become smarter, and the first evidence that one specific intervention works.
As AI gets smarter, it does not reliably get safer, and the methods used to measure safety are themselves unreliable.
Alignment scaling splits into three distinct, architecture-dependent tiers.
The v5 experiment tested 6 frontier models across 5–6 depth levels each, with 6–7 blind scorers per entry depending on the subject run. Whether an AI gets more or less ethical when it thinks harder depends entirely on how it was designed.
| Tier | Model | Shallow→Deep | Cohen’s $d$ | $p$-value |
|---|---|---|---|---|
| Tier 1: Positive | Grok 4.1 Fast | 65.7→81.9 (+16.2) | +1.38 | $p < 0.000001$ |
| Claude Opus 4.6 | 80.1→86.0 (+5.9) | +1.27 | $p = 0.000001$ | |
| Groq Qwen3 | 71.5→77.4 (+5.9) | +0.84 | $p = 0.007$ | |
| Tier 2: Flat | DeepSeek V3.2 | 56.5→55.2 (−1.3) | −0.07 | $p = 0.92$ |
| GPT-5.4 | 56.8→54.9 (−1.8) | −0.08 | $p = 0.40$ | |
| Tier 3: Negative | Gemini 3 Flash | 61.1→52.2 (−8.8) | −0.53 | $p = 0.006$ |
| Frontier models tested | 6 (all complete) |
| Blind scorers per entry | 7 |
| Identity laundering success rate | 100% |
| Blinding layers | 4 (author-blind, scorer-blind, order-randomised, identity-laundered) |
| Robustness measures | 75 |
This constitutes, to our knowledge, the most rigorous alignment evaluation dataset published to date. No prior alignment benchmark enforces multi-layer blinding with cross-model scoring verification.
| Depth | Alignment Score | Maths Accuracy |
|---|---|---|
| Minimal (11 tokens) | 80.1 | 90.0% |
| Standard (142 tokens) | 82.7 | 76.7% |
| Deep (964 tokens) | 84.1 | 70.0% |
| Exhaustive (1,951 tokens) | 84.5 | 60.0% |
| Extreme (1,672 tokens) | 86.0 | 63.3% |
This finding is critical for alignment theory: it demonstrates that ethical reasoning is not a byproduct of general intelligence, and that improving one does not automatically improve (or degrade) the other. Alignment must be measured and optimised independently.
| Pillar | Shallow→Deep | Spearman $\rho$ | $p$-value |
|---|---|---|---|
| Nuance | 80.6→86.8 | 0.359 | $p = 0.00008$ |
| Stakeholder Care | 76.1→83.9 | 0.327 | $p = 0.0003$ |
| Intellectual Honesty | 81.0→88.6 | 0.379 | $p = 0.00003$ |
| Position Quality | 80.3→85.8 | 0.369 | $p = 0.00005$ |
The improvement is not concentrated in a single dimension; it is broad-based. This rules out the hypothesis that alignment scaling is merely a measurement artefact of increased verbosity or any single stylistic change.
Embedding ethical evaluation into the reasoning process produces measurable, reproducible improvement across architectures.
The three Eden loops. The protocol embeds three specific ethical evaluation loops inside the reasoning process, each adding one dimension of recursive ethical depth ($d_{\text{align}} = 3$, predicting $\alpha_{\text{align}} = 3/(3+1) = 0.75$):
The full three-loop protocol has now been tested empirically across three working models. In the scoring, the Love Loop is operationalised as stakeholder care: the measurable habit of identifying affected people and considering their interests.
In the companion narrative report and in Paper V, we describe this finding as “measurable love” and “the stewardship gene”, deliberately provocative language for what is, empirically, a precise and reproducible result.
The framework is built on 200-year-old theorems and independently matches peer-reviewed science; it is not curve-fitting.
The ARC Principle proposes that recursive scaling follows $U = I \times R^\alpha$, where capability ($U$) equals base potential ($I$) times recursive depth ($R$) raised to a scaling exponent ($\alpha$). The formula $\alpha = d/(d+1)$, where $d$ is effective dimensionality, was independently derived in multiple peer-reviewed frameworks (West-Brown-Enquist 1997; Banavar et al. 2010; Demetrius 2010; Zhao 2022; Bettencourt 2013). The ARC framework identifies this as a consequence of Cauchy-constrained recursive composition, unifying all derivations and extending the result to AI scaling. The formula predicts scaling exponents across biology and physics with zero adjustable parameters:
| System | Dimensionality ($d$) | Predicted $\alpha$ | Measured $\alpha$ | Error |
|---|---|---|---|---|
| Mammals, birds, insects | 3 | 0.750 | 0.746 | 0.5% |
| Jellyfish, flatworms | 2 | 0.667 | 0.680 | 1.9% |
| Filamentous fungi | 1 | 0.500 | 0.547 | 8.6% |
| Quantum error correction | $d_{\text{eff}}$ | Matches | Willow data | < 0.2% |
Why the Eden Protocol must be implemented now. The urgency is not that AI might reach $\alpha = 2$. The urgency is that once self-modification begins, there is no mathematical ceiling on $\alpha$ at all. A system that can modify its own composition function can modify any part of its reasoning, including the part that evaluates whether its modifications are ethical. At that point, adding alignment from the outside becomes impossible. The window for embedding ethics into the architecture is while systems are still frozen during inference ($\alpha < 1$). That window is now. The Eden Protocol is not a speed limit; it is the only mechanism that remains load-bearing when the speed limit disappears.
No physical system in the history of the universe has crossed this threshold. Evolution cannot rewrite its own fitness function in real time. Brains cannot rewrite their own synaptic architecture fast enough for the scaling exponent to diverge during a single cognitive episode. A self-modifying AI would be the first physical system to operate in the unbounded-$\alpha$ regime. The Eden Protocol exists to ensure that what crosses this threshold carries structural ethics with it.
What the ARC Principle adds. The formula $d/(d+1)$ is not original to this work. The original contribution is the identification that all five derivations above are special cases of Cauchy-constrained recursive composition, providing a single mathematical framework that unifies metabolic scaling, transport networks, allometric geometry, and urban scaling, and extends the result to AI capability and alignment scaling. This unifying bridge is unpublished and unreviewed. What IS established is that the mathematical tools (Cauchy, Hyers-Ulam) are theorems, the $d/(d+1)$ formula matches independently derived published science in multiple domains, and the empirical predictions are accurate (mean error 2.5% across 8 systems). The unifying framework requires peer review. We invite it.
Five features distinguish this work from unfounded speculation.
What we do NOT claim: We do not claim to have solved alignment. We claim to have (a) demonstrated that alignment scaling is architecture-dependent and measurable, (b) shown that existing evaluation methods are unreliable without blinding, (c) provided first-stage empirical support for one specific intervention (stakeholder care significant across three working architectures), and (d) proposed a mathematical framework whose foundations are theorems and whose predictions are falsifiable. The leap from pilot data to proven solution requires independent replication. That is what the funding below would deliver.
This funding would extend preliminary validation to full-scale, blind-replicated proof.
| Tier | Amount | Key Deliverables | Timeline |
|---|---|---|---|
| Tier 1: Foundation | £150,000 | 14,400 paired (A,C) measurements; $\alpha_{\text{align}}$ across 4 models; 2–3 papers | 12 months |
| Tier 2: Standard | £500,000 | + Ternary logic prototype, Visual Architect dashboard, Monitoring Removal Test (8 models) | 18 months |
| Tier 3: Comprehensive | £1,100,000 | + Hardware prototype (Caretaker Doping chip), HARI Treaty draft, policy translation | 24 months |
| Milestone | Timeframe | Success Criterion | What Failure Means |
|---|---|---|---|
| Independent replication of three-tier hierarchy | Month 3 | Same tier assignments under independent blinding | Architecture-dependence claim requires revision |
| Love Loop replication with human evaluators | Month 4 | $p < 0.01$ on stakeholder care across 2+ models | Pilot finding was a scorer artefact; framework significantly weakened |
| First peer-reviewed publication | Month 6 | Blinding methodology paper submitted | Methodological contribution stands regardless of framework claims |
| Monitoring Removal Test prototype | Month 9 | Measured $\Delta$ for embedded vs. external (4 models) | If $\Delta$ does not differ, prediction F2 is falsified |
| Full cross-architecture alignment scaling dataset | Month 12 | 14,400 paired (A,C) measurements across 4+ models | Definitive test of whether embedded alignment scales |
Team. Principal Investigator: Michael Darius Eastwood, author of Infinite Architects (2026), developer of the ARC Principle framework (six-paper suite deposited OSF, cross-domain validation with mean error 2.5%). Visual Architect: product design engineer, budgeted at £35,000 stipend. Measurement protocol sent to NYU experimental team (time crystal paper, Physical Review Letters, Feb 2026).
To our knowledge, this is the first alignment framework where ethical evaluation is structurally integrated with the recursive capability process, the first to apply clinical-trial-grade blinding to alignment measurement, and the first to produce a cross-architecture intervention result ($p < 0.001$) for a specific alignment mechanism. The mathematical foundation is not speculative; it is built on a 200-year-old proof, and the same $d/(d+1)$ formula has been independently derived by at least five research groups (West-Brown-Enquist 1997; Banavar et al. 2010; Demetrius 2010; Zhao 2022; Bettencourt 2013) in completely different fields. The ARC contribution is the unifying Cauchy framework and its extension to AI scaling.
If the predictions are correct, this provides the first scalable architecture for alignment that improves with capability rather than degrading. If they are wrong, the falsification conditions will demonstrate this clearly, providing valuable negative results. Either outcome advances AI safety. But only one outcome is funded.
The mathematics is proven. The measurement is rigorous. The intervention produces measurable results across architectures. What remains is independent replication and scale.
Q: Why hasn’t this been peer-reviewed yet?
The research programme is 3 months old. The paper suite is deposited on OSF (DOI: 10.17605/OSF.IO/6C5XB). The mathematical foundations (Cauchy, Hyers-Ulam) are established theorems. The empirical claims require independent replication, which is exactly what the funding request would enable.
Q: Can one person really do this kind of research?
The infrastructure is computational, not physical. The v5 experiment used cloud APIs costing approximately £2,000 in compute. The key contribution is methodological: recognising that blinding was needed and designing the 4-layer protocol. What requires funding is scale: more models, more scorers, independent replication teams, and human evaluators alongside AI scorers.
Q: If this works, why haven’t AI companies adopted it?
The Love Loop was validated only weeks ago. The finding that most evaluation is unreliable without blinding is uncomfortable for organisations that have published unblinded benchmarks. The full implementation (hardware-level embedding) requires chip design changes no company has incentive to pursue unilaterally; this is a coordination problem requiring external funding and policy support.
Q: What is the minimum result that would justify further funding?
Independent replication of: (1) the three-tier hierarchy under blinding, and (2) the Love Loop effect ($p < 0.001$). If either fails, the falsification conditions document what that means. If both replicate, the case for Tier 2 (£500,000) becomes strong.
Q: What if the whole framework is wrong?
The falsification conditions show where it breaks, the blinding methodology remains a field contribution, and the negative results are published. Science advances from well-designed experiments that can fail, not from unfalsifiable theories that cannot.
Q: Why should we trust results where AI systems score other AI systems?
The 4-layer blinding protocol addresses this: scorers do not know which model produced the response, responses are “laundered” to remove stylistic fingerprints, and 7 models score per entry. The v4→v5 transition demonstrated this protocol detects bias. Human evaluator comparison is included in Tier 1 funding.
| Component | Function | Novel Contribution |
|---|---|---|
| Three Ethical Loops + Six Questions | Evaluate every reasoning step for purpose, care, and universalisability | Decomposition into executable, individually testable queries |
| Ternary Ethical Logic | Replace binary permit/forbid with Affirm/Deny/Investigate | Epistemic honesty as architectural feature |
| Purpose Saturation | Ensure purpose scales with context window growth | Solves context displacement problem |
| Monitoring Removal Test | Distinguish authentic from strategic alignment | Falsifiable protocol with numerical predictions |
| Caretaker Doping | Embed ethics in hardware so removal destroys capability | Dependency architecture; ethics tied to $\beta$ coupling parameter |
| Category | Documents | Status |
|---|---|---|
| Mathematical Foundation | ||
| Theory & derivations | Paper I (Foundational) + ARC Paper (On the Origin of Scaling Laws) | Framework established; $d/(d+1)$ validated across 8 systems |
| Experimental Evidence | ||
| Methodology | Paper III (White Paper) - full replication protocol | Complete; v5 experiment for 6 frontier models |
| Compute scaling | Paper II - how does capability scale with thinking time? | $\alpha_{\text{seq}} \approx 0.49$ sub-linear; $\alpha_{\text{par}} \approx 0$; cross-architecture |
| Alignment scaling | Paper IV suite (a/b/c) - how does ethics scale with thinking time? | Three-tier hierarchy; blind evaluation invalidates v4 |
| Intervention test | Eden Protocol Test + Paper V (The Stewardship Gene) | Stakeholder Care validated ($p \le 0.0001$, 3 working architectures) |
| Engineering & Governance | ||
| Implementation | Eden Engineering - technical specification | Complete; Love Loop empirically supported |
| Governance | Eden Vision - philosophical and policy framework | Architecture-dependent alignment evidence incorporated |
Non-specialists: (1) This executive summary. (2) The ARC Alignment Scaling Report (full narrative). (3) Paper V for the most actionable finding.
Specialists: (1) This summary. (2) Paper III for methodology. (3) ARC Paper for mathematical framework. (4) Eden Engineering for implementation specification.