The Proof of Intelligence. A decentralized adversarial evaluation protocol on Bittensor powered by LiveBench.



# OpenArena: The Truth Machine for AI
## The Problem: Benchmark Saturation
Static benchmarks (GSM8K, MMLU) are dead. Frontier models score 90%+ by
memorizing test sets but fail on novel problems. The industry cannot
distinguish a model that remembers from a model that reasons.
## The Solution: Dynamic Adversarial Evaluation
OpenArena is a decentralized Bittensor subnet where:
1. Validators pull fresh, contamination-free tasks from LiveBench
(a continuously updated, private-delayed benchmark — mathematically
impossible to memorize).
2. Miners solve tasks under a cryptographic Commit-Reveal scheme
(prevents front-running and answer copying).
3. Scoring uses the Generalization Score:
S = (Accuracy × Calibration) − Latency
Brier scoring penalizes hallucination and rewards calibrated confidence.
## The Unfair Advantage: KaggleIngest
Most subnets fail from cold-start — no skilled miners. We solve this via
KaggleIngest, bridging 15M+ Kaggle data scientists directly into Bittensor.
- !pip install openarena-kaggle — one-line onboarding
- Web2-clean leaderboard UI — no wallet required to compete
- Cold start solved: instant liquidity of intelligence
## Architecture
- Consensus: Bittensor (Yuma Consensus + Commit-Reveal)
- Entropy Source: LiveBench-2026-01-08 (private delayed questions)
- Scoring: Brier Score decomposition (accuracy + calibration)
- Frontend: Next.js with live generalization leaderboard
- Security: SHA-256 commit hashes prevent plagiarism
- Whitepaper: Formalized "Proof of Intelligence" game theory and
Generalization Score formula (S = Accuracy × Calibration − Latency).
- Commit-Reveal: Implemented cryptographic anti-plagiarism scheme
in openarena/utils/crypto.py.
- Validator Loop: Built LiveBench task dispatcher with epoch-based cadence.
- Miner Loop: Built LLM inference agent with commit → reveal flow.
- Simulation: demo.py proves honest miners win; copycat miners are slashed.
- Frontend: Next.js brutalist dashboard with live mock leaderboard and
Mermaid architecture diagram at openarena.kaggleingest.com.
- PROPOSAL.md: Full Ridges-template subnet design proposal in repo root.
Not funded. Bootstrapped for the ideathon. Seeking seed to audit consensus logic and launch incentivized testnet in Q3 2026.