OpenArena

视频

Python

Web3

# OpenArena: The Truth Machine for AI

## The Problem: Benchmark Saturation

Static benchmarks (GSM8K, MMLU) are dead. Frontier models score 90%+ by

memorizing test sets but fail on novel problems. The industry cannot

distinguish a model that remembers from a model that reasons.

## The Solution: Dynamic Adversarial Evaluation

OpenArena is a decentralized Bittensor subnet where:

1. Validators pull fresh, contamination-free tasks from LiveBench

(a continuously updated, private-delayed benchmark — mathematically

impossible to memorize).

2. Miners solve tasks under a cryptographic Commit-Reveal scheme

(prevents front-running and answer copying).

3. Scoring uses the Generalization Score:

S = (Accuracy × Calibration) − Latency

Brier scoring penalizes hallucination and rewards calibrated confidence.

## The Unfair Advantage: KaggleIngest

Most subnets fail from cold-start — no skilled miners. We solve this via

KaggleIngest, bridging 15M+ Kaggle data scientists directly into Bittensor.

- !pip install openarena-kaggle — one-line onboarding

- Web2-clean leaderboard UI — no wallet required to compete

- Cold start solved: instant liquidity of intelligence

## Architecture

- Consensus: Bittensor (Yuma Consensus + Commit-Reveal)

- Entropy Source: LiveBench-2026-01-08 (private delayed questions)

- Scoring: Brier Score decomposition (accuracy + calibration)

- Frontend: Next.js with live generalization leaderboard

- Security: SHA-256 commit hashes prevent plagiarism

- Whitepaper: Formalized "Proof of Intelligence" game theory and

Generalization Score formula (S = Accuracy × Calibration − Latency).

- Commit-Reveal: Implemented cryptographic anti-plagiarism scheme

in openarena/utils/crypto.py.

- Validator Loop: Built LiveBench task dispatcher with epoch-based cadence.

- Miner Loop: Built LLM inference agent with commit → reveal flow.

- Simulation: demo.py proves honest miners win; copycat miners are slashed.

- Frontend: Next.js brutalist dashboard with live mock leaderboard and

Mermaid architecture diagram at openarena.kaggleingest.com.

- PROPOSAL.md: Full Ridges-template subnet design proposal in repo root.

Not funded. Bootstrapped for the ideathon. Seeking seed to audit consensus logic and launch incentivized testnet in Q3 2026.