hackquest logo

OpenArena

The Proof of Intelligence. A decentralized adversarial evaluation protocol on Bittensor powered by LiveBench.

视频

项目图片 1
项目图片 2
项目图片 3

技术栈

Python
AI
Web3
Next

描述

# OpenArena: The Truth Machine for AI

## The Problem: Benchmark Saturation

Static benchmarks (GSM8K, MMLU) are dead. Frontier models score 90%+ by

memorizing test sets but fail on novel problems. The industry cannot

distinguish a model that remembers from a model that reasons.

## The Solution: Dynamic Adversarial Evaluation

OpenArena is a decentralized Bittensor subnet where:

1. Validators pull fresh, contamination-free tasks from LiveBench

(a continuously updated, private-delayed benchmark — mathematically

impossible to memorize).

2. Miners solve tasks under a cryptographic Commit-Reveal scheme

(prevents front-running and answer copying).

3. Scoring uses the Generalization Score:

S = (Accuracy × Calibration) − Latency

Brier scoring penalizes hallucination and rewards calibrated confidence.

## The Unfair Advantage: KaggleIngest

Most subnets fail from cold-start — no skilled miners. We solve this via

KaggleIngest, bridging 15M+ Kaggle data scientists directly into Bittensor.

- !pip install openarena-kaggle — one-line onboarding

- Web2-clean leaderboard UI — no wallet required to compete

- Cold start solved: instant liquidity of intelligence

## Architecture

- Consensus: Bittensor (Yuma Consensus + Commit-Reveal)

- Entropy Source: LiveBench-2026-01-08 (private delayed questions)

- Scoring: Brier Score decomposition (accuracy + calibration)

- Frontend: Next.js with live generalization leaderboard

- Security: SHA-256 commit hashes prevent plagiarism

本次黑客松进展

- Whitepaper: Formalized "Proof of Intelligence" game theory and

Generalization Score formula (S = Accuracy × Calibration − Latency).

- Commit-Reveal: Implemented cryptographic anti-plagiarism scheme

in openarena/utils/crypto.py.

- Validator Loop: Built LiveBench task dispatcher with epoch-based cadence.

- Miner Loop: Built LLM inference agent with commit → reveal flow.

- Simulation: demo.py proves honest miners win; copycat miners are slashed.

- Frontend: Next.js brutalist dashboard with live mock leaderboard and

Mermaid architecture diagram at openarena.kaggleingest.com.

- PROPOSAL.md: Full Ridges-template subnet design proposal in repo root.

融资状态

Not funded. Bootstrapped for the ideathon. Seeking seed to audit consensus logic and launch incentivized testnet in Q3 2026.

队长
AAnand Vashishtha
项目链接
赛道
AIInfra