SolveSpeech
Miners can solve all speech problems Who and what sad we need do transcripts in real live with noise/reverberation/overlapspeech(when users talked in one time)
Videos
Tech Stack
Python
Description
# SolveSpeech
## Problem
- Real-world audio contains noise, reverberation, overlap speech
- Current ASR benchmarks are clean & unrealistic
- Need decentralized real-world evaluation
### Real Dialogue Timestamps Example
```text
[00:00.00 - 00:02.10] Speaker A: "Hi, can you hear me?"
[00:02.15 - 00:03.40] Speaker B: "Yes, loud and clear."
[00:03.45 - 00:06.20] Speaker A: "Great, let's start."
[00:06.00 - 00:06.90] Speaker B: "Sure." (overlap)
```## Architecture: Full Subnet Flow
```text
+-----------+ +-------+ +------------+ +-----------+
| Validator | --> | Miner | --> | Evaluation | --> | Incentive |
+-----------+ +-------+ +------------+ +-----------+
```
## Architecture: Validator
```text
+--------+ +----------+ +-----+ +----------------+ +---------+ +---------------+
| Script | --> | Speakers | --> | TTS | --> | Noise / Reverb | --> | Overlap | --> | Mix & Publish |
+--------+ +----------+ +-----+ +----------------+ +---------+ +---------------+
```
## Architecture: Miner
```text
+---------+
| Input |
+----+----+
|
+----v----+
| Encoder |
+--+---+--+
| |
+-----------+ +-----------+
| |
+----v----+ +----v----+
| ASR | | Speaker |
+----+----+ +----+----+
| |
+-----------+ +-----------+
| |
+--v---v--+
| Overlap |
+----+-----+
|
+----v----+
| Merge |
+----+----+
|
+----v----+
| Output |
+---------+
```## Target Users
- Wearables (Glasses, Earbuds)
- Call Centers
- Meeting Rooms
- Classrooms
- Smart Home Devices
## Roadmap (Quarterly)
- Q2: Synthetic generation (2-10 speakers, single-room simulation)
- Q3: Launch inference chutes
- Q4: Collaborate with real users (enterprise & consumer)
Progress During Hackathon
from idea to pipelines
Fundraising Status
searching 500tao for open subnet