SolveSpeech

# SolveSpeech

## Problem

- Real-world audio contains noise, reverberation, overlap speech
- Current ASR benchmarks are clean & unrealistic
- Need decentralized real-world evaluation


### Real Dialogue Timestamps Example

```text
[00:00.00 - 00:02.10] Speaker A: "Hi, can you hear me?"
[00:02.15 - 00:03.40] Speaker B: "Yes, loud and clear."
[00:03.45 - 00:06.20] Speaker A: "Great, let's start."
[00:06.00 - 00:06.90] Speaker B: "Sure."   (overlap)
```## Architecture: Full Subnet Flow

```text
+-----------+     +-------+     +------------+     +-----------+
| Validator | --> | Miner | --> | Evaluation | --> | Incentive |
+-----------+     +-------+     +------------+     +-----------+
```

## Architecture: Validator

```text
+--------+     +----------+     +-----+     +----------------+     +---------+     +---------------+
| Script | --> | Speakers | --> | TTS | --> | Noise / Reverb | --> | Overlap | --> | Mix & Publish |
+--------+     +----------+     +-----+     +----------------+     +---------+     +---------------+
```

## Architecture: Miner

```text
                 +---------+
                 |  Input  |
                 +----+----+
                      |
                 +----v----+
                 | Encoder |
                 +--+---+--+
                    |   |
        +-----------+   +-----------+
        |                           |
   +----v----+                 +----v----+
   |   ASR   |                 | Speaker |
   +----+----+                 +----+----+
        |                           |
        +-----------+   +-----------+
                    |   |
                 +--v---v--+
                 | Overlap  |
                 +----+-----+
                      |
                 +----v----+
                 |  Merge  |
                 +----+----+
                      |
                 +----v----+
                 | Output  |
                 +---------+
```## Target Users

- Wearables (Glasses, Earbuds)
- Call Centers
- Meeting Rooms
- Classrooms
- Smart Home Devices

## Roadmap (Quarterly)

- Q2: Synthetic generation (2-10 speakers, single-room simulation)
- Q3: Launch inference chutes
- Q4: Collaborate with real users (enterprise & consumer)
SolveSpeech

Videos

Tech Stack

Description

Progress During Hackathon

Fundraising Status