hackquest logo

TrustTrace

TrustTrace is a provenance tracking system that helps AI companies identify the origins of their training data. By fingerprinting content, detecting similarities with known sources, and recording line

视频

技术栈

Next
Web3
Python
Solidity
AI
Provenance

描述

The Problem

AI companies face a growing crisis: they don't know where their training data comes from. Recent lawsuits from NYT, artists, and content creators highlight a critical gap in the AI supply chain. Companies cannot answer basic questions:

  • "Does our dataset contain copyrighted content?"

  • "Where did this data originally come from?"

  • "Are we compliant with EU AI Act requirements?"

This creates massive legal liability ($200M+ in recent lawsuits) and blocks enterprise adoption of AI technology.

The Solution

TrustTrace creates an immutable provenance layer for AI training data through a four-step process:

1. Fingerprint

Text content is converted to unique signatures using MinHash and sentence-transformers, creating cryptographic fingerprints that are robust to paraphrasing and minor edits.

2. Trace

CrewAI-powered agents compare query fingerprints against a database of 102+ known sources (NYT, Wikipedia, Reddit), using Jaccard similarity to detect content origins.

3. Assess

System returns similarity scores, license types (COPYRIGHT, CC-BY-SA, NONE), and risk levels (LOW/MEDIUM/HIGH/CRITICAL) to help companies understand legal exposure.

4. Record

All lineage findings are immutably stored on Mantle L2 blockchain at contract 0xefA667dB730A3aFbaE3Dbbe71bdf2268F5A627E1, creating an auditable trail for compliance and dispute resolution.

Architecture

┌─────────────────────────────────────────────────────────┐
│              FRONTEND (Next.js + TypeScript)            │
│                   Query & Lineage Viewer                 │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│                BACKEND (FastAPI + Python)               │
│  ┌─────────────────────────────────────────────────────┐│
│  │ CrewAI Orchestrator                                 ││
│  │ → Tracer Agent (similarity search)                  ││
│  │ → Registry Agent (blockchain writes)                ││
│  └─────────────────────────────────────────────────────┘│
└─────────────┬──────────────────┬────────────────────────┘
              │                  │
              ▼                  ▼
    ┌─────────────────┐  ┌─────────────────┐
    │ SQLite DB       │  │ Mantle L2       │
    │ 102 fingerprints│  │ (Sepolia)       │
    └─────────────────┘  └─────────────────┘

Tech Stack

Layer

Technology

Purpose

Fingerprinting

MinHash, sentence-transformers

Content similarity detection

Agents

CrewAI

Orchestration & automation

Backend

FastAPI

REST API server

Database

SQLite

Fingerprint storage

Blockchain

Mantle L2, Web3

On-chain provenance

Frontend

Next.js, Tailwind CSS

User interface

Demo Experience

  1. Query: User pastes text into the web interface

  2. Analysis: CrewAI agents fingerprint and compare against known sources

  3. Results: System detects matches (e.g., 87% similarity to NYT article with HIGH copyright risk)

  4. Verification: Full lineage tree displayed with on-chain proof link to Mantle Explorer

Sample Query

Input:

"The New York Times reported today on the ongoing developments in the technology sector, highlighting key innovations and market trends."

Output:

{
  "matches": [
    {
      "source": "nyt-article-00042",
      "similarity": 0.91,
      "license": "COPYRIGHT",
      "risk": "HIGH"
    }
  ],
  "risk_assessment": "HIGH",
  "on_chain_proof": "0xcb3d0be2..."
}

Sample Lineage Data Hash

Input: 1dc950094c6b6b36e7b93e5527ee5bf7c19e66d98d96e9cdac8d045a811be40f

Business Model

Pay-per-query API for enterprises training AI models:

  • Pre-deployment compliance: Check datasets before training

  • Continuous monitoring: Scan data pipelines for copyright risks

  • Audit support: Generate lineage reports for regulators and legal teams

Why Mantle L2

  • Low gas fees: Cost-effective on-chain recording for high-volume data pipelines

  • High throughput: Handles thousands of lineage records per second

  • Modular architecture: Scalable from testnet to mainnet production deployments

  • EVM compatibility: Seamless integration with existing Web3 tooling

Deployment Status

Impact

TrustTrace enables the responsible AI ecosystem by:

  • Reducing legal risk: Identify copyright issues before deployment

  • Ensuring compliance: Meet EU AI Act data documentation requirements

  • Building trust: Provide transparency for AI model consumers

  • Enabling licensing: Fair attribution and compensation for content creators

本次黑客松进展

MVP Fully Functional

融资状态

-
队长
KKenji Hikmatullah
项目链接
赛道
AINFT