TrustTrace

The Problem

AI companies face a growing crisis: they don't know where their training data comes from. Recent lawsuits from NYT, artists, and content creators highlight a critical gap in the AI supply chain. Companies cannot answer basic questions:

"Does our dataset contain copyrighted content?"
"Where did this data originally come from?"
"Are we compliant with EU AI Act requirements?"

This creates massive legal liability ($200M+ in recent lawsuits) and blocks enterprise adoption of AI technology.

The Solution

TrustTrace creates an immutable provenance layer for AI training data through a four-step process:

1. Fingerprint

Text content is converted to unique signatures using MinHash and sentence-transformers, creating cryptographic fingerprints that are robust to paraphrasing and minor edits.

2. Trace

CrewAI-powered agents compare query fingerprints against a database of 102+ known sources (NYT, Wikipedia, Reddit), using Jaccard similarity to detect content origins.

3. Assess

System returns similarity scores, license types (COPYRIGHT, CC-BY-SA, NONE), and risk levels (LOW/MEDIUM/HIGH/CRITICAL) to help companies understand legal exposure.

4. Record

All lineage findings are immutably stored on Mantle L2 blockchain at contract 0xefA667dB730A3aFbaE3Dbbe71bdf2268F5A627E1, creating an auditable trail for compliance and dispute resolution.

Architecture

┌─────────────────────────────────────────────────────────┐
│              FRONTEND (Next.js + TypeScript)            │
│                   Query & Lineage Viewer                 │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│                BACKEND (FastAPI + Python)               │
│  ┌─────────────────────────────────────────────────────┐│
│  │ CrewAI Orchestrator                                 ││
│  │ → Tracer Agent (similarity search)                  ││
│  │ → Registry Agent (blockchain writes)                ││
│  └─────────────────────────────────────────────────────┘│
└─────────────┬──────────────────┬────────────────────────┘
              │                  │
              ▼                  ▼
    ┌─────────────────┐  ┌─────────────────┐
    │ SQLite DB       │  │ Mantle L2       │
    │ 102 fingerprints│  │ (Sepolia)       │
    └─────────────────┘  └─────────────────┘

Tech Stack

Layer	Technology	Purpose
Fingerprinting	MinHash, sentence-transformers	Content similarity detection
Agents	CrewAI	Orchestration & automation
Backend	FastAPI	REST API server
Database	SQLite	Fingerprint storage
Blockchain	Mantle L2, Web3	On-chain provenance
Frontend	Next.js, Tailwind CSS	User interface

Demo Experience

Query: User pastes text into the web interface
Analysis: CrewAI agents fingerprint and compare against known sources
Results: System detects matches (e.g., 87% similarity to NYT article with HIGH copyright risk)
Verification: Full lineage tree displayed with on-chain proof link to Mantle Explorer

Sample Query

Input:

"The New York Times reported today on the ongoing developments in the technology sector, highlighting key innovations and market trends."

Output:

{
  "matches": [
    {
      "source": "nyt-article-00042",
      "similarity": 0.91,
      "license": "COPYRIGHT",
      "risk": "HIGH"
    }
  ],
  "risk_assessment": "HIGH",
  "on_chain_proof": "0xcb3d0be2..."
}

Sample Lineage Data Hash

Input: 1dc950094c6b6b36e7b93e5527ee5bf7c19e66d98d96e9cdac8d045a811be40f

Business Model

Pay-per-query API for enterprises training AI models:

Pre-deployment compliance: Check datasets before training
Continuous monitoring: Scan data pipelines for copyright risks
Audit support: Generate lineage reports for regulators and legal teams

Why Mantle L2

Low gas fees: Cost-effective on-chain recording for high-volume data pipelines
High throughput: Handles thousands of lineage records per second
Modular architecture: Scalable from testnet to mainnet production deployments
EVM compatibility: Seamless integration with existing Web3 tooling

Deployment Status

Contract: ProvenanceRegistry.sol deployed on Mantle Sepolia
Contract Address: 0xefA667dB730A3aFbaE3Dbbe71bdf2268F5A627E1
Explorer: https://sepolia.mantlescan.xyz/address/0xefA667dB730A3aFbaE3Dbbe71bdf2268F5A627E1
Test Data: 102 pre-seeded fingerprints from NYT, Wikipedia, Reddit
Status: ✅ Fully functional MVP
Source: https://github.com/kenjihikmatullah/TrustTrace
Web: https://trusttrace.hikmatullah.com/

Impact

TrustTrace enables the responsible AI ecosystem by:

Reducing legal risk: Identify copyright issues before deployment
Ensuring compliance: Meet EU AI Act data documentation requirements
Building trust: Provide transparency for AI model consumers
Enabling licensing: Fair attribution and compensation for content creators

TrustTrace

Videos

Tech Stack

Description

The Problem

The Solution

1. Fingerprint

2. Trace

3. Assess

4. Record

Architecture

Tech Stack

Demo Experience

Sample Query

Sample Lineage Data Hash

Business Model

Why Mantle L2

Deployment Status

Impact

Progress During Hackathon

Fundraising Status