TrustTrace

The Problem

AI companies face a growing crisis: they don't know where their training data comes from. Recent lawsuits from NYT, artists, and content creators highlight a critical gap in the AI supply chain. Companies cannot answer basic questions:

"Does our dataset contain copyrighted content?"
"Where did this data originally come from?"
"Are we compliant with EU AI Act requirements?"

This creates massive legal liability ($200M+ in recent lawsuits) and blocks enterprise adoption of AI technology.

The Solution

TrustTrace creates an immutable provenance layer for AI training data through a four-step process:

1. Fingerprint

Text content is converted to unique signatures using MinHash and sentence-transformers, creating cryptographic fingerprints that are robust to paraphrasing and minor edits.

2. Trace

CrewAI-powered agents compare query fingerprints against a database of 102+ known sources (NYT, Wikipedia, Reddit), using Jaccard similarity to detect content origins.

3. Assess

System returns similarity scores, license types (COPYRIGHT, CC-BY-SA, NONE), and risk levels (LOW/MEDIUM/HIGH/CRITICAL) to help companies understand legal exposure.

4. Record

All lineage findings are immutably stored on Mantle L2 blockchain at contract 0xefA667dB730A3aFbaE3Dbbe71bdf2268F5A627E1, creating an auditable trail for compliance and dispute resolution.

Architecture

┌─────────────────────────────────────────────────────────┐
│              FRONTEND (Next.js + TypeScript)            │
│                   Query & Lineage Viewer                 │
└────────────────────────┬────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────┐
│                BACKEND (FastAPI + Python)               │
│  ┌─────────────────────────────────────────────────────┐│
│  │ CrewAI Orchestrator                                 ││
│  │ → Tracer Agent (similarity search)                  ││
│  │ → Registry Agent (blockchain writes)                ││
│  └─────────────────────────────────────────────────────┘│
└─────────────┬──────────────────┬────────────────────────┘
              │                  │
              ▼                  ▼
    ┌─────────────────┐  ┌─────────────────┐
    │ SQLite DB       │  │ Mantle L2       │
    │ 102 fingerprints│  │ (Sepolia)       │
    └─────────────────┘  └─────────────────┘

Tech Stack

Layer	Technology	Purpose
Fingerprinting	MinHash, sentence-transformers	Content similarity detection
Agents	CrewAI	Orchestration & automation
Backend	FastAPI	REST API server
Database	SQLite	Fingerprint storage
Blockchain	Mantle L2, Web3	On-chain provenance
Frontend	Next.js, Tailwind CSS	User interface

Demo Experience

Query: User pastes text into the web interface
Analysis: CrewAI agents fingerprint and compare against known sources
Results: System detects matches (e.g., 87% similarity to NYT article with HIGH copyright risk)
Verification: Full lineage tree displayed with on-chain proof link to Mantle Explorer

Sample Query

Input:

"The New York Times reported today on the ongoing developments in the technology sector, highlighting key innovations and market trends."

Output:

{
  "matches": [
    {
      "source": "nyt-article-00042",
      "similarity": 0.91,
      "license": "COPYRIGHT",
      "risk": "HIGH"
    }
  ],
  "risk_assessment": "HIGH",
  "on_chain_proof": "0xcb3d0be2..."
}

Sample Lineage Data Hash

Input: 1dc950094c6b6b36e7b93e5527ee5bf7c19e66d98d96e9cdac8d045a811be40f

Business Model

Pay-per-query API for enterprises training AI models:

Pre-deployment compliance: Check datasets before training
Continuous monitoring: Scan data pipelines for copyright risks
Audit support: Generate lineage reports for regulators and legal teams

Why Mantle L2

Low gas fees: Cost-effective on-chain recording for high-volume data pipelines
High throughput: Handles thousands of lineage records per second
Modular architecture: Scalable from testnet to mainnet production deployments
EVM compatibility: Seamless integration with existing Web3 tooling

Deployment Status

Contract: ProvenanceRegistry.sol deployed on Mantle Sepolia
Contract Address: 0xefA667dB730A3aFbaE3Dbbe71bdf2268F5A627E1
Explorer: https://sepolia.mantlescan.xyz/address/0xefA667dB730A3aFbaE3Dbbe71bdf2268F5A627E1
Test Data: 102 pre-seeded fingerprints from NYT, Wikipedia, Reddit
Status: ✅ Fully functional MVP
Source: https://github.com/kenjihikmatullah/TrustTrace
Web: https://trusttrace.hikmatullah.com/

Impact

TrustTrace enables the responsible AI ecosystem by:

Reducing legal risk: Identify copyright issues before deployment
Ensuring compliance: Meet EU AI Act data documentation requirements
Building trust: Provide transparency for AI model consumers
Enabling licensing: Fair attribution and compensation for content creators