Problem To Solve

It’s not always possible to tell what data a model was trained on, even if you have access to the model weights, because training models is a compression of training data. This introduces several challenges that do not exist in traditional software: • Complicates copyright • Harder to fairly compensate the owners of the data • Harder to know who trained each part of the model in settings where training is done by more than one party, e.g. Model Zoo • Easier to add biases in models Source: Mohamed Baioumy & Alex Cheema (AI x Crypto Primer)

Problem Solution

Make the training process itself verifiable. Build tools to break down how a model was trained and check if it contains a given piece of data. Several approaches can be explored: Integrate cryptographic primitives into the training process itself. For example, Pytorch NFT Callback hashes the current network weights, some metadata (data, accuracy, etc…) and your eth address every N epoch, which proves who did the model training. Note: This approach introduces a performance overhead to training models.

Inspiration

Source: Mohamed Baioumy & Alex Cheema (AI x Crypto Primer) Full credits go toward these two legends.

Verifiable Training

Idea Provided by

Harry Zhang

EcosystemAll

TrackAI

Creation TimeJun 18, 2024

Open to Team upNo

telegramharryzhang18