hackquest logo

Modalities got latent

We present to you a novel approach to solving multi-modal problems using what we call, "Latent processing": This approach was heavily inspired by the Multi-head Latent Attention paper.

비디오

설명

This approach not only makes training the thing easier, but also ensures faster evaluation.

Upon further improvement this model can be of much use in edge AI applications like, robotics. Upon arranging the proper hardware, and a more diverse dataset, this approach can certainly qualify to come in the big league to Vision-Language-Action models!

The Visual Question Answering model that we built using the approach is a mere demonstration of Latent Processing's capabilities...

Team IkAI members: Srijito Ghosh:- GitHub: https://www.github.com/Srijito354 Muskan Kumari:- GitHub: https://www.github.com/Muskan040399

In this project we tried building a Visual Question Answering (VQA) web-app using a CLIP model (built entirely from scratch), trained using the same original to latent space compression technique, as mentioned before. It was trained on the EasyVQA dataset (GH link: https://github.com/vzhou842/easy-VQA.git).

Libraries and Frameworks used: Pytorch Streamlit

해커톤 진행 상황

Discovered processing using latent spaces for smaller and faster processing at a similar scale to larger models.

기술 스택

Python
Pytorch
Streamlit
AI
GenAI
Computer vision
C++
Web2

자금 모금 상태

NA

팀 리더SSrijito Ghosh
부문
AIOther