Modalities got latent

비디오

설명

This approach not only makes training the thing easier, but also ensures faster evaluation.

Upon further improvement this model can be of much use in edge AI applications like, robotics. Upon arranging the proper hardware, and a more diverse dataset, this approach can certainly qualify to come in the big league to Vision-Language-Action models!

The Visual Question Answering model that we built using the approach is a mere demonstration of Latent Processing's capabilities...

Team IkAI members: Srijito Ghosh:- GitHub: https://www.github.com/Srijito354 Muskan Kumari:- GitHub: https://www.github.com/Muskan040399

In this project we tried building a Visual Question Answering (VQA) web-app using a CLIP model (built entirely from scratch), trained using the same original to latent space compression technique, as mentioned before. It was trained on the EasyVQA dataset (GH link: https://github.com/vzhou842/easy-VQA.git).

Libraries and Frameworks used: Pytorch Streamlit

해커톤 진행 상황

Discovered processing using latent spaces for smaller and faster processing at a similar scale to larger models.

기술 스택

Python

Pytorch

Streamlit

AI

GenAI

Computer vision

C++

Web2

자금 모금 상태

NA

Modalities got latent

비디오

설명

해커톤 진행 상황

기술 스택

자금 모금 상태

Github 링크