Modalities got latent

Videos

Descripción

This approach not only makes training the thing easier, but also ensures faster evaluation.

Upon further improvement this model can be of much use in edge AI applications like, robotics. Upon arranging the proper hardware, and a more diverse dataset, this approach can certainly qualify to come in the big league to Vision-Language-Action models!

The Visual Question Answering model that we built using the approach is a mere demonstration of Latent Processing's capabilities...

Team IkAI members: Srijito Ghosh:- GitHub: https://www.github.com/Srijito354 Muskan Kumari:- GitHub: https://www.github.com/Muskan040399

In this project we tried building a Visual Question Answering (VQA) web-app using a CLIP model (built entirely from scratch), trained using the same original to latent space compression technique, as mentioned before. It was trained on the EasyVQA dataset (GH link: https://github.com/vzhou842/easy-VQA.git).

Libraries and Frameworks used: Pytorch Streamlit

Progreso del hackathon

Discovered processing using latent spaces for smaller and faster processing at a similar scale to larger models.

Pila tecnológica

Python

Pytorch

Streamlit

AI

GenAI

Computer vision

C++

Web2

Estado de recaudación de fondos