hackquest logo

AutoCon

AutoCon, an AI-powered Confidence Analysis Tool, evaluates facial expressions, vocal tone, and speech sentiment giving users personalized insights and actionable tips for impactful communication.

Videos

Descripción

🔍 Project Title: AutoCon – AI-Driven Confidence Analysis for Impactful Communication

💡 Problem Statement

In a world where communication defines success—be it job interviews, public speaking, online education, or leadership—people often struggle with confidence, clarity, and emotional impact. Traditional soft-skill training methods are subjective, non-scalable, and lack real-time personalized feedback. There's a growing need for a data-driven, AI-powered solution that quantifies and improves communication confidence.


🚀 Solution Overview

AutoCon is an intelligent confidence analysis system that utilizes advanced AI to evaluate and enhance a user's communication skills. It provides deep insights into facial expressions, vocal features, and emotional tone, and delivers real-time, personalized feedback with recommendations powered by large language models.

AutoCon bridges computer vision, NLP, and audio signal processing to offer a holistic view of a user’s communication impact—all while ensuring scalability, performance, and usability.


🧠 Core Features & Workflow

  1. Facial Emotion Detection

    • Uses MTCNN for accurate face detection in video streams.

    • Extracts cropped faces frame-by-frame.

    • A pre-trained CNN model built on top of the FER-2013 dataset detects micro-expressions across 7 emotions (Happy, Sad, Angry, etc.).

    • Outputs emotion timelines to gauge emotional variance and presence.

  2. Speech Sentiment & Emotion Analysis

    • Extracts audio using FFmpeg.

    • Processes audio with Deepgram API to generate real-time transcripts.

    • Applies VADER sentiment analysis to the transcript for textual sentiment classification (positive, neutral, negative).

    • Correlates audio tone with spoken content to detect emotional dissonance.

  3. Audio Feature Analysis with Librosa

    • Extracts pitch, energy, speech rate, and spectral features.

    • Analyzes clarity, fluency, and vocal modulation.

    • Detects filler words, stammering, or low-energy tones to give vocal feedback.

  4. Posture & Gesture Recognition

    • Integration-ready with MoveNet Thunder to assess body language.

    • Posture symmetry and hand gesture energy contribute to the engagement score.

  5. Insight & Recommendation Engine

    • All scores are fed into a Gemini-powered LLM that generates:

      • Personalized feedback.

      • Growth suggestions.

      • Weekly improvement plans based on emotion trends and vocal clarity.

  6. Scalable Backend & User Tracking

    • Built on MongoDB Atlas to handle analysis results and user metadata at scale.

    • Tracks user progress, computes monthly averages, and offers performance dashboards.


🛠 Tech Stack

Domain

Technologies Used

Frontend

HTML, TailwindCSS, JS

Backend

Python, Flask

AI/ML Models

Pre-trained CNN (FER), VADER, Librosa, Gemini

Computer Vision

MTCNN, OpenCV, MoveNet

Audio

Librosa, FFmpeg

Transcription

Deepgram

Database

MongoDB Atlas

Packaging

PyInstaller, Electron (for desktop app)

Deployment

Render / Heroku / Local for demo


📊 Impact

  • Democratizes soft-skill improvement using AI.

  • Offers objective, quantifiable metrics to replace subjective feedback.

  • Highly relevant for students, professionals, educators, and coaches.

  • Can be deployed in corporate trainings, ed-tech, job platforms, and therapy tools.


📈 Scalability & Innovation

  • Modular design with plug-and-play analysis pipelines.

  • Horizontally scalable with MongoDB Atlas and async backend support.

  • Real-time emotion fusion across modalities (face, voice, speech).

  • Pioneers holistic, AI-driven communication scoring in a lightweight, portable desktop package.


🏆 Why AutoCon Stands Out from Others?

  • Tackles a deeply human problem of public speaking with all round analysis and extensive AI insights with regard to every aspects.

  • Combines CV, NLP, audio processing, and LLMs—all in one cohesive system.

  • Designed with future-readiness and extensibility in mind (mobile, real-time, cloud).

  • Built to not only detect—but empower users with tools to grow.

Progreso del hackathon

🚀 Progress During Hackathon 1. Ideation & Planning a. Brainstormed a unique idea focused on analysing human confidence through multimodal inputs (facial emotion, voice tone, posture, etc.). b. Defined key problem statements and how our solution addresses them. 2. Tech Stack Finalization a. Selected powerful tools and libraries like FER, MTCNN, VADER, Librosa, and Deepgram. b. Chose Python and Flask for the backend, MongoDB Atlas for scalable data storage. 3. Facial Emotion Detection Module a. Implemented MTCNN for face detection. b. Integrated a pre-trained MTCNN model trained for FER to detect seven core emotions. 4. Speech Sentiment Analysis a. Used FFmpeg to extract audio from video input. b. Applied VADER sentiment analysis on transcripts generated via Deepgram API. 5. Audio Feature Extraction a. Leveraged Librosa to extract audio features like pitch, clarity, tempo, etc., for scoring speech confidence. 6. Posture & Gesture Analysis a. Integrated MoveNet for pose estimation to track posture and gestures. b. Evaluated posture consistency during speech delivery. 7. Scoring & Confidence Evaluation a. Designed a scoring system combining multiple factors: emotion, sentiment, clarity, posture, and eye contact. b. Generated feedback based on computed metrics. 8. AI-Powered Recommendations a. Used Google Gemini API to provide personalized feedback and actionable tips for improving presentation skills. 9. Frontend & Dashboard a. Developed an intuitive frontend using HTML, TailwindCSS, and JS. b. Built a real-time dashboard to visualize metrics and track progress. 10. User Management & Authentication System a. Implemented secure user registration and login system. b. Added Authentication and Authorisation. 11. Data Visualization a. Created visual charts to show month-wise confidence growth. b. Summarized insights using MongoDB aggregations and charted using Chart.js. 12. Desktop App Packaging a. May Package the web app using PyInstaller and Electron for a cross-platform desktop app experience. 13. Testing & Debugging a. Thoroughly tested all modules individually and in integration. b. Performed real-time tests with sample users to refine performance. 14. Deployment a. Can be deployed on Render/Heroku for live demo purposes. b. Ensured smooth operation in local environment for offline presentations. 15. Final Touches & Documentation a. Wrote a detailed README with setup instructions and tech insights. b. Prepared demo videos and presentation slides for submission.

Pila tecnológica

Python
FER Model with MTCNN
Deepgram
Gemini
MongoDB
Librosa, FFMPEG and VADER Sentiment
Movenet Thunder Model for Gesture
ReportLab
Líder del equipoSSamannay Saha
Código abierto
Sector
AI