AutoCon

AutoCon, an AI-powered Confidence Analysis Tool, evaluates facial expressions, vocal tone, and speech sentiment giving users personalized insights and actionable tips for impactful communication.

视频

描述

🔍 Project Title: AutoCon – AI-Driven Confidence Analysis for Impactful Communication

💡 Problem Statement

In a world where communication defines success—be it job interviews, public speaking, online education, or leadership—people often struggle with confidence, clarity, and emotional impact. Traditional soft-skill training methods are subjective, non-scalable, and lack real-time personalized feedback. There's a growing need for a data-driven, AI-powered solution that quantifies and improves communication confidence.

🚀 Solution Overview

AutoCon is an intelligent confidence analysis system that utilizes advanced AI to evaluate and enhance a user's communication skills. It provides deep insights into facial expressions, vocal features, and emotional tone, and delivers real-time, personalized feedback with recommendations powered by large language models.

AutoCon bridges computer vision, NLP, and audio signal processing to offer a holistic view of a user’s communication impact—all while ensuring scalability, performance, and usability.

🧠 Core Features & Workflow

Facial Emotion Detection
- Uses MTCNN for accurate face detection in video streams.
- Extracts cropped faces frame-by-frame.
- A pre-trained CNN model built on top of the FER-2013 dataset detects micro-expressions across 7 emotions (Happy, Sad, Angry, etc.).
- Outputs emotion timelines to gauge emotional variance and presence.
Speech Sentiment & Emotion Analysis
- Extracts audio using FFmpeg.
- Processes audio with Deepgram API to generate real-time transcripts.
- Applies VADER sentiment analysis to the transcript for textual sentiment classification (positive, neutral, negative).
- Correlates audio tone with spoken content to detect emotional dissonance.
Audio Feature Analysis with Librosa
- Extracts pitch, energy, speech rate, and spectral features.
- Analyzes clarity, fluency, and vocal modulation.
- Detects filler words, stammering, or low-energy tones to give vocal feedback.
Posture & Gesture Recognition
- Integration-ready with MoveNet Thunder to assess body language.
- Posture symmetry and hand gesture energy contribute to the engagement score.
Insight & Recommendation Engine
- All scores are fed into a Gemini-powered LLM that generates:
  - Personalized feedback.
  - Growth suggestions.
  - Weekly improvement plans based on emotion trends and vocal clarity.
Scalable Backend & User Tracking
- Built on MongoDB Atlas to handle analysis results and user metadata at scale.
- Tracks user progress, computes monthly averages, and offers performance dashboards.

🛠 Tech Stack

Domain	Technologies Used
Frontend	HTML, TailwindCSS, JS
Backend	Python, Flask
AI/ML Models	Pre-trained CNN (FER), VADER, Librosa, Gemini
Computer Vision	MTCNN, OpenCV, MoveNet
Audio	Librosa, FFmpeg
Transcription	Deepgram
Database	MongoDB Atlas
Packaging	PyInstaller, Electron (for desktop app)
Deployment	Render / Heroku / Local for demo

📊 Impact

Democratizes soft-skill improvement using AI.
Offers objective, quantifiable metrics to replace subjective feedback.
Highly relevant for students, professionals, educators, and coaches.
Can be deployed in corporate trainings, ed-tech, job platforms, and therapy tools.

📈 Scalability & Innovation

Modular design with plug-and-play analysis pipelines.
Horizontally scalable with MongoDB Atlas and async backend support.
Real-time emotion fusion across modalities (face, voice, speech).
Pioneers holistic, AI-driven communication scoring in a lightweight, portable desktop package.

🏆 Why AutoCon Stands Out from Others?

Tackles a deeply human problem of public speaking with all round analysis and extensive AI insights with regard to every aspects.
Combines CV, NLP, audio processing, and LLMs—all in one cohesive system.
Designed with future-readiness and extensibility in mind (mobile, real-time, cloud).
Built to not only detect—but empower users with tools to grow.

本次黑客松进展

🚀 Progress During Hackathon 1. Ideation & Planning a. Brainstormed a unique idea focused on analysing human confidence through multimodal inputs (facial emotion, voice tone, posture, etc.). b. Defined key problem statements and how our solution addresses them. 2. Tech Stack Finalization a. Selected powerful tools and libraries like FER, MTCNN, VADER, Librosa, and Deepgram. b. Chose Python and Flask for the backend, MongoDB Atlas for scalable data storage. 3. Facial Emotion Detection Module a. Implemented MTCNN for face detection. b. Integrated a pre-trained MTCNN model trained for FER to detect seven core emotions. 4. Speech Sentiment Analysis a. Used FFmpeg to extract audio from video input. b. Applied VADER sentiment analysis on transcripts generated via Deepgram API. 5. Audio Feature Extraction a. Leveraged Librosa to extract audio features like pitch, clarity, tempo, etc., for scoring speech confidence. 6. Posture & Gesture Analysis a. Integrated MoveNet for pose estimation to track posture and gestures. b. Evaluated posture consistency during speech delivery. 7. Scoring & Confidence Evaluation a. Designed a scoring system combining multiple factors: emotion, sentiment, clarity, posture, and eye contact. b. Generated feedback based on computed metrics. 8. AI-Powered Recommendations a. Used Google Gemini API to provide personalized feedback and actionable tips for improving presentation skills. 9. Frontend & Dashboard a. Developed an intuitive frontend using HTML, TailwindCSS, and JS. b. Built a real-time dashboard to visualize metrics and track progress. 10. User Management & Authentication System a. Implemented secure user registration and login system. b. Added Authentication and Authorisation. 11. Data Visualization a. Created visual charts to show month-wise confidence growth. b. Summarized insights using MongoDB aggregations and charted using Chart.js. 12. Desktop App Packaging a. May Package the web app using PyInstaller and Electron for a cross-platform desktop app experience. 13. Testing & Debugging a. Thoroughly tested all modules individually and in integration. b. Performed real-time tests with sample users to refine performance. 14. Deployment a. Can be deployed on Render/Heroku for live demo purposes. b. Ensured smooth operation in local environment for offline presentations. 15. Final Touches & Documentation a. Wrote a detailed README with setup instructions and tech insights. b. Prepared demo videos and presentation slides for submission.

技术栈

Python

FER Model with MTCNN

Deepgram

Gemini

MongoDB

Librosa, FFMPEG and VADER Sentiment

Movenet Thunder Model for Gesture

ReportLab

队长

SSamannay Saha

Github

Github 关联

https://github.com/Samz-alpha-02/AutoCon

赛道

获奖赛道

Shortlisted For In-Person Hack

Hack4Bengal 4.0