AutoCon, an AI-powered Confidence Analysis Tool, evaluates facial expressions, vocal tone, and speech sentiment giving users personalized insights and actionable tips for impactful communication.
In a world where communication defines success—be it job interviews, public speaking, online education, or leadership—people often struggle with confidence, clarity, and emotional impact. Traditional soft-skill training methods are subjective, non-scalable, and lack real-time personalized feedback. There's a growing need for a data-driven, AI-powered solution that quantifies and improves communication confidence.
AutoCon is an intelligent confidence analysis system that utilizes advanced AI to evaluate and enhance a user's communication skills. It provides deep insights into facial expressions, vocal features, and emotional tone, and delivers real-time, personalized feedback with recommendations powered by large language models.
AutoCon bridges computer vision, NLP, and audio signal processing to offer a holistic view of a user’s communication impact—all while ensuring scalability, performance, and usability.
Facial Emotion Detection
Uses MTCNN for accurate face detection in video streams.
Extracts cropped faces frame-by-frame.
A pre-trained CNN model built on top of the FER-2013 dataset detects micro-expressions across 7 emotions (Happy, Sad, Angry, etc.).
Outputs emotion timelines to gauge emotional variance and presence.
Speech Sentiment & Emotion Analysis
Extracts audio using FFmpeg.
Processes audio with Deepgram API to generate real-time transcripts.
Applies VADER sentiment analysis to the transcript for textual sentiment classification (positive, neutral, negative).
Correlates audio tone with spoken content to detect emotional dissonance.
Audio Feature Analysis with Librosa
Extracts pitch, energy, speech rate, and spectral features.
Analyzes clarity, fluency, and vocal modulation.
Detects filler words, stammering, or low-energy tones to give vocal feedback.
Posture & Gesture Recognition
Integration-ready with MoveNet Thunder to assess body language.
Posture symmetry and hand gesture energy contribute to the engagement score.
Insight & Recommendation Engine
All scores are fed into a Gemini-powered LLM that generates:
Personalized feedback.
Growth suggestions.
Weekly improvement plans based on emotion trends and vocal clarity.
Scalable Backend & User Tracking
Built on MongoDB Atlas to handle analysis results and user metadata at scale.
Tracks user progress, computes monthly averages, and offers performance dashboards.
Domain | Technologies Used |
---|---|
Frontend | HTML, TailwindCSS, JS |
Backend | Python, Flask |
AI/ML Models | Pre-trained CNN (FER), VADER, Librosa, Gemini |
Computer Vision | MTCNN, OpenCV, MoveNet |
Audio | Librosa, FFmpeg |
Transcription | Deepgram |
Database | MongoDB Atlas |
Packaging | PyInstaller, Electron (for desktop app) |
Deployment | Render / Heroku / Local for demo |
Democratizes soft-skill improvement using AI.
Offers objective, quantifiable metrics to replace subjective feedback.
Highly relevant for students, professionals, educators, and coaches.
Can be deployed in corporate trainings, ed-tech, job platforms, and therapy tools.
Modular design with plug-and-play analysis pipelines.
Horizontally scalable with MongoDB Atlas and async backend support.
Real-time emotion fusion across modalities (face, voice, speech).
Pioneers holistic, AI-driven communication scoring in a lightweight, portable desktop package.
Tackles a deeply human problem of public speaking with all round analysis and extensive AI insights with regard to every aspects.
Combines CV, NLP, audio processing, and LLMs—all in one cohesive system.
Designed with future-readiness and extensibility in mind (mobile, real-time, cloud).
Built to not only detect—but empower users with tools to grow.
🚀 Progress During Hackathon 1. Ideation & Planning a. Brainstormed a unique idea focused on analysing human confidence through multimodal inputs (facial emotion, voice tone, posture, etc.). b. Defined key problem statements and how our solution addresses them. 2. Tech Stack Finalization a. Selected powerful tools and libraries like FER, MTCNN, VADER, Librosa, and Deepgram. b. Chose Python and Flask for the backend, MongoDB Atlas for scalable data storage. 3. Facial Emotion Detection Module a. Implemented MTCNN for face detection. b. Integrated a pre-trained MTCNN model trained for FER to detect seven core emotions. 4. Speech Sentiment Analysis a. Used FFmpeg to extract audio from video input. b. Applied VADER sentiment analysis on transcripts generated via Deepgram API. 5. Audio Feature Extraction a. Leveraged Librosa to extract audio features like pitch, clarity, tempo, etc., for scoring speech confidence. 6. Posture & Gesture Analysis a. Integrated MoveNet for pose estimation to track posture and gestures. b. Evaluated posture consistency during speech delivery. 7. Scoring & Confidence Evaluation a. Designed a scoring system combining multiple factors: emotion, sentiment, clarity, posture, and eye contact. b. Generated feedback based on computed metrics. 8. AI-Powered Recommendations a. Used Google Gemini API to provide personalized feedback and actionable tips for improving presentation skills. 9. Frontend & Dashboard a. Developed an intuitive frontend using HTML, TailwindCSS, and JS. b. Built a real-time dashboard to visualize metrics and track progress. 10. User Management & Authentication System a. Implemented secure user registration and login system. b. Added Authentication and Authorisation. 11. Data Visualization a. Created visual charts to show month-wise confidence growth. b. Summarized insights using MongoDB aggregations and charted using Chart.js. 12. Desktop App Packaging a. May Package the web app using PyInstaller and Electron for a cross-platform desktop app experience. 13. Testing & Debugging a. Thoroughly tested all modules individually and in integration. b. Performed real-time tests with sample users to refine performance. 14. Deployment a. Can be deployed on Render/Heroku for live demo purposes. b. Ensured smooth operation in local environment for offline presentations. 15. Final Touches & Documentation a. Wrote a detailed README with setup instructions and tech insights. b. Prepared demo videos and presentation slides for submission.