RagFin_AI

Description

RagFin AI : AI-powered Personal Finance & Tax Advisor integrates budgeting, tax filing, and investment planning into a single, seamless platform. Leveraging a RAG-based AI model, it automates data extraction, delivers personalized real-time financial insights, and adapts to regulatory changes, all while reducing manual effort.

PROBLEM STATEMENT

Many freelancers, gig workers, and small business owners struggle with managing their finances due to fragmented tools, manual data entry, and generic advice that fails to address their specific needs. They face challenges in keeping up with rapidly changing tax regulations, tracking expenses across multiple accounts, and creating cohesive financial plans—resulting in wasted time, increased costs, and potential non-compliance.

SOLUTION

Our AI-powered Personal Finance & Tax Advisor integrates budgeting, tax filing, and investment planning into a single, seamless platform. Leveraging a RAG-based AI model, it automates data extraction, delivers personalized real-time financial insights, and adapts to regulatory changes, all while reducing manual effort. This solution not only enhances financial management for users but also creates scalable revenue streams through premium subscriptions, API licensing, and affiliate partnerships.

⚙ Key Features of Our Solution:

🤖 AI-Powered Financial Advisor – Personalized budgeting, tax tips & investment suggestions using a RAG-based model.

🧾 Automated Tax Filing – Real-time filing support with compliance updates.

📊 Smart Budgeting Dashboard – Income, expenses, and savings visualized in one place.

🔄 Tax Laws & Investment Updates – Provides users with latest circulars and notification regarding income laws and taxes.

🤝Context based tips: Upload pdf of you financial detail and get personalized advices.

Progress During Hackathon

Hackathon Progress Timeline (5th April 12:00 PM - 6th April 12:00 PM IST) Apr 5,1 PM: Initial Setup & RAG Core: Cloned repository, set up backend (Python/Flask) & frontend (Next.js) environments. Established basic RAG pipeline connecting to Pinecone vector DB and Groq LLM API. Web Scraping Integration: Integrated Python scripts (Selenium/BeautifulSoup) to scrape recent RBI/Income Tax notifications, creating the initial data.json knowledge source. Initial Indexing: Successfully indexed scraped notification data (whole documents) into Pinecone using all-MiniLM-L6-v2 embeddings. Basic Backend API: Developed Flask endpoints (/api/query) to handle user queries, perform retrieval, prompt LLM, and return answers. Frontend UI: Built the initial chat interface using Shadcn UI and Next.js. Apr 5, Evening/Night: Debugging & Refinement (RAG v1): Tested the initial RAG pipeline. Identified poor retrieval relevance due to whole-document indexing and limitations of the initial embedding model. Received generic LLM responses lacking specific context. Chunking Implementation: Refactored backend/index.py to implement text chunking (RecursiveCharacterTextSplitter, size ~1000 chars) on notification content. Metadata Storage: Modified indexing to store only chunk metadata (filename, URL, index) in Pinecone, storing full chunk text separately to overcome metadata size limits. Context Fetching: Updated backend/app.py to retrieve chunk metadata from Pinecone and dynamically fetch corresponding full chunk text from local storage (data.json loaded into memory) for context building. Apr 6, Morning: Model Upgrade & Re-indexing: Switched embedding model from all-MiniLM-L6-v2 (384 dim) to all-roberta-large-v1 (1024 dim) for better semantic understanding. Created a new Pinecone index (finance2) with 1024 dimensions. Re-indexed all chunked data using the new model. Improved Retrieval Testing: Tested specific queries targeting known notifications. Observed significantly improved retrieval relevance with the new model and chunking. LLM answers became contextually grounded. Frontend-Backend Integration: Connected the Next.js frontend (chat-interface.tsx) to the Flask backend API (/api/query). Replaced simulated responses with live API calls. Handled loading states and basic error display. Chat History & Session Management: Implemented MongoDB integration for persistent chat history. Added backend endpoints (/api/chats, /api/chat/<id>, POST /api/chats) and frontend logic (ChatHeader, state management) to save, list, and load chat sessions. Resolved session ID consistency issues. Apr 6, Late Morning: Premium Feature Prototype (Document Upload): Implemented backend endpoint (/api/upload) to receive user-uploaded documents (PDF, XLSX, CSV, TXT). Added file parsing logic (PyMuPDF, pandas) and text extraction. Implemented temporary in-memory storage (session_document_store) for document text chunks associated with a session ID. Modified /api/query to perform on-the-fly similarity search on document chunks and combine this context with RAG context for the LLM. Updated frontend to handle file upload API calls and display active document context. UI Refinement & Bug Fixing: Resolved UI glitches (duplicate buttons), ensured file upload UI interaction works, added assistant logo, fixed path/import errors. Deployment Preparation: Added Gunicorn, created Procfile, configured backend for deployment environment. Addressed deployment memory issues on Render free tier by optimizing model loading and reducing workers (though final deployment may require paid tier).

Fundraising Status

Fundraising Status & Needs Current Status: Pre-seed / Self-funded Prototype. RagFin AI is currently operating without external funding. All development has been bootstrapped during this hackathon. Funding Needs: We are seeking seed funding to accelerate development and bring RagFin AI to market. Funds are required for: Compute Resources (Cloud Hosting): GPU Instances: For efficient fine-tuning of a custom financial LLM (crucial for nuanced understanding and safety) and potentially for hosting the optimized model for low-latency inference. Scalable Backend Hosting: Moving beyond free/starter tiers on platforms like Render or deploying on AWS/GCP to handle user load, document processing, and reliable API performance. Data Acquisition & Annotation: Licensing or sourcing high-quality, comprehensive financial data (Indian regulations, market data, anonymized case studies) beyond basic web scraping. Funding manual review, annotation, and creation of a large-scale, high-quality dataset specifically for fine-tuning the financial LLM (especially for RAG Q&A and reasoning examples). Team Expansion: Hiring specialized engineers (ML/NLP for fine-tuning & RAG optimization, Backend for scalability & security, Frontend for richer UI/UX) and potentially financial domain experts for data validation. Third-Party APIs & Services: Potential costs for premium market data feeds, enhanced security scanning, or specialized financial APIs. Marketing & User Acquisition: Initial go-to-market activities upon product launch. Goal: Secure funding to transition from a functional prototype to a robust, scalable, and highly intelligent financial assistant platform, starting with refining the core RAG and fine-tuning capabilities.

RagFin_AI

Videos

Description

Progress During Hackathon

Tech Stack

Fundraising Status

Github Link