MediCare is an AI-powered disease prediction system that allows users to input symptoms and get potential diagnoses along with detailed disease information. It also suggests nearby hospitals.
This project implements a Machine Learning-based Disease Prediction System designed to predict probable diseases based on a patient's symptoms. The core model is an optimized XGBoost Classifier (XGBClassifier
), fine-tuned using RandomizedSearchCV
for enhanced accuracy and performance.
Language: Python
Libraries:pandas
, numpy
, scikit-learn
, xgboost
, joblib
, difflib
, tabulate
, scipy
Model Used: XGBClassifier
(eXtreme Gradient Boosting)
Training Strategy:
Features selected using SelectKBest
with mutual_info_classif
.
Dataset split into training and testing sets using stratification.
Hyperparameters optimized via RandomizedSearchCV
.
Final model saved as optimized_disease_predictor.pkl
.
Interactive symptom input system with fuzzy matching (difflib
) for typo correction.
Dynamic prediction output showing:
Primary predicted disease with confidence score.
Alternative likely diseases.
Symptom data structured for binary classification (1
= present, 0
= absent).
Model trained on selected 40 most informative symptoms for high relevance and generalization.
Predicts top three possible diseases with confidence percentages.
Offers clear suggestions and highlights the need for professional medical consultation.
💡 Project Name: MediCare 🩺 Tagline / Slogan: Predict Prevent Protect ✅ Current Progress: Model Development: Symptom-based Disease Prediction system built. Trained an XGBoost Classifier on real-world medical symptom datasets. Feature selection using Mutual Information for improving prediction accuracy. Model Accuracy: Achieved a strong predictive accuracy (~High 90% range depending on dataset split). Functionality: Predicts the top 3 probable diseases based on user-reported symptoms. Includes both interactive input and predefined input support. Intelligent fuzzy symptom matching (helps with typos or approximate inputs). Tech Stack: Python, Pandas, Scikit-learn, XGBoost, Joblib Difflib for close symptom matching. Tabulate for clean terminal output. Deployment Preparation: Model is saved and can be integrated into web / desktop apps. CLI prototype working perfectly for demonstration. ⚡ Next Steps: 🌐 Web / App Integration (suggest: Streamlit / Flask for web app). 🧠 Expand dataset for more diverse symptoms and conditions. 📊 Add more context-aware suggestions (e.g. severity ranking).