PORTFOLIO

Chirag Jain
Hi! đź‘‹ I am Chirag
A Data Science enthusiast focusing in AI Research and Development, and a Final-year student at Vellore Institute of Technology, Amravati, Andhra Pradesh, India.
👨‍💻Technical & Scientific Interests: Research,Deep Learning, Machine Learning, Python Development, Data Analysis, Database Management, Astronomy
Experience
Indian National Science Academy (INSA) Summer Research Fellow
May 2025 – Jul. 2025
Technologies: PyTorch, Computer Vision
Supervisor: Dr. Ramesh Venkadamchalam (Department of Mathematics, Central University of Tamil Nadu)
- Conducted research on AI-based disaster damage assessment using the xView2 dataset comprising 20,000 pre/post-disaster image pairs, focusing on building-level classification to support Humanitarian Assistance and Disaster Relief (HADR) efforts.
- Designed a scalable preprocessing pipeline that extracted building patches, applied augmentation, and implemented majority-class undersampling, resulting in a 94.4% dataset expansion (from 304,370 to 591,583 image patches) and improved class balance for model training.
- Developed and optimized a lightweight CNN with residual connections (3.7M parameters, 28.71 MB) achieving 83.3% accuracy on 128x128 inputs, closely matching ResNet18's 83.5% (7M parameters, 63.33 MB) with over 54.66% reduction in model size, enabling faster inference for field deployment.
Education
Vellore Institute of Technology - Amaravati, Andhra Pradesh
2022-2026
B.Tech of Computer Science and Engineering Core
Current CGPA: 9.12
Chennai Public School - Chennai, Tamil Nadu
2020-2022
Central Board of Secondary Education (CBSE)
Senior Secondary
Percentage: 92.6%
Chennai Public School - Chennai, Tamil Nadu
2018-2020
Central Board of Secondary Education (CBSE)
Secondary
Percentage: 95.6%
Certifications
- AWS
Solution Architect Associate (SAA-C03)
- IBM Professional Data Science
Certification
- R
Programming Certification
- The Complete
Python Developer Course - ZerotoMastery Academy
- Data
Analyst with R Certification
- What is Data Science
- Tools for Data Science
- Data Science Methodology
- Python for Data
Science,AI and development
- Python Project
for Data Science
- Database and SQL
for Data Science with Python
- Data Analysis
with Python
- Data
Visualisation with Python
- Machine Learning
with Python
- Generative AI: Elevate Your
Data Science Career
- Data Scientist Career Guide
and Interview Preparation
- ISRO
IIRS DLP - Exploring Earth's Moon through Chandrayaan
- ISRO
IIRS DLP - Aditya L1: India's first space based observatory
- ISRO
IIRS - Space Science and Technology Awareness Training (START)
Professional Certifications
Fundamental Certifications
Data Science and Analysis Certifications
Astronomy Certifications
Projects
Interactive Portfolio Analytics & Risk Assessment Dashboard
- Architected a quantitative finance platform to analyse 10 blue-chip equities, implementing a suite of risk metrics including Sharpe Ratio optimization, 95% Value-at-Risk, correlation matrices, max drawdown, and beta coefficients.
- Deployed a mobile-responsive analytics dashboard using a containerized architecture (Render.com) and Plotly/Dash, featuring real-time data visualization and an automated processing pipeline for 500 trading days of market data via yfinance API.
- Implemented an advanced statistical modelling system featuring dynamic risk-free rate integration (10Y Treasury), Monte Carlo simulation for risk analysis, and performance attribution across technology, financial, and healthcare sectors.
Comparative Analysis of DL Models for Brain MRI Tumour Detection
- Conducted a comparative analysis of Deep Learning models (VGG16, VGG19, Xception, Simple CNN, EfficientNet-Attn) by standardizing training conditions with the same optimizer, scheduler, and epochs through performance evaluation.
- Processed ~12,000 brain MRI scans from the MRI ND-5 dataset (sourced from IEEE Dataport) through transformation pipelines to train deep learning models and generate comparative performance visualizations.
- EfficientNet achieved the highest performance with 99.82% accuracy on the external dataset and 97.45% on the internal dataset, validated using Nemenyi and Cohen’s d statistical significance tests
- Research selected for presentation at an upcoming IEEE international conference, with subsequent publication slated for a Scopus-indexed journal.
Stacked Ensemble Learning Model to Classify Potentially Hazardous Near-Earth Asteroids
- Developed a novel stacked ensemble model to classify Near-Earth asteroids as Potentially Hazardous Asteroids (PHAs) using physical and orbital attributes, achieving a recall of 99.29% and accuracy of 99.53%, critical for asteroid impact analysis.
- The dataset is acquired from NASA’s Jet Propulsion Laboratory Solar System Dynamics’ open datasets, consisting approximately 1.3 million records undergoing data pre-processing before model building.
- Built a stacked ensemble with Random Forest and XGBoost as base models and Logistic Regression as the meta-model, optimized using GridSearchCV, RFECV, and 15-fold cross-validation.
- Demonstrated the stacked model’s superior recall performance compared to individual base and meta models, underscoring its robustness in asteroid classification; results are currently under review for journal publication.
Terms and Condition Summariser
- Designed and implemented a responsive SaaS Terms & Conditions Summarizer with dual-mode architecture: AI-powered local analysis using pre-trained NLP model (Legal Pegasus) and KeyBERT library for development, plus optimized rule-based NLP for public production deployment.
- Built intelligent document processing pipeline that extracts text from multiple file formats (.pdf, .docx, .txt), performs automated section identification using legal keyword recognition, and applies either neural summarization (AI mode) or rule-based sentence scoring (Lite mode). Enhanced with threading for concurrent processing, professional PDF generation with color-coded formatting, and real-time user feedback collection stored in JSON format.
- Developed modern full-stack web application using HTML5, CSS3, JavaScript, and Flask backend with production deployment on Railway platform featuring optimized build pipeline and health monitoring.
- Key Features: Dual-mode processing (AI/Lite), custom summary length controls (50-350 words), multi-format file support with drag-and-drop upload, mobile-responsive interface, structured PDF output with professional formatting, real-time notifications, user feedback system, and production-ready deployment with 99% uptime.
Early Prediction of Chronic Kidney Disease using Machine Learning
- Designed and implemented a predictive learning machine learning model that analysed medical records to identify chronic kidney disease, achieving an accuracy of 93.33% and a recall score of 94.44%.
- Four models were considered - Random Forest, Decision Tree, Logistic Regression, and XGBoost. Out of the 4, the overall performance of XGBoost was relatively better than the other 3.
- Trained on a CKD dataset acquired from UC Irvine Machine Learning Repository comprising 400 records and synthetic data generated using Copulas library comprising 200 records.
- Deployed the model locally via Flask with a user-friendly web interface scalable for public deployment using PythonAnywhere, enhancing accessibility and potential for broader user engagement.
Chymes: A Spotify Playlist Curator
- Developed and designed a playlist curator using Python that creates a Spotify playlist using real-time weather status.
- The model utilised Openweathermap API, and Spotify API to gather information and generate a playlist of 30 songs.
- Utilised Flask to deploy the webpage, and soon enough a mobile application available on Play Store. Currently, the web application is under Beta testing phase with 5+ users.
X(formerly Twitter) Sentiment Analysis: COVID-19 Tweets
- Built a sentiment analysis model using Python that categorised 2021 COVID-19 pandemic tweets into positive, negative, and neutral sentiments.
- The datasets were acquired from Kaggle and merged to form a single dataset with 2,00,000+ records.Transfer learning was implemented on a pre-trained model, Vader built on Python by C.J. Hutto.
- The tuned model achieved an accuracy of 88% and the results were visualised on a window created using Tkinter library on a daily and monthly basis.
Contact & Socials
Gmail:
chiragajay.jain@gmail.comLinkedIn:
linkedin.com/in/chiragajainGitHub:
github.com/Chirag-65-JainKaggle:
kaggle.com/chiragajain© Copyright 2025 | Made by Chirag Jain