PORTFOLIO

Chirag Jain

Hi! 👋 I am Chirag

I build software to solve problems I have personally encountered. My focus is on leveraging AI and SaaS technologies to develop applications that are accessible, insightful, and designed to make a tangible, positive impact on people's lives.

👨‍💻Technical & Scientific Interests: Research, Deep Learning, Machine Learning, Python Software Development, Data Science, Database Management, Astronomy

Work Experience

Apollo Tyres LTD Digital Innovation Hub

Data Science Intern

Jan. 2026 – Present

Location: Hyderabad - Hybrid

Starting January 5, 2025 - Updates coming soon

Weekly Progress Blogs (Coming Soon)

Research Experience

Indian National Science Academy (INSA) Summer Research Fellow

May 2025 – Jul. 2025

Technologies: PyTorch, Computer Vision

Supervisor: Dr. Ramesh Venkadamchalam (Department of Mathematics, Central University of Tamil Nadu)

Conducted research on AI-based disaster damage assessment using the xView2 dataset comprising 20,000 pre/post-disaster image pairs, focusing on building-level classification to support Humanitarian Assistance and Disaster Relief (HADR) efforts.
Designed a scalable preprocessing pipeline that extracted building patches, applied augmentation, and implemented majority-class undersampling, resulting in a 94.4% dataset expansion (from 304,370 to 591,583 image patches) and improved class balance for model training.
Developed and optimized a lightweight CNN with residual connections (3.7M parameters, 28.71 MB) achieving 83.3% accuracy on 128x128 inputs, closely matching ResNet18's 83.5% (7M parameters, 63.33 MB) with over 54.66% reduction in model size, enabling faster inference for field deployment.

GitHub Repository

Comparative Analysis of DL Models for Brain MRI Tumour Detection

Publication

Technologies: PyTorch, Computer Vision

Supervisor: Dr. Deepasikha Mishra (School of Computer Science and Engineering, VIT-AP University)

Conducted a comparative analysis of Deep Learning models (VGG16, VGG19, Xception, Simple CNN, EfficientNet-Attn) by standardizing training conditions with the same optimizer, scheduler, and epochs through performance evaluation.
Processed ~12,000 brain MRI scans from the MRI ND-5 dataset (sourced from IEEE Dataport) through transformation pipelines to train deep learning models and generate comparative performance visualizations.
EfficientNet achieved the highest performance with 99.82% accuracy on the external dataset and 97.45% on the internal dataset, validated using Nemenyi and Cohen's d statistical significance tests
Research selected for presentation at an IEEE international conference, with subsequent publication slated for Scopus-indexed IEEE Digital Library.

GitHub Repository Publication Link

Education

Vellore Institute of Technology - Amaravati, Andhra Pradesh

2022-2026

B.Tech of Computer Science and Engineering Core

Current CGPA: 9.12

Chennai Public School - Chennai, Tamil Nadu

2020-2022

Central Board of Secondary Education (CBSE)

Senior Secondary

Percentage: 92.6%

Chennai Public School - Chennai, Tamil Nadu

2018-2020

Central Board of Secondary Education (CBSE)

Secondary

Percentage: 95.6%

Certifications

Professional Certifications

Fundamental Certifications

Data Science and Analysis Certifications

Astronomy Certifications

Projects

KOSH: Open Government Data MCP Server

University Capstone Project - Group

Developed Kosh, a conversational interface for India’s Open Government Data that allows users to query complex public datasets and generate instant, dynamic visualizations using natural language, reducing time-to-insight from minutes to seconds .
Engineered a specialized Model Context Protocol (MCP) server using Python and FastMCP to wrap 25+ government APIs, solving the N X M integration bottleneck and creating a standardized interoperability layer for AI agents.
Built a robust full-stack solution leveraging Google Gemini 2.5 Pro for advanced reasoning, Node.js/Express for the agent backend, and React.js for the frontend, featuring real-time response streaming and automated chart rendering.
Contributed towards the development of Kosh React UI with a custom in-chat visualisation feature and engineered specialised backend tools to integrate 8+ government APIs, enabling the system to fetch, filter, and render complex public datasets dynamically.

GitHub Repository

GenAI Legal Assistant (Recently Revamped)

Designed and developed a responsive, full-stack SaaS application to analyze and summarize complex legal documents, serving as a "GenAI Legal Assistant."
Built an intelligent document processing pipeline that ingests multiple file formats (.pdf, .docx, .txt), extracts text, and performs automated section identification using legal keyword recognition.
Engineered a resilient, dual-mode architecture for analysis, featuring a primary mode powered by the Gemini API and a fallback "Lite" mode using a fine-tuned Legal Pegasus model and KeyBERT for continued operation.
Automated the end-to-end software delivery lifecycle by establishing a CI/CD pipeline, deploying a scalable and monitored solution to a production environment on the Railway platform.
Delivered a rich user experience with features including drag-and-drop file uploads, custom summary length controls, and the ability to export the structured analysis as a professionally formatted PDF.

GitHub Repository Website

GitDone: A GitHub-Integrated Deadline tracker

Deployed an open-source tool that uses GitHub OAuth2 for secure user sign-in, allowing developers to create and manage deadline countdowns for their repositories.
Constructed a 4-endpoint REST API to provide users with a unique embed link for a real-time countdown widget, enabling seamless integration into applications like Notion by configuring appropriate CORS header.
Automated the software delivery lifecycle by engineering a CI/CD pipeline with AWS CodePipeline for deployments to AWS Elastic Beanstalk. Fortified security and performance by implementing Amazon CloudFront as a CDN to handle custom domain routing, SSL certificate termination, and cached asset delivery with a 99.5% uptime

GitHub Repository Website

Interactive Portfolio Analytics & Risk Assessment Dashboard

Architected a quantitative finance platform to analyse 10 blue-chip equities, implementing a suite of risk metrics including Sharpe Ratio optimization, 95% Value-at-Risk, correlation matrices, max drawdown, and beta coefficients.
Deployed a mobile-responsive analytics dashboard using a containerized architecture (Render.com) and Plotly/Dash, featuring real-time data visualization and an automated processing pipeline for 500 trading days of market data via yfinance API.
Implemented an advanced statistical modelling system featuring dynamic risk-free rate integration (10Y Treasury), Monte Carlo simulation for risk analysis, and performance attribution across technology, financial, and healthcare sectors.

GitHub Repository Website

Stacked Ensemble Learning Model to Classify Potentially Hazardous Near-Earth Asteroids

Developed a novel stacked ensemble model to classify Near-Earth asteroids as Potentially Hazardous Asteroids (PHAs) using physical and orbital attributes, achieving a recall of 99.29% and accuracy of 99.53%, critical for asteroid impact analysis.
The dataset is acquired from NASA’s Jet Propulsion Laboratory Solar System Dynamics’ open datasets, consisting approximately 1.3 million records undergoing data pre-processing before model building.
Built a stacked ensemble with Random Forest and XGBoost as base models and Logistic Regression as the meta-model, optimized using GridSearchCV, RFECV, and 15-fold cross-validation.
Demonstrated the stacked model’s superior recall performance compared to individual base and meta models, underscoring its robustness in asteroid classification; results are currently under review for journal publication.

GitHub Repository

Early Prediction of Chronic Kidney Disease using Machine Learning

Designed and implemented a predictive learning machine learning model that analysed medical records to identify chronic kidney disease, achieving an accuracy of 93.33% and a recall score of 94.44%.
Four models were considered - Random Forest, Decision Tree, Logistic Regression, and XGBoost. Out of the 4, the overall performance of XGBoost was relatively better than the other 3.
Trained on a CKD dataset acquired from UC Irvine Machine Learning Repository comprising 400 records and synthetic data generated using Copulas library comprising 200 records.
Deployed the model locally via Flask with a user-friendly web interface scalable for public deployment using PythonAnywhere, enhancing accessibility and potential for broader user engagement.

GitHub Repository Project Report

Chymes: A Spotify Playlist Curator

Developed and designed a playlist curator using Python that creates a Spotify playlist using real-time weather status.
The model utilised Openweathermap API, and Spotify API to gather information and generate a playlist of 30 songs.
Utilised Flask to deploy the webpage, and soon enough a mobile application available on Play Store. Currently, the web application is under Beta testing phase with 5+ users.

GitHub Repository

X(formerly Twitter) Sentiment Analysis: COVID-19 Tweets

Built a sentiment analysis model using Python that categorised 2021 COVID-19 pandemic tweets into positive, negative, and neutral sentiments.
The datasets were acquired from Kaggle and merged to form a single dataset with 2,00,000+ records.Transfer learning was implemented on a pre-trained model, Vader built on Python by C.J. Hutto.
The tuned model achieved an accuracy of 88% and the results were visualised on a window created using Tkinter library on a daily and monthly basis.

GitHub Repository Project Report

Contact & Socials

Gmail:

chiragajay.jain@gmail.com

LinkedIn:

linkedin.com/in/chiragajain

GitHub:

github.com/ChiragAJain

Kaggle:

kaggle.com/chiragajain