PORTFOLIO
Chirag Jain
Hi! đź‘‹ I am Chirag
I build software to solve problems I have personally encountered. My focus is on leveraging AI and SaaS technologies to develop applications that are accessible, insightful, and designed to make a tangible, positive impact on people's lives.
👨‍💻Technical & Scientific Interests: Research, Deep Learning, Machine Learning, Python Software Development, Data Science, Database Management, Astronomy
Work Experience
Apollo Tyres LTD Digital Innovation Hub
Data Science Intern
Jan. 2026 – Present
Location: Hyderabad - Hybrid
Starting January 5, 2025 - Updates coming soon
Research Experience
Indian National Science Academy (INSA) Summer Research Fellow
May 2025 – Jul. 2025
Technologies: PyTorch, Computer Vision
Supervisor: Dr. Ramesh Venkadamchalam (Department of Mathematics, Central University of Tamil Nadu)
- Conducted research on AI-based disaster damage assessment using the xView2 dataset comprising 20,000 pre/post-disaster image pairs, focusing on building-level classification to support Humanitarian Assistance and Disaster Relief (HADR) efforts.
- Designed a scalable preprocessing pipeline that extracted building patches, applied augmentation, and implemented majority-class undersampling, resulting in a 94.4% dataset expansion (from 304,370 to 591,583 image patches) and improved class balance for model training.
- Developed and optimized a lightweight CNN with residual connections (3.7M parameters, 28.71 MB) achieving 83.3% accuracy on 128x128 inputs, closely matching ResNet18's 83.5% (7M parameters, 63.33 MB) with over 54.66% reduction in model size, enabling faster inference for field deployment.
Comparative Analysis of DL Models for Brain MRI Tumour Detection
Publication
Technologies: PyTorch, Computer Vision
Supervisor: Dr. Deepasikha Mishra (School of Computer Science and Engineering, VIT-AP University)
- Conducted a comparative analysis of Deep Learning models (VGG16, VGG19, Xception, Simple CNN, EfficientNet-Attn) by standardizing training conditions with the same optimizer, scheduler, and epochs through performance evaluation.
- Processed ~12,000 brain MRI scans from the MRI ND-5 dataset (sourced from IEEE Dataport) through transformation pipelines to train deep learning models and generate comparative performance visualizations.
- EfficientNet achieved the highest performance with 99.82% accuracy on the external dataset and 97.45% on the internal dataset, validated using Nemenyi and Cohen's d statistical significance tests
- Research selected for presentation at an IEEE international conference, with subsequent publication slated for Scopus-indexed IEEE Digital Library.
Education
Vellore Institute of Technology - Amaravati, Andhra Pradesh
2022-2026
B.Tech of Computer Science and Engineering Core
Current CGPA: 9.12
Chennai Public School - Chennai, Tamil Nadu
2020-2022
Central Board of Secondary Education (CBSE)
Senior Secondary
Percentage: 92.6%
Chennai Public School - Chennai, Tamil Nadu
2018-2020
Central Board of Secondary Education (CBSE)
Secondary
Percentage: 95.6%
Certifications
- AWS
Solution Architect Associate (SAA-C03)

- IBM Professional Data Science
Certification

- Associate
Data Engineer in Snowflake
- R
Programming Certification

- The Complete
Python Developer Course - ZerotoMastery Academy

- Introduction to PowerBI

- Introduction to DAX in PowerBI

- Introduction to Snowflake

- Introduction to Snowflake SQL

- Introduction to Data Modelling in Snowflake

- Fully Automated MLOps
- Monitoring Machine Learning Concepts
- Data Manipulation in Snowflake
- Data Preparation in PowerBI
- Database Design
- Data Visualization in PowerBI
- Data
Analyst with R Certification
- What is Data Science

- Tools for Data Science

- Data Science Methodology

- Python for Data
Science,AI and development
- Python Project
for Data Science

- Database and SQL
for Data Science with Python
- Data Analysis
with Python

- Data
Visualisation with Python

- Machine Learning
with Python

- Generative AI: Elevate Your
Data Science Career

- Data Scientist Career Guide
and Interview Preparation

- ISRO
IIRS DLP - Exploring Earth's Moon through Chandrayaan

- ISRO
IIRS DLP - Aditya L1: India's first space based observatory

- ISRO
IIRS - Space Science and Technology Awareness Training (START)

Professional Certifications
Fundamental Certifications
Data Science and Analysis Certifications
Astronomy Certifications
Projects
KOSH: Open Government Data MCP Server
University Capstone Project - Group
- Developed Kosh, a conversational interface for India’s Open Government Data that allows users to query complex public datasets and generate instant, dynamic visualizations using natural language, reducing time-to-insight from minutes to seconds .
- Engineered a specialized Model Context Protocol (MCP) server using Python and FastMCP to wrap 25+ government APIs, solving the N X M integration bottleneck and creating a standardized interoperability layer for AI agents.
- Built a robust full-stack solution leveraging Google Gemini 2.5 Pro for advanced reasoning, Node.js/Express for the agent backend, and React.js for the frontend, featuring real-time response streaming and automated chart rendering.
- Contributed towards the development of Kosh React UI with a custom in-chat visualisation feature and engineered specialised backend tools to integrate 8+ government APIs, enabling the system to fetch, filter, and render complex public datasets dynamically.
GenAI Legal Assistant (Recently Revamped)
- Designed and developed a responsive, full-stack SaaS application to analyze and summarize complex legal documents, serving as a "GenAI Legal Assistant."
- Built an intelligent document processing pipeline that ingests multiple file formats (.pdf, .docx, .txt), extracts text, and performs automated section identification using legal keyword recognition.
- Engineered a resilient, dual-mode architecture for analysis, featuring a primary mode powered by the Gemini API and a fallback "Lite" mode using a fine-tuned Legal Pegasus model and KeyBERT for continued operation.
- Automated the end-to-end software delivery lifecycle by establishing a CI/CD pipeline, deploying a scalable and monitored solution to a production environment on the Railway platform.
- Delivered a rich user experience with features including drag-and-drop file uploads, custom summary length controls, and the ability to export the structured analysis as a professionally formatted PDF.
GitDone: A GitHub-Integrated Deadline tracker
- Deployed an open-source tool that uses GitHub OAuth2 for secure user sign-in, allowing developers to create and manage deadline countdowns for their repositories.
- Constructed a 4-endpoint REST API to provide users with a unique embed link for a real-time countdown widget, enabling seamless integration into applications like Notion by configuring appropriate CORS header.
- Automated the software delivery lifecycle by engineering a CI/CD pipeline with AWS CodePipeline for deployments to AWS Elastic Beanstalk. Fortified security and performance by implementing Amazon CloudFront as a CDN to handle custom domain routing, SSL certificate termination, and cached asset delivery with a 99.5% uptime
Interactive Portfolio Analytics & Risk Assessment Dashboard
- Architected a quantitative finance platform to analyse 10 blue-chip equities, implementing a suite of risk metrics including Sharpe Ratio optimization, 95% Value-at-Risk, correlation matrices, max drawdown, and beta coefficients.
- Deployed a mobile-responsive analytics dashboard using a containerized architecture (Render.com) and Plotly/Dash, featuring real-time data visualization and an automated processing pipeline for 500 trading days of market data via yfinance API.
- Implemented an advanced statistical modelling system featuring dynamic risk-free rate integration (10Y Treasury), Monte Carlo simulation for risk analysis, and performance attribution across technology, financial, and healthcare sectors.
Stacked Ensemble Learning Model to Classify Potentially Hazardous Near-Earth Asteroids
- Developed a novel stacked ensemble model to classify Near-Earth asteroids as Potentially Hazardous Asteroids (PHAs) using physical and orbital attributes, achieving a recall of 99.29% and accuracy of 99.53%, critical for asteroid impact analysis.
- The dataset is acquired from NASA’s Jet Propulsion Laboratory Solar System Dynamics’ open datasets, consisting approximately 1.3 million records undergoing data pre-processing before model building.
- Built a stacked ensemble with Random Forest and XGBoost as base models and Logistic Regression as the meta-model, optimized using GridSearchCV, RFECV, and 15-fold cross-validation.
- Demonstrated the stacked model’s superior recall performance compared to individual base and meta models, underscoring its robustness in asteroid classification; results are currently under review for journal publication.
Early Prediction of Chronic Kidney Disease using Machine Learning
- Designed and implemented a predictive learning machine learning model that analysed medical records to identify chronic kidney disease, achieving an accuracy of 93.33% and a recall score of 94.44%.
- Four models were considered - Random Forest, Decision Tree, Logistic Regression, and XGBoost. Out of the 4, the overall performance of XGBoost was relatively better than the other 3.
- Trained on a CKD dataset acquired from UC Irvine Machine Learning Repository comprising 400 records and synthetic data generated using Copulas library comprising 200 records.
- Deployed the model locally via Flask with a user-friendly web interface scalable for public deployment using PythonAnywhere, enhancing accessibility and potential for broader user engagement.
Chymes: A Spotify Playlist Curator
- Developed and designed a playlist curator using Python that creates a Spotify playlist using real-time weather status.
- The model utilised Openweathermap API, and Spotify API to gather information and generate a playlist of 30 songs.
- Utilised Flask to deploy the webpage, and soon enough a mobile application available on Play Store. Currently, the web application is under Beta testing phase with 5+ users.
X(formerly Twitter) Sentiment Analysis: COVID-19 Tweets
- Built a sentiment analysis model using Python that categorised 2021 COVID-19 pandemic tweets into positive, negative, and neutral sentiments.
- The datasets were acquired from Kaggle and merged to form a single dataset with 2,00,000+ records.Transfer learning was implemented on a pre-trained model, Vader built on Python by C.J. Hutto.
- The tuned model achieved an accuracy of 88% and the results were visualised on a window created using Tkinter library on a daily and monthly basis.
Contact & Socials
Gmail:
chiragajay.jain@gmail.comLinkedIn:
linkedin.com/in/chiragajainGitHub:
github.com/ChiragAJainKaggle:
kaggle.com/chiragajain© Copyright 2026 | Made by Chirag Jain