- Building a QIIME 2 pipeline for 16S rRNA sequencing data across colorectal cancer cohorts from Malaysia, Sri Lanka, and Saudi Arabia.
- Training and cross-validating Random Forest and gradient-boosted classifiers for early CRC detection with explicit focus on cross-population robustness.
- Investigating which microbial taxa remain predictive across geographies versus which signals are population-specific.
- Framing the project around deployment readiness for diverse populations, with manuscript preparation in progress.
Experience / Research
Academic projects across interpretable ML, computational biology, and healthcare AI.

Capstone Researcher
Capstone Project, NYU Abu Dhabi
Gut Microbiome ML for Colorectal Cancer Detection
Sep 2025 to May 2026
PythonQIIME 216S rRNARandom ForestBioinformaticsCross-cohort validation

Research Assistant
Computational Medicine Laboratory, NYU New York
Long COVID Risk Factor Analysis
Sep 2024 to Dec 2024
- Developed and implemented data cleaning pipelines for Electronic Health Records to identify long COVID risk factors and symptom patterns across diverse patient populations.
- Engineered a predictive modeling pipeline integrating clinical data and biomarkers for early identification of high-risk long COVID patients, improving detection rates from 65% to 89%.
- Applied differential equation models to biomarker datasets, including skin conductance data, to quantify patient fatigue levels.
PythonEHRPredictive ModelingETLClinical Data

Research Assistant
Spine Labs, University of New South Wales
Spinal Pathology Research
Jun 2024 to Aug 2024
- Engineered a comprehensive data pipeline that processed and standardized more than 1,200 medical manuscripts across eight formats, achieving 95% data accuracy while reducing processing time by 60%.
- Developed a multi-shot prompt engineering workflow for medical NLP, improving data labeling accuracy from 55% to 92% across more than 10,000 medical terms.
- Led evaluation of LLM architectures through systematic literature analysis, selecting a cost-effective solution that reduced computational overhead by 40% while maintaining high labeling accuracy.
PythonLLMsPrompt EngineeringMedical NLP

Machine Learning Researcher
Laboratory for Advanced Bio-Photonics and Imaging, NYU Abu Dhabi
Raman Spectroscopy Meets Computational Intelligence
Aug 2023 to Jun 2024
- Developed and validated three machine learning models for real-time tissue classification using Raman spectroscopy data, achieving more than 98% classification accuracy.
- Co-authored a peer-reviewed publication on ML-based tissue classification for early detection in surgical settings, later recognized as Editor's Choice in Lasers in Surgery and Medicine.
- Prepared and preprocessed Raman spectral data for downstream analysis, supporting tissue characterization work for precision medicine.
- Presented the research at Harvard's National Collegiate Research Conference.
Pythonscikit-learnDeep LearningInterpretable MLBiomedical

Research Assistant
Decision Making Lab, IIT Kanpur
Depression Diagnosis and Gaming as Mental Health Indicators
May 2021 to Aug 2021
- Analyzed longitudinal medical literature on depression diagnosis to identify recurring clinical trends across several decades of published work.
- Investigated neurological responses to gaming as a potential behavioral signal of deteriorating mental health.
- Explored passive, non-invasive signal detection as an early research direction in mental health AI.
Medical Literature ReviewNeuroscienceMental Health AI