Soumyadeep Roy
Address:
Complex Networks Research Lab, Dept. of CSE,
Indian Institute of Technology Kharagpur,
West Bengal, India - 721302
Curriculum VitaeI am an Institute Ph.D. Research Fellow in the Department of Computer Science and Engineering at IIT Kharagpur, supervised by Prof. Niloy Ganguly and Prof. Shamik Sural. I submitted my Ph.D thesis titled “Domain Adaptation for Medical Language Understanding” on October 2024.
Currently, I am doing a research internship at the Health Innovation and Technology Centre of GE Healthcare India. I am looking for research scientist and postdoc positions in Foundation Models and Generative AI for medicine.
Research Interests: I have expertise in developing foundational models for medical applications, ranging from NLP (text) models to DNA and sc-RNA-based biological foundational models. For NLP, I work on develop efficient domain adaptation techniques for adapting open-domain generative AI model for medical tasks. Specifically, optimizing the model vocabulary during finetuning (vocabulary adaptation).
Research Experience: During my Ph.D, I was part of the Complex Networks Research Group (CNeRG) at IIT Kharagpur. I have spent 2.5 years (January 2021 - July 2023) working as a Research Associate at the Leibniz AI Future Lab, L3S Research Center in Germany, with Prof. Wolfgang Nejdl. In collaboration with the Hannover Medical School, Germany, I have developed a novel machine-learning methodology for identifying novel patient subtypes for Parkinson’s disease. Before joining Ph.D., I have completed an M.S Research degree at CSE Dept. at IIT Kharagpur and worked as a Junior Research Fellow (first 2 years) and Senior Research Fellow. During my bachelors, I also secured the Indian Academic of Sciences Summer Research Fellowship, and spent three months summer internship at IIT Kharagpur.
news
Oct 25, 2024 | Our tutorial proposal “Building Trustworthy AI Models for Medicine: From Theory to Applications” has been accepted at the 18th ACM International Conference on Web Search and Data Mining (WSDM 2025), an A* conference, to be held in Germany (https://www.wsdm-conference.org/2025/) |
---|---|
Oct 16, 2024 | Submitted my Ph.D thesis titled “Domain Adaptation for Medical Language Understanding”. |
Sep 20, 2024 | Our work at IIT Kharagpur “Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models” got accepted to EMNLP 2024 Findings (an A* conference) as a short paper |
Jul 04, 2024 | Our work at IIT Kharagpur “Unlocking Efficiency: Adaptive Masking for Gene Transformer Models” got accepted to ECAI 2024 (a Core A conference with acceptance rate of 23%). |
Apr 17, 2024 | Our work at IIT Kharagpur “MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization” got accepted to IJCAI 2024 Main track(a Core A* conference). |
selected publications
- Building Trustworthy AI Models for Medicine: From Theory to ApplicationsIn The 18th ACM International Conference on Web Search and Data Mining , 2025