Soumyadeep Roy

Curriculum Vitae

prof_pic.jpg

Address:

Complex Networks Research Lab, Dept. of CSE,

Indian Institute of Technology Kharagpur,

West Bengal, India - 721302

Curriculum Vitae

I am an Institute Ph.D. Research Fellow in the Department of Computer Science and Engineering at IIT Kharagpur, supervised by Prof. Niloy Ganguly and Prof. Shamik Sural. I submitted my Ph.D thesis titled “Domain Adaptation for Medical Language Understanding” on October 2024.

Currently, I am doing a research internship at the Health Innovation and Technology Centre of GE Healthcare India. I am looking for research scientist and postdoc positions in Foundation Models and Generative AI for medicine.

Research Interests: I have expertise in developing foundational models for medical applications, ranging from NLP (text) models to DNA and sc-RNA-based biological foundational models. For NLP, I work on develop efficient domain adaptation techniques for adapting open-domain generative AI model for medical tasks. Specifically, optimizing the model vocabulary during finetuning (vocabulary adaptation).

Research Experience: During my Ph.D, I was part of the Complex Networks Research Group (CNeRG) at IIT Kharagpur. I have spent 2.5 years (January 2021 - July 2023) working as a Research Associate at the Leibniz AI Future Lab, L3S Research Center in Germany, with Prof. Wolfgang Nejdl. In collaboration with the Hannover Medical School, Germany, I have developed a novel machine-learning methodology for identifying novel patient subtypes for Parkinson’s disease. Before joining Ph.D., I have completed an M.S Research degree at CSE Dept. at IIT Kharagpur and worked as a Junior Research Fellow (first 2 years) and Senior Research Fellow. During my bachelors, I also secured the Indian Academic of Sciences Summer Research Fellowship, and spent three months summer internship at IIT Kharagpur.

news

Oct 25, 2024 Our tutorial proposal “Building Trustworthy AI Models for Medicine: From Theory to Applications” has been accepted at the 18th ACM International Conference on Web Search and Data Mining (WSDM 2025), an A* conference, to be held in Germany (https://www.wsdm-conference.org/2025/)
Oct 16, 2024 Submitted my Ph.D thesis titled “Domain Adaptation for Medical Language Understanding”.
Sep 20, 2024 Our work at IIT Kharagpur “Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models” got accepted to EMNLP 2024 Findings (an A* conference) as a short paper
Jul 04, 2024 Our work at IIT Kharagpur “Unlocking Efficiency: Adaptive Masking for Gene Transformer Models” got accepted to ECAI 2024 (a Core A conference with acceptance rate of 23%).
Apr 17, 2024 Our work at IIT Kharagpur “MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization” got accepted to IJCAI 2024 Main track(a Core A* conference).

selected publications

  1. Building Trustworthy AI Models for Medicine: From Theory to Applications
    Soumyadeep Roy, Sowmya S. Sundaram , Dominik Wolff , and 1 more author
    In The 18th ACM International Conference on Web Search and Data Mining , 2025
  2. Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models
    Gunjan Balde , Soumyadeep Roy, Mainack Mondal , and 1 more author
    In Findings of the 2024 Conference on Empirical Methods in Natural Language Processing , 2024
  3. Unlocking Efficiency: Adaptive Masking for Gene Transformer Models
    Soumyadeep Roy, Shamik Sural , and Niloy Ganguly
    In Proceedings of the 27th European Conference on Artificial Intelligence , 2024
  4. MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization
    Gunjan Balde , Soumyadeep Roy, Mainack Mondal , and 1 more author
    In Proceedings of the 33rd International Joint Conference on Artificial Intelligence , 2024
  5. Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions
    Soumyadeep Roy, Aparup Khatua , Fatemeh Ghoochani , and 3 more authors
    In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , 2024
  6. GENEMASK: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning
    Soumyadeep Roy, Jonas Wallat , Sowmya S. Sundaram , and 2 more authors
    In 26th European Conference on Artificial Intelligence ECAI 2023 , Sep 2023
  7. Interpretable Clinical Trial Search using Pubmed Citation Network
    Soumyadeep Roy, Niloy Ganguly , Shamik Sural , and 1 more author
    In 2023 IEEE International Conference on Digital Health (ICDH) , Sep 2023
  8. Knowledge-Aware Neural Networks for Medical Forum Question Classification
    Soumyadeep Roy, Sudip Chakraborty , Aishik Mandal , and 6 more authors
    In Proceedings of the 30th ACM International Conference on Information & Knowledge Management , Sep 2021
  9. An Integrated Approach for Improving Brand Consistency of Web Content: Modeling, Analysis, and Recommendation
    Soumyadeep Roy, Shamik Sural , Niyati Chhaya , and 2 more authors
    ACM Trans. Web, May 2021