Soumyadeep Roy
Address:
Division of Computational Medicine,
Dept. of Medicine,
Stanford University,
California, United States
Curriculum Vitae | Research BlogI am a postdoctoral scholar working with Prof. Tina Hernandez-Boussard at the Division of Computational Medicine under the Department of Medicine of Stanford University. My project aims to understand the deviations in real-world patient trajectories from the established clinical guidelines, at scale with the help of large language models - does it lead to better patient outcomes or avoidable harm.
My primary area of research is natural language processing, with expertise in medical and healthcare applications. My research areas of interest are Foundation Models for Medicine, Generative AI, Text Summarization, and Efficient Pretraining.
I hold a PhD in Computer Science and Engineering from the Indian Institute of Technology Kharagpur, where I worked with Prof. Niloy Ganguly and Prof. Shamik Sural. Here, I was part of the Complex Networks Research Group (CNeRG).
My PhD thesis is titled “Domain Adaptation for Medical Language Understanding”, where I developed novel domain adaptation techniques to effectively and efficiently adapt open-domain AI models to the medical domain.
Please go through my CV for further details.
news
| Dec 05, 2025 | Presented our work on vocabulary adaptation (VA) for training medical language models at the Microsoft Research India (Bangalore) Friday Breakfast talk series. Link to slides |
|---|---|
| Nov 21, 2025 | Our Parkinson Disease Subtyping paper with L3S Research Center Germany and Hannover Medical School got published at the Frontiers in AI Journal under Section Medicine and Public Health https://doi.org/10.3389/frai.2025.1668206. Link to slides |
| Sep 03, 2025 | Started my postdoc at Stanford Medicine with Prof. Tina Hernandez-Boussard. I will work on understanding how real-world patient trajectories deviate from clinical guidelines. Does it lead to positive patient outcomes or avoidable harm? |
| Aug 01, 2025 | Served as a reviewer for A* conferences such as EMNLP, AAAI, ACL RR - July and journals such as Frontiers in Genetics, Knowledge and Information Systems. |
| Jul 11, 2025 | Delighted to share that our work ““Where does it hurt? Medical Intent Classification from Dialogues: A Dataset on Doctor Intents and Benchmarking Study”” with Berlin University of Applied Sciences Berlin (BHT) and L3S Research Center, Germany has been accepted to 28th European Conference on Artificial Intelligence (ECAI 2025) to be held in Bologna, Italy. |
| Jun 01, 2025 | Relaunched my research blog about NLP, medical AI and academic research, at https://datanalytics101.com |
| May 15, 2025 | Our work “Evaluation of LLMs in Medical Text Summarization: The Role of Vocabulary Adaptation in High OOV Settings” at IIT Kharagpur, got accepted to ACL 2025 Findings, an A* conference in natural language processing. |
| May 05, 2025 | Successfully defended my PhD Thesis “Domain Adaptation for Medical Language Understanding” on May 5, 2025 |
| Mar 14, 2025 | Moderated a panel discussion with Prof. Preslav Nakov, Prof. Krishna Gummadi, Prof. Ritumbra Manuvie, Prof. Jeanne Mifsud Bonnici, and Prof. Prasenjit Mitra, at the Workshop on Generative AI for Disinformation and Misinformation Detection of WSDM 2025 |
| Mar 10, 2025 | Delivered a 3-hour tutorial “Building Trustworthy AI Models for Medicine: From Theory to Applications” (tutorial website) together with Dominik Wolff (Hannover Medical School, Germany), Sowmya S. Sundaram (Stanford University) and Prof. Niloy Ganguly (IIT Kharagpur), at the 18th ACM International Conference on Web Search and Data Mining (WSDM 2025) held in Hannover, Germany |