Soumyadeep Roy

Postdoc at Stanford Medicine

I work on deployable medical language models — making them efficient enough to run in real clinical workflows, reliable enough for clinical decision-making, and auditable against established medical guidelines.

I am a postdoctoral scholar working with Prof. Tina Hernandez-Boussard at the Division of Computational Medicine, Department of Medicine of Stanford University. My current research at Stanford focuses on the clinical domain of perioperative pain management where I collaborate with a clinical team of surgeons, trauma care physicians and psychologists. I am leading the following two research directions - (i) developing auditable reasoning LLMs for clinical tasks, using clinical guidelines as the evidence base, and (ii) LLM as a tool to construct executable guideline pathways and understand deviations in real-world patient trajectories, to improve patient care outcomes.

My research spans three connected threads: efficient pretraining and adaptation of medical and biomedical foundation models (GeneMask, vocabulary adaptation during finetuning, adaptive BPE); reliability evaluation of clinical LLMs (USMLE error taxonomy, OOV impact studies, trustworthy AI tutorials); and guideline-grounded reasoning for clinical decision support. Published 16+ peer-reviewed papers at SIGIR, ACL, IJCAI, EMNLP, ECAI, CIKM, and Frontiers in AI.

Please go through my resume for further details.

Previously: PhD from IIT Kharagpur (2025), Research Associate at L3S Research Center / Leibniz University Hannover (2021–2023), AI Intern at Wipro GE Healthcare (2024–2025), and Research Intern at Adobe Research (2018).

news

Mar 16, 2026 Presented AuriCare, a holistic pain management decision support concept, at MIT Hacking Medicine’s Boston GrandHack 2026. Demo Blog
Feb 20, 2026 Served as a reviewer for 2 conferences (FAccT 2026, ACL ARR January 2026) and 2 journals (JAMIA, Frontiers in AI)
Feb 14, 2026 Our work “LongTailQA: Benchmarking LLMs and RAG Models on Disambiguated Long-Tail Entities” got accepted to LREC 2026. Year-long collaborative effort with PhD students and colleagues from L3S Research Center, Germany
Dec 05, 2025 Presented our work on vocabulary adaptation (VA) for training medical language models at the Microsoft Research India (Bangalore) Friday Breakfast talk series. Link to slides
Nov 21, 2025 Our Parkinson Disease Subtyping paper with L3S Research Center Germany and Hannover Medical School got published at the Frontiers in AI Journal under Section Medicine and Public Health https://doi.org/10.3389/frai.2025.1668206. Link to slides
Sep 03, 2025 Started my postdoc at Stanford Medicine with Prof. Tina Hernandez-Boussard. I will work on understanding how real-world patient trajectories deviate from clinical guidelines. Does it lead to positive patient outcomes or avoidable harm?
Aug 01, 2025 Served as a reviewer for A* conferences such as EMNLP, AAAI, ACL RR - July and journals such as Frontiers in Genetics, Knowledge and Information Systems.
Jul 11, 2025 Delighted to share that our work ““Where does it hurt? Medical Intent Classification from Dialogues: A Dataset on Doctor Intents and Benchmarking Study”” with Berlin University of Applied Sciences Berlin (BHT) and L3S Research Center, Germany has been accepted to 28th European Conference on Artificial Intelligence (ECAI 2025) to be held in Bologna, Italy.
Jun 01, 2025 Relaunched my research blog about NLP, medical AI and academic research, at https://datanalytics101.com
May 15, 2025 Our work “Evaluation of LLMs in Medical Text Summarization: The Role of Vocabulary Adaptation in High OOV Settings” at IIT Kharagpur, got accepted to ACL 2025 Findings, an A* conference in natural language processing.