Soumyadeep Roy

Postdoc at Stanford Medicine

I work on building deployable medical language models using real-world data (structured EHR and notes).

I am a postdoctoral scholar working with Prof. Tina Hernandez-Boussard at the Division of Computational Medicine, Department of Medicine of Stanford University. My current research at Stanford focuses on the clinical domain of perioperative pain management where I collaborate with a clinical team of surgeons, trauma care physicians and psychologists. I am leading the following two research directions - (i) developing auditable reasoning LLMs for clinical tasks, using clinical guidelines as the evidence base, and (ii) LLM as a tool to construct executable guideline pathways and understand deviations in real-world patient trajectories, to improve patient care outcomes.

My research spans three connected threads: efficient pretraining and adaptation of medical and biomedical foundation models (GeneMask, vocabulary adaptation during finetuning, adaptive BPE); reliability evaluation of clinical LLMs (USMLE error taxonomy, OOV impact studies, trustworthy AI tutorials); and guideline-grounded reasoning for clinical decision support. Published 16+ peer-reviewed papers at SIGIR, ACL, IJCAI, EMNLP, ECAI, CIKM, and Frontiers in AI.

Please go through my resume for further details.

Previously: PhD from IIT Kharagpur (2025), Research Associate at L3S Research Center / Leibniz University Hannover (2021–2023), AI Intern at Wipro GE Healthcare (2024–2025), and Research Intern at Adobe Research (2018).

news

May 10, 2026 Guest Lecturer for Stanford BMDS 223 Course “Deploying and Evaluating Fair AI in Healthcare” (Spring 2026). One lecture on “Bias Evaluation in LLMs” and one hands-on coding workshop on “Bias Audit on Real-World Data (MIMIC-IV)”
Apr 10, 2026 Our work on efficient vocabulary adaptation on medical and legal domains got accepted as an ACL 2026 Mains track as a full paper. I will be presenting in-person at San Diego, California. Code Preprint
Mar 16, 2026 Presented AuriCare, a holistic pain management decision support concept, at MIT Hacking Medicine’s Boston GrandHack 2026. Demo Blog
Feb 20, 2026 Served as a reviewer for 2 conferences (FAccT 2026, ACL ARR January 2026) and 2 journals (JAMIA, Frontiers in AI)
Feb 14, 2026 Our work “LongTailQA: Benchmarking LLMs and RAG Models on Disambiguated Long-Tail Entities” got accepted to LREC 2026. Year-long collaborative effort with PhD students and colleagues from L3S Research Center, Germany
Dec 05, 2025 Presented our work on vocabulary adaptation (VA) for training medical language models at the Microsoft Research India (Bangalore) Friday Breakfast talk series. Link to slides
Nov 21, 2025 Our Parkinson Disease Subtyping paper with L3S Research Center Germany and Hannover Medical School got published at the Frontiers in AI Journal under Section Medicine and Public Health https://doi.org/10.3389/frai.2025.1668206. Link to slides
Sep 03, 2025 Started my postdoc at Stanford Medicine with Prof. Tina Hernandez-Boussard. I will work on understanding how real-world patient trajectories deviate from clinical guidelines. Does it lead to positive patient outcomes or avoidable harm?
Aug 01, 2025 Served as a reviewer for A* conferences such as EMNLP, AAAI, ACL RR - July and journals such as Frontiers in Genetics, Knowledge and Information Systems.
Jul 11, 2025 Delighted to share that our work ““Where does it hurt? Medical Intent Classification from Dialogues: A Dataset on Doctor Intents and Benchmarking Study”” with Berlin University of Applied Sciences Berlin (BHT) and L3S Research Center, Germany has been accepted to 28th European Conference on Artificial Intelligence (ECAI 2025) to be held in Bologna, Italy.