I want to build AI systems with coherent, updateable, and interpretable models of internal and external phenomenon. Specifically, my research focuses on the following three types of models:
- World models: models of the external environment that update in the presence of new information and support coherent downstream prediction and reasoning.
- User models: models of the user's preferences, goals, beliefs, values, learning styles, and workflows.
- Self models: models of the AI system's own internal computations, external behaviors, and limitations.
Together, these models enable AI systems to behave more reliably and predictably, in ways that are transparent and safe for humans. Ultimately, my goal is to pave the way for AI systems that we can collaborate with and learn from—systems that empower rather than replace people.
About Me
I am a PhD candidate at MIT CSAIL, affiliated with the language & intelligence (LINGO) lab @ MIT. My advisor is Jacob Andreas. I am funded by a Clare Boothe Luce Graduate Fellowship [Press] and was a 2024 Rising Star in EECS. Previously, I spent a year at Facebook AI Applied Research, and before that, I obtained my B.S. in Computer Science at the University of Washington, where I worked with Luke Zettlemoyer. You can view more in my CV.
Representative Papers
-
Training Language Models to Explain Their Own Computations
Belinda Z. Li, Zifan Carl Guo, Vincent Huang, Jacob Steinhardt, Jacob Andreas
ArXiv Preprint -
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
Belinda Z. Li, Been Kim, Zi Wang
NeurIPS Dataset & Benchmarks Track, 2025 -
(How) Do Language Models Track State
Belinda Z. Li, Zifan Carl Guo, Jacob Andreas
ICML, 2025 -
Bayesian Preference Elicitation with Language Models
Kunal Handa, Yarin Gal, Ellie Pavlick, Noah Goodman, Jacob Andreas, Alex Tamkin, Belinda Z. Li
ArXiv Preprint -
Eliciting Human Preferences with Language Models
Belinda Z. Li, Alex Tamkin, Noah Goodman, Jacob Andreas
ICLR, 2025 -
Implicit Representations of Meaning in Neural Language Models
Belinda Z. Li, Maxwell Nye, Jacob Andreas
ACL, 2021.
All Papers
This list is updated very intermittently. For the latest, up-to-date list, please check my Google Scholar.
-
Preference-Conditioned Language-Guided Abstraction
Andi Peng, Andreea Bobu, Belinda Z Li, Theodore R Sumers, Ilia Sucholutsky, Nishanth Kumar, Thomas L Griffiths, Julie A Shah
HRI, 2024 -
Bayesian Preference Elicitation with Language Models
Kunal Handa, Yarin Gal, Ellie Pavlick, Noah Goodman, Jacob Andreas, Alex Tamkin*, Belinda Z. Li*
ArXiv Preprint * = Equal contribution -
Learning with language-guided state abstractions
Andi Peng, Ilia Sucholutsky*, Belinda Z Li*, Theodore R Sumers, Thomas L Griffiths, Jacob Andreas, Julie A Shah
ICLR, 2024 * = Equal contribution
-
Eliciting Human Preferences with Language Models
Belinda Z. Li*, Alex Tamkin*, Noah Goodman, Jacob Andreas
ArXiv Preprint * = Equal contribution -
Measuring and Manipulating Knowledge Representations in Language Models
Evan Hernandez, Belinda Z. Li, Jacob Andreas
ArXiv Preprint -
LaMPP: Language Models as Probabilistic Priors for Perception and Action
Belinda Z. Li, William Chen, Pratyusha Sharma, Jacob Andreas
ArXiv Preprint -
Toward Interactive Dictation
Belinda Z Li, Jason Eisner, Adam Pauls, Sam Thomson
ACL, 2023 -
Language Modeling with Latent Situations
Belinda Z. Li, Maxwell Nye, Jacob Andreas
ACL Findings, 2023
- Quantifying Adaptability in Pre-trained Language Models with 500 Tasks
Belinda Z. Li, Jane Yu, Madian Khabsa, Luke Zettlemoyer, Alon Halevy, Jacob Andreas
NAACL, 2022
-
Implicit Representations of Meaning in Neural Language Models
Belinda Z. Li, Maxwell Nye, Jacob Andreas
ACL, 2021. -
On Unifying Misinformation Detection
Nayeon Lee, Belinda Z. Li, Sinong Wang, Pascale Fung, Hao Ma, Wen-tau Yih, and Madian Khabsa
NAACL, 2021 -
On the Influence of Masking Policies in Intermediate Pre-training
Qinyuan Ye, Belinda Z. Li, Sinong Wang, Benjamin Bolte, Hao Ma, Wen-tau Yih, Xiang Ren, and Madian Khabsa
EMNLP, 2021
-
Studying Strategically: Learning to Mask for Closed-book QA
Qinyuan Ye, Belinda Z. Li, Sinong Wang, Benjamin Bolte, Hao Ma, Wen-tau Yih, Xiang Ren, and Madian Khabsa
ArXiv Preprint -
Efficient One-Pass End-to-End Entity Linking for Questions
Belinda Z. Li, Sewon Min, Srinivasan Iyer, Yashar Mehdad, and Wen-tau Yih
EMNLP, 2020 -
Linformer: Self-Attention with Linear Complexity
Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, and Hao Ma
ArXiv Preprint -
Language Models as Fact Checkers?
Nayeon Lee, Belinda Z. Li, Sinong Wang, Wen-tau Yih, Hao Ma, and Madian Khabsa
FEVER (Fact Extraction and VERification) Workshop @ ACL, 2020 -
Active Learning for Coreference Resolution using Discrete Annotation
Belinda Z. Li, Gabriel Stanovsky, and Luke Zettlemoyer
ACL, 2020