Gabriele Sarti
Postdoctoral Researcher, Northeastern University
-
Gabriele Sarti is a postdoctoral researcher at the BauLab in Northeastern University, working on interpretability interfaces for the National Deep Inference Foundation (NDIF) project. His research spans various aspects of interpretability, including context attribution for faithful answer citations, steering for personalization, and evaluation of interpretability insights in professional workflows. He is an active maintainer of multiple open-source interpretability libraries and a regular mentor for the SPAR program.
-
Analysis, control and compression of reasoning chains and agentic traces produced by LRMs and LLM-based agents.
References: Fast KV Compaction via Attention Matching, Thought Anchors: Which LLM Reasoning Steps Matter?, A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents
Visual components and interfaces to support prototyping and interpretability workflows on LLMs at scale
References: Designing a Dashboard for Transparency and Control of Conversational AI, Model Diffing Toolkit, NDIF Workbench
Analysis and control of user modeling in LLM conversations, with a focus on implicit modeling and how it causally influences model behavior.
References: Large-scale online deanonymization with LLMs, Scalably Extracting Latent Representations of Users
Analysis of memory structures, usage and updating for stateful LLM agents
-
Strong candidates will have:
Hands-on research experience with large language models (LLMs), in particular on evaluations and/or interpretability analyses beyond small toy tasks.
Previous experience with interpretability techniques and tools (e.g. TransformerLens, NNsight), with a good understanding of their limitation when conducting analyses at scale.
The ability to independently identify promising avenue for research, translate research hypotheses into a solid set of experiments, and connect their results to the original motivations.
Experience with writing production-quality code and designing abstraction hierarchies and data formats for streamlining analytical workflows.
Previous experience on user-facing interfaces, experimental design, and user-centered evaluation in the HCI domain.