Publications & Presentations

Key research outputs. First-author publications are highlighted.

The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features

Jeremias Lino Ferrao, et al.

Self-Ablating Transformers: More Interpretability, Less Sparsity

Jeremias Lino Ferrao, et al.

World Model Agents with Change-Based Intrinsic Motivation

Jeremias Lino Ferrao, et al.