Stories by bearseascape

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

MooseAgent: A LLM Based Multi-Agent Framework for Automating Moose Simulation

Automated Researchers Can Subtly Sandbag

Auditing Language Models for Hidden Objectives

Policy for LLM Writing on LessWrong

Towards Understanding Distilled Reasoning Models: A Representational Approach

Transformers Learn to Implement Multistep Gradient Descent with Chain of Thought

(Mis)Fitting: A Survey of Scaling Laws

Resurrecting saturated LLM benchmarks with adversarial encoding

Deep Double Descent: Where Bigger Models and More Data Hurt

Value-Based Deep RL Scales Predictably