Toggle navigation
HN
Paper
All
Show
Ask
Jobs
Top stories
Today
Last 7 days
Last months
This year
Stats
Stories by karinemellata
Alignment is not free: How model upgrades can silence your confidence signals
117 points
karinemellata
2025-05-06T23:22:49Z
www.variance.co
We used sparse autoencoders to explain LLM moderation flags of violent threats
6 points
karinemellata
2025-04-21T20:11:58Z
www.variance.co