HN
Paper
All
Show
Ask
Jobs
Top
Today
Last 7 days
Last months
This year
Statistics
All
Show
Ask
Jobs
Top stories
Today
Last 7 days
Last months
This year
Statistics
Stories by
smaddrellmander
The annotated PyTorch training loop
5 points
smaddrellmander
2026-06-22T23:44:59Z
idlemachines.co.uk
Mae vs. MSE: more than just the mean vs. median debate
1 points
smaddrellmander
2026-06-19T00:18:45Z
idlemachines.co.uk
DiffusionGemma: Discrete diffusion in a large language model
3 points
smaddrellmander
2026-06-11T22:21:21Z
idlemachines.co.uk
Heaven knows I'm perplexed now
2 points
smaddrellmander
2026-06-06T20:41:46Z
idlemachines.co.uk
Reading MAI's efficiency gain. How to pick architectures like serious people
9 points
smaddrellmander
2026-06-04T00:05:15Z
idlemachines.co.uk
MAI-Thinking-1: Building a Hill-Climbing Machine [pdf]
2 points
smaddrellmander
2026-06-03T07:46:03Z
microsoft.ai
Are contrastive losses just cross entropy all along?
2 points
smaddrellmander
2026-06-02T14:15:35Z
idlemachines.co.uk
Every token, everywhere, all at once
2 points
smaddrellmander
2026-05-26T13:32:58Z
idlemachines.co.uk
The cut in the Mixture of Experts compute graph
1 points
smaddrellmander
2026-05-19T07:50:10Z
idlemachines.co.uk
DeepSeek V4 from the Inside
2 points
smaddrellmander
2026-05-12T10:52:57Z
idlemachines.co.uk
Softmax, can you derive the Jacobian? And should you care?
131 points
smaddrellmander
2026-04-27T20:38:22Z
idlemachines.co.uk
Gemma 4 is not your standard transformer
2 points
smaddrellmander
2026-04-22T08:42:29Z
idlemachines.co.uk