Stories by ModelForge

Claude Code's Real Secret Sauce Isn't the Model

The State of LLMs 2025: Progress, Problems, and Predictions

A Researcher's Field Guide to Non-Standard LLM Architectures

Explanation of Gated DeltaNet (Qwen3-Next and Kimi Linear)

The Core Components of Modern LLMs and the Models Beyond Transformers [video]

Popular Attention Alternatives: GQA, MLA, SWA

Multi-Head Latent Attention

Thinking Machines Lab Co-Founder Departs for Meta

OpenAI's internal Slack messages could cost it billions in copyright suit

LLM Evaluation from Scratch: Multiple Choice, Verifiers, Leaderboards, LLM Judge

Gemma 3 270M re-implemented in pure PyTorch for local tinkering

GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2

LLM Research Papers: The 2024 List

Scaling Test-Time Compute with Open LLM Models