Stories by kumama

Designing dev onboarding for an agent-first world

I post-trained a model to reliably roll a die

Open-Weight Models Don't Need to Win

Prompt caching but for RL – 7.5x speedup on long-prompt/short-response workloads

Pokegents: Making multi-agent coding feel like a team

Grpo explained: group relative policy optimization for LLM finetuning

Do RL on a model with your vector db

What is reinforcement learning finetuning

RAG to riches: synthetic data for training RAG agents

rag not lag: rl for fast agentic retrieval

Show HN: Benchmax, a new open-source RL environment framework for LLM finetuning

Beating o3/o4-mini with Codebase-specific Reinforcement Learning

We might be overestimating coding agent performance on SWE-Bench

How to Improve Code Completion LLMs with Repo-Specific Finetuning

Show HN: Free AI Code Completion for Xcode with model choice/codebase context