HN
Paper
All
Show
Ask
Jobs
Top
Today
Last 7 days
Last months
This year
Statistics
All
Show
Ask
Jobs
Top stories
Today
Last 7 days
Last months
This year
Statistics
Stories by
NicoConstant
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
218 points
NicoConstant
2026-05-29T09:47:23Z
blog.kog.ai
Kog AI – Building a Real-Time Inference Stack on AMD Instinct GPUs [video]
8 points
NicoConstant
2026-05-15T08:17:35Z
www.youtube.com