Toggle navigation
HN
Paper
All
Show
Ask
Jobs
Top stories
Today
Last 7 days
Last months
This year
Stats
Stories by zone411
Show HN: Bazaar – a new LLM benchmark for economic reasoning under uncertainty
8 points
zone411
2025-07-22T17:55:18Z
github.com
AI Comes Up with Physics Experiments. But They Work
4 points
zone411
2025-07-21T14:40:32Z
www.quantamagazine.org
Emergent Price-Fixing by LLM Auction Agents
7 points
zone411
2025-07-15T11:37:28Z
github.com
Public Goods Game Benchmark: Contribute and Punish, a Multi-Agent Benchmark
7 points
zone411
2025-03-20T15:32:52Z
github.com
Elimination Game: Multi-Agent LLM Social Reasoning, Strategy, and Deception
5 points
zone411
2025-02-25T19:55:17Z
github.com
SWE-Lancer: a benchmark of freelance software engineering tasks from Upwork
110 points
zone411
2025-02-18T05:25:05Z
arxiv.org
LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21
17 points
zone411
2025-02-10T19:09:20Z
github.com
Multi-Agent Step Race Benchmark: LLM Collaboration and Deception Under Pressure
7 points
zone411
2025-01-22T16:17:47Z
github.com
Show HN: LLM Thematic Generalization Benchmark
6 points
zone411
2025-01-14T17:32:45Z
github.com
Show HN: LLM Creative Story-Writing Benchmark
5 points
zone411
2025-01-06T16:36:03Z
github.com
Show HN: LLM Divergent Thinking Creativity Benchmark
8 points
zone411
2024-12-30T18:29:30Z
github.com
Show HN: LLM Deceptiveness and Gullibility Benchmark
7 points
zone411
2024-10-22T16:27:14Z
github.com
LLM Confabulation (Hallucination) Leaderboard
6 points
zone411
2024-10-10T15:34:33Z
github.com
4 points
zone411
2024-10-10T15:21:07Z
news.ycombinator.com
O1-preview and o1-mini results on NYT Connections
2 points
zone411
2024-09-13T23:16:56Z
twitter.com
Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy
213 points
zone411
2023-11-05T05:02:33Z
twitter.com
Can you beat a stochastic parrot? ParrotChess.com
3 points
zone411
2023-09-22T05:41:46Z
parrotchess.com
Generative AI while browsing in Chrome
3 points
zone411
2023-08-15T23:50:55Z
labs.google.com
1 points
zone411
2023-07-08T02:20:23Z
news.ycombinator.com
Statement on AI Risk
341 points
zone411
2023-05-30T10:08:15Z
www.safe.ai
1
2