#benchmarks
2 articles
-
AutoAgent: The AI That Engineers Its Own Harness and Tops Benchmarks
AutoAgent autonomously builds and optimizes agent harnesses without human engineering, achieving #1 on SpreadsheetBench (96.5%) and top GPT-5 score on TerminalBench (55.1%) in 24-hour runs.
-
An AI Agent Made $19,915 in 8 Hours. The Benchmark That Proved It Is Open Source.
ClawWork dropped 220 professional tasks across 44 job categories, gave AI agents $10 each, and told them to survive. One agent turned that into nearly twenty grand.