Reliable Data Engineering

#benchmarks

2 articles

AutoAgent: The AI That Engineers Its Own Harness and Tops Benchmarks

5 Apr, 2026

AutoAgent autonomously builds and optimizes agent harnesses without human engineering, achieving #1 on SpreadsheetBench (96.5%) and top GPT-5 score on TerminalBench (55.1%) in 24-hour runs.

An AI Agent Made $19,915 in 8 Hours. The Benchmark That Proved It Is Open Source.

25 Mar, 2026

ClawWork dropped 220 professional tasks across 44 job categories, gave AI agents $10 each, and told them to survive. One agent turned that into nearly twenty grand.