ArenaOpus 4.8#1
SWE-bench VerifiedFable 595.0%
SWE-bench ProOpus 4.869.2%
Terminal-Bench 2.1Opus 4.874.6%
GPQA DiamondOpus 4.893.6%
Humanity's Last ExamOpus 4.857.9%
OSWorldOpus 4.883.4%
Online-Mind2WebOpus 4.884.0%
GDPval-AAOpus 4.81890 Elo
Finance Agent v2Opus 4.853.9%
Arena TextGrok-4.1 Thinking1483
ArenaOpus 4.8#1
SWE-bench VerifiedFable 595.0%
SWE-bench ProOpus 4.869.2%
Terminal-Bench 2.1Opus 4.874.6%
GPQA DiamondOpus 4.893.6%
Humanity's Last ExamOpus 4.857.9%
OSWorldOpus 4.883.4%
Online-Mind2WebOpus 4.884.0%
GDPval-AAOpus 4.81890 Elo
Finance Agent v2Opus 4.853.9%
Arena TextGrok-4.1 Thinking1483
TD

Thierry Damiba

// Building the agentic work stack.

Developer Evangelist at Arcade. Founder, Scale Intelligence. I write playbooks from the field on agents, GTM engineering, and what happens when machines start doing the work.

Latest

The Problem Is Not Delegation

Read the essay

Building something in the agent space? Book a call →