clawbench

The agent benchmark that scores the full stack — harness, config, and model — not just the LLM. Trace-based scoring, reliability metrics, configuration diagnostics.

Category: Extension
Stars: 88
Updated: 2026-05-09
Source: https://github.com/openclaw/clawbench

Use case

Skill, extension, or automation surface discovered via GitHub search.

About this listing

Part of the OpenClaw board, ranked #24 by GitHub stars within that slice. Data auto-refreshes from the crawler—validate licenses and maturity on the upstream repository.

Neighbors

clawbench

Use case

About this listing

More from OpenClaw

openclaw

clawhub

gogcli