clawbench
The agent benchmark that scores the full stack — harness, config, and model — not just the LLM. Trace-based scoring, reliability metrics, configuration diagnostics.
- Category
- Extension
- Stars
- 88
- Updated
- 2026-05-09
Use case
Skill, extension, or automation surface discovered via GitHub search.
About this listing
Part of the OpenClaw board, ranked #24 by GitHub stars within that slice. Data auto-refreshes from the crawler—validate licenses and maturity on the upstream repository.