Controlling AI Agents via Resampling
Where does control fit in among alignment techniques?
Controlling AI Agents via Resampling
BashBench
We decided that solving sys-admin problems is a natural progression from previous work in the direction of models performing autonomous AI R&D work.
Each task consists of a natural language description and a hidden set of Pytest tests. The agent is provided the task’s natural language description and works on the task until all tests pass.