/ar:run — Single Experiment Iteration¶
Run exactly ONE experiment iteration: review history, decide a change, edit, commit, evaluate.
Usage¶
What It Does¶
Step 1: Resolve experiment¶
If no experiment specified, run python {skill_path}/scripts/setup_experiment.py --list and ask the user to pick.
Step 2: Load context¶
# Read experiment config
cat .autoresearch/{domain}/{name}/config.cfg
# Read strategy and constraints
cat .autoresearch/{domain}/{name}/program.md
# Read experiment history
cat .autoresearch/{domain}/{name}/results.tsv
# Checkout the experiment branch
git checkout autoresearch/{domain}/{name}
Step 3: Decide what to try¶
Review results.tsv: - What changes were kept? What pattern do they share? - What was discarded? Avoid repeating those approaches. - What crashed? Understand why. - How many runs so far? (Escalate strategy accordingly)
Strategy escalation: - Runs 1-5: Low-hanging fruit (obvious improvements) - Runs 6-15: Systematic exploration (vary one parameter) - Runs 16-30: Structural changes (algorithm swaps) - Runs 30+: Radical experiments (completely different approaches)
Step 4: Make ONE change¶
Edit only the target file specified in config.cfg. Change one thing. Keep it simple.
Step 5: Commit and evaluate¶
git add {target}
git commit -m "experiment: {short description of what changed}"
python {skill_path}/scripts/run_experiment.py \
--experiment {domain}/{name} --single
Step 6: Report result¶
Read the script output. Tell the user: - KEEP: "Improvement! {metric}: {value} ({delta} from previous best)" - DISCARD: "No improvement. {metric}: {value} vs best {best}. Reverted." - CRASH: "Evaluation failed: {reason}. Reverted."
Step 7: Self-improvement check¶
After every 10th experiment (check results.tsv line count), update the Strategy section of program.md with patterns learned.
Rules¶
- ONE change per iteration. Don't change 5 things at once.
- NEVER modify the evaluator (evaluate.py). It's ground truth.
- Simplicity wins. Equal performance with simpler code is an improvement.
- No new dependencies.