From 3791b90ab781a7384ed0da6dcf6f41b096a37721 Mon Sep 17 00:00:00 2001 From: flucchetti Date: Fri, 14 Jul 2023 13:58:03 +0000 Subject: [PATCH] readme --- benchmark/README.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 benchmark/README.md diff --git a/benchmark/README.md b/benchmark/README.md new file mode 100644 index 0000000..1ddd05c --- /dev/null +++ b/benchmark/README.md @@ -0,0 +1,12 @@ +# Benchmark and Robot Simulator + +The benchmark tasks are stored in `benchmark.jsonl`. The benchmark works by running a simulation of the LLM-generated code using ASP. + +The simulation is checked with ASP temporal constraints for each task. A readable version of constraints can be found in `evaluator/constraints`. An ASP solver (Clingo) is used to determine whether the simulation trace satisfies the constraints. + +## Walkthrough + +- `evaluator/robot.lp` contains the ASP rules governing state changes in our simulated world. +- `simple_tracer.py` contains a script for turning python generated code into a trace of ASP instructions to feed to the simulation. +- `evaluator/evaluate.py` is called by the top-level RoboEval script and runs the simulation. +- `evaluator/solve_utils.py` contains a class of helper python functions that can be called in ASP. \ No newline at end of file