GRAPES (Graph-based Reasoning and Planning with Ensemble Systems) is an innovative approach to enhancing language model reasoning capabilities through a Monte Carlo Tree Search (MCTS) based prompting mechanism. This system leverages a dual-model setup, utilizing Andromeda and Prometheus models, to improve reasoning outcomes significantly. The below system uses the open source llama3.1:8b model.
At the core of GRAPES is the MCTS algorithm, adapted for language model reasoning:
- Selection: The algorithm starts from the root node and selects child nodes based on the UCB1 (Upper Confidence Bound 1) score, balancing exploration and exploitation.
- Expansion: When a leaf node is reached, it's expanded by generating potential next steps in the reasoning process. Generation done by probing questions from Prometheus model.
- Simulation: The current path is evaluated to produce an answer and reason.
- Backpropagation: The evaluation results are propagated back up the tree, updating node statistics. Evaluation is done by the Prometheus model.
GRAPES employs two specialized models:
- Andromeda: Responsible for generating reasoning steps and potential answers.
- Prometheus: Evaluates the quality and logical consistency of the generated answers.
The system includes a MCTSVisualizer
class that creates a visual representation of the search tree, enhancing interpretability.
The UCB1 score is used for node selection:
def ucb1_score(node: MCTSNode, parent_visits: int) -> float:
if node.visits == 0:
return float('inf')
return (node.value / node.visits) + math.sqrt(2 * math.log(parent_visits) / node.visits)
Prometheus generates probing questions to guide the exploration:
probe_prompt = f"Based on the question '{question}' and the current reasoning path:\n{' -> '.join(path)}\nGenerate a probing question that could lead to insightful next steps in the reasoning process."
Prometheus evaluates the generated answers:
prompt = f"Evaluate this answer and reason, focusing on inconsistencies or logical errors:\nAnswer: {answer}\nReason: {reason}\n Original question:{question}\n Is this answer logically consistent and correct?"
The system maintains a rejection history to avoid repeating mistakes:
rejection_context = "\n".join(
f"Previous attempt {i+1}:\nAnswer: {item['Answer']}\nReason: {item['Reason']}\nRejection: {item['Rejection']}"
for i, item in enumerate(rejection_history)
)
- Asynchronous Processing: The system uses
asyncio
for concurrent operations, enhancing efficiency. - Pydantic Models: Structured data models ensure type safety and validation.
- Configurable Parameters: The MCTS process can be fine-tuned with parameters like
max_iterations
andmax_depth
. - Error Handling: Robust error handling with retries for model queries.
The results of the GRAPES model show significant improvement in reasoning capabilities:
-
Accuracy Improvement:
-
Reasoning Quality: The GRAPES model demonstrated more coherent and logically consistent reasoning paths.
-
Explainability: The graph visualization provides insights into the reasoning process, enhancing the interpretability of the model's decisions. Also, the reasoning path is easily audited.
-
Dataset:
- models are tested on the following dataset: https://github.com/Mihir3009/LogicBench/tree/main/data/LogicBench(Eval)
- questions and asnwers are a collection of 64 reasoning questions from the repository above.
GRAPES shows promising improvement in llama3.1:8b model reasoning capaibilities. By combining MCTS with a dual-model architecture, it achieves more accurate and logically consistent responses. The system's ability to learn from rejections and adapt its reasoning path showcases its potential for continual improvement in complex reasoning tasks.