GRAPES: Graph-based Reasoning and Planning with Ensemble Systems

Overview

GRAPES (Graph-based Reasoning and Planning with Ensemble Systems) is an innovative approach to enhancing language model reasoning capabilities through a Monte Carlo Tree Search (MCTS) based prompting mechanism. This system leverages a dual-model setup, utilizing Andromeda and Prometheus models, to improve reasoning outcomes significantly. The below system uses the open source llama3.1:8b model.

Key Components

1. Monte Carlo Tree Search (MCTS)

At the core of GRAPES is the MCTS algorithm, adapted for language model reasoning:

Selection: The algorithm starts from the root node and selects child nodes based on the UCB1 (Upper Confidence Bound 1) score, balancing exploration and exploitation.
Expansion: When a leaf node is reached, it's expanded by generating potential next steps in the reasoning process. Generation done by probing questions from Prometheus model.
Simulation: The current path is evaluated to produce an answer and reason.
Backpropagation: The evaluation results are propagated back up the tree, updating node statistics. Evaluation is done by the Prometheus model.

2. Dual Model Architecture

GRAPES employs two specialized models:

Andromeda: Responsible for generating reasoning steps and potential answers.
Prometheus: Evaluates the quality and logical consistency of the generated answers.

3. Graph Visualization

The system includes a MCTSVisualizer class that creates a visual representation of the search tree, enhancing interpretability.

Key Algorithms and Techniques

1. UCB1 Score Calculation

The UCB1 score is used for node selection:

def ucb1_score(node: MCTSNode, parent_visits: int) -> float:
    if node.visits == 0:
        return float('inf')
    return (node.value / node.visits) + math.sqrt(2 * math.log(parent_visits) / node.visits)

2. Probing Questions

Prometheus generates probing questions to guide the exploration:

probe_prompt = f"Based on the question '{question}' and the current reasoning path:\n{' -> '.join(path)}\nGenerate a probing question that could lead to insightful next steps in the reasoning process."

3. Answer Evaluation

Prometheus evaluates the generated answers:

prompt = f"Evaluate this answer and reason, focusing on inconsistencies or logical errors:\nAnswer: {answer}\nReason: {reason}\n Original question:{question}\n Is this answer logically consistent and correct?"

4. Rejection History

The system maintains a rejection history to avoid repeating mistakes:

rejection_context = "\n".join(
    f"Previous attempt {i+1}:\nAnswer: {item['Answer']}\nReason: {item['Reason']}\nRejection: {item['Rejection']}"
    for i, item in enumerate(rejection_history)
)

Implementation Details

Asynchronous Processing: The system uses asyncio for concurrent operations, enhancing efficiency.
Pydantic Models: Structured data models ensure type safety and validation.
Configurable Parameters: The MCTS process can be fine-tuned with parameters like max_iterations and max_depth.
Error Handling: Robust error handling with retries for model queries.

Results

The results of the GRAPES model show significant improvement in reasoning capabilities:

Accuracy Improvement:
- llama3.1:8b model: 62.50% accuracy
- GRAPES (grapes_llama3.1:8b): 78.12% accuracy
Reasoning Quality: The GRAPES model demonstrated more coherent and logically consistent reasoning paths.
Explainability: The graph visualization provides insights into the reasoning process, enhancing the interpretability of the model's decisions. Also, the reasoning path is easily audited.
Dataset:
- models are tested on the following dataset: https://github.com/Mihir3009/LogicBench/tree/main/data/LogicBench(Eval)
- questions and asnwers are a collection of 64 reasoning questions from the repository above.

Conclusion

GRAPES shows promising improvement in llama3.1:8b model reasoning capaibilities. By combining MCTS with a dual-model architecture, it achieves more accurate and logically consistent responses. The system's ability to learn from rejections and adapt its reasoning path showcases its potential for continual improvement in complex reasoning tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
andromeda_prompt.txt		andromeda_prompt.txt
grapes.py		grapes.py
mctsmodels.py		mctsmodels.py
models.py		models.py
orchestrate.py		orchestrate.py
prometheus_prompt.txt		prometheus_prompt.txt
questions.json		questions.json
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GRAPES: Graph-based Reasoning and Planning with Ensemble Systems

Overview

Key Components

1. Monte Carlo Tree Search (MCTS)

2. Dual Model Architecture

3. Graph Visualization

Key Algorithms and Techniques

1. UCB1 Score Calculation

2. Probing Questions

3. Answer Evaluation

4. Rejection History

Implementation Details

Results

Conclusion

About

Releases

Packages

Languages

License

citizenhicks/grapes

Folders and files

Latest commit

History

Repository files navigation

GRAPES: Graph-based Reasoning and Planning with Ensemble Systems

Overview

Key Components

1. Monte Carlo Tree Search (MCTS)

2. Dual Model Architecture

3. Graph Visualization

Key Algorithms and Techniques

1. UCB1 Score Calculation

2. Probing Questions

3. Answer Evaluation

4. Rejection History

Implementation Details

Results

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages