Data Analysis Crow is an AI agent framework designed to perform complex scientific data analysis tasks by iteratively working through Jupyter notebooks. This agent takes in datasets and prompts, then systematically explores, analyzes, and interprets the data to provide comprehensive answers and insights.
The agent was used to produce the trajectories for the BixBench benchmark.
- Accepts datasets and natural language prompts
- Iteratively builds Jupyter notebooks to answer research questions
- Works with Python, R, and Bash code execution
- Specializes in bioinformatics analysis but adaptable to various domains
- Comes with a Docker image including most common bioinformatics packages
# Clone the repository
git clone https://github.com/Future-House/data-analysis-crow.git
cd data-analysis-crow
# Install dependencies
pip install -e .
# OPTIONAL:pull the docker image with bioinformatics packages
docker pull futurehouse/bixbench:aviary-notebook-env
We support all LLMs that are supported by litellm. Create a .env
file with the API keys for the LLMs you want to use. For example:
OPENAI_API_KEY = "your-openai-api-key"
ANTHROPIC_API_KEY = "your-anthropic-api-key"
The agent works by taking a dataset and a prompt, then iteratively building a Jupyter notebook to answer the question. Visit the tutorial for a simple step-by-step guide on how to use the agent.
For advanced evaluations, you can configure server.yaml
and runner.yaml
in the src/scripts/bixbench_evaluation
directory and then run the evaluation script:
bash src/scripts/bixbench_evaluation/run.sh
This will:
- Load the specified dataset
- Process the prompt to understand the research question
- Generate a Jupyter notebook with progressive analysis steps
- Provide a final answer based on the analysis
Results are saved in the output directory specified in your configuration file.
Note that the dataset and environment configuration must be updated appropriately. For an example, see dataset.py which includes the capsule dataset configuration used for the BixBench benchmark.
We also recommend visiting the BixBench repository where we share a full evaluation harness for the agent.
Coming soon!
Data Analysis Crow was used to produce the trajectories for the BixBench benchmark, which evaluates AI agents on real-world bioinformatics tasks.
BixBench tests AI agents' ability to:
- Explore biological datasets
- Perform long, multi-step computational analyses
- Interpret nuanced results in the context of a research question
You can find the BixBench dataset in Hugging Face, the paper here, and the blog post here.
To use this agent for BixBench evaluations, we recommend visiting the BixBench repository for more details.