Research Agent

This Research Agent is a powerful tool designed to automate the process of gathering and distilling data into a report using a variety of APIs and web scraping (Langchain) tools.

It generates research outlines, converts them into Directed Acyclic Graphs (DAGs), scrapes webpages, and interacts with APIs (such as Brave, You, Exa, etc.) to answer each question in the outline. You can also add your own tools to connect additional data sources.

Consider this a draft. There is potential to improve both the repository and the agent itself. Contributions are welcome!

Areas of Improvement

Flat vs. Depth: The research agent works well for topics that are widely covered on the internet, such as USA visa options and key metrics for SaaS companies. However, it is less effective for in-depth searches.

The planning step does not include any form of reflection or feedback from the user. The ADaPT paper could be a valid strategy here.

Demo

Langfuse Trace

Watch our demo to see the Research Agent in action, from generating research outlines to summarizing comprehensive responses.

Getting Started

To get started with this research agent, follow these steps:

Clone the repository and prepare venv:

# Clone the repository
git clone https://github.com/z33pX/research-agent.git
 
# Navigate to the project directory
cd research-agent
 
# Set up a virtual environment
python3 -m venv venv
 
# Activate the virtual environment and install dependencies
source venv/bin/activate && pip install -r requirements.txt
# On Windows, use the following instead of the above command:
# venv\Scripts\activate && pip install -r requirements.txt

Rename the env file to .env and fill in the necessary keys:
```
mv env .env
# Open the .env file and add your keys
```
Run the agent:
```
python app.py
```

Overview

This research agent workflow involves a user providing a prompt, which is processed by an LLM to generate an outline and a directed acyclic graph (DAG) of research topics (questions and their sub-questions). The task scheduler then executes tasks for each topic, collecting and storing content using various search tools (LangChain plug-and-play), evaluating its relevancy, and potentially generating additional research topics if more content is needed. After extracting cliff notes for each topic, the agent combines and summarizes them into one concise summary.

Workflow Description

1. User Interaction

User Prompt: The user initiates the process by providing a prompt.
Context Window: The user prompt and system instructions are fed into the LLM to generate an outline.

2. Outline Generation

LLM - Generate Outline: The LLM creates a research outline based on the user prompt.
Generate DAG: The outline is then converted into a JSON DAG by another LLM.

3. Task Scheduling

Task Scheduler: The JSON DAG is processed by the task scheduler, which executes tasks in an optimized way (in parallel when possible, otherwise sequentially).

4. Task Flow

Execute Task: Each task from the DAG is executed.
Result Handling: Results are either collected from existing content or new content is gathered.
- Load Existing Content: Checks if content from previous tasks can be used.
- Collect Content: If new content is needed, it is collected, processed and added to the state and database.

5. Content Collection Flow

Pick Tools to Get Information: The LLM selects the appropriate tools for information gathering.
Scrape and Store: Tools are executed to scrape webpages and retrieve data from APIs. Then, the content is stored.
- Scrape and retrieve: Various tools like YouComSearch, SimilarWebSearch, ExaCompanySearch, and NewsSearch are used to gather content.
- Store: The collected content is stored to be retrieved later by subsequent tasks or other research jobs.

6. Content Processing

Decide Relevancy: The LLM evaluates the relevancy of the collected content.
Extract 30+ Notes: Relevant content is processed to extract useful notes.
More Content Needed: Checks if additional content is required. If yes, the process loops back to content collection.

7. Summarization

LLM - Summarize Task Result: The final results are summarized and concatenated, providing the user with a comprehensive research output.

Tools

News Search

This tool searches for news on Google using the Serper API by first selecting relevant links with an LLM. It then scrapes the selected websites for content, which is processed by the LLM to generate a response.

You.com Search

This tool searches for information using the You.com Search Engine by first selecting relevant results. It then processes these results using an LLM, which generates a response based on the user prompt and contextual information.

Similar Web Search

This tool retrieves relevant data from SimilarWeb based on the user prompt. For example, if the prompt "Get web traffic for Tesla" does not include the relevant domain, the tool first searches for related domains using Brave Search. After finding the domain, it inserts it into a SimilarWeb URL, scrapes the page using Zyte, and provides the content to GPT-4o for summarization before delivering the final response to the user.

Exa Company Search

This tool processes user queries by first using Exa Company Search to find relevant company information. It then scrapes the identified websites for content, which is subsequently processed by an LLM to generate a response for the user.

Contributing

We welcome contributions to enhance this project, whether it's adding new tools, fixing bugs, or improving documentation. Please fork the repository and create a pull request with your improvements.

License

This project is licensed under the MIT License, allowing for free use, modification, and distribution. See the LICENSE file for details.

Contact

For any questions or suggestions, please contact me on X (Twitter). Receive practical AI-related news, including hands-on tutorials, tools, and research: Tech-Optimist

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
images		images
prompts		prompts
research_agent		research_agent
tools		tools
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
app.py		app.py
env		env
requirements.txt		requirements.txt
research_2024_07_19_15_02_54.json		research_2024_07_19_15_02_54.json
research_2024_07_19_16_18_18.json		research_2024_07_19_16_18_18.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research Agent

Areas of Improvement

Demo

Getting Started

Overview

Workflow Description

1. User Interaction

2. Outline Generation

3. Task Scheduling

4. Task Flow

5. Content Collection Flow

6. Content Processing

7. Summarization

Tools

News Search

You.com Search

Similar Web Search

Exa Company Search

Contributing

License

Contact

About

Releases

Packages

Languages

z33pX/research-agent

Folders and files

Latest commit

History

Repository files navigation

Research Agent

Areas of Improvement

Demo

Getting Started

Overview

Workflow Description

1. User Interaction

2. Outline Generation

3. Task Scheduling

4. Task Flow

5. Content Collection Flow

6. Content Processing

7. Summarization

Tools

News Search

You.com Search

Similar Web Search

Exa Company Search

Contributing

License

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages