Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add code search tool with RAG capabilities #34

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

neubig
Copy link
Contributor

@neubig neubig commented Dec 22, 2024

Description

This PR adds a new code search tool that uses Retrieval Augmented Generation (RAG) to enable semantic code search across repositories.

Features

  • Semantic code search using sentence transformers (configurable via env var)
  • Fast similarity search using FAISS
  • Support for indexing any git repository with configurable file extensions
  • Save/load functionality for search indices
  • Comprehensive error handling and test coverage

Implementation Details

  • Added new dependencies: PyTorch CPU, Sentence Transformers, FAISS
  • New module structure:
    • code_search/core.py: Core indexing and search functionality
    • code_search/tools.py: High-level tool functions
    • Tests in tests/test_code_search.py

Example Usage

# Initialize search for a repository
result = initialize_code_search(
    repo_path="/path/to/repo",
    save_dir="/path/to/save",
    extensions=[".py"],  # optional
    embedding_model="BAAI/bge-base-en-v1.5"  # optional
)

# Search code
result = search_code(
    save_dir="/path/to/save",
    query="function that handles HTTP requests",
    k=5  # number of results
)

Testing

  • Added comprehensive unit tests
  • Tested on the openhands-aci repository itself with good results
  • Example search results for "code that handles file editing":
    File: openhands_aci/editor/__init__.py
    Score: 0.727
    ...
    

Notes

  • The embedding model can be configured through an environment variable
  • All tests are passing
  • Documentation included in code

@neubig neubig changed the title [AI Generated] Add code search tool with RAG capabilities Add code search tool with RAG capabilities Dec 22, 2024
code_search_index/index.faiss Outdated Show resolved Hide resolved
code_search_index/documents.pkl Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
openhands-agent and others added 4 commits December 22, 2024 16:43
- Remove binary index files and add to .gitignore
- Reorganize dependencies into optional groups:
  - code-search: sentence-transformers and faiss-cpu
  - pytorch-cpu: PyTorch CPU version
  - pytorch: Default PyTorch version
@neubig neubig marked this pull request as ready for review December 22, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants