-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polish search API, frontend and backend #28
Merged
Merged
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
a83106c
first working with debugger
rbs333 16f6817
first decent cards
rbs333 9ece6b8
FE checkpoint
rbs333 65d6281
remove old backend, pydantic update
rbs333 405569a
wip working tests
rbs333 7ea0d5c
update fonts
rbs333 28d81ee
FE updates
rbs333 19ad80d
fix warning and reload
rbs333 7a0898d
lint, mypy, etc
rbs333 a8fc337
ls
rbs333 2b3ae1f
cwd
rbs333 4e4ec35
actually run tests
rbs333 45d229d
add env back
rbs333 f8d666d
fix typo
rbs333 2cfcd21
try localhost
rbs333 6e0713f
yet another type fix
rbs333 328624f
fix index init
rbs333 20437da
try, try again
rbs333 f1adb39
wip
rbs333 5fc9b52
first working with docker-compose
rbs333 0fac771
by_text working
rbs333 d7f309b
cleanup before push
rbs333 ec4368e
add s3 bit
rbs333 cafde63
remove unneeded, modify readme, modify test
rbs333 434b181
update function
rbs333 29172b3
more cleanup
rbs333 463bdbe
fix circular import
rbs333 e97f4a6
add templates for test
rbs333 78550eb
remove pin
rbs333 041913b
path thing
rbs333 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
name: Test Suite | ||
|
||
on: | ||
pull_request: | ||
branches: | ||
- main | ||
|
||
push: | ||
branches: | ||
- main | ||
|
||
jobs: | ||
test: | ||
name: Python ${{ matrix.python-version }} - ${{ matrix.connection }} [redis-stack ${{matrix.redis-stack-version}}] | ||
runs-on: ubuntu-latest | ||
|
||
strategy: | ||
fail-fast: false | ||
matrix: | ||
python-version: ["3.11"] | ||
redis-stack-version: ['latest'] | ||
|
||
services: | ||
redis: | ||
image: redis/redis-stack-server:${{matrix.redis-stack-version}} | ||
ports: | ||
- 6379:6379 | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Set up Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
cache: 'pip' | ||
|
||
- name: Install Poetry | ||
uses: snok/install-poetry@v1 | ||
|
||
- name: Install dependencies | ||
working-directory: ./backend | ||
run: | | ||
poetry install --all-extras | ||
|
||
- name: Run tests | ||
env: | ||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} | ||
COHERE_API_KEY: ${{ secrets.COHERE_API_KEY }} | ||
working-directory: ./backend | ||
run: | | ||
poetry run test |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,17 @@ | ||
arxiv-metadata-oai-snapshot.json | ||
arxiv-papers-1000.json | ||
arxiv.zip | ||
*.DS_STORE | ||
*.log | ||
.env | ||
.ipynb_checkpoints | ||
*.pkl | ||
.venv | ||
venv | ||
__pycache__ | ||
new_backend/arxivsearch/templates/ | ||
*/.nvm | ||
.coverage* | ||
coverage.* | ||
htmlcov/ | ||
legacy-data/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
{ | ||
// Use IntelliSense to learn about possible attributes. | ||
// Hover to view descriptions of existing attributes. | ||
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 | ||
"version": "0.2.0", | ||
"configurations": [ | ||
{ | ||
"name": "Python Debugger: FastAPI", | ||
"type": "debugpy", | ||
"cwd": "${workspaceFolder}/backend/", | ||
"env": { | ||
"PYTHONPATH": "${cwd}" | ||
}, | ||
"request": "launch", | ||
"module": "uvicorn", | ||
"args": [ | ||
"arxivsearch.main:app", | ||
"--port=8888", | ||
"--reload" | ||
], | ||
"jinja": true, | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{ | ||
"python.testing.pytestArgs": [ | ||
"backend" | ||
], | ||
"python.testing.unittestEnabled": false, | ||
"python.testing.pytestEnabled": true, | ||
"python.testing.cwd": "${workspaceFolder}/backend/", | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,38 +1,47 @@ | ||
FROM node:18.8-alpine AS ReactImage | ||
FROM node:22.0.0 AS ReactImage | ||
|
||
WORKDIR /app/frontend | ||
|
||
ENV NODE_PATH=/app/frontend/node_modules | ||
ENV PATH=$PATH:/app/frontend/node_modules/.bin | ||
|
||
COPY ./frontend/package.json ./ | ||
RUN yarn install --no-optional | ||
RUN npm install | ||
|
||
ADD ./frontend ./ | ||
RUN yarn build | ||
RUN npm run build | ||
|
||
|
||
FROM python:3.9-slim-buster AS ApiImage | ||
FROM python:3.11 AS ApiImage | ||
|
||
ENV PYTHONUNBUFFERED 1 | ||
ENV PYTHONDONTWRITEBYTECODE 1 | ||
|
||
RUN python3 -m pip install --upgrade pip setuptools wheel | ||
|
||
WORKDIR /app/ | ||
COPY ./data/ ./data | ||
VOLUME [ "/data" ] | ||
|
||
RUN apt-get update && \ | ||
apt-get install -y curl && \ | ||
rm -rf /var/lib/apt/lists/* | ||
|
||
RUN curl -sSL https://install.python-poetry.org | POETRY_HOME=/opt/poetry python && \ | ||
cd /usr/local/bin && \ | ||
ln -s /opt/poetry/bin/poetry && \ | ||
poetry config virtualenvs.create false | ||
|
||
RUN mkdir -p /app/backend | ||
|
||
# copy deps first so we don't have to reload everytime | ||
COPY ./backend/poetry.lock ./backend/pyproject.toml ./backend/ | ||
|
||
WORKDIR /app/backend | ||
RUN poetry install --all-extras --no-interaction | ||
|
||
COPY ./backend/ . | ||
RUN pip install -e . --no-cache-dir | ||
|
||
# add static react files to fastapi image | ||
COPY --from=ReactImage /app/frontend/build /app/backend/arxivsearch/templates/build | ||
|
||
LABEL org.opencontainers.image.source https://github.com/RedisVentures/redis-arxiv-search | ||
|
||
WORKDIR /app/backend/arxivsearch | ||
|
||
CMD ["sh", "./entrypoint.sh"] | ||
CMD ["poetry", "run", "start-app"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,12 @@ | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the rest of this Readme up to date with instructions to run the service? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it is now. |
||
<div align="center"> | ||
<a href="https://github.com/RedisVentures/redis-arXiv-search"><img src="https://github.com/RedisVentures/redis-arXiv-search/blob/main/backend/arxivsearch/data/redis-logo.png?raw=true" width="30%"><img></a> | ||
<a href="https://github.com/RedisVentures/redis-arXiv-search"><img src="https://redis.io/wp-content/uploads/2024/04/Logotype.svg?raw=true" width="30%"><img></a> | ||
<br /> | ||
<br /> | ||
<div display="inline-block"> | ||
<a href="https://docsearch.redisvl.com"><b>Hosted Demo</b></a> | ||
<a href="https://github.com/RedisVentures/redis-arXiv-search"><b>Code</b></a> | ||
<a href="https://github.com/redis-developer/redis-ai-resources"><b>More AI Recipes</b></a> | ||
<a href="https://datasciencedojo.com/blog/ai-powered-document-search/"><b>Blog Post</b></a> | ||
<a href="https://redis.io/docs/interact/search-and-query/advanced-concepts/vectors/"><b>Redis Vector Search Documentation</b></a> | ||
</div> | ||
|
@@ -16,15 +17,14 @@ | |
# 🔎 Redis arXiv Search | ||
*This repository is the official codebase for the arxiv paper search app hosted at: **https://docsearch.redisvl.com*** | ||
|
||
|
||
[Redis](https://redis.com) is a highly performant, production-ready vector database, which can be used for many types of applications. Here we showcase Redis vector search applied to a document retrieval use case. Read more about AI-powered search in [the technical blog post](https://datasciencedojo.com/blog/ai-powered-document-search/) published by our partners, *[Data Science Dojo](https://datasciencedojo.com)*. | ||
|
||
### Dataset | ||
|
||
The arXiv papers dataset was sourced from the the following [Kaggle link](https://www.kaggle.com/Cornell-University/arxiv). arXiv is commonly used for scientific research in a variety of fields. Exposing a semantic search layer enables natural human language to be used to discover relevant papers. | ||
|
||
|
||
![Demo](data/assets/arXivSearch.png) | ||
|
||
## Application | ||
|
||
This app was built as a Single Page Application (SPA) with the following components: | ||
|
@@ -39,9 +39,46 @@ This app was built as a Single Page Application (SPA) with the following compone | |
- **[React-Bootstrap](https://react-bootstrap.github.io/)** for some UI elements | ||
- **[Huggingface](https://huggingface.co/sentence-transformers)**, **[OpenAI](https://platform.openai.com)**, and **[Cohere](https://cohere.com)** for vector embedding creation | ||
|
||
Some inspiration was taken from this [Cookiecutter project](https://github.com/Buuntu/fastapi-react) | ||
Some inspiration was taken from this [tiangolo/full-stack-fastapi-template](https://github.com/tiangolo/full-stack-fastapi-template) | ||
and turned into a SPA application instead of a separate front-end server approach. | ||
|
||
### General Project Structure | ||
|
||
``` | ||
/backend | ||
/arxivsearch | ||
/api | ||
/routes | ||
papers.py # primary paper search logic lives here | ||
/db | ||
load.py # seeds Redis DB | ||
redis_helpers.py # redis util | ||
/schema | ||
# pydantic models for serialization/validation from API | ||
/tests | ||
/utils | ||
config.py | ||
spa.py # logic for serving compiled react project | ||
main.py # entrypoint | ||
/frontend | ||
/public | ||
# index, manifest, logos, etc. | ||
/src | ||
/config | ||
/styles | ||
/views | ||
# primary components live here | ||
|
||
api.ts # logic for connecting with BE | ||
App.tsx # project entry | ||
Routes.tsk # route definitions | ||
... | ||
/data | ||
# folder mounted as volume in Docker | ||
# load script auto populates initial data from S3 | ||
|
||
``` | ||
|
||
### Embedding Providers | ||
Embeddings represent the semantic properies of the raw text and enable vector similarity search. This applications supports `HuggingFace`, `OpenAI`, and `Cohere` embeddings out of the box. | ||
|
||
|
@@ -99,22 +136,33 @@ $ docker compose -f docker-local-redis.yml up | |
|
||
|
||
## Customizing (optional) | ||
- You can use the provided Jupyter Notebook in the [`data/`](data/README.md) directory to create paper embeddings and metadata. The output JSON files will end up stored in the `data/` directory and used when creating your own container. | ||
- Use the `./build.sh` script to build your own docker image based on the application source code and dataset changes. | ||
- If you want to use K8s instead of Docker Compose, we have some [resources to help you get started](k8s/README.md). | ||
|
||
### Run local redis with Docker | ||
```bash | ||
docker run -d --name redis -p 6379:6379 -p 8001:8001 redis/redis-stack:latest | ||
``` | ||
|
||
### FastApi with poetry | ||
To run the backend locally | ||
|
||
1. `cd backend` | ||
2. `poetry install` | ||
3. `poetry run start-app` | ||
|
||
*poetry run start-app runs the initial db load script and launch the API* | ||
|
||
### React Dev Environment | ||
It's typically easier to build front end in an interactive environment, testing changes in realtime. | ||
|
||
1. Deploy the app using steps above. | ||
2. Install packages (you may need to use `npm` to install `yarn`) | ||
2. Install packages | ||
```bash | ||
$ cd frontend/ | ||
$ yarn install --no-optional | ||
$ npm install | ||
```` | ||
4. Use `yarn` to serve the application from your machine | ||
4. Use `npm` to serve the application from your machine | ||
```bash | ||
$ yarn start | ||
$ npm run start | ||
``` | ||
5. Navigate to `http://localhost:3000` in a browser. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Python | ||
__pycache__ | ||
app.egg-info | ||
*.pyc | ||
.mypy_cache | ||
.coverage | ||
htmlcov | ||
.venv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
from fastapi import APIRouter | ||
|
||
from arxivsearch.api.routes import papers | ||
|
||
api_router = APIRouter() | ||
api_router.include_router(papers.router, prefix="/papers", tags=["papers"]) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like checking in launch scripts in case people need help setting a debuger