Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Beam for serverless /ingest #223

Merged
merged 39 commits into from
Mar 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
9f41df9
Fully working initial work - in production for webscrape
KastanDay Feb 22, 2024
6d5eba4
cleanup
KastanDay Feb 22, 2024
4a3cf55
Add retries to ingest
KastanDay Feb 22, 2024
ade5ab1
Change scaling strategy
KastanDay Feb 22, 2024
da26920
Change scaling strategy
KastanDay Feb 22, 2024
aa70cc5
trunk improvement -- add isort
KastanDay Feb 22, 2024
316b1bd
disable timeouts, set max replicas to 10 (max we get for free)
KastanDay Feb 22, 2024
c6d5aed
Reduce autoscaling to improve costs. Set workers from 1 to 2.
KastanDay Mar 5, 2024
8cce02b
Remove all ingest code from vector_database.py
KastanDay Mar 5, 2024
2f30285
Clean up and format all code
KastanDay Mar 5, 2024
cc27a28
Revert "Clean up and format all code"
KastanDay Mar 5, 2024
bb6c41a
Merge branch 'main' into add_beam_serverless_ingest
KastanDay Mar 5, 2024
80a1119
Clean up requirements.txt, removed all related to ingest
KastanDay Mar 5, 2024
9460933
Add detailed posthog logging for /ingest failures
KastanDay Mar 6, 2024
44e9c11
Add nomic logging and deleting back to /ingest
KastanDay Mar 6, 2024
3281c70
Add nomic dependency
KastanDay Mar 6, 2024
ea9c432
Major Refactor introducing Dependency Injection
rohan-uiuc Mar 7, 2024
f0542a8
Move utils_tokenization to utils
rohan-uiuc Mar 7, 2024
f40d533
Merge branch 'add_beam_serverless_ingest' of github.com:UIUC-Chatbot/…
rohan-uiuc Mar 7, 2024
f6a787e
Add Flask-Injector to dependencies
rohan-uiuc Mar 7, 2024
a14cc44
Added executors for async operations
rohan-uiuc Mar 7, 2024
ad220a6
Adding injection to ExportService __init__, and add SQLDatabase injec…
rohan-uiuc Mar 7, 2024
8451a6b
fix posthog error logs
KastanDay Mar 7, 2024
f6615b1
Increase workers to 3, remove callback url for now
KastanDay Mar 7, 2024
db076a7
Add nomic logging to Beam ingest
KastanDay Mar 7, 2024
484b2a2
Fix sentry service instantiation issue in ExportService constructor
rohan-uiuc Mar 7, 2024
016e48b
Clean up env vars and minor type errors
KastanDay Mar 8, 2024
3017603
Clean up trunk check recommendation
KastanDay Mar 8, 2024
81fc4ef
Reduce workers from 6 to 3, should be more than enough with reduced r…
KastanDay Mar 8, 2024
5951fe4
Update OpenAI API type to be fetched from environment variable
rohan-uiuc Mar 8, 2024
ae00694
Reduce threads from 1_000 to 100. more sensible
KastanDay Mar 8, 2024
75251d1
Update posthog /getTopContexts name in so we can track impovements fr…
KastanDay Mar 8, 2024
f5a36fc
Add new method to SQLDatabase, update environment variable usage, som…
rohan-uiuc Mar 8, 2024
3abc7eb
Increase workers from 3 to 4
KastanDay Mar 8, 2024
8a17e3d
Fix deletion bug in retrieval_service to check for materials before d…
rohan-uiuc Mar 8, 2024
107eef5
Fix mimetype parsing for ValueError: not enough values to unpack (exp…
KastanDay Mar 12, 2024
d6976fe
Remove ffmpeg and tesseract-ocr, no ingest here anymore
KastanDay Mar 12, 2024
cf5e4df
Merge branch 'add_beam_serverless_ingest' of github.com:UIUC-Chatbot/…
rohan-uiuc Mar 12, 2024
bea873d
Merge pull request #228 from UIUC-Chatbot/dependency_injection
rohan-uiuc Mar 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ SUPABASE_READ_ONLY=
SUPABASE_JWT_SECRET=

MATERIALS_SUPABASE_TABLE=uiuc_chatbot
NEW_NEW_NEWNEW_MATERIALS_SUPABASE_TABLE=documents
SUPABASE_DOCUMENTS_TABLE=documents

# QDRANT
QDRANT_COLLECTION_NAME=uiuc-chatbot
Expand Down
1 change: 1 addition & 0 deletions .trunk/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@
plugins
user_trunk.yaml
user.yaml
tmp
24 changes: 15 additions & 9 deletions .trunk/trunk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
# To learn more about the format of this file, see https://docs.trunk.io/reference/trunk-yaml
version: 0.1
cli:
version: 1.18.0
version: 1.20.1
# Trunk provides extensibility via plugins. (https://docs.trunk.io/plugins)
plugins:
sources:
- id: trunk
ref: v1.3.0
ref: v1.4.3
uri: https://github.com/trunk-io/plugins
# Many linters and tools depend on runtimes - configure them here. (https://docs.trunk.io/runtimes)
runtimes:
Expand All @@ -18,20 +18,26 @@ runtimes:
# This is the section where you manage your linters. (https://docs.trunk.io/check/configuration)
# - [email protected] # too sensitive, causing failures that make devs skip checks.
lint:
disabled:
- black
enabled:
# - [email protected]
# - [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected]
- [email protected].5
- checkov@3.1.9
- [email protected].7
- checkov@3.2.22
- git-diff-check
- markdownlint@0.37.0
- markdownlint@0.39.0
- [email protected]
- prettier@3.1.0
- ruff@0.1.7
- prettier@3.2.5
- ruff@0.2.2
- [email protected]
- [email protected]
- trivy@0.48.0
- yamllint@1.33.0
- trivy@0.49.1
- yamllint@1.35.1
ignore:
- linters: [ALL]
paths:
Expand Down
64 changes: 0 additions & 64 deletions ai_ta_backend/aws.py

This file was deleted.

7 changes: 7 additions & 0 deletions ai_ta_backend/beam/.beamignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.venv
venv
.idea
.vscode
.git
*.pyc
__pycache__
Loading