-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* make torch optional * WIP datatrove integration * fix limit break * hard coded doc schema * hard coded doc id * hard coded doc id * WIP datatrove * WIP removed compression * added favicons * Migrated to Ruff linter * CI for all PRs * disable docs build on dev branch * fixed install variation * changed linter * clean up tlsh install * clean up tlsh install * added datatrove dependency * recover missing files * added trust remote code
- Loading branch information
Showing
145 changed files
with
1,696 additions
and
1,501 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,9 +2,7 @@ name: docs | |
on: | ||
push: | ||
branches: | ||
- master | ||
- main | ||
- dev | ||
permissions: | ||
contents: write | ||
jobs: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -161,5 +161,6 @@ cython_debug/ | |
|
||
.DS_Store | ||
|
||
/logs/ | ||
data/ | ||
./data/* |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
install: | ||
@echo "--- 🚀 Installing project dependencies ---" | ||
pip install -e ".[all]" | ||
|
||
install-for-tests: | ||
@echo "--- 🚀 Installing project dependencies for test ---" | ||
@echo "This ensures that the project is not installed in editable mode" | ||
pip install ".[dev]" | ||
|
||
install-tlsh: | ||
@echo "--- 🚀 Installing TLSH dependency (same version as OSCAR 23.01) ---" | ||
pip download python-tlsh==4.5.0 && \ | ||
tar -xvf python-tlsh-4.5.0.tar.gz && \ | ||
cd python-tlsh-4.5.0 && \ | ||
sed -i 's/set(TLSH_BUCKETS_128 1)/set(TLSH_BUCKETS_256 1)/g; s/set(TLSH_CHECKSUM_1B 1)/set(TLSH_CHECKSUM_3B 1)/g' CMakeLists.txt && \ | ||
python setup.py install && \ | ||
rm -rf ../python-tlsh-4.5.0* | ||
|
||
lint: | ||
@echo "--- 🧹 Running linters ---" | ||
ruff format . # running ruff formatting | ||
ruff check . --fix # running ruff linting | ||
|
||
lint-check: | ||
@echo "--- 🧹 Check is project is linted ---" | ||
# Required for CI to work, otherwise it will just pass | ||
ruff format . --check # running ruff formatting | ||
ruff check **/*.py # running ruff linting | ||
|
||
test: | ||
@echo "--- 🧪 Running tests ---" | ||
pytest --durations=5 ./tests | ||
|
||
pr: | ||
@echo "--- 🚀 Running requirements for a PR ---" | ||
make lint | ||
make test |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.