-
Notifications
You must be signed in to change notification settings - Fork 218
dataprep: support air gapped env #1492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
eero-t
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be Dockerfile comments stating that these models and nltk are redis dependencies.
Actually, these are common to most DB backends. But I've only tested it on redis. I"ve updated the description to be more precise |
f5db704 to
ac7b982
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some doc/comment suggestions.
- Enhance dataprep to be able to run in the air gapped environment. - Add documentionation of how to run dataprep in the air gapped environment. Related to bug opea-project#1488. Signed-off-by: Lianhao Lu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces partial support for running the dataprep microservice in air-gapped environments by adding offline mode configurations for redis, qdrant, and milvus backends. Key changes include:
- Adding an optional "offline" parameter to start and validate service functions in the test scripts.
- Updating documentation and Docker configurations to guide air-gapped setup.
- Extending docker-compose configurations to support offline service variants.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/dataprep/test_dataprep_redis.sh | Added offline parameter handling and adjusted service startup logic. |
| tests/dataprep/test_dataprep_qdrant.sh | Introduced offline mode with corresponding start and validate logic. |
| tests/dataprep/test_dataprep_milvus.sh | Added offline mode support with a minor inconsistency in cleanup logic. |
| tests/dataprep/dataprep_utils.sh | Added a new utility function for pre-downloading dataprep models. |
| comps/dataprep/src/README_{redis,qdrant,milvus}.md | Updated documentation to include air gapped environment instructions. |
| comps/dataprep/src/Dockerfile | Created a data directory and configured environment variables for offline mode. |
| comps/dataprep/deployment/docker_compose/compose.yaml | Extended docker-compose definitions to include offline service variants. |
| comps/dataprep/README.md | Added common guide and steps for running dataprep in an air gapped env. |
yinghu5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
eero-t
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, looks OK to me.
Description
This is partial support running dataprep in airgap env. This only has been tested on redis/milvus/qdrant backend.
Issues
Partial of #1488.
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
List the newly introduced 3rd party dependency if exists.
Tests
Describe the tests that you ran to verify your changes.