-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[docs] [template] [data] Refactor current LLM Batch inference template #59897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[docs] [template] [data] Refactor current LLM Batch inference template #59897
Conversation
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
…full 2M Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces two new beginner-level Ray Data LLM batch inference examples: one for text data (reformatting dates) and another for vision data (generating image captions). The changes include adding new vocabulary terms like 'postprocess' and 'reformat' to the Vale style guide, updating documentation configuration to include the new example READMEs, and adding entries for these examples in examples.yml. For each example, new files were added, including Jupyter notebooks, corresponding Python scripts (for small and scaled datasets), Anyscale job configurations, CI scripts for testing and converting notebooks to markdown, and cluster configurations for AWS and GCE. Review comments highlighted several inconsistencies and areas for improvement: a grammatical error ('On contrary' instead of 'On the contrary'), trailing blank lines in YAML files, duplication of the nb2py.py utility script across examples, and an out-of-sync README for the vision example. Additionally, there were inconsistencies in the number of partitions used in the vision example's Python script versus its notebook, and a missing repartitioning step for large datasets in the vision notebook.
doc/source/data/examples/llm_batch_inference_text/content/README.ipynb
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm-batch-inference-vision/content/README.md
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/batch_inference_vision.py
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/batch_inference_vision_scaled.py
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/ci/nb2py.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
nrghosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- left comments inline
- broader and other detailed comments below
Both Notebooks
- detokenize=false issue
- Job config filename mismatches
Comments & Explanations
max_model_lenis a hard sequence length cap, not an "estimate." Sequences exceeding this will error at runtime- Repartition numbers (64, 128, 256) are arbitrary magic numbers with no explained heuristic (should be ~2-4x worker count)
batch_sizerecommendations don't explain the actual constraint (KV cache memory per sequence)
Performance Section
- concurrency and batch size advice is kind of circular, vague and not super actionable—no guidance on how to determine or actually think about those values
- quantization examples need caveats / disclaimers and explanation about model arch / hardware type / quantization method, they are not interchangable
- Model parallelism example suddenly switches to different model (Llama 90B) without explanation— and model ref switches between a lot of the examples - confusing to walk through
Missing Content
- No error handling for malformed inputs (corrupt images, bad data)
- No checkpointing example or discussion despite intro mentioning it
Text Notebook Specific
Use Case & Prompt
- Date reformatting seems like a trivial use-case even for a demo, why do something that a regex could do?
temperature=0.3why? why not 0?- No output validation in postprocess—should probably validate
MM-DD-YYYYformat - mention guided/constrained decoding to guarantee valid output?
- String concatenation is missing a space:
"...MM-DD-YYYY.""Be concise..."
Vision Notebook Specific
Code
- explain why 225x225 resize? model specific? users may not understand why or how to adapt.
- Notebook does
%pip install datasets- maybe pin version?
Explanations
- explain why
max_model_len=8192vs text's 256 (32x difference) - talk about vision overhead? - "go over your Anyscale Job" → "go to your Anyscale Job"
- show image validation as part of the demo?
def preprocess(row):
try:
image = Image.open(BytesIO(row['jpg']['bytes'])).convert('RGB')
except Exception as e:
return Noneand maybe show off how Ray Data handles None returns (or use filter() to drop failures).
Both
Performance remarks / tips and pasted code examples doesn't read super smoothly and is confusing (if I put on my user hat and look at it like the first time) - jumping between models, magic numbners, etc - could be a lot more thorough in explanation. Will defer to https://github.com/anyscale/docs/pull/1626/changes for more detailed tuning and performance focus.
doc/source/data/examples/llm_batch_inference_vision/content/README.ipynb
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/README.ipynb
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/README.ipynb
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/README.ipynb
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/README.ipynb
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_text/content/README.ipynb
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_text/content/README.ipynb
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/README.ipynb
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_text/content/README.ipynb
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_text/content/README.ipynb
Outdated
Show resolved
Hide resolved
…of batch size, concurrency, more refs to docs links, refactor quantization and model parallelism section for more readability, add image validation, mention anyscale runtime, pin datasets version Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
|
@nrghosh |
doc/source/data/examples/llm_batch_inference_text/content/README.ipynb
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/README.ipynb
Outdated
Show resolved
Hide resolved
Signed-off-by: Aydin Abiar <[email protected]>
doc/source/data/examples/llm_batch_inference_vision/content/batch_inference_vision_scaled.py
Outdated
Show resolved
Hide resolved
|
/gemini review |
nrghosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @Aydin-ab - see also cursor/gemini comments when it comes to code. As long as you are able to run them successfully, should be free of serious bugs now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces new LLM batch inference examples for both text and vision data, along with their corresponding CI configurations and helper scripts. The changes effectively split the existing template into two distinct, independently readable content pieces, which is a good refactoring step. The new examples demonstrate how to use Ray Data LLM APIs for batch inference, including data preparation, processor configuration, and scaling considerations. The addition of CI scripts ensures these examples remain functional.
However, there are a few areas that could be improved for robustness and clarity:
- The
nb2py.pyscripts use specific string matching to modify dataset limits for CI. This approach is brittle and could break if the exact string in the notebook changes. - Some comments in the Jupyter notebooks are slightly misleading regarding dataset size limits.
- The standalone Python scripts contain hardcoded configuration values that would ideally be configurable for real-world use cases.
doc/source/data/examples/llm_batch_inference_text/content/batch_inference_text.py
Outdated
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/README.ipynb
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/batch_inference_vision.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
doc/source/data/examples/llm_batch_inference_text/content/batch_inference_text.py
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/batch_inference_vision.py
Show resolved
Hide resolved
doc/source/data/examples/llm_batch_inference_vision/content/batch_inference_vision_scaled.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
doc/source/data/examples/llm_batch_inference_text/content/batch_inference_text.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Aydin Abiar <[email protected]>
Signed-off-by: Aydin Abiar <[email protected]>
doc/source/data/examples/llm_batch_inference_vision/content/batch_inference_vision.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Aydin Abiar <[email protected]>
follow up of this (closed for inactivity over the holidays):
#58571
There are a lot of files changed but the main technical content to review are the README.ipynb
For context, the goal is to refactor this current template
https://console.anyscale.com/template-preview/llm_batch_inference
And split it into 2: one on text data, and the other on vision data
Both are very similar, but should be read independently as two distinct content 👍