Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Create a NeMo Service and NeMo Stage #1130

Closed
2 tasks done
Tracked by #1140 ...
mdemoret-nv opened this issue Aug 18, 2023 · 1 comment · May be fixed by #1204
Closed
2 tasks done
Tracked by #1140 ...

[FEA]: Create a NeMo Service and NeMo Stage #1130

mdemoret-nv opened this issue Aug 18, 2023 · 1 comment · May be fixed by #1204
Assignees
Labels
feature request New feature or request sherlock Issues/PRs related to Sherlock workflows and components

Comments

@mdemoret-nv
Copy link
Contributor

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

High

Please provide a clear description of problem this feature solves

This feature would allow Morpheus pipelines to integrate with NVIDIA's LLM service, NeMo, by sending inference requests to the service from a stage in the pipeline.

The ability to run LLM models inside of a Morpheus pipeline will allow for pipelines to execute complex NLP tasks with very large models. Often these models would be too large to run inside of a Morpheus pipeline so sending the requests off to an external service fits well with other inference services like Triton.

Describe your ideal solution

This new feature should be built from 2 components:

  1. A NeMo LLM service which lives outside of the pipeline
    1. The LLM service should live outside of the pipeline to allow multiple stages to utilize the same LLM model, but batch their requests together. The best way to do this is make a singleton service which can be accessed at any time from multiple stages.
    2. Requests sent to this service should be batched and then sent off to the NeMo endpoint via the nemo_llm library (python) or CURL (C++)
  2. A NeMo LLM Inference Stage which lives inside of the pipeline
    1. This stage will primarily interact with the LLM service, sending input messages to the LLM service for inference
    2. Returned messages from the LLM service will be sent to the next stage in the pipeline.

Configurable Options
The NeMo Inference stage should include (but not be limited to the following configurable parameters:

  • The column to use as the text to use for the inference request
  • The model name
  • The model customization ID
  • NeMo endpoint
  • API Key
  • Organization Key
  • Any model parameters that nemo_llm supports
    • For example, tokens_to_generate, stop, temperature, etc.

Describe any alternatives you have considered

A test prototype has been created here: https://github.com/mdemoret-nv/Morpheus/tree/mdd_nemo-stage/examples/nemo

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
@mdemoret-nv mdemoret-nv added the feature request New feature or request label Aug 18, 2023
@mdemoret-nv mdemoret-nv added this to the 23.11 - Sherlock milestone Aug 21, 2023
@mdemoret-nv mdemoret-nv added the sherlock Issues/PRs related to Sherlock workflows and components label Sep 8, 2023
This was referenced Sep 19, 2023
@mdemoret-nv
Copy link
Contributor Author

Closing since it was completed in 23.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request sherlock Issues/PRs related to Sherlock workflows and components
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants