Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for a CPU-only Mode #1851

Draft
wants to merge 212 commits into
base: branch-24.10
Choose a base branch
from

Conversation

dagardner-nv
Copy link
Contributor

@dagardner-nv dagardner-nv commented Aug 19, 2024

Description

  • Adds a new enum morpheus.config.ExecutionMode with members GPU & CPU along with a new morpheus.config.Config.execution_mode attribute.
  • For backwards compatibility, by default Config.execution_mode will always default to GPU
  • Add new supported_execution_modes to StageBase which returns ExecutionMode.GPU by default. This ensures that building a pipeline with a stage not matching the execution mode will raise a reasonable error to the user.
  • Add CpuOnlyMixin and GpuAndCpuMixin mixins to automate overriding this, and makes it easier for users to determine which execution modes a given stage supports at a glance.
  • Since C++ Stage/Message impls can only support cuDF DataFrames, and RMM tensors, this PR re-purposes the existing Python stage/message impls mode to serve as CPU-only mode.
  • CPU-only mode will center around pandas DataFrames and NumPy arrays for tensors, since the current Python code which expects cuDF/CuPy is already 99% compatible with pandas/NumPy.
  • Avoid importing cudf or any other GPU based package which will fail on import at the top-level of a module. This is important for stage, message and modules which are automatically imported by the morpheus CLI tool.
  • Add new utility methods to morpheus.utils.type_utils (ex: get_df_pkg, is_cudf_type) to help avoid importing cudf directly
  • Add a new Config.freeze method which will make a config object immutable. This will be called the first time a config object is used to construct a pipeline or stage object. Prevents the possibility of config parameters from being changed in the middle of pipeline construction.
  • CudfHelper::load is no longer called automatically on import, instead it is called manually on pipeline build when execution mode is GPU.
  • Add Python implementation of ControlMessage
  • To simulate a system without a GPU to test CPU-only mode, if the CPU_ONLY environment variable is defined docker/run_container_dev.sh will launch the container using the runc runtime.
  • Remove automatic test parameterization of C++/Python mode, since supporting CPU-only mode will become the exception not the rule. Add a new gpu_and_cpu_mode test marker to explicitly indicate a test intended to be parameterized over execution modes.
  • Fix copy constructor for ControlMessage
  • AppShieldSourceStage now emits ControlMessages, AppShieldMessageMeta is now deprecated
  • AutoencoderSourceStage and thus AzureSourceStage, CloudTrailSourceStage, and DuoSourceStage now emit ControlMessage, UserMessageMeta is now deprecated.

Known Issues:

  • ransomware pipeline now takes 9min to execute (previously this took about 40s), this appears to be the result of an adverse interaction between dask and MRC coroutines.

Stages that support GPU & CPU mode:

  • ArxivSource
  • DataBricksDeltaLakeSourceStage
  • FileSourceStage
  • HttpClientSourceStage
  • HttpServerSourceStage
  • InMemoryDataGenStage
  • KafkaSourceStage
  • DeserializeStage
  • DropNullStage
  • GroupByColumnStage
  • MonitorStage
  • TriggerStage
  • SerializeStage
  • HttpClientSinkStage
  • HttpServerSinkStage
  • InMemorySinkStage
  • WriteToFileStage
  • WriteToKafkaStage

Closes #1646
Closes #1846
Closes #1852

Open Questions:

  • Should the Python impl for a GPU only stages such as PreprocessFILStage be removed, given that the C++ impl is the only one that will be used?
  • Should CppConfig be removed/renamed? The concept of a C++ mode is becoming a bit confusing given that we want to support LLM pipelines in CPU/Python mode, yet much of that code only contains a C++ impl. The big advantage that the current singleton has is that message constructors have access to that to determine if the C++ or Python impl should be used.
  • What should be done about Python-only pipelines such as AE, Ransomware & GNN Fraud which require a GPU? Most of these depend on a Python-only message implementation. I think the easiest approach would be to simply create a C++ subclass of MessageMeta which can store arbitrary Python objects in a map, and then handle the API in the Python bindings.

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

dagardner-nv and others added 30 commits August 12, 2024 13:35
This is a fixup to commit 4e3edb9 that refactored the setup files

Signed-off-by: Anuradha Karuppiah <[email protected]>
rapids-bot bot pushed a commit to nv-morpheus/MRC that referenced this pull request Sep 11, 2024
* Since CPU-only mode will become a supported feature we want to avoid unnecessary warnings.

Relates to nv-morpheus/Morpheus#1851

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: #497
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change DO NOT MERGE PR should not be merged; see PR for details feature request New feature or request skip-ci Optionally Skip CI for this PR
Projects
Status: Review - Ready for Review
2 participants