Skip to content
Draft
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 7 additions & 1 deletion container/Dockerfile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decoding is optional, if dynamo was not built with this feature, or if no decoding configuration is passed, unprocessed URLs will be passed.

If the feature is gated behind a compile-time feature flag, I think it will be difficult to consume for most users since they'll need to build from source for one way or the other. Is this something that can be set as a frontend flag or environment variable or something instead? What do you think on how to control frontend-side media decoding feature @grahamking @krishung5 @indrajit96 ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rmccorm4
I have a draft PR (#3929) for this compile time flag into alexandre's branch which is WIP.
I have taken the workflow for that, using enable_kvbm and block-manager feature group as an inspiration

if [ "$ENABLE_KVBM" = "true" ]; then \

Do you think that workflow is too tedious on the user side for a front end change ? Because for using KVBM also the user needs to compile or build the wheel again ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at the end of the day what we wanted to do is not prevent people from running the regular frontend if they don't have the required media loading system dependencies at runtime (ffmpeg mostly), and they don't need media decoding.

So the solution we are working on is a build-time flag. That way even not having ffmpeg during build is possible. But this means having different wheels for different features yes.

If we are in charge of the build and don't care about having ffmpeg on our side during build, then another solution could be to require ffmpeg during build, but at runtime, if the dynamic linking fails to find ffmpeg, disable video decoding? Need to see how doable this is with rust.

Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,11 @@ RUN apt-get update -y \
clang \
libclang-dev \
protobuf-compiler \
# media-loading rust build+runtime dependencies
libavcodec-dev \
libavutil-dev \
libavformat-dev \
pkg-config \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

Expand Down Expand Up @@ -296,8 +301,9 @@ ENV CARGO_BUILD_JOBS=${CARGO_BUILD_JOBS:-16} \
PATH=/usr/local/cargo/bin:/opt/dynamo/venv/bin:$PATH

# Install system dependencies
RUN dnf install -y https://download1.rpmfusion.org/free/el/rpmfusion-free-release-8.noarch.rpm && dnf install -y https://download1.rpmfusion.org/nonfree/el/rpmfusion-nonfree-release-8.noarch.rpm
RUN dnf update -y \
&& dnf install -y llvm-toolset protobuf-compiler wget unzip \
&& dnf install -y llvm-toolset protobuf-compiler wget unzip libavdevice-dev libavutil-dev libavcodec-dev libavformat-dev pkg-config \
&& dnf clean all \
&& rm -rf /var/cache/dnf

Expand Down
5 changes: 5 additions & 0 deletions container/Dockerfile.vllm
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,11 @@ ARG WORKSPACE_DIR=/workspace
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
# Install utilities
wget \
&& rm -f /etc/apt/sources.list.d/cuda*.list \
&& wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && dpkg -i cuda-keyring_1.1-1_all.deb && \
apt-get update -y && \
apt-get install -y --no-install-recommends \
nvtop \
wget \
tmux \
Expand Down
69 changes: 69 additions & 0 deletions lib/bindings/python/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 6 additions & 2 deletions lib/bindings/python/rust/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ use dynamo_llm::{self as llm_rs};
use dynamo_llm::{entrypoint::RouterConfig, kv_router::KvRouterConfig};

use crate::llm::local_model::ModelRuntimeConfig;
use crate::llm::preprocessor::MediaDecoder;

#[pyclass(eq, eq_int)]
#[derive(Clone, Debug, PartialEq)]
Expand Down Expand Up @@ -154,6 +155,7 @@ fn _core(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_class::<llm::kv::WorkerMetricsPublisher>()?;
m.add_class::<llm::model_card::ModelDeploymentCard>()?;
m.add_class::<llm::local_model::ModelRuntimeConfig>()?;
m.add_class::<llm::preprocessor::MediaDecoder>()?;
m.add_class::<llm::preprocessor::OAIChatPreprocessor>()?;
m.add_class::<llm::backend::Backend>()?;
m.add_class::<llm::kv::OverlapScores>()?;
Expand Down Expand Up @@ -214,7 +216,7 @@ fn log_message(level: &str, message: &str, module: &str, file: &str, line: u32)
/// Create an engine and attach it to an endpoint to make it visible to the frontend.
/// This is the main way you create a Dynamo worker / backend.
#[pyfunction]
#[pyo3(signature = (model_input, model_type, endpoint, model_path, model_name=None, context_length=None, kv_cache_block_size=None, router_mode=None, migration_limit=0, runtime_config=None, user_data=None, custom_template_path=None))]
#[pyo3(signature = (model_input, model_type, endpoint, model_path, model_name=None, context_length=None, kv_cache_block_size=None, router_mode=None, migration_limit=0, runtime_config=None, user_data=None, custom_template_path=None, media_decoder=None))]
#[allow(clippy::too_many_arguments)]
fn register_llm<'p>(
py: Python<'p>,
Expand All @@ -230,6 +232,7 @@ fn register_llm<'p>(
runtime_config: Option<ModelRuntimeConfig>,
user_data: Option<&Bound<'p, PyDict>>,
custom_template_path: Option<&str>,
media_decoder: Option<MediaDecoder>,
) -> PyResult<Bound<'p, PyAny>> {
// Validate Prefill model type requirements
if model_type.inner == llm_rs::model_type::ModelType::Prefill {
Expand Down Expand Up @@ -302,7 +305,8 @@ fn register_llm<'p>(
.migration_limit(Some(migration_limit))
.runtime_config(runtime_config.unwrap_or_default().inner)
.user_data(user_data_json)
.custom_template_path(custom_template_path_owned);
.custom_template_path(custom_template_path_owned)
.media_decoder(media_decoder.map(|m| m.inner));
// Load the ModelDeploymentCard
let mut local_model = builder.build().await.map_err(to_pyerr)?;
// Advertise ourself on etcd so ingress can find us
Expand Down
Loading
Loading