Skip to content

Commit c34be40

Browse files
committed
add proposal: Migrate the Joint Inference Example for LLM from KubeEdge-Ianvs to KubeEdge-Sedna
Signed-off-by: Ying Jiaze <[email protected]>
1 parent 8f900d6 commit c34be40

File tree

3 files changed

+145
-0
lines changed

3 files changed

+145
-0
lines changed
87.3 KB
Loading
159 KB
Loading
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# Proposal: Migrate the Joint Inference Example for LLM from KubeEdge-Ianvs to KubeEdge-Sedna
2+
3+
This proposal outlines a project to migrate the Large Language Model (LLM) joint inference example from `kubeedge-ianvs` to `kubeedge-sedna`. The project will focus on implementing custom query routing algorithms for NLP tasks and creating the necessary `Estimator` classes and data handlers to support them.
4+
5+
## Background and Motivation
6+
7+
KubeEdge-Sedna excels at edge-cloud collaborative AI for Computer Vision (CV) tasks but lacks examples for the increasingly important domain of LLMs. The `kubeedge-ianvs` project already contains an example for LLM joint inference. This project aims to migrate that proven pattern to Sedna, enriching the Sedna ecosystem with a powerful, real-world example for developers looking to deploy collaborative LLMs on the edge.
8+
9+
## Goals
10+
11+
- Migrate the core functionality of the ianvs LLM joint inference example to Sedna.
12+
13+
- Implement custom Hard Example Mining (HEM) routing algorithms suitable for NLP tasks.
14+
15+
- Modify Sedna's data pipeline to enable routers to access raw input data, not just model inference results.
16+
17+
- Develop new Estimator classes and modular LLM handlers (HuggingfaceLLM, APIBasedLLM, etc.) for NLP workflows.
18+
19+
- Produce a complete and well-documented example, including code and configuration files.
20+
21+
## Design Details
22+
23+
### Architecture Overview
24+
25+
The architecture of the joint inference system will consist of:
26+
- **Edge Worker**: A lightweight model running on edge devices, responsible for lightweight inference and routing decisions.
27+
- **Cloud Worker**: A more powerful model running in the cloud, handling complex inference tasks and generating more accurate results. As api-based LLMs are often used, this worker will also include API-based LLM handlers.
28+
29+
![architecture](./images/joint-inference-qa-architecture.png)
30+
31+
### Custom Router and Data Path Modification
32+
33+
Sedna's existing routers (`HardExampleMining`) are designed for CV tasks and follow an "inference-then-mining" pattern, where the router can only access the inference result from the edge model. The ianvs example includes a `BERTFilter` which requires a "mining-then-inference" approach, needing access to the original input data to perform its routing logic.
34+
35+
I will reference the implementation in https://github.com/kubeedge/ianvs/blob/main/examples/resources/third_party/sedna-0.6.0.1-py3-none-any.whl to introduce relevant features. By adding an optional `mining_mode` parameter to the `inference` method of the `JointInference` class (with values "inference-then-mining" or "mining-then-inference", defaulting to the former to ensure seamless compatibility with existing examples), I will enable `JointInference` to flexibly switch between these paths during inference.
36+
37+
The new parameter `mining_mode` will be passed to the model via container environment variables defined in the YAML config file, thus avoiding changes to the Go-written Sedna control layer.
38+
39+
![data path](./images/joint-inference-data-path.png)
40+
41+
### Support for NLP Tasks
42+
43+
Sedna's current Estimator classes and data modules are CV-focused. To handle LLMs, they must be adapted for text-based workflows.
44+
45+
Solution: I will:
46+
47+
- Create new Estimator classes specifically for NLP inference.
48+
49+
- Develop modular LLM handlers (e.g., `HuggingfaceLLM`, `APIBasedLLM`) that can be reused by both edge and cloud models.
50+
51+
- Adapt Sedna's data management to handle text datasets.
52+
53+
### Implementation Details
54+
55+
Files to be added include:
56+
57+
```
58+
|- examples
59+
| |- joint_inference
60+
| |- answer_generation_inference
61+
| |- big_model
62+
| |- interface.py
63+
| |- big_model.py
64+
| |- little_model
65+
| |- interface.py
66+
| |- little_model.py
67+
| answer_generation_inference.yaml
68+
| README.md
69+
```
70+
71+
The `interface.py` files will define the `Estimator` classes for the edge and cloud models, while the `big_model.py` and `little_model.py` files will create and launch the `BigModelService` and `JointInference` instances. The `Estimator` classes will automatically load models from local storage, URLs, or switch to API-based LLMs based on configuration settings.
72+
73+
Files to be modified include:
74+
```
75+
|- lib
76+
| |- sedna
77+
| |- algorithms
78+
| |- hard_example_mining.py
79+
| |- backend
80+
| |- torch
81+
| |- __init__.py
82+
| |- core
83+
| |- joint_inference.py
84+
```
85+
86+
The modification to `hard_example_mining.py` will focus on adding several new hard-example-mining algorithms: `BertRouter`, `EdgeOnly`, and `CloudOnly`. These new algorithms will be implemented as separate classes and will not affect existing algorithms, ensuring backward compatibility.
87+
88+
The modification to `torch/__init__.py` and `joint_inference.py` aims to enable the framework to support importing URL-based models, rather than only local model weights. This will only involve minor modifications to judgment conditions without changing the main logic, and should not affect existing examples.
89+
90+
The modification is necessary because the existing library performs file path validation when loading models, checking whether the specified model path corresponds to an actual file on the local filesystem. However, for HuggingFace-based and API-based LLMs, the `model_url` parameter passed in is a URL string (such as a HuggingFace model identifier like "bert-base-uncased" or an API endpoint), not a local file path. The current validation logic would incorrectly reject these valid URL-based model specifications, preventing the framework from supporting modern LLM deployment patterns where models are loaded directly from remote repositories or accessed via APIs.
91+
92+
## Project Plan
93+
94+
### Phase 1: Foundation and Analysis (Weeks 1-2)
95+
96+
1. **Environment Setup and Analysis**
97+
- Set up development environment with KubeEdge-Sedna
98+
- Conduct deep analysis of the existing joint inference implementation in Sedna
99+
100+
2. **Architecture Design**
101+
- Design the "mining-then-inference" data flow modification for `JointInference` class
102+
- Plan NLP-specific Estimator class hierarchy and interfaces
103+
- Define modular LLM handler architecture (HuggingfaceLLM, APIBasedLLM)
104+
105+
### Phase 2: Core Framework Development (Weeks 3-4)
106+
107+
3. **Data Path Enhancement**
108+
- Implement `mining_mode` parameter in `JointInference.inference()` method
109+
- Modify core joint inference logic to support both inference patterns
110+
- Update hard example mining algorithms to handle raw input data access
111+
112+
4. **NLP Infrastructure Development**
113+
- Create base NLP Estimator classes for edge and cloud models
114+
- Implement modular LLM handlers with unified interfaces
115+
- Develop text data processing and management capabilities
116+
- Add URL-based model loading support to torch backend
117+
118+
### Phase 3: Algorithm Implementation (Weeks 5-6)
119+
120+
5. **Custom Router Development**
121+
- Implement `BertRouter` for BERT-based filtering decisions
122+
- Create `EdgeOnly` and `CloudOnly` routing algorithms
123+
- Ensure backward compatibility with existing CV-based routers
124+
125+
6. **Model Integration**
126+
- Develop edge model interface with lightweight inference capabilities
127+
- Implement cloud model interface supporting both local and API-based LLMs
128+
- Create configuration-driven model loading and switching logic
129+
130+
### Phase 4: Example Development and Testing (Weeks 7-8)
131+
132+
7. **Complete Example Implementation**
133+
- Build the answer generation inference example with all components
134+
- Create comprehensive configuration files and deployment scripts
135+
- Create comprehensive documentation with usage examples and troubleshooting guides
136+
137+
### Deliverables
138+
139+
1. Modified Sedna components supporting NLP-based joint inference
140+
2. NLP Estimator classes and LLM handlers
141+
3. Custom routing algorithms for NLP tasks
142+
4. Working example implementation with configuration files
143+
5. Detailed documentation and usage guide
144+
145+

0 commit comments

Comments
 (0)