Replies: 1 comment 4 replies
-
这里有一个例子可以参考,使用NVIDIA Triton |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
在文档里看到了OneEmbedding的分布式训练方案,是实现了显存-内存-ssd的多级缓存能力。
但是文档里只介绍了分布式训练方案,而没有介绍推理方案。
在文档的“模型部署”里,提到是需要将模型转为onnx,由Triton加载实现推理。那Triton能够读取OneEmbedding的多级缓存吗?具体的实现案例有吗?
Beta Was this translation helpful? Give feedback.
All reactions