Development Roadmap (2024 Q4) #1487

Ying1123 · 2024-09-21T22:38:00Z

fengyang95 · 2024-09-22T02:02:41Z

Are there any plans to optimize long context latency?

lumiere-ml · 2024-10-17T02:24:33Z

Hi，can I help for Multi-layer radix cache (GPU/CPU/Disk)？ Really insterested in that.

tanzelin430 · 2024-10-17T11:58:58Z

Are there any plans to optimize long context latency?

I am interested in contributing to P-D split inference architechure and I have machines that support me to develop the architechure, if you guys got any related develop plans please let me know. Thank you @Ying1123 @zhyncs @fengyang95

merrymercy · 2024-10-19T13:58:47Z

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

zhyncs · 2024-10-20T06:01:03Z

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

tanzelin430 · 2024-10-20T06:14:54Z

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

thanks for invitation, I am in slack now. forward to collaberate with you

lumiere-ml · 2024-10-20T09:01:30Z

@lumiere-ml @tanzelin430 Are you in the slack channel? We can follow up on that.

@lumiere-ml @tanzelin430 Welcome to join our slack channel https://join.slack.com/t/sgl-fru7574/shared_invite/zt-2ngly9muu-t37XiH87qvD~6rVBTkTEHw

Thanks for your invitation！

Ying1123 changed the title ~~[WIP] Development Roadmap (2024 Q4)~~ Development Roadmap (2024 Q4) Sep 22, 2024

zhyncs pinned this issue Sep 22, 2024

zhyncs mentioned this issue Sep 22, 2024

[Feature] Are there plans to implement a prefill-decode split inference architecture? #1080

Closed

ByronHsu mentioned this issue Oct 4, 2024

Provide an offline engine API #1567

Merged

3 tasks

ByronHsu mentioned this issue Oct 15, 2024

Support vLLM-style rope flashinfer-ai/flashinfer#530

Open

zhaochenyang20 mentioned this issue Oct 20, 2024

Add documentations for Installation #1733

Closed

3 tasks

zhyncs mentioned this issue Nov 1, 2024

Development Roadmap (2024 Q3) #634

Closed

29 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development Roadmap (2024 Q4) #1487

Development Roadmap (2024 Q4) #1487

Ying1123 commented Sep 21, 2024 •

edited by merrymercy

Loading

fengyang95 commented Sep 22, 2024

lumiere-ml commented Oct 17, 2024

tanzelin430 commented Oct 17, 2024

merrymercy commented Oct 19, 2024

zhyncs commented Oct 20, 2024

tanzelin430 commented Oct 20, 2024

lumiere-ml commented Oct 20, 2024

Development Roadmap (2024 Q4) #1487

Development Roadmap (2024 Q4) #1487

Comments

Ying1123 commented Sep 21, 2024 • edited by merrymercy Loading

Performance

Parallelism

Hardware Coverage

Model Coverage

LoRA support

LMCache Integration

Quantization

Server API

Observability

Others

fengyang95 commented Sep 22, 2024

lumiere-ml commented Oct 17, 2024

tanzelin430 commented Oct 17, 2024

merrymercy commented Oct 19, 2024

zhyncs commented Oct 20, 2024

tanzelin430 commented Oct 20, 2024

lumiere-ml commented Oct 20, 2024

Ying1123 commented Sep 21, 2024 •

edited by merrymercy

Loading