Releases: Tencent/PatrickStar
Releases · Tencent/PatrickStar
v0.4.6
v0.4.5
refractory the files in example and add chunk size searching.
v0.4.4
The system is successfully evaluated on a multi-node system.
The benchmark scripts are integrated with memory-centric tiling borrowed from DeepSpeed.
It trains an 18B model on WeChat Yard.
v0.4.3
PatrickStar is evaluated on 8xA100 SuperNode.
- Fix async copy bug in chunk move.
- Add Memory Allocation Cache
- Memory Saving Communication.
v0.4.2
Refactored memory tracer.
v0.3.0
v0.1.0
单机单卡版本。使用eager mode进行chunk schema调度。性能不佳,由于巨大的CPU-GPU移动开销。