MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU
We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU.arXiv.org

ms.lane
in reply to ☆ Yσɠƚԋσʂ ☆ • • •