New LLM Scaling Law
post by wrmedford · 2025-02-19T20:21:17.475Z · LW · GW · 0 commentsThis is a link post for https://github.com/wrmedford/moe-scaling
Contents
No comments
Hi all,
I'm an independent researcher, and I believe I came across a new scaling law for Mixture of Experts models. I'd appreciate any review and critique. This challenges the notion that performant inference and training must hold all weights in VRAM, and suggests that as long as bus speeds are sufficient (like on modern hardware like NVIDIA's GH200), even NVMe could be a viable option for storing weights without a measurable performance degradation.
I am doing this in my free time on my own dime, so please forgive any mistakes. I promise they were made in good faith.
0 comments
Comments sorted by top scores.