推特讨论HBF技术主要适用于读取密集型高容量数据,并指出随着模型规模增长,HBF可能有用。同时提到英伟达当前策略是扩大scale-up域规模至144/576/1152,将权重存储在大域中,并通过STX将KV缓存卸载到SSD。
"HBF can only target a narrow set of workloads, like read-heavy, high-capacity data that benefits from being close to the GPU."
HBF can be useful if model sizes grow to 50T-100T parameters
But as far as I can tell, Nvidia's current strategy is to increase the size of the scale-up domain to 144/576/1152 to store the weights in a big single domain and offload the KV cache to SSDs thru STX
likes: 144 | retweets: 13 | replies: 12 | views: 46871