1-bit LLMs Could Solve AI’s Energy Demands

ylai@lemmy.ml · 1 month ago

1-bit LLMs Could Solve AI’s Energy Demands

qjkxbmwvz · 30 days ago

Only briefly skimmed, but don’t you need nonlinearity for these things to work (e.g., rectifier, sigmoid…)? Else, it’s just linear algebra, and more layers can’t help (since matrices can be multiplied, the dimensionality is the only thing that matters). I don’t think you can really get nonlinearity with one bit.

Not my field, so I’m sure I’m missing something. If anyone wants to ELI5 though…

howrar@lemmy.ca · 30 days ago

This article got me curious about how these 1-bit models worked so I read up on it a bit.

https://arxiv.org/html/2402.11295v3

The model parameters aren’t completely converted to 1-bit. It’s decomposed into a sign matrix (the 1-bit part) and two full precision vectors which together make a rank 1 approximation of the original matrix. So if I understand correctly, this means everything still functions the same way as a regular transformer. Input vectors, intermediate values, and outputs, all are full precision and have no problem going through nonlinearities.