This was a real bummer for anyone interested in running local LLMs. Memory bandwidth is the limiting factor for performance in inference, and the Mac unified memory architecture is one of the relatively cheaper ways to get a lot of memory rather than buying a specialist AI GPU for $5-10k. I was planning to upgrade the memory a bit further than normal on my next MBP upgrade in order to experiment with AI, but now I’m questioning whether the pro chip will be fast enough to be useful.
Yeah, I’m unaware how loud I’m being when gaming on voice chat, this would be nice if it were a lower price