• ☆ Yσɠƚԋσʂ ☆@lemmy.ml
    link
    fedilink
    arrow-up
    48
    ·
    edit-2
    2 months ago

    I think by the time AI becomes efficient enough to be profitable, it’s going to be efficient enough to run locally and the whole AI as a service business model is going to collapse. We’re basically in the mainframe era of AI right now, and we’ve seen this happen with many technologies before. There’s no reason to think this case will be different.

    Just to give you an idea of how fast this stuff is moving. Qwen 3.6 was just released and can be run on a high end laptop, it outperforms Qwen 3.5 from February which required a commercial grade server to run. https://qwen.ai/blog?id=qwen3.6-27b

    • grue@lemmy.world
      link
      fedilink
      arrow-up
      11
      ·
      edit-2
      2 months ago

      There’s no reason to think this case will be different.

      Not even the end of Moore’s Law?

      I’m not sure if you’re aware, but processors aren’t really getting much more efficient anymore. They’re just getting bigger (more parallel), which is why the price for the newer generations of GPUs has been skyrocketing. A new top-end GPU costs twice as much (or more) as a previous-gen one because it has twice as many (or more) compute units, since they can’t make the individual compute units much faster due to fundamental laws of physics.

      • ☆ Yσɠƚԋσʂ ☆@lemmy.ml
        link
        fedilink
        arrow-up
        16
        ·
        2 months ago

        I expect that software will continue to get optimized, and we’ll see new algorithms that are more efficient than what people are doing currently. However, it’s possible we’ll start seeing hardware specifically built for models as well. For example, there’s already a startup that uses ASIC chips to print the model directly to the chip. Since each transistor acts as a state, it doesn’t need DRAM and the whole chip requires a small amount of SRAM which isn’t in short supply right now https://www.anuragk.com/blog/posts/Taalas.html

        The limitation with this approach is that the chip is made for a specific model, but that’s not really that different from the way regular chips work either. You buy a chip and if it does what you need, it keeps working. When new models come out, new chips get printed, and if you need the new capabilities then you upgrade.

        You can see how absurdly fast their hardware version of llama 3 is here https://chatjimmy.ai/

      • iByteABit@lemmy.ml
        link
        fedilink
        arrow-up
        8
        arrow-down
        1
        ·
        2 months ago

        There’s always two sides to software, one is the power of the hardware, and the other is the efficiency of the software. I think in this case OP means that AI will be optimized so much that it will require tiny fractions of the resources it previously needed, at least for the casual use cases of an average person asking a simple question or performing a small task.

    • pyr0ball@reddthat.com
      link
      fedilink
      arrow-up
      9
      ·
      2 months ago

      Yup. Already working on a suite of local pipeline apps and an orchestration platform for this. Happy to share if interested! Source