Millions of articles from The New York Times were used to train chatbots that now compete with it, the lawsuit said.

  • EnderMB@lemmy.world
    link
    fedilink
    arrow-up
    4
    arrow-down
    1
    ·
    6 months ago

    These models can still be trained on data that they’re allowed to use, but I think that what we’re seeing is that the better LLM services are probably trained with shocking amounts of private data, whereas the less performant probably don’t use stolen data.

    • spaduf@slrpnk.net
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      edit-2
      6 months ago

      Textbooks are a big one that I suspect we’ll probably see a set of suits over. Particularly because they seem to be some of the most valuable training data.