• Humanius@lemmy.world
    link
    fedilink
    English
    arrow-up
    24
    arrow-down
    1
    ·
    2 days ago

    Let me be the devil’s advocate for this one.

    These companies were already training their models on Wikipedia’s wealth of information anyway. In this way Wikipedia is earning some revenue from the thing that was already happening, letting them put that money back into the non-profit.

    • Sunspear@piefed.social
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      1
      ·
      2 days ago

      Yeah I mean Wikipedia has regular dump files where you can just… download its entire content, or parts of it if you so wish. Getting money instead for that bandwidth is immediately an improvement

      • Nollij@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 day ago

        They weren’t using those dumps. They were scraping the main site, at incredible expense to Wikipedia.

    • phaedrus@piefed.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 days ago

      Wikipedia is also public knowledge. I personally think it’s OK to use for training data.

      However, there are other concerns about inaccuracies and some info on the site needs to be scrutinized and verified just because anyone can edit it, and the LLMs getting trained can’t do that part.

      What actually bothers me about this is that the companies training the LLMs are going to put them behind paywalls, removing the public knowledge part of this.

  • hector@lemmy.today
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    2
    ·
    2 days ago

    Wikipedia has an owner? I thought it was all non profit ey or some shit.

    • Humanius@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      ·
      2 days ago

      Non-profits also have owners. It being a non-profit just means that the company’s main priority isn’t turning a profit, so excess revenue generally gets pumped back into the non-profit.