Wikipedia is also public knowledge. I personally think it’s OK to use for training data.
However, there are other concerns about inaccuracies and some info on the site needs to be scrutinized and verified just because anyone can edit it, and the LLMs getting trained can’t do that part.
What actually bothers me about this is that the companies training the LLMs are going to put them behind paywalls, removing the public knowledge part of this.
Wikipedia is also public knowledge. I personally think it’s OK to use for training data.
However, there are other concerns about inaccuracies and some info on the site needs to be scrutinized and verified just because anyone can edit it, and the LLMs getting trained can’t do that part.
What actually bothers me about this is that the companies training the LLMs are going to put them behind paywalls, removing the public knowledge part of this.