Mona Awad and Paul Tremblay’s lawsuit claims their books were used without their consent. But copyright protection doesn’t apply to ideas – they’ll need to demonstrate the likelihood of economic loss.
There are already laws regarding producing works too similar to copyrighted material.
Production is infringement, not training.
If I feed all of Stephen King into a LLM such that it learns what well written horror narratives looks like, and it produces a story with original and different plot elements distinct from copyrighted works, that’s fine.
If it starts writing about killer clowns thwarted by child orgies in the sewers then you might have an infringement problem.
And ironically, the best tool for protecting copyrighted material from infringement is going to be…LLMs (acting in a discriminator role comparing indexed copy to protected works).
If ‘training’ ends up successfully labeled as infringement we’re going to end up with much worse long term outcomes in jurisdictions that honor that ruling than we otherwise would.
This is the longer tail masses adopting MPAA math in trying to tally potential losses and in the efforts to protect the status quo are shooting themselves in the foot on laying claim to the future of the industry, inevitably leading to being left out of the next round of growth.
Also, from an ‘infringenent’ standpoint it just means we’ll see less open models and more closed ones which ends up using other jurisdictional models to launder copyrighted materials for synthetic training data.
There are already laws regarding producing works too similar to copyrighted material.
Production is infringement, not training.
If I feed all of Stephen King into a LLM such that it learns what well written horror narratives looks like, and it produces a story with original and different plot elements distinct from copyrighted works, that’s fine.
If it starts writing about killer clowns thwarted by child orgies in the sewers then you might have an infringement problem.
And ironically, the best tool for protecting copyrighted material from infringement is going to be…LLMs (acting in a discriminator role comparing indexed copy to protected works).
If ‘training’ ends up successfully labeled as infringement we’re going to end up with much worse long term outcomes in jurisdictions that honor that ruling than we otherwise would.
This is the longer tail masses adopting MPAA math in trying to tally potential losses and in the efforts to protect the status quo are shooting themselves in the foot on laying claim to the future of the industry, inevitably leading to being left out of the next round of growth.
Also, from an ‘infringenent’ standpoint it just means we’ll see less open models and more closed ones which ends up using other jurisdictional models to launder copyrighted materials for synthetic training data.
This is beyond dumb.