Mark Zuckerberg authorized Meta to use “pirated copies” of his copyrighted books to train the company’s artificial intelligence models, a group of authors says in a US court filing. insisted.
The filing cites internal meta-communications in which the social network company’s chief executive officer warned that the data set was “known to be pirated” within the company’s AI executive team. , claims to have supported the use of the LibGen dataset, a vast online archive of books. .
Internal messages say using a database containing pirated content could undermine negotiations between Facebook and Instagram’s owners and regulators, according to the filing. “Media reports suggesting that we have used data sets known to be pirated, such as LibGen, could undermine our bargaining position with regulators.”
Other authors suing Mehta for copyright infringement, including American author Ta-Nehisi Coates and comedian Sarah Silverman, made the accusation in a filing made public in California federal court on Wednesday. .
The authors sued Meta in 2023, alleging that the social media company misused their books to train Llama, a large-scale language model that powers chatbots.
The Library Genesis (LibGen) dataset is a “shadow library” from Russia that claims to contain millions of novels, nonfiction books, and scientific journal articles. Last year, a New York federal court ordered the anonymous operator of LibGen to pay a group of publishers $30m (£24m) in damages for copyright infringement.
The use of copyrighted content in the training of AI models has become a legal battleground in the development of generative AI tools such as ChatGPT chatbots, and creative professionals and publishers who use their work without permission have It warns that doing so could jeopardize livelihoods and business models.
The filing cites a memo that quotes Mark Zuckerberg’s initials and notes that “following escalation to MZ,” Meta’s AI team “was approved to use LibGen.”
The filing cited internal communications in which Meta’s engineers discussed accessing and reviewing LibGen data, but that “torrenting,” meaning peer-to-peer sharing of files, “could not be accessed from a corporate laptop (owned by Meta).” He also said he was reluctant to start the process because of the lack of access. i feel sick. ”
U.S. District Judge Vince Chhabria ruled last year that the text generated by Meta’s AI model infringed the author’s copyright and that Meta illegally included a book’s Copyright Management Information (CMI), which refers to information about a work, including its title. rejected the claim that he had been deprived. Name of author and copyright holder. However, the plaintiff was granted permission to amend its claims.
The authors argued this week that the evidence supports their infringement claims and warrants reinstating the CMI case and adding new computer fraud claims.
At Thursday’s hearing, Chhabria said he would allow the writers to file an amended complaint, but expressed skepticism about the fraud and the validity of CMI’s claims.
We have reached out to Meta for comment.
Reuters contributed to this article