Facebook Parent-Company Meta is currently fighting class action lawsuits claiming copyright infringement and unfair competition over how the llama trained. According to a post by VX-Underground on X (formerly Twitter), court records show that social media companies use warezed torrents to download 81.7TB of data from Shadow Libraries, including Anna’s archives, Zlibrary and Libgen It makes clear what he did. We then used this information to train the AI model.
Evidence in the form of written communications illustrates researchers’ concerns regarding the use of meta pirated materials. One senior AI researcher said, “I don’t think we should use pirated materials. We need to draw a line here,” another said, “Using pirated materials. What we do is to exceed our ethical threshold,” they said, “Scihub, ResearchGate, Libgen are basically similar to Piratebay or something like that.”
Then, in January 2023, Mark Zuckerberg himself attended the meeting, where he said, “We need to move forward with something like this… we need to find a way to unblock it.” About three months later, Meta employees sent another message saying they were worried about the meta IP address being used “to load via pirate content.” They also added, “I feel that torrents from the corporate laptop are not correct,” followed by laughing loudly at the emoji.
Aside from these messages, the documentation revealed that the company has taken steps to prevent the activity from being reverted to the meta, to prevent the infrastructure from being used in these downloads and seed operations. Court documents say this constitutes evidence of Meta’s illegal activity.
However, this is not the first time that an AI training model has been accused of stealing information from the Internet. Openai was sued by novelists until June 2023 to train large-scale language models using books. Nvidia also exists on the receivers of lawsuits filed by authors to train Nemo models using 196,640 books. A former Nvidia employee said he blew the company whi in August last year, shaking over 426,000 hours of videos every day for use in AI training. Recently, Openai is investigating whether Deepseek illegally retrieved data from ChatGpt. This shows how you can get something sarcasm.
As the lawsuit against Meta is still ongoing, we must wait for the court to decide whether or not the court has committed a direct infringement. And even if the writer wins this case, Meta will probably appeal the decision with that huge financial war heart. So we have to wait several months, if not years, to see the final court decision.