Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

US Senators reduce resolutions to block Trump’s global tariff amid economic turmoil

It’s great to see Indian artists perform at Coachella and win a Grammy Award, says AR Rahman

Rare earth metals will be in the center stage at ICSTAR-2025

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » Inside Meta’s race to beat OpenAI: “We need to learn how to build frontiers and win this race.”
AI

Inside Meta’s race to beat OpenAI: “We need to learn how to build frontiers and win this race.”

Adnan MaharBy Adnan MaharJanuary 14, 2025No Comments5 Mins Read0 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


A major copyright lawsuit against Meta has revealed a large amount of inside information about the company’s plans to develop the Llama open source AI model. This includes a discussion about avoiding “media coverage that suggests the use of datasets known to be pirated.”

The message is part of a series of exhibits unsealed by a California court that show Meta used copyrighted data to train its AI system and tried to hide it. This suggests that he was trying to do so. Some of the messages first came to light last week.

In an October 2023 email to Hugo Touvron, a researcher at Meta AI, Ahmad Aldar, Meta’s vice president of generative AI, said that OpenAI’s large-scale The company’s goal “needs to be GPT4,” he wrote, referring to the language model. “To build the frontier and learn how to win this race,” Aldahl added. These plans appear to have included training the AI ​​systems of the book piracy site Library Genesis (LibGen).

An undated email sent from Sony Theakanath, meta director of products, to Joelle Pineau, vice president of AI research, states that LibGen will only be used internally, for benchmarks included in blog posts, or on the site. I was considering creating a trained model. In an email, Theakanath said that after escalating to “MZ” (presumably Meta CEO Mark Zuckerberg), “GenAI has been approved to use LibGen for Llama3…with a number of agreed mitigation measures.” “It was,” he wrote. As stated in the email, Theakanath believes that “Libgen is essential to meeting SOTA (state-of-the-art) numbers,” adding that “OpenAI and Mistral are using this library in their models ( Mistral and OpenAI have not disclosed whether they are using LibGen. (The Verge has contacted both parties for more information.)

Meta’s Theakanath writes that LibGen is “essential” to achieving “SOTA numbers across all categories.”
Screenshot: The Verge

The court documents allege that author Richard Kadri, comedian Sarah Silverman and others used illegally obtained copyrighted content to train AI models, in violation of intellectual property laws. The case stems from a class action lawsuit filed against Meta. Meta, like other AI companies, argues that using copyrighted material for training data constitutes legal fair use. The Verge reached out to Mehta for comment, but he did not immediately respond.

Some of the “mitigation measures” for using LibGen include that Meta avoids externally quoting “use of training data” from the site, while also avoiding “data clearly marked as pirated/stolen.” It included a provision that it must be deleted. Teakanas’ email also said the company’s models need to be “red teamed” against “biological weapons and CBRNE (chemical, biological, radiological, nuclear, and explosive)” risks.

The email also touched on some of the “policy risks” posed by LibGen’s use, including how regulators would respond to media reports suggesting Meta’s use of pirated content. . “This could undermine our negotiating position with regulators on these issues,” the email said. An April 2023 conversation between Meta researcher Nikolai Vasilikov and AI team member David Esiob also revealed that Vasilikov said, “I don’t know if Meta’s IP can be used to load torrents (of) pirated content. ” is shown to have been acknowledged.

Other internal documents show steps taken by Meta to hide copyright information in LibGen’s training data. The document titled “Observations about LibGen-SciMag” contains comments left by employees about how to improve the dataset. One suggestion is to “further remove copyright headers and document identifiers” that include lines containing “ISBN,” “Copyright,” “All Rights Reserved,” or the copyright symbol. Other notes include removing more metadata “to avoid potential legal issues” and considering whether to remove a paper’s author list “to reduce liability.” is mentioned.

This document describes the removal of “Copyright Headers and Document Identifiers”.
Screenshot: The Verge

Last June, the New York Times reported on the frenzied competition within Meta after ChatGPT’s debut, revealing that the company had hit a wall. In other words, I have exhausted almost every available English book, article, and poem that I can find online. Executives reportedly wanted more data, so they discussed an outright acquisition of Simon & Schuster and considered hiring a contractor in Africa to summarize the books without permission.

In the report, some executives justified their company’s approach by pointing to OpenAI’s “market precedent” for using copyrighted works, while Google established the right to scan books. Some executives argued that the company’s 2015 court victory could provide legal cover. According to the New York Times, one executive said at the conference, “The only thing holding it back from performing as well as ChatGPT is literally the amount of data.”

Frontier labs like OpenAI and Anthropic are reportedly hitting a data wall. This means there is not enough new data to train a large language model. Many leaders denied this. OpenAI CEO Sam Altman said it plainly: “There are no walls.” OpenAI co-founder Ilya Satskeva, who left the company last May to start a new Frontier Lab, was more candid about the potential of data walls. At a major AI conference last month, Satskever said: “Data has peaked and can’t go any further. We have to work with the data we have. There’s only one Internet.”

This lack of data has led to the emergence of many strange new ways to obtain unique data. Bloomberg reports that Frontier Labs, including OpenAI and Google, are paying digital content creators $1 to $4 per minute for unused video footage to train their LLMs (the companies are competitors). (have an AI video generation product that does this).

Companies like Meta and OpenAI want to grow their AI systems as quickly as possible, so things are bound to get a little confusing. A judge partially dismissed Mr. Kadry and Mr. Silverman’s class action lawsuit last year, but the evidence outlined here could strengthen parts of their case as it progresses through the courts.



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleAdidas’ new Lux Superstar is made in Germany and JJJJound free
Next Article John Cena talks about the WWE loss that defined his career and why ‘LOL Cena’s victory’ had to happen (Exclusive)
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

April 14, 2025

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

February 24, 2025

As Deepseek and ChatGpt Surge, is Delhi behind?

February 18, 2025
Leave A Reply Cancel Reply

Top Posts

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 202493 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202451 Views

2025 Best Actress Oscar Predictions

December 12, 202434 Views

Merry AI: ChatGPT can now be spoken to using the voice of Santa Claus

December 13, 202426 Views
Don't Miss
AI April 14, 2025

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

Alphabet and Nvidia are investing in Safe Superintelligence (SSI), a stealth mode AI startup co-founded…

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

As Deepseek and ChatGpt Surge, is Delhi behind?

Openai’s Sam Altman reveals his daily use of ChatGpt, and that’s not what you think

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

US Senators reduce resolutions to block Trump’s global tariff amid economic turmoil

It’s great to see Indian artists perform at Coachella and win a Grammy Award, says AR Rahman

Rare earth metals will be in the center stage at ICSTAR-2025

Most Popular

ATUA AI (TUA) develops cutting-edge AI infrastructure to optimize distributed operations

October 11, 20020 Views

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.