Close Menu
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

Instead of Timothée Chalamett or Tom Holland, Sean Penn declares the Oscar-winning actress “the last movie star.” Hollywood

Does an American pope change U.S. politics? : The NPR Politics Podcast : NPR

Amazon will face Elon Musk’s Tesla with the robot launch.

Facebook X (Twitter) Instagram
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram Pinterest Vimeo
Karachi Chronicle
  • Home
  • AI
  • Business
  • Entertainment
  • Fashion
  • Politics
  • Sports
  • Tech
  • World
Karachi Chronicle
You are at:Home » DeepMind’s JetFormer: Unified multimodal models without modeling constraints
AI

DeepMind’s JetFormer: Unified multimodal models without modeling constraints

Adnan MaharBy Adnan MaharDecember 26, 2024No Comments2 Mins Read2 Views
Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


Recent advances in training large-scale multimodal models are driven by efforts to eliminate modeling constraints and unify architectures across domains. Despite these advances, many existing models still rely on individually trained components, such as modality-specific encoders and decoders.

In a new paper, “JetFormer: An Autoregressive Generative Model of Raw Images and Text,” the Google DeepMind research team introduces JetFormer, a breakthrough autoregressive decoder-specific Transformer designed to directly model raw data. I’m doing it. The model maximizes the potential of raw data without relying on pre-trained components and can seamlessly understand and generate text and images.

The team summarizes JetFormer’s key innovations as follows:

Leveraging normalization flows for image representation: The crucial insight behind JetFormer is that it uses a powerful normalization flow called “jet” to encode images into latent representations suitable for autoregressive modeling. That’s it. Traditional autoregression on raw image patches encoded as pixels has been impractical due to structural complexity. JetFormer’s flow model addresses this issue by providing a lossless, reversible representation that seamlessly integrates with multimodal models. At inference time, the reversibility of the flow enables direct image decoding. Guide models to high-level information: To strengthen the focus on important high-level information, researchers employ two innovative strategies. Progressive Gaussian Noise Augmentation: Gaussian noise is added during training and gradually reduced, encouraging the model to prioritize comprehensive features early. In the process of learning. Managing image data redundancy: JetFormer allows you to selectively exclude redundant dimensions in natural images from autoregressive models. Alternatively, principal component analysis (PCA) has been considered to reduce dimensionality without sacrificing important information.

The team evaluated JetFormer on two challenging tasks: conditional image generation for the ImageNet class and web-scale multimodal generation. Results show that JetFormer competes with less flexible models and outperforms on both image and text generation tasks when trained on large-scale data. Its end-to-end training capabilities further emphasize its flexibility and effectiveness.

JetFormer represents a major advance in simplifying multimodal architectures by integrating text and image modeling approaches. The innovative use of normalized flows and emphasis on prioritizing high-level features begins a new era of end-to-end generative modeling. This work lays the foundation for further exploration of integrated multimodal systems and paves the way for a more integrated and efficient approach to AI model development.

The paper “JetFormer: An Autoregressive Generative Model of Raw Images and Text” is available on arXiv.

Author: Hekate He | Editor: Chain Zhang

Something like this:

like Loading…



Source link

Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
Previous ArticleSoccer’s best of 2024: Players of the year, best goal, games
Next Article Jim Cramer on Alphabet Inc. (GOOGL): “Google Conference Call was as good as Lily was bad.”
Adnan Mahar
  • Website

Adnan is a passionate doctor from Pakistan with a keen interest in exploring the world of politics, sports, and international affairs. As an avid reader and lifelong learner, he is deeply committed to sharing insights, perspectives, and thought-provoking ideas. His journey combines a love for knowledge with an analytical approach to current events, aiming to inspire meaningful conversations and broaden understanding across a wide range of topics.

Related Posts

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

April 14, 2025

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

February 24, 2025

As Deepseek and ChatGpt Surge, is Delhi behind?

February 18, 2025
Leave A Reply Cancel Reply

Top Posts

President Trump’s SEC nominee Paul Atkins marries multi-billion dollar roof fortune

December 14, 202495 Views

Alice Munro’s Passive Voice | New Yorker

December 23, 202452 Views

2025 Best Actress Oscar Predictions

December 12, 202434 Views

20 Most Anticipated Sex Movies of 2025

January 22, 202531 Views
Don't Miss
AI April 14, 2025

Google, Nvidia invests in AI startup Safe Superintelligence, co-founder of Openai Ilya Sutskever

Alphabet and Nvidia are investing in Safe Superintelligence (SSI), a stealth mode AI startup co-founded…

This $30 billion AI startup can be very strange by a man who said that neural networks may already be aware of it

As Deepseek and ChatGpt Surge, is Delhi behind?

Openai’s Sam Altman reveals his daily use of ChatGpt, and that’s not what you think

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to Karachi Chronicle, your go-to source for the latest and most insightful updates across a range of topics that matter most in today’s fast-paced world. We are dedicated to delivering timely, accurate, and engaging content that covers a variety of subjects including Sports, Politics, World Affairs, Entertainment, and the ever-evolving field of Artificial Intelligence.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

Instead of Timothée Chalamett or Tom Holland, Sean Penn declares the Oscar-winning actress “the last movie star.” Hollywood

Does an American pope change U.S. politics? : The NPR Politics Podcast : NPR

Amazon will face Elon Musk’s Tesla with the robot launch.

Most Popular

ATUA AI (TUA) develops cutting-edge AI infrastructure to optimize distributed operations

October 11, 20020 Views

10 things you should never say to an AI chatbot

November 10, 20040 Views

Character.AI faces lawsuit over child safety concerns

December 12, 20050 Views
© 2025 karachichronicle. Designed by karachichronicle.
  • Home
  • About us
  • Advertise
  • Contact us
  • DMCA
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.