Google DeepMind, Google’s AI research arm, in collaboration with Massachusetts Institute of Technology (MIT) and New York University (NYU), has published new research introducing inference time scaling for diffusion models.
The study, titled “Inference Time Scaling of Diffusion Models Beyond Scaling of the Denoising Step,” investigated the impact of providing additional computing resources to image generation models as they produce results. I’m doing it.
Diffusion models start the process with “pure noise” and require multiple denoising steps to get a clean output based on the input. “In this study, we investigate the scaling behavior of inference time for diffusion models beyond increasing the denoising step and investigate how generation performance can be further improved with increased computational complexity,” the authors write. states.
The study found that increasing inference computation time “significantly improved” the quality of the samples produced. Review the study’s detailed technical report to understand more about the components and technology used.
Nanye Ma, one of the researchers, said the study found improvements when searching for noises that are more likely to start. “This suggests that investing compute to look for better noise pushes the scaling limit on inference time,” he said in X.
“Our search framework consists of two components: a verifier that provides feedback and an algorithm that finds better noise candidates,” he added.
This study compared the effectiveness of inference time search techniques across different models and showed that small-scale models with search can outperform large-scale models without search.
“These results demonstrate that the substantial training costs are partially offset by modest inference time computations, allowing for more efficient high-quality samples,” said Ma.
Calculating inference time is a widely used concept in large language models, especially OpenAI’s o1 inference model.
“These studies show that LLMs can generate higher quality and more contextually relevant responses by allocating more computing during inference, often through advanced search processes. ” state the authors, providing motivation for applying these techniques to diffusion models.
This also appears to be true for diffusion models, as demonstrated by Google DeepMind and others. Saining Xie, one of the authors, said he was surprised by the natural scalability of the diffusion model during inference. “We will train with fixed flops, but we can increase this to (approximately) 1,000 during the testing period,” he told X.
This research primarily focuses on image generation tasks and evaluates it on text-to-image generation benchmarks, but if these techniques can be extended to video generation, it will be difficult for OpenAI to beat Google. . Google’s Veo 2 model outperforms OpenAI’s Sora in both quality and immediate compliance.