Bytedance, the parent company of Tiktok, recently unveiled a new AI tool that can use one photo to generate realistic videos of people talking or playing instruments. Bytedance, known as Omnihuman-1, says the new tool “ssigns very realistic human videos based on weak signal inputs, especially audio.”
Omnihuman-1: What do we know?
In a recently published research paper on ARXIV, the company announced the development of a new AI tool that allows images of any aspect ratio to be seamlessly worked. Whether your input is a portrait, half-body shot, or full-body image, this tool can produce very realistic and detailed results in a wide range of scenarios. This level of versatility shows significant advancements compared to existing AI models. Much of this is limited to changing facial expressions and generating simple lip sync effects to make a static image look as if you were talking.
Read again |Look | How did Mark Zuckerberg react when he entered Harvard? “Thursday’s throwback” video spills beans
According to details shared on the Omnihuman-1 page hosted on Beehiiv, the research team showed several sample videos showing the impressive features of the tool. These examples feature dynamic hand gestures, full-body movements taken from multiple angles, and even animation sequences of moving animals, highlighting the adaptability and accuracy of the model. One outstanding example is a black and white video in which Omnifman-1 animates famous physicist Albert Einstein.
The company behind the tool, Bytedan, claims that Omnihuman-1 was trained using an extensive dataset consisting of over 18,700 hours of human video footage. This training process included a variety of input types, including text prompts, audio clips, and physical pose data, allowing the model to accurately replicate natural human movements and representations.
Researchers argue that Omnihuman-1 is currently outperforming other similar AI systems across multiple performance benchmarks, setting a new standard for image-to-video generation technology. Though it is not the first tool designed to convert static images into dynamic video, bytedance’s model appears to have an advantage over its competitors. For AI to learn.