TECH & OTHER NEWS

Hybrid AI model crafts smooth, high-quality videos in seconds

May 14, 2025

Source: MIT News

The CausVid generative AI tool uses a diffusion model to teach an autoregressive (frame-by-frame) system to rapidly produce stable, high-resolution videos.

What would a behind-the-scenes look at a video generated by an artificial intelligence model be like? You might think the process is similar to stop-motion animation, where many images are created and stitched together, but that’s not quite the case for “diffusion models” like OpenAl’s SORA and Google’s VEO 2.

Instead of producing a video frame-by-frame (or “autoregressively”), these systems process the entire sequence at once. The resulting clip is often photorealistic, but the process is slow and doesn’t allow for on-the-fly changes.

Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Adobe Research have now developed a hybrid approach, called “CausVid,” to create videos in seconds. Much like a quick-witted student learning from a well-versed teacher, a full-sequence diffusion model trains an autoregressive system to swiftly predict the next frame while ensuring high quality and consistency. CausVid’s student model can then generate clips from a simple text prompt, turning a photo into a moving scene, extending a video, or altering its creations with new inputs mid-generation.

This dynamic tool enables fast, interactive content creation, cutting a 50-step process into just a few actions. It can craft many imaginative and artistic scenes, such as a paper airplane morphing into a swan, woolly mammoths venturing through snow, or a child jumping in a puddle. Users can also make an initial prompt, like “generate a man crossing the street,” and then make follow-up inputs to add new elements to the scene, like “he writes in his notebook when he gets to the opposite sidewalk.”

Brief computer-generated animation of a character in an old deep-sea diving suit walking on a leaf

A video produced by CausVid illustrates its ability to create smooth, high-quality content.

AI-generated animation courtesy of the researchers.

The CSAIL researchers say that the model could be used for different video editing tasks, like helping viewers understand a livestream in a different language by generating a video that syncs with an audio translation. It could also help render new content in a video game or quickly produce training simulations to teach robots new tasks.

Tianwei Yin SM ’25, PhD ’25, a recently graduated student in electrical engineering and computer science and CSAIL affiliate, attributes the model’s strength to its mixed approach.

“CausVid combines a pre-trained diffusion-based model with autoregressive architecture that’s typically found in text generation models,” says Yin, co-lead author of a new paper about the tool. “This AI-powered teacher model can envision future steps to train a frame-by-frame system to avoid making rendering errors.”

Yin’s co-lead author, Qiang Zhang, is a research scientist at xAI and a former CSAIL visiting researcher. They worked on the project with Adobe Research scientists Richard Zhang, Eli Shechtman, and Xun Huang, and two CSAIL principal investigators: MIT professors Bill Freeman and Frédo Durand.

Caus(Vid) and effect

Many autoregressive models can create a video that’s initially smooth, but the quality tends to drop off later in the sequence. A clip of a person running might seem lifelike at first, but their legs begin to flail in unnatural directions, indicating frame-to-frame inconsistencies (also called “error accumulation”).

Error-prone video generation was common in prior causal approaches, which learned to predict frames one by one on their own. CausVid instead uses a high-powered diffusion model to teach a simpler system its general video expertise, enabling it to create smooth visuals, but much faster.

Hybrid AI model crafts smooth, high-quality videos in seconds

LEAVE A REPLY Cancel reply

Top Stories

AI Is Transforming Jobs Faster Than Organizations Can Transform Work

Global Economic Outlook Hangs in Balance between Geopolitical Headwinds and AI boost, Chief Economists...

Low Job Confidence, Uneven AI Adoption and Rising Anxiety is Reshaping Employee Sentiment in...

By 2027, 40% of Enterprises Will Demote or Decommission Autonomous AI Agents Due to...

Sixty-One Percent of CEOs Say Their Boards Are Rushing AI Transformation

Cyber Security

Threat Actors Spoofing FIFA Websites in Advance of the 2026 World Cup

Kaspersky reports 17% of major Mexico cities open Wi-fi spots unsecure

Despite robust security measures, credential abuse techniques remain the most effective attack method

Phishing-as-a-Service (PhaaS): Understanding the Growing Cyber Threat and How to Stay Safe

Kaspersky: Nearly 5 in 10 victims of tech-enabled abuse know their perpetrator