Researchers Unveil GRASP: A Breakthrough in AI Planning for Long-Horizon World Models

In a major advancement for artificial intelligence, a team of researchers has developed GRASP, a novel gradient-based planner that dramatically improves the reliability of long-horizon planning within learned world models. The method, detailed in a new paper, addresses fundamental fragility issues that have long plagued the use of powerful predictive models for control and decision-making.

“GRASP makes gradient-based planning robust for the first time over extended time horizons,” said Dr. Yann LeCun, Chief AI Scientist at Meta and co-author of the study. “This is critical for deploying world models in real-world robotics and autonomous systems where accurate long-term predictions are essential.”

The breakthrough tackles three core problems: ill-conditioned optimization, poor local minima from non-greedy structure, and failure modes caused by high-dimensional latent spaces. GRASP achieves this through three key innovations—virtual state lifting for parallel optimization, stochasticity in state iterates for exploration, and gradient reshaping to avoid fragile state-input gradients through vision models.

Background

World models are learned simulators that predict future observations based on current states and actions. As these models scale, they become increasingly capable of generating long sequences of high-dimensional visual data, akin to general-purpose simulators. However, using them for planning—especially over long horizons—has remained notoriously difficult.

Researchers Unveil GRASP: A Breakthrough in AI Planning for Long-Horizon World Models — Source: bair.berkeley.edu

“Having a powerful predictive model is not the same as being able to control it,” explained Dr. Aditi Krishnapriyan, a co-author from UC Berkeley. “The optimization becomes unstable, and naive gradient-based methods often fail when planning many steps into the future.” This limitation has hindered applications in robotics, video game AI, and autonomous navigation.

What This Means

GRASP’s robust approach to long-horizon planning could unlock new capabilities for AI systems that must reason over extended time frames. Potential use cases include dexterous manipulation in robotics, multi-step process control, and complex decision-making in autonomous vehicles.

“This is a foundational tool,” said Dr. Mike Rabbat, a co-author from Meta. “It allows practitioners to leverage state-of-the-art world models for planning without the usual fragility. We expect it to be particularly impactful for tasks where trial-and-error in the real world is expensive or dangerous.”

The method is designed to work with any differentiable world model, making it broadly applicable. The team has released code and pre-trained models to facilitate adoption.

How GRASP Works

At its core, GRASP transforms planning into a parallel optimization problem. Instead of sequentially rolling out actions, it lifts the trajectory into a set of virtual states that are optimized simultaneously across time steps. This parallelism dramatically improves gradient flow and mitigates vanishing gradients common in long sequences.

Stochasticity is injected directly into the state iterates, mimicking exploration without requiring additional policy models. Critically, the gradients used to update actions are reshaped to avoid passing through the brittle, high-dimensional vision layers of the world model, which often produce noisy or misleading signals.

“The gradient reshaping is the secret sauce,” noted Dr. Amir Bar, a co-author. “It ensures that action updates receive clean, informative gradients, even when the model has dozens of layers of visual processing.”

Technical Details

The research team evaluated GRASP on a range of continuous control benchmarks from the DeepMind Control Suite and MetaWorld, as well as on image-based tasks with learned vision encoders. Across all settings, GRASP consistently outperformed existing gradient-based planners, often by a wide margin as horizon length increased.

In one experiment, GRASP achieved near-perfect success rates on a 100-step reaching task, whereas baseline methods failed after 20 steps. The method also showed improved sample efficiency, requiring fewer environment interactions to converge to good plans.

Virtual state lifting enables parallel optimization across time.
Stochastic state iterates provide built-in exploration.
Gradient reshaping prevents brittle backpropagation through vision layers.

Future Outlook

The team believes GRASP is a stepping stone toward integrating world models more tightly with reinforcement learning and model-based reinforcement learning. They are already exploring extensions to discrete action spaces and partially observable environments.

“We’re just scratching the surface,” said LeCun. “GRASP shows that gradient-based planning can be made practical for long horizons. The next step is to incorporate this into a full model-based RL agent that can learn and plan in real time.”

For more context, read about what world models are and how GRASP works technically.